date:20161222

RE: [PATCH] ethtool: add one ethtool option to set relax ordering mode

2016-12-22 Thread maowenan



> -Original Message-
> From: Jeff Kirsher [mailto:jeffrey.t.kirs...@intel.com]
> Sent: Friday, December 23, 2016 9:07 AM
> To: maowenan; Alexander Duyck
> Cc: Stephen Hemminger; netdev@vger.kernel.org; weiyongjun (A);
> Dingtianhong
> Subject: Re: [PATCH] ethtool: add one ethtool option to set relax ordering 
> mode
> 
> On Fri, 2016-12-23 at 00:40 +, maowenan wrote:
> > > -Original Message-
> > > From: Alexander Duyck [mailto:alexander.du...@gmail.com]
> > > Sent: Thursday, December 22, 2016 11:54 PM
> > > To: maowenan
> > > Cc: Stephen Hemminger; netdev@vger.kernel.org; jeffrey.t.kirsher@intel.
> > > com;
> > > weiyongjun (A); Dingtianhong
> > > Subject: Re: [PATCH] ethtool: add one ethtool option to set relax
> > > ordering mode
> > >
> > > On Wed, Dec 21, 2016 at 5:39 PM, maowenan 
> > > wrote:
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > > > > Sent: Thursday, December 22, 2016 9:28 AM
> > > > > To: maowenan
> > > > > Cc: netdev@vger.kernel.org; jeffrey.t.kirs...@intel.com
> > > > > Subject: Re: [PATCH] ethtool: add one ethtool option to set
> > > > > relax ordering mode
> > > > >
> > > > > On Thu, 8 Dec 2016 14:51:38 +0800 Mao Wenan
> > > > >  wrote:
> > > > >
> > > > > > This patch provides one way to set/unset IXGBE NIC TX and RX
> > > > > > relax ordering mode, which can be set by ethtool.
> > > > > > Relax ordering is one mode of 82599 NIC, to enable this mode
> > > > > > can enhance the performance for some cpu architecure.
> > > > >
> > > > > Then it should be done by CPU architecture specific quirks
> > > > > (preferably in PCI
> > > > > layer) so that all users get the option without having to do
> > > > > manual
> > >
> > > intervention.
> > > > >
> > > > > > example:
> > > > > > ethtool -s enp1s0f0 relaxorder off ethtool -s enp1s0f0
> > > > > > relaxorder on
> > > > >
> > > > > Doing it via ethtool is a developer API (for testing) not
> > > > > something that makes sense in production.
> > > >
> > > >
> > > > This feature is not mandatory for all users, acturally relax
> > > > ordering default configuration of 82599 is 'disable', So this
> > > > patch gives one way to
> > >
> > > enable relax ordering to be selected in some performance condition.
> > >
> > > That isn't quite true.  The default for Sparc systems is to have it
> > > enabled.
> > >
> > > Really this is something that is platform specific.  I agree with
> > > Stephen that it would work better if this was handled as a series of
> > > platform specific quirks handled at something like the PCI layer
> > > rather than be a switch the user can toggle on and off.
> > >
> > > With that being said there are changes being made that should help
> > > to improve the situation.  Specifically I am looking at adding
> > > support for the DMA_ATTR_WEAK_ORDERING which may also allow us to
> > > identify cases where you might be able to specify the DMA behavior
> > > via the DMA mapping instead of having to make the final decision in
> > > the device itself.
> > >
> > > - Alex
> >
> > Yes, Sparc is a special case. From the NIC driver point of view, It is
> > no need for some ARCHs to do particular operation and compiling
> > branch, ethtool is a flexible method for user to make decision whether
> > on|off this feature.
> > I think Jeff as maintainer of 82599 has some comments about this.
> 
> My original comment/objection was that you attempted to do this change as a
> module parameter to the ixgbe driver, where I directed you to use ethtool so
> that other drivers could benefit from the ability to enable/disable relaxed
> ordering.  As far as how it gets implemented in ethtool or PCI layer, makes
> little difference to me, I only had issues with the driver specific module
> parameter implementation, which is not acceptable.


Thank you Jeff and Alex.
And then I have gone through mail thread about "i40e: enable PCIe relax 
ordering for SPARC",
It only works for SPARC, any other ARCH who wants to enable 
DMA_ATTR_WEAK_ORDERING 
feature, should define the new macro, recompile the driver module.

Because of the above reasons, we implement in ethtool to give the final user a 
convenient
way to on|off special feature, no need define new macro, easy to extend the new 
features,
and also good benefit for other driver as Jeff referred.

Re: [PATCH v2] stmmac: CSR clock configuration fix

2016-12-22 Thread Phil Reid


G'day Joao,
On 23/12/2016 01:06, Joao Pinto wrote:

Às 4:57 PM de 12/22/2016, Phil Reid escreveu:

On 22/12/2016 23:47, Joao Pinto wrote:


Hello Phil,

Às 3:42 PM de 12/22/2016, Phil Reid escreveu:

G'day Joao,

On 22/12/2016 20:38, Joao Pinto wrote:

When testing stmmac with my QoS reference design I checked a problem in the
CSR clock configuration that was impossibilitating the phy discovery, since
every read operation returned 0x. This patch fixes the issue.

Signed-off-by: Joao Pinto 
---
changes v1->v2 (David Miller)
- DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch
to make sense

 drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c  | 2 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c| 8 
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
index b21d03f..94223c8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem
*ioaddr, int mcbins,
 mac->mii.reg_shift = 6;
 mac->mii.reg_mask = 0x07C0;
 mac->mii.clk_csr_shift = 2;
-mac->mii.clk_csr_mask = 0xF;
+mac->mii.clk_csr_mask = GENMASK(4, 2);


Should this not be GENMASK(5,2)


According to Universal MAC databook (valid for MAC100 and MAC1000), we have:

Bits: 4:2
000 60-100 MHz clk_csr_i/42
001 100-150 MHz clk_csr_i/62
010 20-35 MHz clk_csr_i/16
011 35-60 MHz clk_csr_i/26
100 150-250 MHz clk_csr_i/102
101 250-300 MHz clk_csr_i/124
110, 111 Reserved

So only bits 2, 3 and 4 should be masked.

Thanks.


But this is a change in behaviour from what was there isn't.
Previous mask was 4 bits. now it's 3.

And for example the altera socfgpa implementation in the cyclone V has valid 
values
for values 0x8-0xf, using bit 5:2.


According to the databook, bit 5 is reserved and RO. When reserved, it is
possible to customize. Is that the case? If there is hardware using the 5th bit
we can put it GENMASK(5, 2) with no problem.


I've also checked the Aria 10 documentation and bit 5 is also RW.
The following options are documented and supported
1000: CSR clock/4
1001: CSR clock/6
1010: CSR clock/8
1011: CSR clock/10
1100: CSR clock/12
1101: CSR clock/14
1110: CSR clock/16
: CSR clock/18
They do mention that these values will probably be outside the IEEE 802.3 
specified range.

But there's at least a couple of cores out there where GENMASK(5,2) is valid.
Can't say if anyone is using that setting thou.


--
Regards
Phil Reid

Re: [PATCH] ethtool: add one ethtool option to set relax ordering mode

2016-12-22 Thread Jeff Kirsher

On Fri, 2016-12-23 at 00:40 +, maowenan wrote:
> > -Original Message-
> > From: Alexander Duyck [mailto:alexander.du...@gmail.com]
> > Sent: Thursday, December 22, 2016 11:54 PM
> > To: maowenan
> > Cc: Stephen Hemminger; netdev@vger.kernel.org; jeffrey.t.kirsher@intel.
> > com;
> > weiyongjun (A); Dingtianhong
> > Subject: Re: [PATCH] ethtool: add one ethtool option to set relax
> > ordering mode
> > 
> > On Wed, Dec 21, 2016 at 5:39 PM, maowenan 
> > wrote:
> > > 
> > > 
> > > > -Original Message-
> > > > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > > > Sent: Thursday, December 22, 2016 9:28 AM
> > > > To: maowenan
> > > > Cc: netdev@vger.kernel.org; jeffrey.t.kirs...@intel.com
> > > > Subject: Re: [PATCH] ethtool: add one ethtool option to set relax
> > > > ordering mode
> > > > 
> > > > On Thu, 8 Dec 2016 14:51:38 +0800
> > > > Mao Wenan  wrote:
> > > > 
> > > > > This patch provides one way to set/unset IXGBE NIC TX and RX
> > > > > relax
> > > > > ordering mode, which can be set by ethtool.
> > > > > Relax ordering is one mode of 82599 NIC, to enable this mode can
> > > > > enhance the performance for some cpu architecure.
> > > > 
> > > > Then it should be done by CPU architecture specific quirks
> > > > (preferably in PCI
> > > > layer) so that all users get the option without having to do manual
> > 
> > intervention.
> > > > 
> > > > > example:
> > > > > ethtool -s enp1s0f0 relaxorder off
> > > > > ethtool -s enp1s0f0 relaxorder on
> > > > 
> > > > Doing it via ethtool is a developer API (for testing) not something
> > > > that makes sense in production.
> > > 
> > > 
> > > This feature is not mandatory for all users, acturally relax ordering
> > > default configuration of 82599 is 'disable', So this patch gives one
> > > way to
> > 
> > enable relax ordering to be selected in some performance condition.
> > 
> > That isn't quite true.  The default for Sparc systems is to have it
> > enabled.
> > 
> > Really this is something that is platform specific.  I agree with
> > Stephen that it
> > would work better if this was handled as a series of platform specific
> > quirks
> > handled at something like the PCI layer rather than be a switch the
> > user can
> > toggle on and off.
> > 
> > With that being said there are changes being made that should help to
> > improve
> > the situation.  Specifically I am looking at adding support for the
> > DMA_ATTR_WEAK_ORDERING which may also allow us to identify cases where
> > you might be able to specify the DMA behavior via the DMA mapping
> > instead of
> > having to make the final decision in the device itself.
> > 
> > - Alex
> 
> Yes, Sparc is a special case. From the NIC driver point of view, It is no
> need for 
> some ARCHs to do particular operation and compiling branch, ethtool is a
> flexible 
> method for user to make decision whether on|off this feature.
> I think Jeff as maintainer of 82599 has some comments about this.

My original comment/objection was that you attempted to do this change as a
module parameter to the ixgbe driver, where I directed you to use ethtool 
so that other drivers could benefit from the ability to enable/disable
relaxed ordering.  As far as how it gets implemented in ethtool or PCI
layer, makes little difference to me, I only had issues with the driver
specific module parameter implementation, which is not acceptable.

signature.asc
Description: This is a digitally signed message part

[PATCH] brcmfmac: fix spelling mistakes on "Ivalid"

2016-12-22 Thread Colin King

From: Colin Ian King 

Trivial fixes to spelling mistake "Ivalid" to "Invalid" in
brcmf_err error messages.

Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index 7ffc4ab..15eaf72 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -3971,7 +3971,7 @@ brcmf_configure_wpaie(struct brcmf_if *ifp,
pval |= AES_ENABLED;
break;
default:
-   brcmf_err("Ivalid unicast security info\n");
+   brcmf_err("Invalid unicast security info\n");
}
offset++;
}
@@ -4015,7 +4015,7 @@ brcmf_configure_wpaie(struct brcmf_if *ifp,
wpa_auth |= WPA2_AUTH_1X_SHA256;
break;
default:
-   brcmf_err("Ivalid key mgmt info\n");
+   brcmf_err("Invalid key mgmt info\n");
}
offset++;
}
-- 
2.10.2

RE: [PATCH] ethtool: add one ethtool option to set relax ordering mode

2016-12-22 Thread maowenan



> -Original Message-
> From: Alexander Duyck [mailto:alexander.du...@gmail.com]
> Sent: Thursday, December 22, 2016 11:54 PM
> To: maowenan
> Cc: Stephen Hemminger; netdev@vger.kernel.org; jeffrey.t.kirs...@intel.com;
> weiyongjun (A); Dingtianhong
> Subject: Re: [PATCH] ethtool: add one ethtool option to set relax ordering 
> mode
> 
> On Wed, Dec 21, 2016 at 5:39 PM, maowenan 
> wrote:
> >
> >
> >> -Original Message-
> >> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> >> Sent: Thursday, December 22, 2016 9:28 AM
> >> To: maowenan
> >> Cc: netdev@vger.kernel.org; jeffrey.t.kirs...@intel.com
> >> Subject: Re: [PATCH] ethtool: add one ethtool option to set relax
> >> ordering mode
> >>
> >> On Thu, 8 Dec 2016 14:51:38 +0800
> >> Mao Wenan  wrote:
> >>
> >> > This patch provides one way to set/unset IXGBE NIC TX and RX relax
> >> > ordering mode, which can be set by ethtool.
> >> > Relax ordering is one mode of 82599 NIC, to enable this mode can
> >> > enhance the performance for some cpu architecure.
> >>
> >> Then it should be done by CPU architecture specific quirks
> >> (preferably in PCI
> >> layer) so that all users get the option without having to do manual
> intervention.
> >>
> >> > example:
> >> > ethtool -s enp1s0f0 relaxorder off
> >> > ethtool -s enp1s0f0 relaxorder on
> >>
> >> Doing it via ethtool is a developer API (for testing) not something
> >> that makes sense in production.
> >
> >
> > This feature is not mandatory for all users, acturally relax ordering
> > default configuration of 82599 is 'disable', So this patch gives one way to
> enable relax ordering to be selected in some performance condition.
> 
> That isn't quite true.  The default for Sparc systems is to have it enabled.
> 
> Really this is something that is platform specific.  I agree with Stephen 
> that it
> would work better if this was handled as a series of platform specific quirks
> handled at something like the PCI layer rather than be a switch the user can
> toggle on and off.
> 
> With that being said there are changes being made that should help to improve
> the situation.  Specifically I am looking at adding support for the
> DMA_ATTR_WEAK_ORDERING which may also allow us to identify cases where
> you might be able to specify the DMA behavior via the DMA mapping instead of
> having to make the final decision in the device itself.
> 
> - Alex

Yes, Sparc is a special case. From the NIC driver point of view, It is no need 
for 
some ARCHs to do particular operation and compiling branch, ethtool is a 
flexible 
method for user to make decision whether on|off this feature.
I think Jeff as maintainer of 82599 has some comments about this.
-Wenan

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-22 Thread Daniel Borkmann


On 12/22/2016 08:05 PM, Cong Wang wrote:

On Wed, Dec 21, 2016 at 1:07 PM, Daniel Borkmann  wrote:


Ok, you mean for net. In that case I prefer the smaller sized fix to be
honest. It also covers everything from the point where we fetch the chain
via cops->tcf_chain() to the end of the function, which is where most of
the complexity resides, and only the two mentioned commits do the relock,


I really wish the problem is only about relocking, but look at the code,
the deeper reason why we have this bug is the complexity of the logic
inside tc_ctl_tfilter(): 1) the replay logic is hard, we have to make it
idempotent; 2) the request logic itself is hard, because of tc filter design
and implementation.

This is why I worry more than just relocking.


But do you have a concrete 2nd issue/bug you're seeing? It rather sounds to
me your argument is more about fear of complexity on tc framework itself.
I agree it's complex, and tc_ctl_tfilter() is quite big in itself, where it
would be good to reduce it's complexity into smaller pieces. But it's not
really related to the fix itself, reducing complexity requires significantly
more and deeper work on the code. We can rework tc_ctl_tfilter() in net-next
to try to simplify it, sure, but I don't get why we have to discuss so much
on this matter in this context, really.


so as a fix I think it's fine as-is. As mentioned, if there's need to
refactor tc_ctl_tfilter() net-next would be better, imho.


Refactor is a too strong word, just moving the replay out is not a refactor.
The end result will be still smaller than your commit d936377414fadbafb4,
which is already backported to stable.


Commit d936377414fa is a whole different thing, and not related to the
topic at all. The two-line changes with the initialization fix is quite
straight forward and fixes a bug in the logic. It's also less complex in
terms of lines but also in terms of complexity than to refactor or move the
replay out. Moving it out contextually means that you also wouldn't rule
out that things like nlmsg_parse(), or in-fact *any* of the __tc_ctl_tfilter()
return paths could potentially return an error that suddenly would require
a replay of the request. So, while moving it out fixes the bug, too, it also
adds more potential points where you would go and replay the request where
you didn't do so before and therefore also adds more complexity to the code
that needs review/audit when fixing bugs since you don't have these hard/direct
return paths anymore. Therefore I don't think it's better to go that way
for the fix.

Thanks,
Daniel

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin

Hannes Frederic Sowa wrote:
> A lockdep test should still be done. ;)

Adding might_lock() annotations will improve coverage a lot.

> Yes, that does look nice indeed. Accounting for bits instead of bytes
> shouldn't be a huge problem either. Maybe it gets a bit more verbose in
> case you can't satisfy a request with one batched entropy block and have
> to consume randomness from two.

The bit granularity is also for the callers' convenience, so they don't
have to mask again.  Whether get_random_bits rounds up to byte boundaries
internally or not is something else.

When the current batch runs low, I was actually thinking of throwing
away the remaining bits and computing a new batch of 512.  But it's
whatever works best at implementation time.

>>> It could only mix the output back in every two calls, in which case
>>> you can backtrack up to one call but you need to do 2^128 work to
>>> backtrack farther.  But yes, this is getting excessively complicated.
> 
>> No, if you're willing to accept limited backtrack, this is a perfectly
>> acceptable solution, and not too complicated.  You could do it phase-less
>> if you like; store the previous output, then after generating the new
>> one, mix in both.  Then overwrite the previous output.  (But doing two
>> rounds of a crypto primtive to avoid one conditional jump is stupid,
>> so forget that.)

> Can you quickly explain why we lose the backtracking capability?

Sure.  An RNG is (state[i], output[i]) = f(state[i-1]).  The goal of
backtracking is to compute output[i], or better yet state[i-1], given
state[i].

For example, consider an OFB or CTR mode generator.  The state is a key
and and IV, and you encrypt the IV with the key to produce output, then
either replace the IV with the output, or increment it.  Either way,
since you still have the key, you can invert the transformation and
recover the previous IV.

The standard way around this is to use the Davies-Meyer construction:

IV[i] = IV[i-1] + E(IV[i-1], key)

This is the standard way to make a non-invertible random function
out of an invertible random permutation.

>From the sum, there's no easy way to find the ciphertext *or* the
plaintext that was encrypted.  Assuming the encryption is secure,
the only way to reverse it is brute force: guess IV[i-1] and run the
operation forward to see if the resultant IV[i] matches.

There are a variety of ways to organize this computation, since the
guess gives toy both IV[i-1] and E(IV[i-1], key) = IV[i] - IV[i-1], including
running E forward, backward, or starting from both ends to see if you
meet in the middle.

The way you add the encryption output to the IV is not very important.
It can be addition, xor, or some more complex invertible transformation.
In the case of SipHash, the "encryption" output is smaller than the
input, so we have to get a bit more creative, but it's still basically
the same thing.

The problem is that the output which is combined with the IV is too small.
With only 64 bits, trying all possible values is practical.  (The world's
Bitcoin miners are collectively computing SHA-256(SHA-256(input)) 1.7 * 2^64
times per second.)

By basically doing two iterations at once and mixing in 128 bits of
output, the guessing attack is rendered impractical.  The only downside
is that you need to remember and store one result between when it's
computed and last used.  This is part of the state, so an attack can
find output[i-1], but not anything farther back.

> ChaCha as a block cipher gives a "perfect" permutation from the output
> of either the CRNG or the CPRNG, which actually itself has backtracking
> protection.

I'm not quite understanding.  The /dev/random implementation uses some
of the ChaCha output as a new ChaCha key (that's another way to mix output
back into the state) to prevent backtracking.  But this slows it down, and
again if you want to be efficient, you're generating and storing large batches
of entropy and storing it in the RNG state.

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-22 Thread Daniel Borkmann

On 12/22/2016 06:50 PM, John Fastabend wrote:

On 16-12-22 08:53 AM, David Miller wrote:

From: Daniel Borkmann 
Date: Wed, 21 Dec 2016 22:07:48 +0100

Ok, you mean for net. In that case I prefer the smaller sized fix to
be honest. It also covers everything from the point where we fetch
the chain via cops->tcf_chain() to the end of the function, which is
where most of the complexity resides, and only the two mentioned
commits do the relock, so as a fix I think it's fine as-is. As
mentioned, if there's need to refactor tc_ctl_tfilter() net-next
would be better, imho.

Please, can you two work towards an agreement as to what fix should
go in at this time?

Thanks.

Thanks for fixing this!

I have a slight preference to push this patch into net as its been
tested already by Shahar and is not particularly invasive. Then for
net-next do a larger cleanup to get rid of some of the complexity per
Cong's suggestion.

My preference as well, thanks.

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-22 Thread Daniel Borkmann


On 12/22/2016 02:16 PM, Shahar Klein wrote:

On 12/21/2016 7:04 PM, Daniel Borkmann wrote:

Shahar reported a soft lockup in tc_classify(), where we run into an
endless loop when walking the classifier chain due to tp->next == tp
which is a state we should never run into. The issue only seems to
trigger under load in the tc control path.

What happens is that in tc_ctl_tfilter(), thread A allocates a new
tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
with it. In that classifier callback we had to unlock/lock the rtnl
mutex and returned with -EAGAIN. One reason why we need to drop there
is, for example, that we need to request an action module to be loaded.

This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
after we loaded and found the requested action, we need to redo the
whole request so we don't race against others. While we had to unlock
rtnl in that time, thread B's request was processed next on that CPU.
Thread B added a new tp instance successfully to the classifier chain.
When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
and destroying its tp instance which never got linked, we goto replay
and redo A's request.

This time when walking the classifier chain in tc_ctl_tfilter() for
checking for existing tp instances we had a priority match and found
the tp instance that was created and linked by thread B. Now calling
again into tp->ops->change() with that tp was successful and returned
without error.

tp_created was never cleared in the second round, thus kernel thinks
that we need to link it into the classifier chain (once again). tp and
*back point to the same object due to the match we had earlier on. Thus
for thread B's already public tp, we reset tp->next to tp itself and
link it into the chain, which eventually causes the mentioned endless
loop in tc_classify() once a packet hits the data path.

Fix is to clear tp_created at the beginning of each request, also when
we replay it. On the paths that can cause -EAGAIN we already destroy
the original tp instance we had and on replay we really need to start
from scratch. It seems that this issue was first introduced in commit
12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
and avoid kernel panic when we use cls_cgroup").

Fixes: 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and 
avoid kernel panic when we use cls_cgroup")
Reported-by: Shahar Klein 
Signed-off-by: Daniel Borkmann 
Cc: Cong Wang 

[...]

I've tested this specific patch (without cong's patch and without the many 
prints) many times and in many test permutations today and it didn't lock

Thanks again Daniel!


Just catching up with email (traveling whole day), thanks a bunch for
your effort, Shahar! In that case lets then add:

Tested-by: Shahar Klein

[PATCH net] inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets

2016-12-22 Thread Willem de Bruijn

From: Willem de Bruijn 

Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within
the packet. For sockets that have transport headers pulled, transport
offset can be negative. Use signed comparison to avoid overflow.

Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Reported-by: Nisar Jagabar 
Signed-off-by: Willem de Bruijn 
---
 net/ipv4/ip_sockglue.c | 2 +-
 net/ipv6/datagram.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 8b13881..9760734 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -148,7 +148,7 @@ static void ip_cmsg_recv_dstaddr(struct msghdr *msg, struct 
sk_buff *skb)
const struct iphdr *iph = ip_hdr(skb);
__be16 *ports = (__be16 *)skb_transport_header(skb);
 
-   if (skb_transport_offset(skb) + 4 > skb->len)
+   if (skb_transport_offset(skb) + 4 > (int)skb->len)
return;
 
/* All current transport protocols have the port numbers in the
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 0489e19..1407426 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -701,7 +701,7 @@ void ip6_datagram_recv_specific_ctl(struct sock *sk, struct 
msghdr *msg,
struct sockaddr_in6 sin6;
__be16 *ports = (__be16 *) skb_transport_header(skb);
 
-   if (skb_transport_offset(skb) + 4 <= skb->len) {
+   if (skb_transport_offset(skb) + 4 <= (int)skb->len) {
/* All current transport protocols have the port 
numbers in the
 * first four bytes of the transport header and this 
function is
 * written with this assumption in mind.
-- 
2.8.0.rc3.226.g39d4020

Re: [PATCH] Fixed status entry in m_can documentation

2016-12-22 Thread Rob Herring

On Thu, Dec 22, 2016 at 11:45:21AM +0100, Vyacheslav V. Yurkov wrote:
> Use valid value for 'enabled' in status field
> 
> Signed-off-by: Vyacheslav V. Yurkov 
> ---
>  Documentation/devicetree/bindings/net/can/m_can.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/can/m_can.txt 
> b/Documentation/devicetree/bindings/net/can/m_can.txt
> index 9e33177..5facaf5 100644
> --- a/Documentation/devicetree/bindings/net/can/m_can.txt
> +++ b/Documentation/devicetree/bindings/net/can/m_can.txt
> @@ -63,5 +63,5 @@ Board dts:
>  &m_can1 {
>   pinctrl-names = "default";
>   pinctrl-0 = <&pinctrl_m_can1>;
> - status = "enabled";
> + status = "okay";

Examples don't need to have status prop. Just remove.

Rob

Re: [PATCH v3 2/3] NFC: trf7970a: Add device tree option of 1.8 Volt IO voltage

2016-12-22 Thread Rob Herring

On Wed, Dec 21, 2016 at 11:18:33PM -0500, Geoff Lansberry wrote:
> The TRF7970A has configuration options for supporting hardware designs
> with 1.8 Volt or 3.3 Volt IO.   This commit adds a device tree option,
> using a fixed regulator binding, for setting the io voltage to match
> the hardware configuration. If no option is supplied it defaults to
> 3.3 volt configuration.
> 
> Signed-off-by: Geoff Lansberry 
> ---
>  .../devicetree/bindings/net/nfc/trf7970a.txt   |  2 ++

Acked-by: Rob Herring 

>  drivers/nfc/trf7970a.c | 26 
> +-
>  2 files changed, 27 insertions(+), 1 deletion(-)

Re: [PATCH v3 1/3] NFC: trf7970a: add device tree option for 27MHz clock

2016-12-22 Thread Rob Herring

On Wed, Dec 21, 2016 at 11:18:32PM -0500, Geoff Lansberry wrote:
> The TRF7970A has configuration options to support hardware designs
> which use a 27.12MHz clock. This commit adds a device tree option
> 'clock-frequency' to support configuring the this chip for default
> 13.56MHz clock or the optional 27.12MHz clock.
> 
> Signed-off-by: Geoff Lansberry 
> ---
>  .../devicetree/bindings/net/nfc/trf7970a.txt   |  2 +

Acked-by: Rob Herring 

>  drivers/nfc/trf7970a.c | 50 
> +-
>  2 files changed, 41 insertions(+), 11 deletions(-)

Re: [PATCH 6/6 net-next] inet: reset tb->fastreuseport when adding a reuseport sk

2016-12-22 Thread Eric Dumazet

On Thu, 2016-12-22 at 16:26 -0500, Josef Bacik wrote:
> If we have non reuseport sockets on a tb we will set tb->fastreuseport to 0 
> and
> never set it again.  Which means that in the future if we end up adding a 
> bunch
> of reuseport sk's to that tb we'll have to do the expensive scan every time.
> Instead add a sock_common to the tb so we know what reuseport sk succeeded 
> last.
> Once one sk has made it onto the list we know that there are no potential bind
> conflicts on the owners list that match that sk's rcv_addr.  So copy the sk's
> common into our tb->fastsock and set tb->fastruseport to FASTREUSESOCK_STRICT 
> so
> we know we have to do an extra check for subsequent reuseport sockets and skip
> the expensive bind conflict check.
> 
> Signed-off-by: Josef Bacik 
> ---
>  include/net/inet_hashtables.h   |  4 
>  net/ipv4/inet_connection_sock.c | 53 
> +
>  2 files changed, 53 insertions(+), 4 deletions(-)
> 
> diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
> index 756ed16..4ccc18f 100644
> --- a/include/net/inet_hashtables.h
> +++ b/include/net/inet_hashtables.h
> @@ -74,12 +74,16 @@ struct inet_ehash_bucket {
>   * users logged onto your box, isn't it nice to know that new data
>   * ports are created in O(1) time?  I thought so. ;-)-DaveM
>   */
> +#define FASTREUSEPORT_ANY1
> +#define FASTREUSEPORT_STRICT 2
> +
>  struct inet_bind_bucket {
>   possible_net_t  ib_net;
>   unsigned short  port;
>   signed char fastreuse;
>   signed char fastreuseport;
>   kuid_t  fastuid;
> + struct sock_common  fastsock;
>   int num_owners;
>   struct hlist_node   node;
>   struct hlist_head   owners;

Please place this fat field at the end of inet_bind_bucket

Many sockets do not use SO_REUSEPORT and should not use this field,
while tb->owners need to be touched.

Re: [PATCH 3/6 net-next] inet: kill smallest_size and smallest_port

2016-12-22 Thread Eric Dumazet

On Thu, 2016-12-22 at 16:26 -0500, Josef Bacik wrote:
> In inet_csk_get_port we seem to be using smallest_port to figure out where the
> best place to look for a SO_REUSEPORT sk that matches with an existing set of
> SO_REUSEPORT's.  However if we get to the logic
> 
> if (smallest_size != -1) {
>   port = smallest_port;
>   goto have_port;
> }
> 
> we will do a useless search, because we would have already done the
> inet_csk_bind_conflict for that port and it would have returned 1, otherwise 
> we
> would have gone to found_tb and succeeded.  Since this logic makes us do yet
> another trip through inet_csk_bind_conflict for a port we know won't work just
> delete this code and save us the time.
> 
> Signed-off-by: Josef Bacik 

Please remove tb->need_owners ;)

Re: [PATCH v4 2/3] Bluetooth: btusb: Add out-of-band wakeup support

2016-12-22 Thread Rob Herring

On Wed, Dec 21, 2016 at 12:33:51PM -0800, Rajat Jain wrote:
> Some onboard BT chips (e.g. Marvell 8997) contain a wakeup pin that
> can be connected to a gpio on the CPU side, and can be used to wakeup
> the host out-of-band. This can be useful in situations where the
> in-band wakeup is not possible or not preferable (e.g. the in-band
> wakeup may require the USB host controller to remain active, and
> hence consuming more system power during system sleep).
> 
> The oob gpio interrupt to be used for wakeup on the CPU side, is
> read from the device tree node, (using standard interrupt descriptors).
> A devcie tree binding document is also added for the driver. The
> compatible string is in compliance with
> Documentation/devicetree/bindings/usb/usb-device.txt
> 
> Signed-off-by: Rajat Jain 
> Reviewed-by: Brian Norris 
> ---
> v4: Move the set_bit(BTUSB_OOB_WAKE_DISABLED,..) call to the beginning of
> btusb_config_oob_wake() - caught by Brian.
> v3: Add Brian's "Reviewed-by"
> v2: * Use interrupt-names ("wakeup") instead of assuming first interrupt.
> * Leave it on device tree to specify IRQ flags (level /edge triggered)
> * Mark the device as non wakeable on exit.
> 
>  Documentation/devicetree/bindings/net/btusb.txt | 40 

Acked-by: Rob Herring 

>  drivers/bluetooth/btusb.c   | 84 
> +
>  2 files changed, 124 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/btusb.txt

Re: [PATCH net-next 02/10] net: netcp: ethss: add support of 10gbe pcsr link status

2016-12-22 Thread Rob Herring

On Tue, Dec 20, 2016 at 05:09:45PM -0500, Murali Karicheri wrote:
> From: WingMan Kwok 
> 
> The 10GBASE-R Physical Coding Sublayer (PCS-R) module provides
> functionality of a physical coding sublayer (PCS) on data being
> transferred between a demuxed XGMII and SerDes supporting a 16
> or 32 bit interface.  From the driver point of view, whether
> a ethernet link is up or not depends also on the status of the
> block-lock bit of the PCSR.  This patch adds the checking of that
> bit in order to determine the link status.

I would think this would be a common thing and the phy driver should 
provide the status, rather than trying to give the ethernet driver 
direct access to the phy registers. Is the PCSR the serdes phy or 
registers in addition to that?

Rob

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread Hannes Frederic Sowa

On 22.12.2016 22:11, George Spelvin wrote:
>> I do tend to like Ted's version in which we use batched
>> get_random_bytes() output.  If it's fast enough, it's simpler and lets
>> us get the full strength of a CSPRNG.
> 
> With the ChaCha20 generator, that's fine, although note that this abandons
> anti-backtracking entirely.
> 
> It also takes locks, something the previous get_random_int code
> path avoided.  Do we need to audit the call sites to ensure that's safe?

We have spin_lock_irq* locks on the way. Of course they can hurt in when
contended. The situation should be the same as with get_random_bytes,
callable from every possible situation in the kernel without fear, so
this should also work for get_random_int. A lockdep test should still be
done. ;)

> And there is the issue that the existing callers assume that there's a
> fixed cost per word.  A good half of get_random_long calls are followed by
> "& ~PAGE_MASK" to extract the low 12 bits.  Or "& ((1ul << mmap_rnd_bits)
> - 1)" to extract the low 28.  If we have a buffer we're going to have to
> pay to refill, it would be nice to use less than 8 bytes to satisfy those.
> 
> But that can be a followup patch.  I'm thinking
> 
> unsigned long get_random_bits(unsigned bits)
>   E.g. get_random_bits(PAGE_SHIFT),
>get_random_bits(mmap_rnd_bits),
>   u32 imm_rnd = get_random_bits(32)
> 
> unsigned get_random_mod(unsigned modulus)
>   E.g. get_random_mod(hole) & ~(alignment - 1);
>get_random_mod(port_scan_backoff)
>   (Althogh probably drivers/s390/scsi/zfcp_fc.c should be changed
>   to prandom.)
> 
> with, until the audit is completed:
> #define get_random_int() get_random_bits(32)
> #define get_random_long() get_random_bits(BITS_PER_LONG)

Yes, that does look nice indeed. Accounting for bits instead of bytes
shouldn't be a huge problem either. Maybe it gets a bit more verbose in
case you can't satisfy a request with one batched entropy block and have
to consume randomness from two.

>> It could only mix the output back in every two calls, in which case
>> you can backtrack up to one call but you need to do 2^128 work to
>> backtrack farther.  But yes, this is getting excessively complicated.
> 
> No, if you're willing to accept limited backtrack, this is a perfectly
> acceptable solution, and not too complicated.  You could do it phase-less
> if you like; store the previous output, then after generating the new
> one, mix in both.  Then overwrite the previous output.  (But doing two
> rounds of a crypto primtive to avoid one conditional jump is stupid,
> so forget that.)

Can you quickly explain why we lose the backtracking capability?

ChaCha as a block cipher gives a "perfect" permutation from the output
of either the CRNG or the CPRNG, which actually itself has backtracking
protection.

Thanks for explaining,
Hannes

[PATCH 4/6 net-next] inet: don't check for bind conflicts twice when searching for a port

2016-12-22 Thread Josef Bacik

This is just wasted time, we've already found a tb that doesn't have a bind
conflict, and we don't drop the head lock so scanning again isn't going to give
us a different answer.  Instead move the tb->reuse setting logic outside of the
found_tb path and put it in the success: path.  Then make it so that we don't
goto again if we find a bind conflict in the found_tb path as we won't reach
this anymore when we are scanning for an ephemeral port.

Signed-off-by: Josef Bacik 
---
 net/ipv4/inet_connection_sock.c | 33 +
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index d352366..f6a34bc 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -164,7 +164,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 {
bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
-   int ret = 1, attempts = 5, port = snum;
+   int ret = 1, port = snum;
struct inet_bind_hashbucket *head;
struct net *net = sock_net(sk);
int i, low, high, attempt_half;
@@ -183,7 +183,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 
goto tb_not_found;
}
-again:
attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
 other_half_scan:
inet_get_local_port_range(net, &low, &high);
@@ -220,8 +219,10 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
spin_lock_bh(&head->lock);
inet_bind_bucket_for_each(tb, &head->chain)
if (net_eq(ib_net(tb), net) && tb->port == port) {
+   if (hlist_empty(&tb->owners))
+   goto success;
if (!inet_csk_bind_conflict(sk, tb, false, 
reuseport_ok))
-   goto tb_found;
+   goto success;
goto next_port;
}
goto tb_not_found;
@@ -256,23 +257,11 @@ int inet_csk_get_port(struct sock *sk, unsigned short 
snum)
  !rcu_access_pointer(sk->sk_reuseport_cb) &&
  sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
goto success;
-   if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) {
-   if ((reuse ||
-(tb->fastreuseport > 0 &&
- sk->sk_reuseport &&
- !rcu_access_pointer(sk->sk_reuseport_cb) &&
- uid_eq(tb->fastuid, uid))) && !snum &&
-   --attempts >= 0) {
-   spin_unlock_bh(&head->lock);
-   goto again;
-   }
+   if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok))
goto fail_unlock;
-   }
-   if (!reuse)
-   tb->fastreuse = 0;
-   if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid))
-   tb->fastreuseport = 0;
-   } else {
+   }
+success:
+   if (!hlist_empty(&tb->owners)) {
tb->fastreuse = reuse;
if (sk->sk_reuseport) {
tb->fastreuseport = 1;
@@ -280,8 +269,12 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
} else {
tb->fastreuseport = 0;
}
+   } else {
+   if (!reuse)
+   tb->fastreuse = 0;
+   if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid))
+   tb->fastreuseport = 0;
}
-success:
if (!inet_csk(sk)->icsk_bind_hash)
inet_bind_hash(sk, tb, port);
WARN_ON(inet_csk(sk)->icsk_bind_hash != tb);
-- 
2.5.5

[PATCH 3/6 net-next] inet: kill smallest_size and smallest_port

2016-12-22 Thread Josef Bacik

In inet_csk_get_port we seem to be using smallest_port to figure out where the
best place to look for a SO_REUSEPORT sk that matches with an existing set of
SO_REUSEPORT's.  However if we get to the logic

if (smallest_size != -1) {
port = smallest_port;
goto have_port;
}

we will do a useless search, because we would have already done the
inet_csk_bind_conflict for that port and it would have returned 1, otherwise we
would have gone to found_tb and succeeded.  Since this logic makes us do yet
another trip through inet_csk_bind_conflict for a port we know won't work just
delete this code and save us the time.

Signed-off-by: Josef Bacik 
---
 net/ipv4/inet_connection_sock.c | 26 --
 1 file changed, 4 insertions(+), 22 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index a1c9055..d352366 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -165,7 +165,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
int ret = 1, attempts = 5, port = snum;
-   int smallest_size = -1, smallest_port;
struct inet_bind_hashbucket *head;
struct net *net = sock_net(sk);
int i, low, high, attempt_half;
@@ -175,7 +174,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
bool reuseport_ok = !!snum;
 
if (port) {
-have_port:
head = &hinfo->bhash[inet_bhashfn(net, port,
  hinfo->bhash_size)];
spin_lock_bh(&head->lock);
@@ -209,8 +207,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 * We do the opposite to not pollute connect() users.
 */
offset |= 1U;
-   smallest_size = -1;
-   smallest_port = low; /* avoid compiler warning */
 
 other_parity_scan:
port = low + offset;
@@ -224,15 +220,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
spin_lock_bh(&head->lock);
inet_bind_bucket_for_each(tb, &head->chain)
if (net_eq(ib_net(tb), net) && tb->port == port) {
-   if (((tb->fastreuse > 0 && reuse) ||
-(tb->fastreuseport > 0 &&
- sk->sk_reuseport &&
- !rcu_access_pointer(sk->sk_reuseport_cb) 
&&
- uid_eq(tb->fastuid, uid))) &&
-   (tb->num_owners < smallest_size || 
smallest_size == -1)) {
-   smallest_size = tb->num_owners;
-   smallest_port = port;
-   }
if (!inet_csk_bind_conflict(sk, tb, false, 
reuseport_ok))
goto tb_found;
goto next_port;
@@ -243,10 +230,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
cond_resched();
}
 
-   if (smallest_size != -1) {
-   port = smallest_port;
-   goto have_port;
-   }
offset--;
if (!(offset & 1))
goto other_parity_scan;
@@ -268,19 +251,18 @@ int inet_csk_get_port(struct sock *sk, unsigned short 
snum)
if (sk->sk_reuse == SK_FORCE_REUSE)
goto success;
 
-   if (((tb->fastreuse > 0 && reuse) ||
+   if ((tb->fastreuse > 0 && reuse) ||
 (tb->fastreuseport > 0 &&
  !rcu_access_pointer(sk->sk_reuseport_cb) &&
- sk->sk_reuseport && uid_eq(tb->fastuid, uid))) &&
-   smallest_size == -1)
+ sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
goto success;
if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) {
if ((reuse ||
 (tb->fastreuseport > 0 &&
  sk->sk_reuseport &&
  !rcu_access_pointer(sk->sk_reuseport_cb) &&
- uid_eq(tb->fastuid, uid))) &&
-   !snum && smallest_size != -1 && --attempts >= 0) {
+ uid_eq(tb->fastuid, uid))) && !snum &&
+   --attempts >= 0) {
spin_unlock_bh(&head->lock);
goto again;
}
-- 
2.5.5

[PATCH 1/6 net-next] inet: collapse ipv4/v6 rcv_saddr_equal functions into one

2016-12-22 Thread Josef Bacik

We pass these per-protocol equal functions around in various places, but
we can just have one function that checks the sk->sk_family and then do
the right comparison function.  I've also changed the ipv4 version to
not cast to inet_sock since it is unneeded.

Signed-off-by: Josef Bacik 
---
 include/net/addrconf.h   |  4 +--
 include/net/inet_hashtables.h|  5 +--
 include/net/udp.h|  1 -
 net/ipv4/inet_connection_sock.c  | 72 
 net/ipv4/inet_hashtables.c   | 16 +++--
 net/ipv4/udp.c   | 58 +++-
 net/ipv6/inet6_connection_sock.c |  4 +--
 net/ipv6/inet6_hashtables.c  | 46 +
 net/ipv6/udp.c   |  2 +-
 9 files changed, 95 insertions(+), 113 deletions(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 8f998af..17c6fd8 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -88,9 +88,7 @@ int __ipv6_get_lladdr(struct inet6_dev *idev, struct in6_addr 
*addr,
  u32 banned_flags);
 int ipv6_get_lladdr(struct net_device *dev, struct in6_addr *addr,
u32 banned_flags);
-int ipv4_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2,
-bool match_wildcard);
-int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2,
+int inet_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2,
 bool match_wildcard);
 void addrconf_join_solict(struct net_device *dev, const struct in6_addr *addr);
 void addrconf_leave_solict(struct inet6_dev *idev, const struct in6_addr 
*addr);
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 0574493..756ed16 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -203,10 +203,7 @@ void inet_hashinfo_init(struct inet_hashinfo *h);
 
 bool inet_ehash_insert(struct sock *sk, struct sock *osk);
 bool inet_ehash_nolisten(struct sock *sk, struct sock *osk);
-int __inet_hash(struct sock *sk, struct sock *osk,
-   int (*saddr_same)(const struct sock *sk1,
- const struct sock *sk2,
- bool match_wildcard));
+int __inet_hash(struct sock *sk, struct sock *osk);
 int inet_hash(struct sock *sk);
 void inet_unhash(struct sock *sk);
 
diff --git a/include/net/udp.h b/include/net/udp.h
index 1661791..c9d8b8e 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -204,7 +204,6 @@ static inline void udp_lib_close(struct sock *sk, long 
timeout)
 }
 
 int udp_lib_get_port(struct sock *sk, unsigned short snum,
-int (*)(const struct sock *, const struct sock *, bool),
 unsigned int hash2_nulladdr);
 
 u32 udp_flow_hashrnd(void);
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 19ea045..ba597cb 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -31,6 +31,78 @@ const char inet_csk_timer_bug_msg[] = "inet_csk BUG: unknown 
timer value\n";
 EXPORT_SYMBOL(inet_csk_timer_bug_msg);
 #endif
 
+#if IS_ENABLED(CONFIG_IPV6)
+/* match_wildcard == true:  IPV6_ADDR_ANY equals to any IPv6 addresses if IPv6
+ *  only, and any IPv4 addresses if not IPv6 only
+ * match_wildcard == false: addresses must be exactly the same, i.e.
+ *  IPV6_ADDR_ANY only equals to IPV6_ADDR_ANY,
+ *  and 0.0.0.0 equals to 0.0.0.0 only
+ */
+static int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2,
+   bool match_wildcard)
+{
+   const struct in6_addr *sk2_rcv_saddr6 = inet6_rcv_saddr(sk2);
+   int sk2_ipv6only = inet_v6_ipv6only(sk2);
+   int addr_type = ipv6_addr_type(&sk->sk_v6_rcv_saddr);
+   int addr_type2 = sk2_rcv_saddr6 ? ipv6_addr_type(sk2_rcv_saddr6) : 
IPV6_ADDR_MAPPED;
+
+   /* if both are mapped, treat as IPv4 */
+   if (addr_type == IPV6_ADDR_MAPPED && addr_type2 == IPV6_ADDR_MAPPED) {
+   if (!sk2_ipv6only) {
+   if (sk->sk_rcv_saddr == sk2->sk_rcv_saddr)
+   return 1;
+   if (!sk->sk_rcv_saddr || !sk2->sk_rcv_saddr)
+   return match_wildcard;
+   }
+   return 0;
+   }
+
+   if (addr_type == IPV6_ADDR_ANY && addr_type2 == IPV6_ADDR_ANY)
+   return 1;
+
+   if (addr_type2 == IPV6_ADDR_ANY && match_wildcard &&
+   !(sk2_ipv6only && addr_type == IPV6_ADDR_MAPPED))
+   return 1;
+
+   if (addr_type == IPV6_ADDR_ANY && match_wildcard &&
+   !(ipv6_only_sock(sk) && addr_type2 == IPV6_ADDR_MAPPED))
+   return 1;
+
+   if (sk2_rcv_saddr6 &&
+   ipv6_addr_equal(&sk->sk_v6_rcv_saddr, sk2_rcv_saddr6))
+   return 1;
+
+   return 0;
+}
+#end

[PATCH 6/6 net-next] inet: reset tb->fastreuseport when adding a reuseport sk

2016-12-22 Thread Josef Bacik

If we have non reuseport sockets on a tb we will set tb->fastreuseport to 0 and
never set it again.  Which means that in the future if we end up adding a bunch
of reuseport sk's to that tb we'll have to do the expensive scan every time.
Instead add a sock_common to the tb so we know what reuseport sk succeeded last.
Once one sk has made it onto the list we know that there are no potential bind
conflicts on the owners list that match that sk's rcv_addr.  So copy the sk's
common into our tb->fastsock and set tb->fastruseport to FASTREUSESOCK_STRICT so
we know we have to do an extra check for subsequent reuseport sockets and skip
the expensive bind conflict check.

Signed-off-by: Josef Bacik 
---
 include/net/inet_hashtables.h   |  4 
 net/ipv4/inet_connection_sock.c | 53 +
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 756ed16..4ccc18f 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -74,12 +74,16 @@ struct inet_ehash_bucket {
  * users logged onto your box, isn't it nice to know that new data
  * ports are created in O(1) time?  I thought so. ;-)  -DaveM
  */
+#define FASTREUSEPORT_ANY  1
+#define FASTREUSEPORT_STRICT   2
+
 struct inet_bind_bucket {
possible_net_t  ib_net;
unsigned short  port;
signed char fastreuse;
signed char fastreuseport;
kuid_t  fastuid;
+   struct sock_common  fastsock;
int num_owners;
struct hlist_node   node;
struct hlist_head   owners;
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 923bd60..814382f 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -236,6 +236,32 @@ inet_csk_find_open_port(struct sock *sk, struct 
inet_bind_bucket **tb_ret, int *
return head;
 }
 
+static inline int sk_reuseport_match(struct inet_bind_bucket *tb,
+struct sock *sk)
+{
+   struct sock *sk2 = (struct sock *)&tb->fastsock;
+   kuid_t uid = sock_i_uid(sk);
+
+   if (tb->fastreuseport <= 0)
+   return 0;
+   if (!sk->sk_reuseport)
+   return 0;
+   if (rcu_access_pointer(sk->sk_reuseport_cb))
+   return 0;
+   if (!uid_eq(tb->fastuid, uid))
+   return 0;
+   /* We only need to check the rcv_saddr if this tb was once marked
+* without fastreuseport and then was reset, as we can only know that
+* the fastsock has no potential bind conflicts with the rest of the
+* possible socks on the owners list.
+*/
+   if (tb->fastreuseport == FASTREUSEPORT_ANY)
+   return 1;
+   if (!inet_rcv_saddr_equal(sk, sk2, true))
+   return 0;
+   return 1;
+}
+
 /* Obtain a reference to a local port for the given sock,
  * if snum is zero it means select any available local port.
  * We try to allocate an odd port (and leave even ports for connect())
@@ -275,9 +301,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
goto success;
 
if ((tb->fastreuse > 0 && reuse) ||
-(tb->fastreuseport > 0 &&
- !rcu_access_pointer(sk->sk_reuseport_cb) &&
- sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
+   sk_reuseport_match(tb, sk))
goto success;
if (inet_csk_bind_conflict(sk, tb, true, true))
goto fail_unlock;
@@ -288,14 +312,35 @@ int inet_csk_get_port(struct sock *sk, unsigned short 
snum)
if (sk->sk_reuseport) {
tb->fastreuseport = 1;
tb->fastuid = uid;
+   memcpy(&tb->fastsock, &sk->__sk_common,
+  sizeof(struct sock_common));
} else {
tb->fastreuseport = 0;
}
} else {
if (!reuse)
tb->fastreuse = 0;
-   if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid))
+   if (sk->sk_reuseport) {
+   /* We didn't match or we don't have fastreuseport set on
+* the tb, but we have sk_reuseport set on this socket
+* and we know that there are no bind conflicts with
+* this socket in this tb, so reset our tb's reuseport
+* settings so that any subsequent sockets that match
+* our current socket will be put on the fast path.
+*
+* If we reset we need to set FASTREUSEPORT_STRICT so we
+* do extra checking for all subsequent sk_reuseport
+

[PATCH 2/6 net-next] inet: drop ->bind_conflict

2016-12-22 Thread Josef Bacik

The only difference between inet6_csk_bind_conflict and inet_csk_bind_conflict
is how they check the rcv_saddr, so delete this call back and simply
change inet_csk_bind_conflict to call inet_rcv_saddr_equal.

Signed-off-by: Josef Bacik 
---
 include/net/inet6_connection_sock.h |  5 -
 include/net/inet_connection_sock.h  |  6 --
 net/dccp/ipv4.c |  2 +-
 net/dccp/ipv6.c |  2 --
 net/ipv4/inet_connection_sock.c | 22 +++-
 net/ipv4/tcp_ipv4.c |  2 +-
 net/ipv6/inet6_connection_sock.c| 40 -
 net/ipv6/tcp_ipv6.c |  2 --
 8 files changed, 9 insertions(+), 72 deletions(-)

diff --git a/include/net/inet6_connection_sock.h 
b/include/net/inet6_connection_sock.h
index 3212b39..8ec87b6 100644
--- a/include/net/inet6_connection_sock.h
+++ b/include/net/inet6_connection_sock.h
@@ -15,16 +15,11 @@
 
 #include 
 
-struct inet_bind_bucket;
 struct request_sock;
 struct sk_buff;
 struct sock;
 struct sockaddr;
 
-int inet6_csk_bind_conflict(const struct sock *sk,
-   const struct inet_bind_bucket *tb, bool relax,
-   bool soreuseport_ok);
-
 struct dst_entry *inet6_csk_route_req(const struct sock *sk, struct flowi6 
*fl6,
  const struct request_sock *req, u8 proto);
 
diff --git a/include/net/inet_connection_sock.h 
b/include/net/inet_connection_sock.h
index 85ee387..add75c7 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -62,9 +62,6 @@ struct inet_connection_sock_af_ops {
char __user *optval, int __user *optlen);
 #endif
void(*addr2sockaddr)(struct sock *sk, struct sockaddr *);
-   int (*bind_conflict)(const struct sock *sk,
-const struct inet_bind_bucket *tb,
-bool relax, bool soreuseport_ok);
void(*mtu_reduced)(struct sock *sk);
 };
 
@@ -261,9 +258,6 @@ inet_csk_rto_backoff(const struct inet_connection_sock 
*icsk,
 
 struct sock *inet_csk_accept(struct sock *sk, int flags, int *err);
 
-int inet_csk_bind_conflict(const struct sock *sk,
-  const struct inet_bind_bucket *tb, bool relax,
-  bool soreuseport_ok);
 int inet_csk_get_port(struct sock *sk, unsigned short snum);
 
 struct dst_entry *inet_csk_route_req(const struct sock *sk, struct flowi4 *fl4,
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index d859a5c..ed6f99b 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -904,7 +905,6 @@ static const struct inet_connection_sock_af_ops 
dccp_ipv4_af_ops = {
.getsockopt= ip_getsockopt,
.addr2sockaddr = inet_csk_addr2sockaddr,
.sockaddr_len  = sizeof(struct sockaddr_in),
-   .bind_conflict = inet_csk_bind_conflict,
 #ifdef CONFIG_COMPAT
.compat_setsockopt = compat_ip_setsockopt,
.compat_getsockopt = compat_ip_getsockopt,
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index adfc790..08bcdc3 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -937,7 +937,6 @@ static const struct inet_connection_sock_af_ops 
dccp_ipv6_af_ops = {
.getsockopt= ipv6_getsockopt,
.addr2sockaddr = inet6_csk_addr2sockaddr,
.sockaddr_len  = sizeof(struct sockaddr_in6),
-   .bind_conflict = inet6_csk_bind_conflict,
 #ifdef CONFIG_COMPAT
.compat_setsockopt = compat_ipv6_setsockopt,
.compat_getsockopt = compat_ipv6_getsockopt,
@@ -958,7 +957,6 @@ static const struct inet_connection_sock_af_ops 
dccp_ipv6_mapped = {
.getsockopt= ipv6_getsockopt,
.addr2sockaddr = inet6_csk_addr2sockaddr,
.sockaddr_len  = sizeof(struct sockaddr_in6),
-   .bind_conflict = inet6_csk_bind_conflict,
 #ifdef CONFIG_COMPAT
.compat_setsockopt = compat_ipv6_setsockopt,
.compat_getsockopt = compat_ipv6_getsockopt,
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index ba597cb..a1c9055 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -116,9 +116,9 @@ void inet_get_local_port_range(struct net *net, int *low, 
int *high)
 }
 EXPORT_SYMBOL(inet_get_local_port_range);
 
-int inet_csk_bind_conflict(const struct sock *sk,
-  const struct inet_bind_bucket *tb, bool relax,
-  bool reuseport_ok)
+static int inet_csk_bind_conflict(const struct sock *sk,
+ const struct inet_bind_bucket *tb,
+ bool relax, bool reuseport_ok)
 {
struct sock *sk2;
bool reuse = sk->sk_reuse;
@@ -134,7 +134,6 @@ int inet_csk_bind_conflict(const struct sock *sk,
 
sk_for_each_bound(sk

[PATCH 5/6 net-next] inet: split inet_csk_get_port into two functions

2016-12-22 Thread Josef Bacik

inet_csk_get_port does two different things, it either scans for an open port,
or it tries to see if the specified port is available for use.  Since these two
operations have different rules and are basically independent lets split them
into two different functions to make them both more readable.

Signed-off-by: Josef Bacik 
---
 net/ipv4/inet_connection_sock.c | 66 +++--
 1 file changed, 44 insertions(+), 22 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index f6a34bc..923bd60 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -156,33 +156,21 @@ static int inet_csk_bind_conflict(const struct sock *sk,
return sk2 != NULL;
 }
 
-/* Obtain a reference to a local port for the given sock,
- * if snum is zero it means select any available local port.
- * We try to allocate an odd port (and leave even ports for connect())
+/*
+ * Find an open port number for the socket.  Returns with the
+ * inet_bind_hashbucket lock held.
  */
-int inet_csk_get_port(struct sock *sk, unsigned short snum)
+static struct inet_bind_hashbucket *
+inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int 
*port_ret)
 {
-   bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
-   int ret = 1, port = snum;
+   int port = 0;
struct inet_bind_hashbucket *head;
struct net *net = sock_net(sk);
int i, low, high, attempt_half;
struct inet_bind_bucket *tb;
-   kuid_t uid = sock_i_uid(sk);
u32 remaining, offset;
-   bool reuseport_ok = !!snum;
 
-   if (port) {
-   head = &hinfo->bhash[inet_bhashfn(net, port,
- hinfo->bhash_size)];
-   spin_lock_bh(&head->lock);
-   inet_bind_bucket_for_each(tb, &head->chain)
-   if (net_eq(ib_net(tb), net) && tb->port == port)
-   goto tb_found;
-
-   goto tb_not_found;
-   }
attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
 other_half_scan:
inet_get_local_port_range(net, &low, &high);
@@ -221,11 +209,12 @@ int inet_csk_get_port(struct sock *sk, unsigned short 
snum)
if (net_eq(ib_net(tb), net) && tb->port == port) {
if (hlist_empty(&tb->owners))
goto success;
-   if (!inet_csk_bind_conflict(sk, tb, false, 
reuseport_ok))
+   if (!inet_csk_bind_conflict(sk, tb, false, 
false))
goto success;
goto next_port;
}
-   goto tb_not_found;
+   tb = NULL;
+   goto success;
 next_port:
spin_unlock_bh(&head->lock);
cond_resched();
@@ -240,8 +229,41 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
attempt_half = 2;
goto other_half_scan;
}
-   return ret;
+   return NULL;
+success:
+   *port_ret = port;
+   *tb_ret = tb;
+   return head;
+}
+
+/* Obtain a reference to a local port for the given sock,
+ * if snum is zero it means select any available local port.
+ * We try to allocate an odd port (and leave even ports for connect())
+ */
+int inet_csk_get_port(struct sock *sk, unsigned short snum)
+{
+   bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
+   struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
+   int ret = 1, port = snum;
+   struct inet_bind_hashbucket *head;
+   struct net *net = sock_net(sk);
+   struct inet_bind_bucket *tb = NULL;
+   kuid_t uid = sock_i_uid(sk);
 
+   if (!port) {
+   head = inet_csk_find_open_port(sk, &tb, &port);
+   if (!head)
+   return ret;
+   if (!tb)
+   goto tb_not_found;
+   goto success;
+   }
+   head = &hinfo->bhash[inet_bhashfn(net, port,
+ hinfo->bhash_size)];
+   spin_lock_bh(&head->lock);
+   inet_bind_bucket_for_each(tb, &head->chain)
+   if (net_eq(ib_net(tb), net) && tb->port == port)
+   goto tb_found;
 tb_not_found:
tb = inet_bind_bucket_create(hinfo->bind_bucket_cachep,
 net, head, port);
@@ -257,7 +279,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
  !rcu_access_pointer(sk->sk_reuseport_cb) &&
  sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
goto success;
-   if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok))
+   if (inet_csk_bind_conflict(sk, tb, true, true))

[PATCH 0/6 net-next][V2] Rework inet_csk_get_port

2016-12-22 Thread Josef Bacik

V1->V2:
-Added a new patch 'inet: collapse ipv4/v6 rcv_saddr_equal functions into one'
 at Hannes' suggestion.
-Dropped ->bind_conflict and just use the new helper.
-Fixed a compile bug from the original ->bind_conflict patch.

The original description of the series follows

=

At some point recently the guys working on our load balancer added the ability
to use SO_REUSEPORT.  When they restarted their app with this option enabled
they immediately hit a softlockup on what appeared to be the
inet_bind_bucket->lock.  Eventually what all of our debugging and discussion led
us to was the fact that the application comes up without SO_REUSEPORT, shuts
down which creates around 100k twsk's, and then comes up and tries to open a
bunch of sockets using SO_REUSEPORT, which meant traversing the inet_bind_bucket
owners list under the lock.  Since this lock is needed for dealing with the
twsk's and basically anything else related to connections we would softlockup,
and sometimes not ever recover.

To solve this problem I did what you see in Path 5/5.  Once we have a
SO_REUSEPORT socket on the tb->owners list we know that the socket has no
conflicts with any of the other sockets on that list.  So we can add a copy of
the sock_common (really all we need is the recv_saddr but it seemed ugly to copy
just the ipv6, ipv4, and flag to indicate if we were ipv6 only in there so I've
copied the whole common) in order to check subsequent SO_REUSEPORT sockets.  If
they match the previous one then we can skip the expensive
inet_csk_bind_conflict check.  This is what eliminated the soft lockup that we
were seeing.

Patches 1-4 are cleanups and re-workings.  For instance when we specify port ==
0 we need to find an open port, but we would do two passes through
inet_csk_bind_conflict every time we found a possible port.  We would also keep
track of the smallest_port value in order to try and use it if we found no
port our first run through.  This however made no sense as it would have had to
fail the first pass through inet_csk_bind_conflict, so would not actually pass
the second pass through either.  Finally I split the function into two functions
in order to make it easier to read and to distinguish between the two behaviors.

I have tested this on one of our load balancing boxes during peak traffic and it
hasn't fallen over.  But this is not my area, so obviously feel free to point
out where I'm being stupid and I'll get it fixed up and retested.  Thanks,

Josef

Re: [PATCH net-next 01/10] net: netcp: ethss: add support of subsystem register region regmap

2016-12-22 Thread Rob Herring

On Tue, Dec 20, 2016 at 05:09:44PM -0500, Murali Karicheri wrote:
> From: WingMan Kwok 
> 
> 10gbe phy driver needs to access the 10gbe subsystem control
> register during phy initialization. To facilitate the shared
> access of the subsystem register region between the 10gbe Ethernet
> driver and the phy driver, this patch adds support of the
> subsystem register region defined by a syscon node in the dts.
> 
> Although there is no shared access to the gbe subsystem register
> region, using syscon for that is for the sake of consistency.
> 
> This change is backward compatible with previously released gbe
> devicetree bindings.
> 
> Signed-off-by: WingMan Kwok 
> Signed-off-by: Murali Karicheri 
> Signed-off-by: Sekhar Nori 
> ---
>  .../devicetree/bindings/net/keystone-netcp.txt |  16 ++-
>  drivers/net/ethernet/ti/netcp_ethss.c  | 140 
> +
>  2 files changed, 127 insertions(+), 29 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/keystone-netcp.txt 
> b/Documentation/devicetree/bindings/net/keystone-netcp.txt
> index 04ba1dc..0854a73 100644
> --- a/Documentation/devicetree/bindings/net/keystone-netcp.txt
> +++ b/Documentation/devicetree/bindings/net/keystone-netcp.txt
> @@ -72,20 +72,24 @@ Required properties:
>   "ti,netcp-gbe-2" for 1GbE N NetCP 1.5 (N=2)
>   "ti,netcp-xgbe" for 10 GbE
>  
> +- syscon-subsys: phandle to syscon node of the switch
> + subsystem registers.
> +
>  - reg:   register location and the size for the following 
> register
>   regions in the specified order.
>   - switch subsystem registers
> + - sgmii module registers

This needs to go on the end of the list. Otherwise, it is not backwards 
compatible.

>   - sgmii port3/4 module registers (only for NetCP 1.4)
>   - switch module registers
>   - serdes registers (only for 10G)
>  
>   NetCP 1.4 ethss, here is the order
> - index #0 - switch subsystem registers
> + index #0 - sgmii module registers
>   index #1 - sgmii port3/4 module registers
>   index #2 - switch module registers
>  
>   NetCP 1.5 ethss 9 port, 5 port and 2 port
> - index #0 - switch subsystem registers
> + index #0 - sgmii module registers
>   index #1 - switch module registers
>   index #2 - serdes registers
>  
> @@ -145,6 +149,11 @@ Optional properties:
>  
>  Example binding:
>  
> +gbe_subsys: subsys@209 {
> + compatible = "syscon";
> + reg = <0x0209 0x100>;
> +};
> +
>  netcp: netcp@200 {
>   reg = <0x2620110 0x8>;
>   reg-names = "efuse";
> @@ -163,7 +172,8 @@ netcp: netcp@200 {
>   ranges;
>   gbe@9 {
>   label = "netcp-gbe";
> - reg = <0x9 0x300>, <0x90400 0x400>, <0x90800 0x700>;
> + syscon-subsys = <&gbe_subsys>;
> + reg = <0x90100 0x200>, <0x90400 0x200>, <0x90800 0x700>;
>   /* enable-ale; */
>   tx-queue = <648>;
>   tx-channel = <8>;

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin

> I do tend to like Ted's version in which we use batched
> get_random_bytes() output.  If it's fast enough, it's simpler and lets
> us get the full strength of a CSPRNG.

With the ChaCha20 generator, that's fine, although note that this abandons
anti-backtracking entirely.

It also takes locks, something the previous get_random_int code
path avoided.  Do we need to audit the call sites to ensure that's safe?

And there is the issue that the existing callers assume that there's a
fixed cost per word.  A good half of get_random_long calls are followed by
"& ~PAGE_MASK" to extract the low 12 bits.  Or "& ((1ul << mmap_rnd_bits)
- 1)" to extract the low 28.  If we have a buffer we're going to have to
pay to refill, it would be nice to use less than 8 bytes to satisfy those.

But that can be a followup patch.  I'm thinking

unsigned long get_random_bits(unsigned bits)
E.g. get_random_bits(PAGE_SHIFT),
 get_random_bits(mmap_rnd_bits),
u32 imm_rnd = get_random_bits(32)

unsigned get_random_mod(unsigned modulus)
E.g. get_random_mod(hole) & ~(alignment - 1);
 get_random_mod(port_scan_backoff)
(Althogh probably drivers/s390/scsi/zfcp_fc.c should be changed
to prandom.)

with, until the audit is completed:
#define get_random_int() get_random_bits(32)
#define get_random_long() get_random_bits(BITS_PER_LONG)

> It could only mix the output back in every two calls, in which case
> you can backtrack up to one call but you need to do 2^128 work to
> backtrack farther.  But yes, this is getting excessively complicated.

No, if you're willing to accept limited backtrack, this is a perfectly
acceptable solution, and not too complicated.  You could do it phase-less
if you like; store the previous output, then after generating the new
one, mix in both.  Then overwrite the previous output.  (But doing two
rounds of a crypto primtive to avoid one conditional jump is stupid,
so forget that.)

>> Hmm, interesting.  Although, for ASLR, we could use get_random_bytes()
>> directly and be done with it.  It won't be a bottleneck.

Isn't that what you already suggested?

I don't mind fewer primtives; I got a bit fixated on "Replace MD5 with
SipHash".  It's just the locking that I want to check isn't a problem.

Re: [PATCH iproute2 v3 4/4] ifstat: Add "sw only" extended statistics to ifstat

2016-12-22 Thread Roopa Prabhu

On 12/22/16, 8:23 AM, Nogah Frankel wrote:
> Add support for extended statistics of SW only type, for counting only the
> packets that went via the cpu. (useful for systems with forward
> offloading). It reads it from filter type IFLA_STATS_LINK_OFFLOAD_XSTATS
> and sub type IFLA_OFFLOAD_XSTATS_CPU_HIT.
>
> It is under the name 'software'
> (or any shorten of it as 'soft' or simply 's')
>
> For example:
> ifstat -x s
>
>
Nogah, can we keep the option names closer to the attribute names ?
That would avoid some confusion and help with the follow-up stats.

ifstat -x offload cpu
or
ifstat -x cpu

for others it would be:

ifstat -x link [vlan|igmp]
ifstat -x vlan
ifstat -x igmp
ifstat -x lacp

and so on...

thanks!

Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Hannes Frederic Sowa

On 22.12.2016 20:56, Andy Lutomirski wrote:
> It's also not quite clear to me why userspace needs to be able to
> calculate the digest on its own.  A bpf(BPF_CALC_PROGRAM_DIGEST)
> command that takes a BPF program as input and hashes it would seem to
> serve the same purpose, and that would allow the kernel to key the
> digest and change the algorithm down the road without breaking things.

I think that people expect digests of BPF programs to be stable over
time and reboots.

Re: [PATCH 1/5 net-next] inet: replace ->bind_conflict with ->rcv_saddr_equal

2016-12-22 Thread Hannes Frederic Sowa

On 21.12.2016 16:16, Josef Bacik wrote:
> On Wed, Dec 21, 2016 at 10:06 AM, Hannes Frederic Sowa
>  wrote:
>> On Tue, 2016-12-20 at 15:07 -0500, Josef Bacik wrote:
>>>  The only difference between inet6_csk_bind_conflict and
>>> inet_csk_bind_conflict
>>>  is how they check the rcv_saddr.  Since we want to be able to check
>>> the saddr in
>>>  other places just drop the protocol specific ->bind_conflict and
>>> replace it with
>>>  ->rcv_saddr_equal, then make inet_csk_bind_conflict the one true
>>> bind conflict
>>>  function.
>>>
>>>  Signed-off-by: Josef Bacik 
>>>
>>
>>
>>
>>>  ---
>>>   include/net/inet6_connection_sock.h |  5 -
>>>   include/net/inet_connection_sock.h  |  9 +++--
>>>   net/dccp/ipv4.c |  3 ++-
>>>   net/dccp/ipv6.c |  2 +-
>>>   net/ipv4/inet_connection_sock.c | 22 +++-
>>>   net/ipv4/tcp_ipv4.c |  3 ++-
>>>   net/ipv4/udp.c  |  1 +
>>>   net/ipv6/inet6_connection_sock.c| 40
>>> -
>>>   net/ipv6/tcp_ipv6.c |  4 ++--
>>>   9 files changed, 18 insertions(+), 71 deletions(-)
>>>
>>>  diff --git a/include/net/inet6_connection_sock.h
>>> b/include/net/inet6_connection_sock.h
>>>  index 3212b39..8ec87b6 100644
>>>  --- a/include/net/inet6_connection_sock.h
>>>  +++ b/include/net/inet6_connection_sock.h
>>>  @@ -15,16 +15,11 @@
>>>
>>>   #include 
>>>
>>>  -struct inet_bind_bucket;
>>>   struct request_sock;
>>>   struct sk_buff;
>>>   struct sock;
>>>   struct sockaddr;
>>>
>>>  -int inet6_csk_bind_conflict(const struct sock *sk,
>>>  -const struct inet_bind_bucket *tb, bool relax,
>>>  -bool soreuseport_ok);
>>>  -
>>>   struct dst_entry *inet6_csk_route_req(const struct sock *sk, struct
>>> flowi6 *fl6,
>>> const struct request_sock *req, u8 proto);
>>>
>>>  diff --git a/include/net/inet_connection_sock.h
>>> b/include/net/inet_connection_sock.h
>>>  index ec0479a..9cd43c5 100644
>>>  --- a/include/net/inet_connection_sock.h
>>>  +++ b/include/net/inet_connection_sock.h
>>>  @@ -62,9 +62,9 @@ struct inet_connection_sock_af_ops {
>>>   char __user *optval, int __user *optlen);
>>>   #endif
>>>   void(*addr2sockaddr)(struct sock *sk, struct sockaddr *);
>>>  -int(*bind_conflict)(const struct sock *sk,
>>>  - const struct inet_bind_bucket *tb,
>>>  - bool relax, bool soreuseport_ok);
>>>  +int (*rcv_saddr_equal)(const struct sock *sk1,
>>>  +   const struct sock *sk2,
>>>  +   bool match_wildcard);
>>>   void(*mtu_reduced)(struct sock *sk);
>>>   };
>>>
>>>
>>
>> The patch looks as a nice code cleanup already!
>>
>> Have you looked if we can simply have one rcv_saddr_equal for both ipv4
>> and ipv6 that e.g. uses sk->sk_family instead of function pointers?
>> This could give us even more possibilities to remove some indirect
>> functions calls and thus might relieve some cycles?
> 
> I was going to do that but I'm not familiar enough with how sockets work
> to be comfortable.  My main concern is we have the ipv6_only() check on
> a socket, which seems to indicate to me that you can have a socket that
> can do both ipv4/ipv6, so what if we're specifically going through the
> ipv6 code, but we aren't ipv6_only() and we end up doing the ipv4
> address compare when we really need to do the ipv6 address compare?  If
> this can't happen (and honestly as I type it out it sounds crazier than
> it did in my head) then yeah I'll totally do that as well and we can
> just have a global function without the protocol specific callbacks, but
> I need you or somebody to tell me I'm crazy and that it would be ok to
> have it all in one function.  Thanks,

IPv6 sockets can do IPv4 via mapped addresses. The other way around
doesn't work, there are no IPv4 sockets that can speak IPv6. The
ipv6_only flags in IPv4 sockets should always stay 0 but they need to be
evaluated from there side for possible port conflicts.

My idea is to use sk->sk_family, which is in sync with
icsk->icsk_af_ops, which you use for the function pointer lookup, to
switch between those functions. Looking through a lot of callback, I
don't see this assumption violated so far.

This would also solve the problem which David described, your search
would be free of those indirect jumps.

Bye,
Hannes

Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Andy Lutomirski

On Thu, Dec 22, 2016 at 11:34 AM, Alexei Starovoitov
 wrote:
> On Thu, Dec 22, 2016 at 9:25 AM, Andy Lutomirski  wrote:
>> On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa
>>  wrote:
>>> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>>>
>>> We don't prevent ebpf programs being loaded based on the digest but
>>> just to uniquely identify loaded programs from user space and match up
>>> with their source.
>>
>> The commit log talks about using the hash to see if the program has
>> already been compiled and JITted.  If that's done, then a collision
>> will directly cause the kernel to malfunction.
>
> Andy, please read the code.
> we could have used jhash there just as well.
> Collisions are fine.

There's relevant in the code to read yet AFAICS.  The code exports it
via fdinfo, and userspace is expected to do something with it.  The
commit message says:

When programs are pinned and retrieved by an ELF loader, the loader
can check the program's digest through fdinfo and compare it against
one that was generated over the ELF file's program section to see
if the program needs to be reloaded.

I assume this means that a userspace component is expected to compare
the digest of a loaded program to a digest of a program it wants to
load and to use the result of the comparison to decide whether the
programs are the same.  If that's indeed the case (and it sure sounds
like it, and I fully expect CRIU to do very similar things when
support is added), then malicious collisions do matter.

It's also not quite clear to me why userspace needs to be able to
calculate the digest on its own.  A bpf(BPF_CALC_PROGRAM_DIGEST)
command that takes a BPF program as input and hashes it would seem to
serve the same purpose, and that would allow the kernel to key the
digest and change the algorithm down the road without breaking things.

Regardless, adding a new hash algorithm that is almost-but-not-quite
SHA-1 and making it a stable interface to userspace is not a good
thing.

--Andy

Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Theodore Ts'o

On Thu, Dec 22, 2016 at 07:08:37PM +0100, Hannes Frederic Sowa wrote:
> I wasn't concerned about performance but more about DoS resilience. I
> wonder how safe half md4 actually is in terms of allowing users to
> generate long hash chains in the filesystem (in terms of length
> extension attacks against half_md4).
> 
> In ext4, is it actually possible that a "disrupter" learns about the
> hashing secret in the way how the inodes are returned during getdents?

They'd have to be a local user, who can execute telldir(3) --- in
which case there are plenty of other denial of service attacks one
could carry out that would be far more devastating.

It might also be an issue if the file system is exposed via NFS, but
again, there are so many other ways an attacker could DoS a NFS server
that I don't think of it as a much of a concern.

Keep in mind that worst someone can do is cause directory inserts to
fail with an ENOSPC, and there are plenty of other ways of doing that
--- such as consuming all of the blocks and inodes in the file system,
for example.

So it's a threat, but not a high priority one as far as I'm concerned.
And if this was a problem in actual practice, users could switch to
the TEA based hash, which should be far harder to attack, and
available today.

- Ted

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread Andy Lutomirski

On Thu, Dec 22, 2016 at 11:24 AM, George Spelvin
 wrote:
>> Having slept on this, I like it less.  The problem is that a
>> backtracking attacker doesn't just learn H(random seed || entropy_0 ||
>> secret || ...) -- they learn the internal state of the hash function
>> that generates that value.  This probably breaks any attempt to apply
>> security properties of the hash function.  For example, the internal
>> state could easily contain a whole bunch of prior outputs it in
>> verbatim.
>
> The problem is, anti-backtracing is in severe tension with your desire
> to use unmodified SipHash.
>
> First of all, I'd like to repeat that it isn't a design goal of the current
> generator and isn't necessary.

Agreed.

> Now, the main point:  it's not likely to be solvable.
>
> The problem with unmodified SipHash is that is has only 64 bits of
> output.  No mix-back mechanism can get around the fundamental problem
> that that's too small to prevent a brute-force guessing attack.  You need
> wider mix-back.  And getting more output from unmodified SipHash requires
> more finalization rounds, which is expensive.

It could only mix the output back in every two calls, in which case
you can backtrack up to one call but you need to do 2^128 work to
backtrack farther.  But yes, this is getting excessively complicated.

> Finally, your discomfort about an attacker learning the internal state...
> if an attacker knows the key and the input, they can construct the
> internal state.  Yes, we could discard the internal state and construct
> a fresh one on the next call to get_random_int, but what are you going
> to key it with?  What are you going to feed it?  What keeps *that*
> internal state any more secret from an attacker than one that's explicitly
> stored?

I do tend to like Ted's version in which we use batched
get_random_bytes() output.  If it's fast enough, it's simpler and lets
us get the full strength of a CSPRNG.

(Aside: some day I want to move all that code from drivers/ to lib/
and teach it to be buildable in userspace, too, so it's easy to play
with it, feed it test vectors, confirm that it's equivalent to a
reference implementation, write up the actual design and try to get
real cryptographers to analyze it, etc.)

> For example, clearly stating the concern over starting new processes
> with predictable layout, and the limits on the fresh entropy supply,
> has made me realize that there *is* a possible source: each exec()
> is passed 128 bits from get_random_bytes in the AT_RANDOM element of
> its auxv.  Since get_random_int() accesses "current" anyway, we could
> store some seed material there rather than using "pid".  While this is
> not fresh for each call to get_random_int, it *is* fresh for each new
> address space whose layout is being randomized.

Hmm, interesting.  Although, for ASLR, we could use get_random_bytes()
directly and be done with it.  It won't be a bottleneck.

Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Alexei Starovoitov

On Thu, Dec 22, 2016 at 9:25 AM, Andy Lutomirski  wrote:
> On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa
>  wrote:
>> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>>
>> We don't prevent ebpf programs being loaded based on the digest but
>> just to uniquely identify loaded programs from user space and match up
>> with their source.
>
> The commit log talks about using the hash to see if the program has
> already been compiled and JITted.  If that's done, then a collision
> will directly cause the kernel to malfunction.

Andy, please read the code.
we could have used jhash there just as well.
Collisions are fine.

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin

> Having slept on this, I like it less.  The problem is that a
> backtracking attacker doesn't just learn H(random seed || entropy_0 ||
> secret || ...) -- they learn the internal state of the hash function
> that generates that value.  This probably breaks any attempt to apply
> security properties of the hash function.  For example, the internal
> state could easily contain a whole bunch of prior outputs it in
> verbatim.

The problem is, anti-backtracing is in severe tension with your desire
to use unmodified SipHash.

First of all, I'd like to repeat that it isn't a design goal of the current
generator and isn't necessary.

The current generator just returns hash[0] from the MD5 state, then
leaves the state stored.  The fact that it conceals earlier outputs is
an accident of the Davies-Meyer structure of md5_transform.

It isn't necessary, because every secret generated is stored unencrypted
for as long as it's of value.  A few values are used for retransmit
backoffs and random MAC addresses.  Those are revealed to the world as
soon as they're used.

Most values are used for ASLR.  These address are of interest to an
attacker trying to mount a buffer-overflow attack, but that only lasts
as long as the process is running to receive buffers.  After the process
exits, knowledge of its layout is worthless.

And this is stored as long as it's running in kernel-accessible data,
so storing a *second* copy in less conveniently kernel-accessible data
(the RNG state) doesn't make the situation any worse.


In addition to the above, if you're assuming a state capture, then
since we have (for necessary efficieny reasons) a negligible about of
fresh entropy, an attacker has the secret key and can predict *future*
outputs very easily.

Given that information, an attacker doesn't need to learn the layout of
vulnerable server X.  If they have a buffer overflow, they can crash
the current instance and wait for a fresh image to be started (with a
known address space) to launch their attack at.


Kernel state capture attacks are a very unlikely attack, mostly because
it's a narrow target a hair's breadth away from the much more interesting
outright kernel compromise (attacker gains write access as well as read)
which renders all this fancy cryptanaysis moot.


Now, the main point:  it's not likely to be solvable.

The problem with unmodified SipHash is that is has only 64 bits of
output.  No mix-back mechanism can get around the fundamental problem
that that's too small to prevent a brute-force guessing attack.  You need
wider mix-back.  And getting more output from unmodified SipHash requires
more finalization rounds, which is expensive.

(Aside: 64 bits does have the advantage that it can't be brute-forced on
the attacked machine.  It must be exfiltrated to the attacker, and the
solution returned to the attack code.  But doing this is getting easier
all the time.)

Adding antibacktracking to SipHash is trivial: just add a Davies-Meyer
structure around its internal state.  Remember the internal state before
hashing in the entropy and secret, generate the output, then add the
previous and final states together for storage.

This is a standard textbook construction, very cheap, and doesn't touch
the compression function which is the target of analysis and attacks,
but it requires poking around in SipHash's internal state.  (And people
who read the textbooks without understanding them will get upset because
the textbooks all talk about using this construction with block ciphers,
and SipHash's compression function is not advertised as a block cipher.)

Alternative designs exist; you could extract additional output from
earlier rounds of SipHash, using the duplex sponge construction you
mentioned earlier.  That output would be used for mixback purposes *only*,
so wouldn't affect the security proof of the "primary" output.
But this is also getting creative with SipHash's internals.


Now, you could use a completely *different* cryptographic primitive
to enforce one-way-ness, and apply SipHash as a strong output transform,
but that doesn't feel like good design, and is probably more expensive.


Finally, your discomfort about an attacker learning the internal state...
if an attacker knows the key and the input, they can construct the
internal state.  Yes, we could discard the internal state and construct
a fresh one on the next call to get_random_int, but what are you going
to key it with?  What are you going to feed it?  What keeps *that*
internal state any more secret from an attacker than one that's explicitly
stored?

Keeping the internal state around is a cacheing optimization, that's all.

*If* you're assuming a state capture, the only thing secret from the
attacker is any fresh entropy collected between the time of capture
and the time of generation.  Due to mandatory efficiency requirements,
this is very small.  

I really think you're wishing for the impossible here.


A final note: although I'm disagreeing with you,

Re: [PATCH iproute2 0/2] Fix the usage of TC tunnel key

2016-12-22 Thread Stephen Hemminger

On Thu, 22 Dec 2016 10:14:39 +0200
Hadar Hen Zion  wrote:

> Add dest UDP port parameter to the usage of tc tunnle key action and
> classifcation.
> 
> 
> Hadar Hen Zion (2):
>   tc/cls_flower: Add to the usage encapsulation dest UDP port
>   tc/m_tunnel_key: Add to the usage encapsulation dest UDP port
> 
>  tc/f_flower.c | 5 +++--
>  tc/m_tunnel_key.c | 2 +-
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 

Looks good, both applied

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-22 Thread Cong Wang

On Wed, Dec 21, 2016 at 1:07 PM, Daniel Borkmann  wrote:
>
> Ok, you mean for net. In that case I prefer the smaller sized fix to be
> honest. It also covers everything from the point where we fetch the chain
> via cops->tcf_chain() to the end of the function, which is where most of
> the complexity resides, and only the two mentioned commits do the relock,

I really wish the problem is only about relocking, but look at the code,
the deeper reason why we have this bug is the complexity of the logic
inside tc_ctl_tfilter(): 1) the replay logic is hard, we have to make it
idempotent; 2) the request logic itself is hard, because of tc filter design
and implementation.

This is why I worry more than just relocking.

> so as a fix I think it's fine as-is. As mentioned, if there's need to
> refactor tc_ctl_tfilter() net-next would be better, imho.

Refactor is a too strong word, just moving the replay out is not a refactor.
The end result will be still smaller than your commit d936377414fadbafb4,
which is already backported to stable.

Re: [PATCH iproute2 v3 2/4] ifstat: Add extended statistics to ifstat

2016-12-22 Thread Stephen Hemminger

On Thu, 22 Dec 2016 18:23:13 +0200
Nogah Frankel  wrote:
On Thu, 22 Dec 2016 18:23:13 +0200
Nogah Frankel  wrote:

>  }
> @@ -691,18 +804,22 @@ static const struct option longopts[] = {
>   { "interval", 1, 0, 't' },
>   { "version", 0, 0, 'V' },
>   { "zeros", 0, 0, 'z' },
> + { "extended", 1, 0, 'x'},
>   { 0 }
>  };
>  
> +
>  int main(int argc, char *argv[])

You let extra whitespace changes creep in.

> + case 'x':
> + is_extended = true;
> + memset(stats_type, 0, 64);
> + strncpy(stats_type, optarg, 63);
> + break;

This seems like doing this either the paranoid or hard way.
Why not:
const char *stats_type = NULL;
...

case 'x':
stats_type = optarg;
break;
...
if (stats_type)
snprintf(hist_name, sizeof(hist_name),
 "%s/.%s_ifstat.u%d", P_tmpdir, stats_type,
 getuid());
else
snprintf(hist_name, sizeof(hist_name),
 "%s/.ifstat.u%d", P_tmpdir, getuid());

Since:
1) optarg points to area in argv that is persistent (avoid copy)
2) don't need is_extended flag value then

Please cleanup and resubmit.

[PATCH] tc: add missing limits.h header

2016-12-22 Thread Baruch Siach

This fixes under musl build issues like:

f_matchall.c: In function ‘matchall_parse_opt’:
f_matchall.c:48:12: error: ‘LONG_MIN’ undeclared (first use in this function)
   if (h == LONG_MIN || h == LONG_MAX) {
^
f_matchall.c:48:12: note: each undeclared identifier is reported only once for 
each function it appears in
f_matchall.c:48:29: error: ‘LONG_MAX’ undeclared (first use in this function)
   if (h == LONG_MIN || h == LONG_MAX) {
 ^

Signed-off-by: Baruch Siach 
---
 tc/tc_util.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tc/tc_util.h b/tc/tc_util.h
index f198a4ad5554..4db26c6d5e25 100644
--- a/tc/tc_util.h
+++ b/tc/tc_util.h
@@ -2,6 +2,7 @@
 #define _TC_UTIL_H_ 1
 
 #define MAX_MSG 16384
+#include 
 #include 
 #include 
 #include 
-- 
2.11.0

Re: [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock

2016-12-22 Thread Rob Herring

On Mon, Dec 19, 2016 at 5:23 PM, Geoff Lansberry  wrote:
> I can make that change, however, I worry that it may be a bit
> misleading, since there are only two supported clock frequencies, but
> a number like that to me implies that it could be set to any number
> you want.   I'm new at this, and so I'll go ahead and change it as you
> request, but I'd like to hear your thoughts on my concern.

Then the binding doc just needs to state what are the 2 valid frequencies.

Rob

Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Jason A. Donenfeld

On Thu, Dec 22, 2016 at 5:59 PM, Hannes Frederic Sowa
 wrote:
> We don't prevent ebpf programs being loaded based on the digest but
> just to uniquely identify loaded programs from user space and match up
> with their source.

Okay, so in that case, a weak hashing function like SHA1 could result
in a real vulnerability. Therefore, this SHA1 stuff needs to be
reverted immediately, pending a different implementation. If this has
ever shipped in a kernel version, it could even deserve a CVE. No
SHA1!

> The hashing is not a proper sha1 neither, unfortunately. I think that
> is why it will have a custom implementation in iproute2?

Jeepers creepers. So for some ungodly reason, LKML has invented yet
another homebrewed crypto primitive. This story really gets more
horrifying every day. No bueno.

So yea, let's revert and re-commit (repeal and replace? just
kidding...). Out with SHA-1, in with Blake2 or SHA2.

Jason

Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Jason A. Donenfeld

On Thu, Dec 22, 2016 at 7:08 PM, Hannes Frederic Sowa
 wrote:
> I wasn't concerned about performance but more about DoS resilience. I
> wonder how safe half md4 actually is in terms of allowing users to
> generate long hash chains in the filesystem (in terms of length
> extension attacks against half_md4).

AFAIK, this is a real vulnerability that needs to be addressed.

Judging by Ted's inquiry about my siphash testing suite, I assume he's
probably tinkering around with it as we speak. :)

Meanwhile I've separated things into several trees:

1. chacha20 rng, already submitted:
https://git.zx2c4.com/linux-dev/log/?h=random-next

2. md5 cleanup, not yet submitted:
https://git.zx2c4.com/linux-dev/log/?h=md5-cleanup

3. md4 cleanup, already submitted:
https://git.zx2c4.com/linux-dev/log/?h=ext4-next-md4-cleanup

4. siphash and networking, not yet submitted as a x/4 series:
https://git.zx2c4.com/linux-dev/log/?h=net-next-siphash

I'll submit (4) in a couple of days, waiting for any comments on the
existing patch-set.

Jason

Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Hannes Frederic Sowa

On 22.12.2016 16:54, Theodore Ts'o wrote:
> On Thu, Dec 22, 2016 at 02:10:33PM +0100, Jason A. Donenfeld wrote:
>> On Thu, Dec 22, 2016 at 1:47 PM, Hannes Frederic Sowa
>>  wrote:
>>> following up on what appears to be a random subject: ;)
>>>
>>> IIRC, ext4 code by default still uses half_md4 for hashing of filenames
>>> in the htree. siphash seems to fit this use case pretty good.
>>
>> I saw this too. I'll try to address it in v8 of this series.
> 
> This is a separate issue, and this series is getting a bit too
> complex.  So I'd suggest pushing this off to a separate change.
> 
> Changing the htree hash algorithm is an on-disk format change, and so
> we couldn't roll it out until e2fsprogs gets updated and rolled out
> pretty broadley.  In fact George sent me patches to add siphash as a
> hash algorithm for htree a while back (for both the kernel and
> e2fsprogs), but I never got around to testing and applying them,
> mainly because while it's technically faster, I had other higher
> priority issues to work on --- and see previous comments regarding
> pixel peeping.  Improving the hash algorithm by tens or even hundreds
> of nanoseconds isn't really going to matter since we only do a htree
> lookup on a file creation or cold cache lookup, and the SSD or HDD I/O
> times will dominate.  And from the power perspective, saving
> microwatts of CPU power isn't going to matter if you're going to be
> spinning up the storage device

I wasn't concerned about performance but more about DoS resilience. I
wonder how safe half md4 actually is in terms of allowing users to
generate long hash chains in the filesystem (in terms of length
extension attacks against half_md4).

In ext4, is it actually possible that a "disrupter" learns about the
hashing secret in the way how the inodes are returned during getdents?

Thanks,
Hannes

Re: [PATCH net-next] ixgbevf: fix 'Etherleak' in ixgbevf

2016-12-22 Thread Alexander Duyck

Yes that is much more helpful.  So looking at it things are being
padded but the last 4 bytes always have this extra data in them.  I've
been trying to recreate the issue on an 82599 with an SR-IOV VF and I
haven't been having much luck reproducing the problem.

In your test environment is the 82599 connected directly to the
Windows machine or are there any
switches/routers/gateways/tunnels/vlans in between?  I've tried
several iterations but with the 82599 connected directly to another
NIC I have I am not able to get it to produce the garbage padding you
are seeing. It makes me wonder if there might be something that is
manipulating the packets in between the two systems.  For example if
there was a VLAN being associated with the VF that is later stripped
and then the packet handed raw to the test system it might explain
what is introducing the extra padding and reason for pulling in the
CRC, and your patch would mask the issue since it would push the
minimum frame size with a VLAN to 68 instead of 64.

- Alex

On Wed, Dec 21, 2016 at 6:00 PM, Kefeng Wang  wrote:
>
>
> On 2016/12/21 10:20, Alexander Duyck wrote:
>> I find it curious that only the last 4 bytes have data in them.  I'm
>> wondering if the NIC/driver in the Windows/Nessus system is
>> interpreting the 4 byte CRC on the end of the frame as padding instead
>> of stripping it.
>>
>> Is there any chance you could capture the entire frame instead of just
>> the padding?  Maybe you could run something like wireshark without
>> enabling promiscuous mode on the VF and capture the frames it is
>> trying to send and receive.  What I want to verify is what the actual
>> amount of padding is that is needed to get to 60 bytes and where the
>> CRC should start.
>>
>> - Alex
>
> Here is the verbose output, is this useful?
> Or we will try according to your advice, thanks,
>
> D:\Program Files\Tenable\Nessus>nasl.exe -aX -t 192.169.0.151 etherleak.nasl
> --
>  ---[ ICMP ]---
> 0x00:  45 00 00 1D 20 81 00 00 40 01 D7 F3 C0 A9 00 97E... ...@...
> 0x10:  C0 A9 00 82 00 00 87 FD 00 01 00 01 78 00 00 00x...
> 0x20:  00 00 00 00 00 00 00 00 00 00 98 E4 75 DF  u.
> --
>  ---[ ICMP ]---
> 0x00:  45 00 00 1D 20 85 00 00 40 01 D7 EF C0 A9 00 97E... ...@...
> 0x10:  C0 A9 00 82 00 00 87 FD 00 01 00 01 78 00 00 00x...
> 0x20:  00 00 00 00 00 00 00 00 00 00 FB DA F8 13  ..
> ---[ ether1 ]---
> 0x00:  00 00 00 00 00 00 00 00 00 00 00 00 00 98 E4 75...u
> 0x10:  DF .
> ---[ ether2 ]---
> 0x00:  00 00 00 00 00 00 00 00 00 00 00 00 00 FB DA F8
> 0x10:  13 .
>
> Padding observed in one frame :
>
>   0x00:  00 00 00 00 00 00 00 00 00 00 00 00 00 98 E4 75...u
>   0x10:  DF .
>
> Padding observed in another frame :
>
>   0x00:  00 00 00 00 00 00 00 00 00 00 00 00 00 FB DA F8
>   0x10:  13
>
>

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-22 Thread John Fastabend

On 16-12-22 08:53 AM, David Miller wrote:
> From: Daniel Borkmann 
> Date: Wed, 21 Dec 2016 22:07:48 +0100
> 
>> Ok, you mean for net. In that case I prefer the smaller sized fix to
>> be honest. It also covers everything from the point where we fetch
>> the chain via cops->tcf_chain() to the end of the function, which is
>> where most of the complexity resides, and only the two mentioned
>> commits do the relock, so as a fix I think it's fine as-is. As
>> mentioned, if there's need to refactor tc_ctl_tfilter() net-next
>> would be better, imho.
> 
> Please, can you two work towards an agreement as to what fix should
> go in at this time?
> 
> Thanks.
> 

Thanks for fixing this!

I have a slight preference to push this patch into net as its been
tested already by Shahar and is not particularly invasive. Then for
net-next do a larger cleanup to get rid of some of the complexity per
Cong's suggestion.

.John

Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Hannes Frederic Sowa

On Thu, 2016-12-22 at 09:25 -0800, Andy Lutomirski wrote:
> On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa
>  wrote:
> > On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
> > > 
> > > You mean:
> > > 
> > > commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
> > > Author: Daniel Borkmann 
> > > Date:   Sun Dec 4 23:19:41 2016 +0100
> > > 
> > > bpf: add prog_digest and expose it via fdinfo/netlink
> > > 
> > > 
> > > Yes, please!  This actually matters for security -- imagine a
> > > malicious program brute-forcing a collision so that it gets loaded
> > > wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
> > > (preferably the latter).  Speed basically doesn't matter here and
> > > Blake2 is both less stable (didn't they slightly change it recently?)
> > > and much less well studied.
> > 
> > We don't prevent ebpf programs being loaded based on the digest but
> > just to uniquely identify loaded programs from user space and match up
> > with their source.
> 
> The commit log talks about using the hash to see if the program has
> already been compiled and JITted.  If that's done, then a collision
> will directly cause the kernel to malfunction.

Yeah, it still shouldn't crash the kernel but it could cause
malfunctions because assumptions are not met from user space thus it
could act in a strange way:

My personal biggest concern is that users of this API will at some
point in time assume this digist is unique (as a key itself for a
hashtables f.e.), while it is actually not (and not enforced so by the
kernel). If you can get an unpriv ebpf program inserted to the kernel
with the same weak hash, a controller daemon could pick it up and bind
it to another ebpf hook, probably outside of the unpriv realm the user
was in before. Only the sorting matters, which might be unstable and is
not guaranteed by anything in most hash table based data structures.

The API seems flawed to me.

> > > My inclination would have been to seed them with something that isn't
> > > exposed to userspace for the precise reason that it would prevent user
> > > code from making assumptions about what's in the hash.  But if there's
> > > a use case for why user code needs to be able to calculate the hash on
> > > its own, then that's fine.  But perhaps the actual fdinfo string
> > > should be "sha256:abcd1234..." to give some flexibility down the road.

To add to this, I am very much in favor of that. Right now it doesn't
have a name because it is a custom algorithm. ;)

> > > 
> > > Also:
> > > 
> > > +   result = (__force __be32 *)fp->digest;
> > > +   for (i = 0; i < SHA_DIGEST_WORDS; i++)
> > > +   result[i] = cpu_to_be32(fp->digest[i]);
> > > 
> > > Everyone, please, please, please don't open-code crypto primitives.
> > > Is this and the code above it even correct?  It might be but on a very
> > > brief glance it looks wrong to me.  If you're doing this to avoid
> > > depending on crypto, then fix crypto so you can pull in the algorithm
> > > without pulling in the whole crypto core.
> > 
> > The hashing is not a proper sha1 neither, unfortunately. I think that
> > is why it will have a custom implementation in iproute2?
> 
> Putting on crypto hat:
> 
> NAK NAK NAK NAK NAK.  "The Linux kernel invented a new primitive in
> 2016 when people know better and is going to handle it by porting that
> new primitive to userspace" is not a particularly good argument.
> 
> Okay, crypto hack back off.
> 
> > 
> > I wondered if bpf program loading should have used the module loading
> > infrastructure from the beginning...
> 
> That would be way too complicated and would be nasty for the unprivileged 
> cases.

I was more or less just thinking about using the syscalls and user
space representation not the generic infrastructure, as it is anyway
too much concerned with real kernel modules (would probably also solve
your cgroup-ebpf thread, as it can be passed by unique name or name and
hash ;) ). Anyway...

> > > At the very least, there should be a separate function that calculates
> > > the hash of a buffer and that function should explicitly run itself
> > > against test vectors of various lengths to make sure that it's
> > > calculating what it claims to be calculating.  And it doesn't look
> > > like the userspace code has landed, so, if this thing isn't
> > > calculating SHA1 correctly, it's plausible that no one has noticed.
> > 
> > I hope this was known from the beginning, this is not sha1 unfortunately.
> > 
> > But ebpf elf programs also need preprocessing to get rid of some
> > embedded load-depending data, so maybe it was considered to be just
> > enough?
> 
> I suspect it was actually an accident.

Maybe, I don't know.

Bye,
Hannes

Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Andy Lutomirski

On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa
 wrote:
> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>> On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa
>>  wrote:
>> > On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
>> > > Hi Hannes,
>> > >
>> > > On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
>> > >  wrote:
>> > > > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
>> > > > You don't want to give people new IPv6 addresses with the same stable
>> > > > secret (across reboots) after a kernel upgrade. Maybe they lose
>> > > > connectivity then and it is extra work?
>> > >
>> > > Ahh, too bad. So it goes.
>> >
>> > If no other users survive we can put it into the ipv6 module.
>> >
>> > > > The bpf hash stuff can be changed during this merge window, as it is
>> > > > not yet in a released kernel. Albeit I would probably have preferred
>> > > > something like sha256 here, which can be easily replicated by user
>> > > > space tools (minus the problem of patching out references to not
>> > > > hashable data, which must be zeroed).
>> > >
>> > > Oh, interesting, so time is of the essence then. Do you want to handle
>> > > changing the new eBPF code to something not-SHA1 before it's too late,
>> > > as part of a ne
>>
>> w patchset that can fast track itself to David? And
>> > > then I can preserve my large series for the next merge window.
>> >
>> > This algorithm should be a non-seeded algorithm, because the hashes
>> > should be stable and verifiable by user space tooling. Thus this would
>> > need a hashing algorithm that is hardened against pre-image
>> > attacks/collision resistance, which siphash is not. I would prefer some
>> > higher order SHA algorithm for that actually.
>> >
>>
>> You mean:
>>
>> commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
>> Author: Daniel Borkmann 
>> Date:   Sun Dec 4 23:19:41 2016 +0100
>>
>> bpf: add prog_digest and expose it via fdinfo/netlink
>>
>>
>> Yes, please!  This actually matters for security -- imagine a
>> malicious program brute-forcing a collision so that it gets loaded
>> wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
>> (preferably the latter).  Speed basically doesn't matter here and
>> Blake2 is both less stable (didn't they slightly change it recently?)
>> and much less well studied.
>
> We don't prevent ebpf programs being loaded based on the digest but
> just to uniquely identify loaded programs from user space and match up
> with their source.

The commit log talks about using the hash to see if the program has
already been compiled and JITted.  If that's done, then a collision
will directly cause the kernel to malfunction.

>> My inclination would have been to seed them with something that isn't
>> exposed to userspace for the precise reason that it would prevent user
>> code from making assumptions about what's in the hash.  But if there's
>> a use case for why user code needs to be able to calculate the hash on
>> its own, then that's fine.  But perhaps the actual fdinfo string
>> should be "sha256:abcd1234..." to give some flexibility down the road.
>>
>> Also:
>>
>> +   result = (__force __be32 *)fp->digest;
>> +   for (i = 0; i < SHA_DIGEST_WORDS; i++)
>> +   result[i] = cpu_to_be32(fp->digest[i]);
>>
>> Everyone, please, please, please don't open-code crypto primitives.
>> Is this and the code above it even correct?  It might be but on a very
>> brief glance it looks wrong to me.  If you're doing this to avoid
>> depending on crypto, then fix crypto so you can pull in the algorithm
>> without pulling in the whole crypto core.
>
> The hashing is not a proper sha1 neither, unfortunately. I think that
> is why it will have a custom implementation in iproute2?

Putting on crypto hat:

NAK NAK NAK NAK NAK.  "The Linux kernel invented a new primitive in
2016 when people know better and is going to handle it by porting that
new primitive to userspace" is not a particularly good argument.

Okay, crypto hack back off.

>
> I wondered if bpf program loading should have used the module loading
> infrastructure from the beginning...

That would be way too complicated and would be nasty for the unprivileged cases.

>
>> At the very least, there should be a separate function that calculates
>> the hash of a buffer and that function should explicitly run itself
>> against test vectors of various lengths to make sure that it's
>> calculating what it claims to be calculating.  And it doesn't look
>> like the userspace code has landed, so, if this thing isn't
>> calculating SHA1 correctly, it's plausible that no one has noticed.
>
> I hope this was known from the beginning, this is not sha1 unfortunately.
>
> But ebpf elf programs also need preprocessing to get rid of some
> embedded load-depending data, so maybe it was considered to be just
> enough?

I suspect it was actually an accident.

Re: [PATCH v2] stmmac: CSR clock configuration fix

2016-12-22 Thread Joao Pinto

Às 4:57 PM de 12/22/2016, Phil Reid escreveu:
> On 22/12/2016 23:47, Joao Pinto wrote:
>>
>> Hello Phil,
>>
>> Às 3:42 PM de 12/22/2016, Phil Reid escreveu:
>>> G'day Joao,
>>>
>>> On 22/12/2016 20:38, Joao Pinto wrote:
 When testing stmmac with my QoS reference design I checked a problem in the
 CSR clock configuration that was impossibilitating the phy discovery, since
 every read operation returned 0x. This patch fixes the issue.

 Signed-off-by: Joao Pinto 
 ---
 changes v1->v2 (David Miller)
 - DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the 
 patch
 to make sense

  drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +-
  drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c  | 2 +-
  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c| 8 
  3 files changed, 6 insertions(+), 6 deletions(-)

 diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
 b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
 index b21d03f..94223c8 100644
 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
 +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
 @@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem
 *ioaddr, int mcbins,
  mac->mii.reg_shift = 6;
  mac->mii.reg_mask = 0x07C0;
  mac->mii.clk_csr_shift = 2;
 -mac->mii.clk_csr_mask = 0xF;
 +mac->mii.clk_csr_mask = GENMASK(4, 2);
>>>
>>> Should this not be GENMASK(5,2)
>>
>> According to Universal MAC databook (valid for MAC100 and MAC1000), we have:
>>
>> Bits: 4:2
>> 000 60-100 MHz clk_csr_i/42
>> 001 100-150 MHz clk_csr_i/62
>> 010 20-35 MHz clk_csr_i/16
>> 011 35-60 MHz clk_csr_i/26
>> 100 150-250 MHz clk_csr_i/102
>> 101 250-300 MHz clk_csr_i/124
>> 110, 111 Reserved
>>
>> So only bits 2, 3 and 4 should be masked.
>>
>> Thanks.
>>
> But this is a change in behaviour from what was there isn't.
> Previous mask was 4 bits. now it's 3.
> 
> And for example the altera socfgpa implementation in the cyclone V has valid 
> values
> for values 0x8-0xf, using bit 5:2.

According to the databook, bit 5 is reserved and RO. When reserved, it is
possible to customize. Is that the case? If there is hardware using the 5th bit
we can put it GENMASK(5, 2) with no problem.

> 
> 
> 
> 
>

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-22 Thread David Miller

From: Daniel Borkmann 
Date: Wed, 21 Dec 2016 22:07:48 +0100

> Ok, you mean for net. In that case I prefer the smaller sized fix to
> be honest. It also covers everything from the point where we fetch
> the chain via cops->tcf_chain() to the end of the function, which is
> where most of the complexity resides, and only the two mentioned
> commits do the relock, so as a fix I think it's fine as-is. As
> mentioned, if there's need to refactor tc_ctl_tfilter() net-next
> would be better, imho.

Please, can you two work towards an agreement as to what fix should
go in at this time?

Thanks.

Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Hannes Frederic Sowa

On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
> On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa
>  wrote:
> > On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
> > > Hi Hannes,
> > > 
> > > On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
> > >  wrote:
> > > > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
> > > > You don't want to give people new IPv6 addresses with the same stable
> > > > secret (across reboots) after a kernel upgrade. Maybe they lose
> > > > connectivity then and it is extra work?
> > > 
> > > Ahh, too bad. So it goes.
> > 
> > If no other users survive we can put it into the ipv6 module.
> > 
> > > > The bpf hash stuff can be changed during this merge window, as it is
> > > > not yet in a released kernel. Albeit I would probably have preferred
> > > > something like sha256 here, which can be easily replicated by user
> > > > space tools (minus the problem of patching out references to not
> > > > hashable data, which must be zeroed).
> > > 
> > > Oh, interesting, so time is of the essence then. Do you want to handle
> > > changing the new eBPF code to something not-SHA1 before it's too late,
> > > as part of a ne
> 
> w patchset that can fast track itself to David? And
> > > then I can preserve my large series for the next merge window.
> > 
> > This algorithm should be a non-seeded algorithm, because the hashes
> > should be stable and verifiable by user space tooling. Thus this would
> > need a hashing algorithm that is hardened against pre-image
> > attacks/collision resistance, which siphash is not. I would prefer some
> > higher order SHA algorithm for that actually.
> > 
> 
> You mean:
> 
> commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
> Author: Daniel Borkmann 
> Date:   Sun Dec 4 23:19:41 2016 +0100
> 
> bpf: add prog_digest and expose it via fdinfo/netlink
> 
> 
> Yes, please!  This actually matters for security -- imagine a
> malicious program brute-forcing a collision so that it gets loaded
> wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
> (preferably the latter).  Speed basically doesn't matter here and
> Blake2 is both less stable (didn't they slightly change it recently?)
> and much less well studied.

We don't prevent ebpf programs being loaded based on the digest but
just to uniquely identify loaded programs from user space and match up
with their source.

There have been talks about signing bpf programs, thus this would
probably need another digest algorithm additionally to that one,
wasting probably instructions. Probably going somewhere in direction of
PKCS#7 might be the thing to do (which leads to the problem to make
PKCS#7 attackable by ordinary unpriv users, hmpf).

> My inclination would have been to seed them with something that isn't
> exposed to userspace for the precise reason that it would prevent user
> code from making assumptions about what's in the hash.  But if there's
> a use case for why user code needs to be able to calculate the hash on
> its own, then that's fine.  But perhaps the actual fdinfo string
> should be "sha256:abcd1234..." to give some flexibility down the road.
> 
> Also:
> 
> +   result = (__force __be32 *)fp->digest;
> +   for (i = 0; i < SHA_DIGEST_WORDS; i++)
> +   result[i] = cpu_to_be32(fp->digest[i]);
> 
> Everyone, please, please, please don't open-code crypto primitives.
> Is this and the code above it even correct?  It might be but on a very
> brief glance it looks wrong to me.  If you're doing this to avoid
> depending on crypto, then fix crypto so you can pull in the algorithm
> without pulling in the whole crypto core.

The hashing is not a proper sha1 neither, unfortunately. I think that
is why it will have a custom implementation in iproute2?

I wondered if bpf program loading should have used the module loading
infrastructure from the beginning...

> At the very least, there should be a separate function that calculates
> the hash of a buffer and that function should explicitly run itself
> against test vectors of various lengths to make sure that it's
> calculating what it claims to be calculating.  And it doesn't look
> like the userspace code has landed, so, if this thing isn't
> calculating SHA1 correctly, it's plausible that no one has noticed.

I hope this was known from the beginning, this is not sha1 unfortunately.

But ebpf elf programs also need preprocessing to get rid of some
embedded load-depending data, so maybe it was considered to be just
enough?

Bye,
Hannes

Re: [PATCH v2] stmmac: CSR clock configuration fix

2016-12-22 Thread Phil Reid


On 22/12/2016 23:47, Joao Pinto wrote:


Hello Phil,

Às 3:42 PM de 12/22/2016, Phil Reid escreveu:

G'day Joao,

On 22/12/2016 20:38, Joao Pinto wrote:

When testing stmmac with my QoS reference design I checked a problem in the
CSR clock configuration that was impossibilitating the phy discovery, since
every read operation returned 0x. This patch fixes the issue.

Signed-off-by: Joao Pinto 
---
changes v1->v2 (David Miller)
- DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch
to make sense

 drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c  | 2 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c| 8 
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
index b21d03f..94223c8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem
*ioaddr, int mcbins,
 mac->mii.reg_shift = 6;
 mac->mii.reg_mask = 0x07C0;
 mac->mii.clk_csr_shift = 2;
-mac->mii.clk_csr_mask = 0xF;
+mac->mii.clk_csr_mask = GENMASK(4, 2);


Should this not be GENMASK(5,2)


According to Universal MAC databook (valid for MAC100 and MAC1000), we have:

Bits: 4:2
000 60-100 MHz clk_csr_i/42
001 100-150 MHz clk_csr_i/62
010 20-35 MHz clk_csr_i/16
011 35-60 MHz clk_csr_i/26
100 150-250 MHz clk_csr_i/102
101 250-300 MHz clk_csr_i/124
110, 111 Reserved

So only bits 2, 3 and 4 should be masked.

Thanks.


But this is a change in behaviour from what was there isn't.
Previous mask was 4 bits. now it's 3.

And for example the altera socfgpa implementation in the cyclone V has valid 
values
for values 0x8-0xf, using bit 5:2.





--
Regards
Phil Reid

VERY IMPORTANT

2016-12-22 Thread info

We have an inheritance of a deceased client with your surname. Kindly  
contact me via email:( gertjan.vlie...@yandex.com ) with your full  
names for info


Best Regards.


Dr, Gertjan Vlieghe.
--
Correo Corporativo Hospital Universitario del Valle E.S.E
***

"Estamos re-dimensionandonos para crecer!"

**

Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Andy Lutomirski

On Thu, Dec 22, 2016 at 8:28 AM, Jason A. Donenfeld  wrote:
> Hi all,
>
> I don't know what your design requirements are for this. It looks like
> you're generating some kind of crypto digest of a program, and you
> need to avoid collisions. If you'd like to go with a PRF (keyed hash
> function) that uses some kernel secret key, then I'd strongly suggest
> using Keyed-Blake2. Alternatively, if you need for userspace to be
> able to calculate the same hash, and don't want to use some kernel
> secret, then I'd still suggest using Blake2, which will be fast and
> secure.
>
> If you can wait until January, I'll work on a commit adding the
> primitive to the tree. I've already written it and I just need to get
> things cleaned up.
>
>> Blake2 is both less stable (didn't they slightly change it recently?)
>
> No, Blake2 is very stable. It's also extremely secure and has been
> extensively studied. Not to mention it's faster than SHA2. And if you
> want to use it as a PRF, it's obvious better suited and faster to use
> Blake2's keyed PRF mode than HMAC-SHA2.
>
> If you don't care about performance, and you don't want to use a PRF,
> then just use SHA2-256. If you're particularly concerned about certain
> types of attacks, you could go with SHA2-512 truncated to 256 bytes,
> but somehow I doubt you need this.

I don't think this cares about performance.  (Well, it cares about
performance, but the verifier will likely dominiate the cost by such a
large margin that the hash algo doesn't matter.)  And staying
FIPS-compliant-ish is worth a little bit, so I'd advocate for
something in the SHA2 family.

> If userspace hasn't landed, can we get away with changing this code
> after 4.10? Or should we just fix it before 4.10? Or should we revert
> it before 4.10? Development-policy-things like this I have zero clue
> about, so I heed to your guidance.

I think it should be fixed or reverted before 4.10.

--Andy

Re: [PATCH net] net, sched: fix soft lockup in tc_classify

2016-12-22 Thread Shahar Klein




On 12/21/2016 7:04 PM, Daniel Borkmann wrote:

Shahar reported a soft lockup in tc_classify(), where we run into an
endless loop when walking the classifier chain due to tp->next == tp
which is a state we should never run into. The issue only seems to
trigger under load in the tc control path.

What happens is that in tc_ctl_tfilter(), thread A allocates a new
tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
with it. In that classifier callback we had to unlock/lock the rtnl
mutex and returned with -EAGAIN. One reason why we need to drop there
is, for example, that we need to request an action module to be loaded.

This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
after we loaded and found the requested action, we need to redo the
whole request so we don't race against others. While we had to unlock
rtnl in that time, thread B's request was processed next on that CPU.
Thread B added a new tp instance successfully to the classifier chain.
When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
and destroying its tp instance which never got linked, we goto replay
and redo A's request.

This time when walking the classifier chain in tc_ctl_tfilter() for
checking for existing tp instances we had a priority match and found
the tp instance that was created and linked by thread B. Now calling
again into tp->ops->change() with that tp was successful and returned
without error.

tp_created was never cleared in the second round, thus kernel thinks
that we need to link it into the classifier chain (once again). tp and
*back point to the same object due to the match we had earlier on. Thus
for thread B's already public tp, we reset tp->next to tp itself and
link it into the chain, which eventually causes the mentioned endless
loop in tc_classify() once a packet hits the data path.

Fix is to clear tp_created at the beginning of each request, also when
we replay it. On the paths that can cause -EAGAIN we already destroy
the original tp instance we had and on replay we really need to start
from scratch. It seems that this issue was first introduced in commit
12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
and avoid kernel panic when we use cls_cgroup").

Fixes: 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and 
avoid kernel panic when we use cls_cgroup")
Reported-by: Shahar Klein 
Signed-off-by: Daniel Borkmann 
Cc: Cong Wang 
---
 Shahar, you mentioned you wanted to run again later without the debug
 printk's. Once you do that and come to the same result again, please
 feel free to reply with a Tested-by or such. Thanks everyone!

 net/sched/cls_api.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 3fbba79..1ecdf80 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -148,13 +148,15 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n)
unsigned long cl;
unsigned long fh;
int err;
-   int tp_created = 0;
+   int tp_created;

if ((n->nlmsg_type != RTM_GETTFILTER) &&
!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
return -EPERM;

 replay:
+   tp_created = 0;
+
err = nlmsg_parse(n, sizeof(*t), tca, TCA_MAX, NULL);
if (err < 0)
return err;



I've tested this specific patch (without cong's patch and without the 
many prints) many times and in many test permutations today and it 
didn't lock


Thanks again Daniel!

Shahar

Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Jason A. Donenfeld

On Thu, Dec 22, 2016 at 5:30 PM, Theodore Ts'o  wrote:
> I'd do this first, as one set.  Adding a new file to crypto is
> unlikely to cause merge conflicts.

Ack.

>
>> 2. convert char/random to use siphash. to: ted ts'o's random-next
>
> I'm confused, I thought you had agreed to the batched chacha20
> approach?

Sorry, I meant to write this. Long day, little sleep. Yes, of course.
Batched entropy.

>> 3. move lib/md5.c to static function in crypto/md5.c, remove entry
>> inside of linux/cryptohash.h. to: ??'s ??-next
>
> This is cleanup, so it doesn't matter that much when it happens.  md5
> changes to crypto is also unlikely to cause conflicts, so we could do
> this at the same time as (2), if Herbert (the crypto maintainer) agrees.

Alright, sure.

>
>> 4. move lib/halfmd4.c to static function in fs/ext/hash.c, remove
>> entry inside of linux/cryptohash.c. to: td ts'o's ext-next
>
> This is definitely separate.

Okay, I'll submit it to you separately.

> One more thing.  Can you add some test cases to lib/siphash.h?
> Triggered off of a CONFIG_SIPHASH_REGRESSION_TEST config flag, with
> some test inputs and known outputs?  I'm going to need to add a
> version of siphash to e2fsprogs, and I want to make sure the userspace
> version is implementing the same algorithm as the kernel siphash.

I've already written these. They're behind TEST_HASH. They currently
test every single line of code of all implementations of siphash. I
spent a long time on these. The test vectors themselves were taken
from the SipHash creators' reference publication. Check out
lib/test_siphash.c in my tree.

Jason

On Thu, Dec 22, 2016 at 5:30 PM, Theodore Ts'o  wrote:
> On Thu, Dec 22, 2016 at 05:16:47PM +0100, Jason A. Donenfeld wrote:
>> Could you offer a bit of advice on how to manage dependencies between
>> patchsets during merge windows? I'm a bit new to the process.
>>
>> Specifically, we how have 4 parts:
>> 1. add siphash, and use it for some networking code. to: david miller's 
>> net-next
>
> I'd do this first, as one set.  Adding a new file to crypto is
> unlikely to cause merge conflicts.
>
>> 2. convert char/random to use siphash. to: ted ts'o's random-next
>
> I'm confused, I thought you had agreed to the batched chacha20
> approach?
>
>> 3. move lib/md5.c to static function in crypto/md5.c, remove entry
>> inside of linux/cryptohash.h. to: ??'s ??-next
>
> This is cleanup, so it doesn't matter that much when it happens.  md5
> changes to crypto is also unlikely to cause conflicts, so we could do
> this at the same time as (2), if Herbert (the crypto maintainer) agrees.
>
>> 4. move lib/halfmd4.c to static function in fs/ext/hash.c, remove
>> entry inside of linux/cryptohash.c. to: td ts'o's ext-next
>
> This is definitely separate.
>
> One more thing.  Can you add some test cases to lib/siphash.h?
> Triggered off of a CONFIG_SIPHASH_REGRESSION_TEST config flag, with
> some test inputs and known outputs?  I'm going to need to add a
> version of siphash to e2fsprogs, and I want to make sure the userspace
> version is implementing the same algorithm as the kernel siphash.
>
>   - Ted



-- 
Jason A. Donenfeld
Deep Space Explorer
fr: +33 6 51 90 82 66
us: +1 513 476 1200
www.jasondonenfeld.com
www.zx2c4.com
zx2c4.com/keys/AB9942E6D4A4CFC3412620A749FC7012A5DE03AE.asc

Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Theodore Ts'o

On Thu, Dec 22, 2016 at 05:16:47PM +0100, Jason A. Donenfeld wrote:
> Could you offer a bit of advice on how to manage dependencies between
> patchsets during merge windows? I'm a bit new to the process.
> 
> Specifically, we how have 4 parts:
> 1. add siphash, and use it for some networking code. to: david miller's 
> net-next

I'd do this first, as one set.  Adding a new file to crypto is
unlikely to cause merge conflicts.

> 2. convert char/random to use siphash. to: ted ts'o's random-next

I'm confused, I thought you had agreed to the batched chacha20
approach?

> 3. move lib/md5.c to static function in crypto/md5.c, remove entry
> inside of linux/cryptohash.h. to: ??'s ??-next

This is cleanup, so it doesn't matter that much when it happens.  md5
changes to crypto is also unlikely to cause conflicts, so we could do
this at the same time as (2), if Herbert (the crypto maintainer) agrees.

> 4. move lib/halfmd4.c to static function in fs/ext/hash.c, remove
> entry inside of linux/cryptohash.c. to: td ts'o's ext-next

This is definitely separate.

One more thing.  Can you add some test cases to lib/siphash.h?
Triggered off of a CONFIG_SIPHASH_REGRESSION_TEST config flag, with
some test inputs and known outputs?  I'm going to need to add a
version of siphash to e2fsprogs, and I want to make sure the userspace
version is implementing the same algorithm as the kernel siphash.

  - Ted

Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Jason A. Donenfeld

Hi all,

I don't know what your design requirements are for this. It looks like
you're generating some kind of crypto digest of a program, and you
need to avoid collisions. If you'd like to go with a PRF (keyed hash
function) that uses some kernel secret key, then I'd strongly suggest
using Keyed-Blake2. Alternatively, if you need for userspace to be
able to calculate the same hash, and don't want to use some kernel
secret, then I'd still suggest using Blake2, which will be fast and
secure.

If you can wait until January, I'll work on a commit adding the
primitive to the tree. I've already written it and I just need to get
things cleaned up.

> Blake2 is both less stable (didn't they slightly change it recently?)

No, Blake2 is very stable. It's also extremely secure and has been
extensively studied. Not to mention it's faster than SHA2. And if you
want to use it as a PRF, it's obvious better suited and faster to use
Blake2's keyed PRF mode than HMAC-SHA2.

If you don't care about performance, and you don't want to use a PRF,
then just use SHA2-256. If you're particularly concerned about certain
types of attacks, you could go with SHA2-512 truncated to 256 bytes,
but somehow I doubt you need this.

Anyway, the take away is: stop using SHA1. It's time has come.

> Everyone, please, please, please don't open-code crypto primitives.
> Is this and the code above it even correct?  It might be but on a very

I shuttered looking at this too. How did this possibly make it past
review? The attitude toward crypto here is generally *terrifying*.
Unless you're a professional cryptographer, the only correct attitude
to have is a careful and conservative one.

> At the very least, there should be a separate function that calculates
> the hash of a buffer and that function should explicitly run itself
> against test vectors of various lengths to make sure that it's
> calculating what it claims to be calculating.  And it doesn't look
> like the userspace code has landed, so, if this thing isn't
> calculating SHA1 correctly, it's plausible that no one has noticed.

If userspace hasn't landed, can we get away with changing this code
after 4.10? Or should we just fix it before 4.10? Or should we revert
it before 4.10? Development-policy-things like this I have zero clue
about, so I heed to your guidance.

Jason

[PATCH iproute2 v3 2/4] ifstat: Add extended statistics to ifstat

2016-12-22 Thread Nogah Frankel

Extended stats are part of the RTM_GETSTATS method. This patch adds them
to ifstat.
While extended stats can come in many forms, we support only the
rtnl_link_stats64 struct for them (which is the 64 bits version of struct
rtnl_link_stats).
We support stats in the main nesting level, or one lower.
The extension can be called by its name or any shorten of it. If there is
more than one matched, the first one will be picked.

To get the extended stats the flag -x  is used.

Signed-off-by: Nogah Frankel 
Reviewed-by: Jiri Pirko 
---
 misc/ifstat.c | 161 --
 1 file changed, 146 insertions(+), 15 deletions(-)

diff --git a/misc/ifstat.c b/misc/ifstat.c
index 5bcbcc8..ce666b3 100644
--- a/misc/ifstat.c
+++ b/misc/ifstat.c
@@ -34,6 +34,7 @@
 #include "libnetlink.h"
 #include "json_writer.h"
 #include "SNAPSHOT.h"
+#include "utils.h"
 
 int dump_zeros;
 int reset_history;
@@ -48,17 +49,21 @@ int pretty;
 double W;
 char **patterns;
 int npatterns;
+bool is_extended;
+int filter_type;
+int sub_type;
 
 char info_source[128];
 int source_mismatch;
 
 #define MAXS (sizeof(struct rtnl_link_stats)/sizeof(__u32))
+#define NO_SUB_TYPE 0x
 
 struct ifstat_ent {
struct ifstat_ent   *next;
char*name;
int ifindex;
-   unsigned long long  val[MAXS];
+   __u64   val[MAXS];
double  rate[MAXS];
__u32   ival[MAXS];
 };
@@ -106,6 +111,48 @@ static int match(const char *id)
return 0;
 }
 
+static int get_nlmsg_extended(const struct sockaddr_nl *who,
+ struct nlmsghdr *m, void *arg)
+{
+   struct if_stats_msg *ifsm = NLMSG_DATA(m);
+   struct rtattr *tb[IFLA_STATS_MAX+1];
+   int len = m->nlmsg_len;
+   struct ifstat_ent *n;
+
+   if (m->nlmsg_type != RTM_NEWSTATS)
+   return 0;
+
+   len -= NLMSG_LENGTH(sizeof(*ifsm));
+   if (len < 0)
+   return -1;
+
+   parse_rtattr(tb, IFLA_STATS_MAX, IFLA_STATS_RTA(ifsm), len);
+   if (tb[filter_type] == NULL)
+   return 0;
+
+   n = malloc(sizeof(*n));
+   if (!n)
+   abort();
+
+   n->ifindex = ifsm->ifindex;
+   n->name = strdup(ll_index_to_name(ifsm->ifindex));
+
+   if (sub_type == NO_SUB_TYPE) {
+   memcpy(&n->val, RTA_DATA(tb[filter_type]), sizeof(n->val));
+   } else {
+   struct rtattr *attr;
+
+   attr = parse_rtattr_one_nested(sub_type, tb[filter_type]);
+   if (attr == NULL)
+   return 0;
+   memcpy(&n->val, RTA_DATA(attr), sizeof(n->val));
+   }
+   memset(&n->rate, 0, sizeof(n->rate));
+   n->next = kern_db;
+   kern_db = n;
+   return 0;
+}
+
 static int get_nlmsg(const struct sockaddr_nl *who,
 struct nlmsghdr *m, void *arg)
 {
@@ -147,18 +194,34 @@ static void load_info(void)
 {
struct ifstat_ent *db, *n;
struct rtnl_handle rth;
+   __u32 filter_mask;
 
if (rtnl_open(&rth, 0) < 0)
exit(1);
 
-   if (rtnl_wilddump_request(&rth, AF_INET, RTM_GETLINK) < 0) {
-   perror("Cannot send dump request");
-   exit(1);
-   }
+   if (is_extended) {
+   ll_init_map(&rth);
+   filter_mask = IFLA_STATS_FILTER_BIT(filter_type);
+   if (rtnl_wilddump_stats_req_filter(&rth, AF_UNSPEC, 
RTM_GETSTATS,
+  filter_mask) < 0) {
+   perror("Cannot send dump request");
+   exit(1);
+   }
 
-   if (rtnl_dump_filter(&rth, get_nlmsg, NULL) < 0) {
-   fprintf(stderr, "Dump terminated\n");
-   exit(1);
+   if (rtnl_dump_filter(&rth, get_nlmsg_extended, NULL) < 0) {
+   fprintf(stderr, "Dump terminated\n");
+   exit(1);
+   }
+   } else {
+   if (rtnl_wilddump_request(&rth, AF_INET, RTM_GETLINK) < 0) {
+   perror("Cannot send dump request");
+   exit(1);
+   }
+
+   if (rtnl_dump_filter(&rth, get_nlmsg, NULL) < 0) {
+   fprintf(stderr, "Dump terminated\n");
+   exit(1);
+   }
}
 
rtnl_close(&rth);
@@ -553,10 +616,17 @@ static void update_db(int interval)
}
for (i = 0; i < MAXS; i++) {
double sample;
-   unsigned long incr = h1->ival[i] - 
n->ival[i];
+   __u64 incr;
+
+   if (is_extended) {
+   incr = h1->val[i] - n->val[i];
+

[PATCH iproute2 v3 3/4] ifstat: Add 64 bits based stats to extended statistics

2016-12-22 Thread Nogah Frankel

The default stats for ifstat are 32 bits based.
The kernel supports 64 bits based stats. (They are returned in struct
rtnl_link_stats64 which is an exact copy of struct rtnl_link_stats, in
which the "normal" stats are returned, but with fields of u64 instead of
u32). This patch adds them as an extended stats.

It is read with filter type IFLA_STATS_LINK_64 and no sub type.

It is under the name 64bits
(or any shorten of it as "64")

For example:
ifstat -x 64bit

Signed-off-by: Nogah Frankel 
Reviewed-by: Jiri Pirko 
---
 misc/ifstat.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/misc/ifstat.c b/misc/ifstat.c
index ce666b3..8325ac7 100644
--- a/misc/ifstat.c
+++ b/misc/ifstat.c
@@ -729,7 +729,8 @@ static int verify_forging(int fd)
 static void xstat_usage(void)
 {
fprintf(stderr,
-"Usage: ifstat supported xstats:\n");
+"Usage: ifstat supported xstats:\n"
+"   64bits default stats, with 64 bits support\n");
 }
 
 struct extended_stats_options_t {
@@ -743,6 +744,7 @@ struct extended_stats_options_t {
  * Name length must be under 64 chars.
  */
 static const struct extended_stats_options_t extended_stats_options[] = {
+   {"64bits", IFLA_STATS_LINK_64, NO_SUB_TYPE},
 };
 
 static bool get_filter_type(char *name)
-- 
2.4.3

Re: [PATCH net] ipvlan: fix multicast processing

2016-12-22 Thread David Miller

From: Mahesh Bandewar 
Date: Wed, 21 Dec 2016 17:30:16 -0800

> From: Mahesh Bandewar 
> 
> In an IPvlan setup when master is set in loopback mode e.g.
> 
>   ethtool -K eth0 set loopback on
> 
>   where eth0 is master device for IPvlan setup.
> 
> The failure is caused by the faulty logic that determines if the
> packet is from TX-path vs. RX-path by just looking at the mac-
> addresses on the packet while processing multicast packets.
> 
> In the loopback-mode where this crash was happening, the packets
> that are sent out are reflected by the NIC and are processed on
> the RX path, but mac-address check tricks into thinking this
> packet is from TX path and falsely uses dev_forward_skb() to pass
> packets to the slave (virtual) devices.
> 
> This patch records the path while queueing packets and eliminates
> logic of looking at mac-addresses for the same decision.
 ...
> Fixes: ba35f8588f47 ("ipvlan: Defer multicast / broadcast processing to a 
> work-queue")
> Signed-off-by: Mahesh Bandewar 

This looks a lot better, applied, thanks.

Re: [PATCH net 1/1] tipc: don't send FIN message from connectionless socket

2016-12-22 Thread David Miller

From: Jon Maloy 
Date: Thu, 22 Dec 2016 07:22:29 -0500

> In commit 6f00089c7372 ("tipc: remove SS_DISCONNECTING state") the
> check for socket type is in the wrong place, causing a closing socket
> to always send out a FIN message even when the socket was never
> connected. This is normally harmless, since the destination node for
> such messages most often is zero, and the message will be dropped, but
> it is still a wrong and confusing behavior.
> 
> We fix this in this commit.
> 
> Reviewed-by: Parthasarathy Bhuvaragan 
> Signed-off-by: Jon Maloy 

Applied.

[PATCH iproute2 v3 1/4] ifstat: Includes reorder

2016-12-22 Thread Nogah Frankel

Reorder the includes order in misc/ifstat.c to match convention.

Signed-off-by: Nogah Frankel 
Reviewed-by: Jiri Pirko 
---
 misc/ifstat.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/misc/ifstat.c b/misc/ifstat.c
index 92d67b0..5bcbcc8 100644
--- a/misc/ifstat.c
+++ b/misc/ifstat.c
@@ -28,12 +28,12 @@
 #include 
 #include 
 
-#include 
-#include 
 #include 
 #include 
 
-#include 
+#include "libnetlink.h"
+#include "json_writer.h"
+#include "SNAPSHOT.h"
 
 int dump_zeros;
 int reset_history;
-- 
2.4.3

[PATCH iproute2 v3 0/4] update ifstat for new stats

2016-12-22 Thread Nogah Frankel

Previously stats were gotten by RTM_GETLINK which returns 32 bits based
statistics. It supports only one type of stats.
Lately, a new method to get stats was added - RTM_GETSTATS. It supports
ability to choose stats type. The basic stats were changed from 32 bits
based to 64 bits based.

This patchset adds ifstat the ability to get extended stats by this
method. Its adds two types of extended stats:
64bits - the same as the "normal" stats but get the stats from the cpu
in 64 bits based struct.
SW - for packets that hit cpu.

---
v2->v3:
- patch 1/4:
 - add a new patch to reorder includes in misc/ifstat.c
- patch 2/4: (previously 1/3)
 - fix typos.
 - change error print to use fprintf.

v1->v2:
 - change from using RTM_GETSTATS always to using it only for extended
   stats.
 - Add 64bits extended stats type.

Nogah Frankel (4):
  ifstat: Includes reorder
  ifstat: Add extended statistics to ifstat
  ifstat: Add 64 bits based stats to extended statistics
  ifstat: Add "sw only" extended statistics to ifstat

 misc/ifstat.c | 171 +++---
 1 file changed, 153 insertions(+), 18 deletions(-)

-- 
2.4.3

[PATCH iproute2 v3 4/4] ifstat: Add "sw only" extended statistics to ifstat

2016-12-22 Thread Nogah Frankel

Add support for extended statistics of SW only type, for counting only the
packets that went via the cpu. (useful for systems with forward
offloading). It reads it from filter type IFLA_STATS_LINK_OFFLOAD_XSTATS
and sub type IFLA_OFFLOAD_XSTATS_CPU_HIT.

It is under the name 'software'
(or any shorten of it as 'soft' or simply 's')

For example:
ifstat -x s

Signed-off-by: Nogah Frankel 
Reviewed-by: Jiri Pirko 
---
 misc/ifstat.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/misc/ifstat.c b/misc/ifstat.c
index 8325ac7..79c0dba 100644
--- a/misc/ifstat.c
+++ b/misc/ifstat.c
@@ -730,7 +730,8 @@ static void xstat_usage(void)
 {
fprintf(stderr,
 "Usage: ifstat supported xstats:\n"
-"   64bits default stats, with 64 bits support\n");
+"   64bits default stats, with 64 bits support\n"
+"   softwareSW stats. Counts only packets that went via the 
CPU\n");
 }
 
 struct extended_stats_options_t {
@@ -745,6 +746,7 @@ struct extended_stats_options_t {
  */
 static const struct extended_stats_options_t extended_stats_options[] = {
{"64bits", IFLA_STATS_LINK_64, NO_SUB_TYPE},
+   {"software",  IFLA_STATS_LINK_OFFLOAD_XSTATS, 
IFLA_OFFLOAD_XSTATS_CPU_HIT},
 };
 
 static bool get_filter_type(char *name)
-- 
2.4.3

Re: [PATCH net] net/mlx4_en: Fix user prio field in XDP forward

2016-12-22 Thread David Miller

From: Tariq Toukan 
Date: Thu, 22 Dec 2016 14:32:58 +0200

> The user prio field is wrong (and overflows) in the XDP forward
> flow.
> This is a result of a bad value for num_tx_rings_p_up, which should
> account all XDP TX rings, as they operate for the same user prio.
> 
> Signed-off-by: Tariq Toukan 
> Reported-by: Martin KaFai Lau 

Applied.

Re: [PATCH v2] stmmac: CSR clock configuration fix

2016-12-22 Thread David Miller

From: Joao Pinto 
Date: Thu, 22 Dec 2016 12:38:00 +

> When testing stmmac with my QoS reference design I checked a problem in the
> CSR clock configuration that was impossibilitating the phy discovery, since
> every read operation returned 0x. This patch fixes the issue.
> 
> Signed-off-by: Joao Pinto 

Applied.

Re: [PATCH v2 net] ipvlan: fix various issues in ipvlan_process_multicast()

2016-12-22 Thread David Miller

From: Eric Dumazet 
Date: Wed, 21 Dec 2016 18:00:24 -0800

> From: Eric Dumazet 
> 
> 1) netif_rx() / dev_forward_skb() should not be called from process
> context.
> 
> 2) ipvlan_count_rx() should be called with preemption disabled.
> 
> 3) We should check if ipvlan->dev is up before feeding packets
> to netif_rx()
> 
> 4) We need to prevent device from disappearing if some packets
> are in the multicast backlog.
> 
> 5) One kfree_skb() should be a consume_skb() eventually
> 
> Fixes: ba35f8588f47 ("ipvlan: Defer multicast / broadcast processing to
> a work-queue")
> Signed-off-by: Eric Dumazet 
> Cc: Mahesh Bandewar 

Applied.

ipv6: handle -EFAULT from skb_copy_bits

2016-12-22 Thread Dave Jones

By setting certain socket options on ipv6 raw sockets, we can confuse the
length calculation in rawv6_push_pending_frames triggering a BUG_ON.

RIP: 0010:[] [] rawv6_sendmsg+0xc30/0xc40
RSP: 0018:881f6c4a7c18  EFLAGS: 00010282
RAX: fff2 RBX: 881f6c681680 RCX: 0002
RDX: 881f6c4a7cf8 RSI: 0030 RDI: 881fed0f6a00
RBP: 881f6c4a7da8 R08:  R09: 0009
R10: 881fed0f6a00 R11: 0009 R12: 0030
R13: 881fed0f6a00 R14: 881fee39ba00 R15: 881fefa93a80

Call Trace:
 [] ? unmap_page_range+0x693/0x830
 [] inet_sendmsg+0x67/0xa0
 [] sock_sendmsg+0x38/0x50
 [] SYSC_sendto+0xef/0x170
 [] SyS_sendto+0xe/0x10
 [] do_syscall_64+0x50/0xa0
 [] entry_SYSCALL64_slow_path+0x25/0x25

Handle by jumping to the failure path if skb_copy_bits gets an EFAULT.

Reproducer:

#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define LEN 504

int main(int argc, char* argv[])
{
int fd;
int zero = 0;
char buf[LEN];

memset(buf, 0, LEN);

fd = socket(AF_INET6, SOCK_RAW, 7);

setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);

sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
}

Signed-off-by: Dave Jones 

diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 291ebc260e70..ea89073c8247 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -591,7 +591,11 @@ static int rawv6_push_pending_frames(struct sock *sk, 
struct flowi6 *fl6,
}
 
offset += skb_transport_offset(skb);
-   BUG_ON(skb_copy_bits(skb, offset, &csum, 2));
+   err = skb_copy_bits(skb, offset, &csum, 2);
+   if (err < 0) {
+   ip6_flush_pending_frames(sk);
+   goto out;
+   }
 
/* in case cksum was not initialized */
if (unlikely(csum))

Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Jason A. Donenfeld

Hi Ted,

On Thu, Dec 22, 2016 at 4:58 PM, Theodore Ts'o  wrote:
> Can we do this as a separate series, please?  At this point, it's a
> completely separate change from a logical perspective, and we can take
> in the change through the random.git tree.
>
> Changes that touch files that are normally changed in several
> different git trees leads to the potential for merge conflicts during
> the linux-next integration and merge window processes.  Which is why
> it's generally best to try to isolate changes as much as possible.

Sure, I can separate things out.

Could you offer a bit of advice on how to manage dependencies between
patchsets during merge windows? I'm a bit new to the process.

Specifically, we how have 4 parts:
1. add siphash, and use it for some networking code. to: david miller's net-next
2. convert char/random to use siphash. to: ted ts'o's random-next
3. move lib/md5.c to static function in crypto/md5.c, remove entry
inside of linux/cryptohash.h. to: ??'s ??-next
4. move lib/halfmd4.c to static function in fs/ext/hash.c, remove
entry inside of linux/cryptohash.c. to: td ts'o's ext-next

Problem: 2 depends on 1, 3 depends on 1 & 2. But this can be
simplified into 3 parts:

1. add siphash, and use it for some networking code. to: david miller's net-next
2. convert char/random to use siphash, move lib/md5.c to static
function in crypto/md5.c, remove entry inside of linux/cryptohash.h.
to: ted ts'o's random-next
3. move lib/halfmd4.c to static function in fs/ext/hash.c, remove
entry inside of linux/cryptohash.c. to: td ts'o's ext-next

Problem: 2 depends on 1. Is that okay with you?

Also, would you like me to merge (3) and (2) of the second list into
one series for you?

Jason

Re: pull-request: wireless-drivers 2016-12-22

2016-12-22 Thread David Miller

From: Kalle Valo 
Date: Thu, 22 Dec 2016 17:11:27 +0200

> before the holidays a really small pull request for 4.10. I just want to
> have the regressions in rtlwifi and ath9k fixed quickly so I send this
> earlier than I normally would.
> 
> Please let me know if there are any problems.

Pulled, thanks.

Re: [PATCH net] net: ipv4: Don't crash if passing a null sk to ip_do_redirect.

2016-12-22 Thread David Miller

From: Lorenzo Colitti 
Date: Fri, 23 Dec 2016 00:33:57 +0900

> Commit e2d118a1cb5e ("net: inet: Support UID-based routing in IP
> protocols.") made ip_do_redirect call sock_net(sk) to determine
> the network namespace of the passed-in socket. This crashes if sk
> is NULL.
> 
> Fix this by getting the network namespace from the skb instead.
> 
> Fixes: e2d118a1cb5e ("net: inet: Support UID-based routing in IP protocols.")
> Signed-off-by: Lorenzo Colitti 

Applied, thanks Lorenzo.

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread Andy Lutomirski

On Wed, Dec 21, 2016 at 6:07 PM, Andy Lutomirski  wrote:
> On Wed, Dec 21, 2016 at 5:13 PM, George Spelvin
>  wrote:
>> As a separate message, to disentangle the threads, I'd like to
>> talk about get_random_long().
>>
>> After some thinking, I still like the "state-preserving" construct
>> that's equivalent to the current MD5 code.  Yes, we could just do
>> siphash(current_cpu || per_cpu_counter, global_key), but it's nice to
>> preserve a bit more.
>>
>> It requires library support from the SipHash code to return the full
>> SipHash state, but I hope that's a fair thing to ask for.
>
> I don't even think it needs that.  This is just adding a
> non-destructive final operation, right?
>
>>
>> Here's my current straw man design for comment.  It's very similar to
>> the current MD5-based design, but feeds all the seed material in the
>> "correct" way, as opposed to Xring directly into the MD5 state.
>>
>> * Each CPU has a (Half)SipHash state vector,
>>   "unsigned long get_random_int_hash[4]".  Unlike the current
>>   MD5 code, we take care to initialize it to an asymmetric state.
>>
>> * There's a global 256-bit random_int_secret (which we could
>>   reseed periodically).
>>
>> To generate a random number:
>> * If get_random_int_hash is all-zero, seed it with fresh a half-sized
>>   SipHash key and the appropriate XOR constants.
>> * Generate three words of random_get_entropy(), jiffies, and current->pid.
>>   (This is arbitary seed material, copied from the current code.)
>> * Crank through that with (Half)SipHash-1-0.
>> * Crank through the random_int_secret with (Half)SipHash-1-0.
>> * Return v1 ^ v3.
>
> Just to clarify, if we replace SipHash with a black box, I think this
> effectively means, where "entropy" is random_get_entropy() || jiffies
> || current->pid:
>
> The first call returns H(random seed || entropy_0 || secret).  The
> second call returns H(random seed || entropy_0 || secret || entropy_1
> || secret).  Etc.

Having slept on this, I like it less.  The problem is that a
backtracking attacker doesn't just learn H(random seed || entropy_0 ||
secret || ...) -- they learn the internal state of the hash function
that generates that value.  This probably breaks any attempt to apply
security properties of the hash function.  For example, the internal
state could easily contain a whole bunch of prior outputs it in
verbatim.

--Andy

BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Andy Lutomirski

On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa
 wrote:
> On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
>> Hi Hannes,
>>
>> On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
>>  wrote:
>> > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
>> > You don't want to give people new IPv6 addresses with the same stable
>> > secret (across reboots) after a kernel upgrade. Maybe they lose
>> > connectivity then and it is extra work?
>>
>> Ahh, too bad. So it goes.
>
> If no other users survive we can put it into the ipv6 module.
>
>> > The bpf hash stuff can be changed during this merge window, as it is
>> > not yet in a released kernel. Albeit I would probably have preferred
>> > something like sha256 here, which can be easily replicated by user
>> > space tools (minus the problem of patching out references to not
>> > hashable data, which must be zeroed).
>>
>> Oh, interesting, so time is of the essence then. Do you want to handle
>> changing the new eBPF code to something not-SHA1 before it's too late,
>> as part of a ne
w patchset that can fast track itself to David? And
>> then I can preserve my large series for the next merge window.
>
> This algorithm should be a non-seeded algorithm, because the hashes
> should be stable and verifiable by user space tooling. Thus this would
> need a hashing algorithm that is hardened against pre-image
> attacks/collision resistance, which siphash is not. I would prefer some
> higher order SHA algorithm for that actually.
>

You mean:

commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
Author: Daniel Borkmann 
Date:   Sun Dec 4 23:19:41 2016 +0100

bpf: add prog_digest and expose it via fdinfo/netlink

Yes, please!  This actually matters for security -- imagine a
malicious program brute-forcing a collision so that it gets loaded
wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
(preferably the latter).  Speed basically doesn't matter here and
Blake2 is both less stable (didn't they slightly change it recently?)
and much less well studied.

My inclination would have been to seed them with something that isn't
exposed to userspace for the precise reason that it would prevent user
code from making assumptions about what's in the hash.  But if there's
a use case for why user code needs to be able to calculate the hash on
its own, then that's fine.  But perhaps the actual fdinfo string
should be "sha256:abcd1234..." to give some flexibility down the road.

Also:

+   result = (__force __be32 *)fp->digest;
+   for (i = 0; i < SHA_DIGEST_WORDS; i++)
+   result[i] = cpu_to_be32(fp->digest[i]);

Everyone, please, please, please don't open-code crypto primitives.
Is this and the code above it even correct?  It might be but on a very
brief glance it looks wrong to me.  If you're doing this to avoid
depending on crypto, then fix crypto so you can pull in the algorithm
without pulling in the whole crypto core.

At the very least, there should be a separate function that calculates
the hash of a buffer and that function should explicitly run itself
against test vectors of various lengths to make sure that it's
calculating what it claims to be calculating.  And it doesn't look
like the userspace code has landed, so, if this thing isn't
calculating SHA1 correctly, it's plausible that no one has noticed.

--Andy

Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Theodore Ts'o

On Thu, Dec 22, 2016 at 07:03:29AM +0100, Jason A. Donenfeld wrote:
> I find this compelling. We'll have one csprng for both
> get_random_int/long and for /dev/urandom, and we'll be able to update
> that silly warning on the comment of get_random_int/long to read
> "gives output of either rdrand quality or of /dev/urandom quality",
> which makes it more useful for other things. It introduces less error
> prone code, and it lets the RNG analysis be spent on just one RNG, not
> two.
> 
> So, with your blessing, I'm going to move ahead with implementing a
> pretty version of this for v8.

Can we do this as a separate series, please?  At this point, it's a
completely separate change from a logical perspective, and we can take
in the change through the random.git tree.

Changes that touch files that are normally changed in several
different git trees leads to the potential for merge conflicts during
the linux-next integration and merge window processes.  Which is why
it's generally best to try to isolate changes as much as possible.

Cheers,

- Ted

Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Theodore Ts'o

On Thu, Dec 22, 2016 at 02:10:33PM +0100, Jason A. Donenfeld wrote:
> On Thu, Dec 22, 2016 at 1:47 PM, Hannes Frederic Sowa
>  wrote:
> > following up on what appears to be a random subject: ;)
> >
> > IIRC, ext4 code by default still uses half_md4 for hashing of filenames
> > in the htree. siphash seems to fit this use case pretty good.
> 
> I saw this too. I'll try to address it in v8 of this series.

This is a separate issue, and this series is getting a bit too
complex.  So I'd suggest pushing this off to a separate change.

Changing the htree hash algorithm is an on-disk format change, and so
we couldn't roll it out until e2fsprogs gets updated and rolled out
pretty broadley.  In fact George sent me patches to add siphash as a
hash algorithm for htree a while back (for both the kernel and
e2fsprogs), but I never got around to testing and applying them,
mainly because while it's technically faster, I had other higher
priority issues to work on --- and see previous comments regarding
pixel peeping.  Improving the hash algorithm by tens or even hundreds
of nanoseconds isn't really going to matter since we only do a htree
lookup on a file creation or cold cache lookup, and the SSD or HDD I/O
times will dominate.  And from the power perspective, saving
microwatts of CPU power isn't going to matter if you're going to be
spinning up the storage device

- Ted

Re: [PATCH] ethtool: add one ethtool option to set relax ordering mode

2016-12-22 Thread Alexander Duyck

On Wed, Dec 21, 2016 at 5:39 PM, maowenan  wrote:
>
>
>> -Original Message-
>> From: Stephen Hemminger [mailto:step...@networkplumber.org]
>> Sent: Thursday, December 22, 2016 9:28 AM
>> To: maowenan
>> Cc: netdev@vger.kernel.org; jeffrey.t.kirs...@intel.com
>> Subject: Re: [PATCH] ethtool: add one ethtool option to set relax ordering 
>> mode
>>
>> On Thu, 8 Dec 2016 14:51:38 +0800
>> Mao Wenan  wrote:
>>
>> > This patch provides one way to set/unset IXGBE NIC TX and RX relax
>> > ordering mode, which can be set by ethtool.
>> > Relax ordering is one mode of 82599 NIC, to enable this mode can
>> > enhance the performance for some cpu architecure.
>>
>> Then it should be done by CPU architecture specific quirks (preferably in PCI
>> layer) so that all users get the option without having to do manual 
>> intervention.
>>
>> > example:
>> > ethtool -s enp1s0f0 relaxorder off
>> > ethtool -s enp1s0f0 relaxorder on
>>
>> Doing it via ethtool is a developer API (for testing) not something that 
>> makes
>> sense in production.
>
>
> This feature is not mandatory for all users, acturally relax ordering default 
> configuration of 82599 is 'disable',
> So this patch gives one way to enable relax ordering to be selected in some 
> performance condition.

That isn't quite true.  The default for Sparc systems is to have it enabled.

Really this is something that is platform specific.  I agree with
Stephen that it would work better if this was handled as a series of
platform specific quirks handled at something like the PCI layer
rather than be a switch the user can toggle on and off.

With that being said there are changes being made that should help to
improve the situation.  Specifically I am looking at adding support
for the DMA_ATTR_WEAK_ORDERING which may also allow us to identify
cases where you might be able to specify the DMA behavior via the DMA
mapping instead of having to make the final decision in the device
itself.

- Alex

Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Jason A. Donenfeld

On Thu, Dec 22, 2016 at 4:51 PM, Hannes Frederic Sowa
 wrote:
> This algorithm should be a non-seeded algorithm, because the hashes
> should be stable and verifiable by user space tooling. Thus this would
> need a hashing algorithm that is hardened against pre-image
> attacks/collision resistance, which siphash is not. I would prefer some
> higher order SHA algorithm for that actually.

Right. SHA-256, SHA-512/256, Blake2s, or Blake2b would probably be
good candidates for this.

Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Hannes Frederic Sowa

On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
> Hi Hannes,
> 
> On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
>  wrote:
> > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
> > You don't want to give people new IPv6 addresses with the same stable
> > secret (across reboots) after a kernel upgrade. Maybe they lose
> > connectivity then and it is extra work?
> 
> Ahh, too bad. So it goes.

If no other users survive we can put it into the ipv6 module.

> > The bpf hash stuff can be changed during this merge window, as it is
> > not yet in a released kernel. Albeit I would probably have preferred
> > something like sha256 here, which can be easily replicated by user
> > space tools (minus the problem of patching out references to not
> > hashable data, which must be zeroed).
> 
> Oh, interesting, so time is of the essence then. Do you want to handle
> changing the new eBPF code to something not-SHA1 before it's too late,
> as part of a new patchset that can fast track itself to David? And
> then I can preserve my large series for the next merge window.

This algorithm should be a non-seeded algorithm, because the hashes
should be stable and verifiable by user space tooling. Thus this would
need a hashing algorithm that is hardened against pre-image
attacks/collision resistance, which siphash is not. I would prefer some
higher order SHA algorithm for that actually.

Bye,
Hannes

Re: [PATCH v2] stmmac: CSR clock configuration fix

2016-12-22 Thread Phil Reid


G'day Joao,

On 22/12/2016 20:38, Joao Pinto wrote:

When testing stmmac with my QoS reference design I checked a problem in the
CSR clock configuration that was impossibilitating the phy discovery, since
every read operation returned 0x. This patch fixes the issue.

Signed-off-by: Joao Pinto 
---
changes v1->v2 (David Miller)
- DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch
to make sense

 drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c  | 2 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c| 8 
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
index b21d03f..94223c8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem 
*ioaddr, int mcbins,
mac->mii.reg_shift = 6;
mac->mii.reg_mask = 0x07C0;
mac->mii.clk_csr_shift = 2;
-   mac->mii.clk_csr_mask = 0xF;
+   mac->mii.clk_csr_mask = GENMASK(4, 2);


Should this not be GENMASK(5,2)



/* Get and dump the chip ID */
*synopsys_id = stmmac_get_synopsys_id(hwid);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
index a1d582f..8a40e69 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
@@ -197,7 +197,7 @@ struct mac_device_info *dwmac100_setup(void __iomem 
*ioaddr, int *synopsys_id)
mac->mii.reg_shift = 6;
mac->mii.reg_mask = 0x07C0;
mac->mii.clk_csr_shift = 2;
-   mac->mii.clk_csr_mask = 0xF;
+   mac->mii.clk_csr_mask = GENMASK(4, 2);

same as above?



/* Synopsys Id is not available on old chips */
*synopsys_id = 0;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index 23322fd..fda01f7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -81,8 +81,8 @@ static int stmmac_mdio_read(struct mii_bus *bus, int phyaddr, 
int phyreg)
value |= (phyaddr << priv->hw->mii.addr_shift)
& priv->hw->mii.addr_mask;
value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask;
-   value |= (priv->clk_csr & priv->hw->mii.clk_csr_mask)
-   << priv->hw->mii.clk_csr_shift;
+   value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
+   & priv->hw->mii.clk_csr_mask;
if (priv->plat->has_gmac4)
value |= MII_GMAC4_READ;

@@ -122,8 +122,8 @@ static int stmmac_mdio_write(struct mii_bus *bus, int 
phyaddr, int phyreg,
& priv->hw->mii.addr_mask;
value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask;

-   value |= ((priv->clk_csr & priv->hw->mii.clk_csr_mask)
-   << priv->hw->mii.clk_csr_shift);
+   value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
+   & priv->hw->mii.clk_csr_mask;
if (priv->plat->has_gmac4)
value |= MII_GMAC4_WRITE;





--
Regards
Phil Reid

Re: [PATCH v2] stmmac: CSR clock configuration fix

2016-12-22 Thread Joao Pinto


Hello Phil,

Às 3:42 PM de 12/22/2016, Phil Reid escreveu:
> G'day Joao,
> 
> On 22/12/2016 20:38, Joao Pinto wrote:
>> When testing stmmac with my QoS reference design I checked a problem in the
>> CSR clock configuration that was impossibilitating the phy discovery, since
>> every read operation returned 0x. This patch fixes the issue.
>>
>> Signed-off-by: Joao Pinto 
>> ---
>> changes v1->v2 (David Miller)
>> - DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch
>> to make sense
>>
>>  drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +-
>>  drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c  | 2 +-
>>  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c| 8 
>>  3 files changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>> b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>> index b21d03f..94223c8 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>> @@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem
>> *ioaddr, int mcbins,
>>  mac->mii.reg_shift = 6;
>>  mac->mii.reg_mask = 0x07C0;
>>  mac->mii.clk_csr_shift = 2;
>> -mac->mii.clk_csr_mask = 0xF;
>> +mac->mii.clk_csr_mask = GENMASK(4, 2);
> 
> Should this not be GENMASK(5,2)

According to Universal MAC databook (valid for MAC100 and MAC1000), we have:

Bits: 4:2
000 60-100 MHz clk_csr_i/42
001 100-150 MHz clk_csr_i/62
010 20-35 MHz clk_csr_i/16
011 35-60 MHz clk_csr_i/26
100 150-250 MHz clk_csr_i/102
101 250-300 MHz clk_csr_i/124
110, 111 Reserved

So only bits 2, 3 and 4 should be masked.

Thanks.

> 
>>
>>  /* Get and dump the chip ID */
>>  *synopsys_id = stmmac_get_synopsys_id(hwid);
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
>> b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
>> index a1d582f..8a40e69 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
>> @@ -197,7 +197,7 @@ struct mac_device_info *dwmac100_setup(void __iomem
>> *ioaddr, int *synopsys_id)
>>  mac->mii.reg_shift = 6;
>>  mac->mii.reg_mask = 0x07C0;
>>  mac->mii.clk_csr_shift = 2;
>> -mac->mii.clk_csr_mask = 0xF;
>> +mac->mii.clk_csr_mask = GENMASK(4, 2);
> same as above?
> 
>>
>>  /* Synopsys Id is not available on old chips */
>>  *synopsys_id = 0;
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
>> b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
>> index 23322fd..fda01f7 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
>> @@ -81,8 +81,8 @@ static int stmmac_mdio_read(struct mii_bus *bus, int
>> phyaddr, int phyreg)
>>  value |= (phyaddr << priv->hw->mii.addr_shift)
>>  & priv->hw->mii.addr_mask;
>>  value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask;
>> -value |= (priv->clk_csr & priv->hw->mii.clk_csr_mask)
>> -<< priv->hw->mii.clk_csr_shift;
>> +value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
>> +& priv->hw->mii.clk_csr_mask;
>>  if (priv->plat->has_gmac4)
>>  value |= MII_GMAC4_READ;
>>
>> @@ -122,8 +122,8 @@ static int stmmac_mdio_write(struct mii_bus *bus, int
>> phyaddr, int phyreg,
>>  & priv->hw->mii.addr_mask;
>>  value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask;
>>
>> -value |= ((priv->clk_csr & priv->hw->mii.clk_csr_mask)
>> -<< priv->hw->mii.clk_csr_shift);
>> +value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
>> +& priv->hw->mii.clk_csr_mask;
>>  if (priv->plat->has_gmac4)
>>  value |= MII_GMAC4_WRITE;
>>
>>
> 
>

Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Jason A. Donenfeld

Hi Hannes,

On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
 wrote:
> IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
> You don't want to give people new IPv6 addresses with the same stable
> secret (across reboots) after a kernel upgrade. Maybe they lose
> connectivity then and it is extra work?

Ahh, too bad. So it goes.

> The bpf hash stuff can be changed during this merge window, as it is
> not yet in a released kernel. Albeit I would probably have preferred
> something like sha256 here, which can be easily replicated by user
> space tools (minus the problem of patching out references to not
> hashable data, which must be zeroed).

Oh, interesting, so time is of the essence then. Do you want to handle
changing the new eBPF code to something not-SHA1 before it's too late,
as part of a new patchset that can fast track itself to David? And
then I can preserve my large series for the next merge window.

Jason

Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Hannes Frederic Sowa

On Thu, 2016-12-22 at 16:29 +0100, Jason A. Donenfeld wrote:
> On Thu, Dec 22, 2016 at 4:12 PM, Jason A. Donenfeld  wrote:
> > As a first step, I'm considering adding a patch to move halfmd4.c
> > inside the ext4 domain, or at the very least, simply remove it from
> > linux/cryptohash.h. That'll then leave the handful of bizarre sha1
> > usages to consider.
> 
> Specifically something like this:
> 
> https://git.zx2c4.com/linux-dev/commit/?h=siphash&id=978213351f9633bd1e3d1fdc3f19d28e36eeac90
> 
> That only leaves two more uses of "cryptohash" to consider, but they
> require a bit of help. First, sha_transform in net/ipv6/addrconf.c.
> That might be a straight-forward conversion to SipHash, but perhaps
> not; I need to look closely and think about it. The next is
> sha_transform in kernel/bpf/core.c. I really have no idea what's going
> on with the eBPF stuff, so that will take a bit longer to study. Maybe
> sha1 is fine in the end there? I'm not sure yet.

IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
You don't want to give people new IPv6 addresses with the same stable
secret (across reboots) after a kernel upgrade. Maybe they lose
connectivity then and it is extra work?

The bpf hash stuff can be changed during this merge window, as it is
not yet in a released kernel. Albeit I would probably have preferred
something like sha256 here, which can be easily replicated by user
space tools (minus the problem of patching out references to not
hashable data, which must be zeroed).

Bye,
Hannes

[PATCH net] net: ipv4: Don't crash if passing a null sk to ip_do_redirect.

2016-12-22 Thread Lorenzo Colitti

Commit e2d118a1cb5e ("net: inet: Support UID-based routing in IP
protocols.") made ip_do_redirect call sock_net(sk) to determine
the network namespace of the passed-in socket. This crashes if sk
is NULL.

Fix this by getting the network namespace from the skb instead.

Fixes: e2d118a1cb5e ("net: inet: Support UID-based routing in IP protocols.")
Signed-off-by: Lorenzo Colitti 
---
 net/ipv4/route.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index fa5c037227..9eabf49013 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -798,6 +798,7 @@ static void ip_do_redirect(struct dst_entry *dst, struct 
sock *sk, struct sk_buf
struct rtable *rt;
struct flowi4 fl4;
const struct iphdr *iph = (const struct iphdr *) skb->data;
+   struct net *net = dev_net(skb->dev);
int oif = skb->dev->ifindex;
u8 tos = RT_TOS(iph->tos);
u8 prot = iph->protocol;
@@ -805,7 +806,7 @@ static void ip_do_redirect(struct dst_entry *dst, struct 
sock *sk, struct sk_buf
 
rt = (struct rtable *) dst;
 
-   __build_flow_key(sock_net(sk), &fl4, sk, iph, oif, tos, prot, mark, 0);
+   __build_flow_key(net, &fl4, sk, iph, oif, tos, prot, mark, 0);
__ip_do_redirect(rt, skb, &fl4, true);
 }
 
-- 
2.8.0.rc3.226.g39d4020

Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Jason A. Donenfeld

On Thu, Dec 22, 2016 at 4:12 PM, Jason A. Donenfeld  wrote:
> As a first step, I'm considering adding a patch to move halfmd4.c
> inside the ext4 domain, or at the very least, simply remove it from
> linux/cryptohash.h. That'll then leave the handful of bizarre sha1
> usages to consider.

Specifically something like this:

https://git.zx2c4.com/linux-dev/commit/?h=siphash&id=978213351f9633bd1e3d1fdc3f19d28e36eeac90

That only leaves two more uses of "cryptohash" to consider, but they
require a bit of help. First, sha_transform in net/ipv6/addrconf.c.
That might be a straight-forward conversion to SipHash, but perhaps
not; I need to look closely and think about it. The next is
sha_transform in kernel/bpf/core.c. I really have no idea what's going
on with the eBPF stuff, so that will take a bit longer to study. Maybe
sha1 is fine in the end there? I'm not sure yet.

Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Jason A. Donenfeld

On Thu, Dec 22, 2016 at 4:05 PM, Hannes Frederic Sowa
 wrote:
> This change would need a new version of the ext4 super block, because
> you should not change existing file systems.

Right.

As a first step, I'm considering adding a patch to move halfmd4.c
inside the ext4 domain, or at the very least, simply remove it from
linux/cryptohash.h. That'll then leave the handful of bizarre sha1
usages to consider.

pull-request: wireless-drivers 2016-12-22

2016-12-22 Thread Kalle Valo

Hi Dave,

before the holidays a really small pull request for 4.10. I just want to
have the regressions in rtlwifi and ath9k fixed quickly so I send this
earlier than I normally would.

Please let me know if there are any problems.

Kalle

The following changes since commit ad688cdbb076833ba17fc65591cd0fe01900a5cf:

  stmmac: fix memory barriers (2016-12-19 11:05:02 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git 
tags/wireless-drivers-for-davem-2016-12-22

for you to fetch changes up to 22b68b93ae2506bd56ee3bf232a51bc8ab955b56:

  rtlwifi: Fix kernel oops introduced with commit e49656147359 (2016-12-21 
16:34:16 +0200)


wireless-drivers fixes for 4.10

All small fixes this time, especially important are the regression
fixes for rtlwifi and ath9k.


Arend Van Spriel (2):
  brcmfmac: fix memory leak in brcmf_cfg80211_attach()
  brcmfmac: fix uninitialized field in scheduled scan ssid configuration

Ben Greear (1):
  ath10k: free host-mem with DMA_BIRECTIONAL flag

Larry Finger (1):
  rtlwifi: Fix kernel oops introduced with commit e49656147359

Tobias Klausmann (1):
  ath9k: do not return early to fix rcu unlocking

 drivers/net/wireless/ath/ath10k/wmi.c  |2 +-
 drivers/net/wireless/ath/ath9k/xmit.c  |2 +-
 .../broadcom/brcm80211/brcmfmac/cfg80211.c |7 +--
 .../net/wireless/broadcom/brcm80211/brcmfmac/pno.c |1 +
 drivers/net/wireless/realtek/rtlwifi/core.c|3 ++-
 5 files changed, 10 insertions(+), 5 deletions(-)

Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Hannes Frederic Sowa

On 22.12.2016 14:10, Jason A. Donenfeld wrote:
> On Thu, Dec 22, 2016 at 1:47 PM, Hannes Frederic Sowa
>  wrote:
>> following up on what appears to be a random subject: ;)
>>
>> IIRC, ext4 code by default still uses half_md4 for hashing of filenames
>> in the htree. siphash seems to fit this use case pretty good.
> 
> I saw this too. I'll try to address it in v8 of this series.

This change would need a new version of the ext4 super block, because
you should not change existing file systems.

Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Jason A. Donenfeld

On Thu, Dec 22, 2016 at 1:47 PM, Hannes Frederic Sowa
 wrote:
> following up on what appears to be a random subject: ;)
>
> IIRC, ext4 code by default still uses half_md4 for hashing of filenames
> in the htree. siphash seems to fit this use case pretty good.

I saw this too. I'll try to address it in v8 of this series.

RE: [PATCH net 1/1] tipc: revert use of copy_from_iter_full()

2016-12-22 Thread Jon Maloy



> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
> On Behalf Of Al Viro
> Sent: Wednesday, 21 December, 2016 22:08
> To: Jon Maloy 
> Cc: da...@davemloft.net; netdev@vger.kernel.org; Parthasarathy Bhuvaragan
> ; Ying Xue
> ; ma...@donjonn.com; tipc-
> discuss...@lists.sourceforge.net
> Subject: Re: [PATCH net 1/1] tipc: revert use of copy_from_iter_full()
> 
> On Thu, Dec 22, 2016 at 02:47:09AM +, Jon Maloy wrote:
> 
> > > OK, I see what's going on there - unlike iterate_and_advance(), which
> > > explicitly skips any work in case of empty iterator, iterate_all_kind()
> > > does not.  Could you check if the following fixes your problem?
> >
> > Confirmed. The crash disappeared and everything works fine.
> 
> OK...  This is the somewhat cleaned version of the same that will go to Linus,
> assuming it survives the local beating.  Hopefully, cleanups hadn't broken it
> in your case...

I tested it, and it still works with me.

///jon

> 
> [iov_iter] fix iterate_all_kinds() on empty iterators
> 
> Problem similar to ones dealt with in "fold checks into iterate_and_advance()"
> and followups, except that in this case we really want to do nothing when
> asked for zero-length operation - unlike zero-length iterate_and_advance(),
> zero-length iterate_all_kinds() has no side effects, and callers are simpler
> that way.
> 
> That got exposed when copy_from_iter_full() had been used by tipc, which
> builds an msghdr with zero payload and (now) feeds it to a primitive
> based on iterate_all_kinds() instead of iterate_and_advance().
> 
> Reported-by: Jon Maloy 
> Signed-off-by: Al Viro 
> 
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 228892dabba6..25f572303801 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -73,19 +73,21 @@
>  }
> 
>  #define iterate_all_kinds(i, n, v, I, B, K) {\
> - size_t skip = i->iov_offset;\
> - if (unlikely(i->type & ITER_BVEC)) {\
> - struct bio_vec v;   \
> - struct bvec_iter __bi;  \
> - iterate_bvec(i, n, v, __bi, skip, (B))  \
> - } else if (unlikely(i->type & ITER_KVEC)) { \
> - const struct kvec *kvec;\
> - struct kvec v;  \
> - iterate_kvec(i, n, v, kvec, skip, (K))  \
> - } else {\
> - const struct iovec *iov;\
> - struct iovec v; \
> - iterate_iovec(i, n, v, iov, skip, (I))  \
> + if (likely(n)) {\
> + size_t skip = i->iov_offset;\
> + if (unlikely(i->type & ITER_BVEC)) {\
> + struct bio_vec v;   \
> + struct bvec_iter __bi;  \
> + iterate_bvec(i, n, v, __bi, skip, (B))  \
> + } else if (unlikely(i->type & ITER_KVEC)) { \
> + const struct kvec *kvec;\
> + struct kvec v;  \
> + iterate_kvec(i, n, v, kvec, skip, (K))  \
> + } else {\
> + const struct iovec *iov;\
> + struct iovec v; \
> + iterate_iovec(i, n, v, iov, skip, (I))  \
> + }   \
>   }   \
>  }
> 
> @@ -576,7 +578,7 @@ bool copy_from_iter_full(void *addr, size_t bytes, struct
> iov_iter *i)
>   WARN_ON(1);
>   return false;
>   }
> - if (unlikely(i->count < bytes)) \
> + if (unlikely(i->count < bytes))
>   return false;
> 
>   iterate_all_kinds(i, bytes, v, ({
> @@ -620,7 +622,7 @@ bool copy_from_iter_full_nocache(void *addr, size_t
> bytes, struct iov_iter *i)
>   WARN_ON(1);
>   return false;
>   }
> - if (unlikely(i->count < bytes)) \
> + if (unlikely(i->count < bytes))
>   return false;
>   iterate_all_kinds(i, bytes, v, ({
>   if (__copy_from_user_nocache((to += v.iov_len) - v.iov_len,
> @@ -837,11 +839,8 @@ unsigned long iov_iter_alignment(const struct iov_iter
> *i)
>   unsigned long res = 0;
>   size_t size = i->count;
> 
> - if (!size)
> - return 0;
> -
>   if (unlikely(i->type & ITER_PIPE)) {
> - if (i->iov_offset && allocated(&i->pipe->bufs[i->idx]))
> + if (size && i->iov_offset && allocated(&i->pipe->bufs[i->idx]))
>

Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Hannes Frederic Sowa

Hi Ted,

On Thu, 2016-12-22 at 00:41 -0500, Theodore Ts'o wrote:
> On Thu, Dec 22, 2016 at 03:49:39AM +0100, Jason A. Donenfeld wrote:
> > 
> > Funny -- while you guys were sending this back & forth, I was writing
> > my reply to Andy which essentially arrives at the same conclusion.
> > Given that we're all arriving to the same thing, and that Ted shot in
> > this direction long before we all did, I'm leaning toward abandoning
> > SipHash for the de-MD5-ification of get_random_int/long, and working
> > on polishing Ted's idea into something shiny for this patchset.
> 
> here are my numbers comparing siphash (using the first three patches
> of the v7 siphash patches) with my batched chacha20 implementation.
> The results are taken by running get_random_* 1 times, and then
> dividing the numbers by 1 to get the average number of cycles for
> the call.  I compiled 32-bit and 64-bit kernels, and ran the results
> using kvm:
> 
>siphashbatched chacha20
>  get_random_int  get_random_long   get_random_int   get_random_long   
> 
> 32-bit270  278 114146
> 64-bit 75   75 106186
> 
> > I did have two objections to this. The first was that my SipHash
> > construction is faster.
> 
> Well, it's faster on everything except 32-bit x86.  :-P
> 
> > The second, and the more
> > important one, was that batching entropy up like this means that 32
> > calls will be really fast, and then the 33rd will be slow, since it
> > has to do a whole ChaCha round, because get_random_bytes must be
> > called to refill the batch.
> 
> ... and this will take 2121 cycles on 64-bit x86, and 2315 cycles on a
> 32-bit x86.  Which on a 2.3 GHz processor, is just under a
> microsecond.  As far as being inconsistent on process startup, I very
> much doubt a microsecond is really going to be visible to the user.  :-)
> 
> The bottom line is that I think we're really "pixel peeping" at this
> point --- which is what obsessed digital photographers will do when
> debating the quality of a Canon vs Nikon DSLR by blowing up a photo by
> a thousand times, and then trying to claim that this is visible to the
> human eye.  Or people who obsessing over the frequency response curves
> of TH-X00 headphones with Mahogony vs Purpleheart wood, when it's
> likely that in a blind head-to-head comparison, most people wouldn't
> be able to tell the difference
> 
> I think the main argument for using the batched getrandom approach is
> that it, I would argue, simpler than introducing siphash into the
> picture.  On 64-bit platforms it is faster and more consistent, so
> it's basically that versus complexity of having to adding siphash to
> the things that people have to analyze when considering random number
> security on Linux.   But it's a close call either way, I think.

following up on what appears to be a random subject: ;)

IIRC, ext4 code by default still uses half_md4 for hashing of filenames
in the htree. siphash seems to fit this use case pretty good.

xfs could also need an update, as they don't seed the directory hash
tables at all (but not sure if they are vulnerable). I should improve
[1] a bit.

[1] http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfstests.git;a=blo
b;f=src/dirhash_collide.c;h=55cec872d5061ac2ca0f56d1f11e6bf349d5bb97;hb
=HEAD

Bye,
Hannes

[PATCH v2] stmmac: CSR clock configuration fix

2016-12-22 Thread Joao Pinto

When testing stmmac with my QoS reference design I checked a problem in the
CSR clock configuration that was impossibilitating the phy discovery, since
every read operation returned 0x. This patch fixes the issue.

Signed-off-by: Joao Pinto 
---
changes v1->v2 (David Miller)
- DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch
to make sense

 drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c  | 2 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c| 8 
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
index b21d03f..94223c8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem 
*ioaddr, int mcbins,
mac->mii.reg_shift = 6;
mac->mii.reg_mask = 0x07C0;
mac->mii.clk_csr_shift = 2;
-   mac->mii.clk_csr_mask = 0xF;
+   mac->mii.clk_csr_mask = GENMASK(4, 2);
 
/* Get and dump the chip ID */
*synopsys_id = stmmac_get_synopsys_id(hwid);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
index a1d582f..8a40e69 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
@@ -197,7 +197,7 @@ struct mac_device_info *dwmac100_setup(void __iomem 
*ioaddr, int *synopsys_id)
mac->mii.reg_shift = 6;
mac->mii.reg_mask = 0x07C0;
mac->mii.clk_csr_shift = 2;
-   mac->mii.clk_csr_mask = 0xF;
+   mac->mii.clk_csr_mask = GENMASK(4, 2);
 
/* Synopsys Id is not available on old chips */
*synopsys_id = 0;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index 23322fd..fda01f7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -81,8 +81,8 @@ static int stmmac_mdio_read(struct mii_bus *bus, int phyaddr, 
int phyreg)
value |= (phyaddr << priv->hw->mii.addr_shift)
& priv->hw->mii.addr_mask;
value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask;
-   value |= (priv->clk_csr & priv->hw->mii.clk_csr_mask)
-   << priv->hw->mii.clk_csr_shift;
+   value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
+   & priv->hw->mii.clk_csr_mask;
if (priv->plat->has_gmac4)
value |= MII_GMAC4_READ;
 
@@ -122,8 +122,8 @@ static int stmmac_mdio_write(struct mii_bus *bus, int 
phyaddr, int phyreg,
& priv->hw->mii.addr_mask;
value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask;
 
-   value |= ((priv->clk_csr & priv->hw->mii.clk_csr_mask)
-   << priv->hw->mii.clk_csr_shift);
+   value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
+   & priv->hw->mii.clk_csr_mask;
if (priv->plat->has_gmac4)
value |= MII_GMAC4_WRITE;
 
-- 
2.9.3

[PATCH net] net/mlx4_en: Fix user prio field in XDP forward

2016-12-22 Thread Tariq Toukan

The user prio field is wrong (and overflows) in the XDP forward
flow.
This is a result of a bad value for num_tx_rings_p_up, which should
account all XDP TX rings, as they operate for the same user prio.

Signed-off-by: Tariq Toukan 
Reported-by: Martin KaFai Lau 
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index bcd955339058..edbe200ac2fa 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1638,7 +1638,8 @@ int mlx4_en_start_port(struct net_device *dev)
 
/* Configure tx cq's and rings */
for (t = 0 ; t < MLX4_EN_NUM_TX_TYPES; t++) {
-   u8 num_tx_rings_p_up = t == TX ? priv->num_tx_rings_p_up : 1;
+   u8 num_tx_rings_p_up = t == TX ?
+   priv->num_tx_rings_p_up : priv->tx_ring_num[t];
 
for (i = 0; i < priv->tx_ring_num[t]; i++) {
/* Configure cq */
-- 
1.8.3.1

Re: [PATCH] stmmac: CSR clock configuration fix

2016-12-22 Thread Joao Pinto


Hi David,

Às 10:15 AM de 12/22/2016, Joao Pinto escreveu:
> Às 6:21 PM de 12/21/2016, David Miller escreveu:
>> From: Joao Pinto 
>> Date: Tue, 20 Dec 2016 11:21:47 +
>>
>>> When testing stmmac with my QoS reference design I checked a problem in the
>>> CSR clock configuration that was impossibilitating the phy discovery, since
>>> every read operation returned 0x. This patch fixes the issue.
>>>
>>> Signed-off-by: Joao Pinto 
>>
>> This isn't enough.
>>
>> It looks like various parts of this driver set the mask field
>> differently.
>>
>> dwmac1000_core.c and dwmac100_core.c set the mask to be the low bits.
>>
>> But dwmac4_core.c uses GENMASK(11, 8) which means the mask is a value
>> which is shifted up already.
>>
>> So your patch will break chips driven by dwmac4_core.c.
> 
> I am using a GMAC4 reference design to test the patches. The clock 
> configuration
> as is, does not work, resulting in the phy discovery failure. By applying this
> patch I am able to set the clock value properly.
> 
> I am going to check in the Databook of GMAC4 and older versions in order to
> justify better.

So from the 4.20 Databook:

Bits
11:8 CR parameter
: CSR clock = 60-100 MHz; MDC clock = CSR
0001: CSR clock = 100-150 MHz; MDC clock = CSR
0010: CSR clock = 20-35 MHz; MDC clock = CSR
0011: CSR clock = 35-60 MHz; MDC clock = CSR
0100: CSR clock = 150-250 MHz; MDC clock = CSR
0101: CSR clock = 250-300 MHz; MDC clock = CSR
0110, 0111: Reserved

For example, if you want configure the CSR clock to 250-300MHZ (my case), you
will set the parameter CR to 0x5. The current mechanism simply messes the value.
With this fix, the 0x5 is shifted to 11:8 like the databook specifies and
applies a GENMASK(11:8) on top, causing the reset of every bit outside the 11:8
which is an assurance.

For older cores like 4.10 and 4.00 the logic is the same. The information was
confirmed by R&D.

Thanks.

> 
>>
>> In order for your change to be correct you must consolidate all of
>> these various pieces to use the same convention.
>>
> 
> Thanks.
>

Re: [PATCH] stmmac: CSR clock configuration fix

2016-12-22 Thread Joao Pinto

Às 12:23 PM de 12/22/2016, Joao Pinto escreveu:
> 
> Hi David,
> 
> Às 10:15 AM de 12/22/2016, Joao Pinto escreveu:
>> Às 6:21 PM de 12/21/2016, David Miller escreveu:
>>> From: Joao Pinto 
>>> Date: Tue, 20 Dec 2016 11:21:47 +
>>>
 When testing stmmac with my QoS reference design I checked a problem in the
 CSR clock configuration that was impossibilitating the phy discovery, since
 every read operation returned 0x. This patch fixes the issue.

 Signed-off-by: Joao Pinto 
>>>
>>> This isn't enough.
>>>
>>> It looks like various parts of this driver set the mask field
>>> differently.
>>>
>>> dwmac1000_core.c and dwmac100_core.c set the mask to be the low bits.
>>>
>>> But dwmac4_core.c uses GENMASK(11, 8) which means the mask is a value
>>> which is shifted up already.
>>>
>>> So your patch will break chips driven by dwmac4_core.c.
>>
>> I am using a GMAC4 reference design to test the patches. The clock 
>> configuration
>> as is, does not work, resulting in the phy discovery failure. By applying 
>> this
>> patch I am able to set the clock value properly.
>>
>> I am going to check in the Databook of GMAC4 and older versions in order to
>> justify better.
> 
> So from the 4.20 Databook:
> 
> Bits
> 11:8 CR parameter
> : CSR clock = 60-100 MHz; MDC clock = CSR
> 0001: CSR clock = 100-150 MHz; MDC clock = CSR
> 0010: CSR clock = 20-35 MHz; MDC clock = CSR
> 0011: CSR clock = 35-60 MHz; MDC clock = CSR
> 0100: CSR clock = 150-250 MHz; MDC clock = CSR
> 0101: CSR clock = 250-300 MHz; MDC clock = CSR
> 0110, 0111: Reserved
> 
> For example, if you want configure the CSR clock to 250-300MHZ (my case), you
> will set the parameter CR to 0x5. The current mechanism simply messes the 
> value.
> With this fix, the 0x5 is shifted to 11:8 like the databook specifies and
> applies a GENMASK(11:8) on top, causing the reset of every bit outside the 
> 11:8
> which is an assurance.
> 
> For older cores like 4.10 and 4.00 the logic is the same. The information was
> confirmed by R&D.
> 
> Thanks.

Bu checking DWMAC100 and DWMAC1000 I understand your concern now. I am going to
change their masks also in order to put it right. V2 comming soon.

Thanks.

> 
>>
>>>
>>> In order for your change to be correct you must consolidate all of
>>> these various pieces to use the same convention.
>>>
>>
>> Thanks.
>>
>

[PATCH net 0/2] net/sched fixes for cls_flower and act_tunnel_key

2016-12-22 Thread Or Gerlitz

Hi Dave,

This small series contain a fix to the matching flags support 
in flower and to the tunnel key action MD prep for IPv6.

On a non related note, wishing everyone around here a happy new year, 
celebrate and take a rest so we can do lots of good patch work(s) next.

Or.

Or Gerlitz (2):
  net/sched: act_tunnel_key: Fix setting UDP dst port in metadata under IPv6
  net/sched: cls_flower: Mandate mask when matching on flags

 net/sched/act_tunnel_key.c |  4 ++--
 net/sched/cls_flower.c | 23 ---
 2 files changed, 14 insertions(+), 13 deletions(-)

-- 
2.3.7

[PATCH net 2/2] net/sched: cls_flower: Mandate mask when matching on flags

2016-12-22 Thread Or Gerlitz

When matching on flags, we should require the user to provide the
mask and avoid using an all-ones mask. Not doing so causes matching
on flags provided w.o mask to hit on the value being unset for all
flags, which may not what the user wanted to happen.

Fixes: faa3ffce7829 ('net/sched: cls_flower: Add support for matching on flags')
Signed-off-by: Or Gerlitz 
Reported-by: Paul Blakey 
Acked-by: Jiri Pirko 
---
 net/sched/cls_flower.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 35ac28d..333f8e2 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -442,32 +442,32 @@ static void fl_set_key_flag(u32 flower_key, u32 
flower_mask,
}
 }
 
-static void fl_set_key_flags(struct nlattr **tb,
-u32 *flags_key, u32 *flags_mask)
+static int fl_set_key_flags(struct nlattr **tb,
+   u32 *flags_key, u32 *flags_mask)
 {
u32 key, mask;
 
-   if (!tb[TCA_FLOWER_KEY_FLAGS])
-   return;
+   /* mask is mandatory for flags */
+   if (!tb[TCA_FLOWER_KEY_FLAGS_MASK])
+   return -EINVAL;
 
key = be32_to_cpu(nla_get_u32(tb[TCA_FLOWER_KEY_FLAGS]));
-
-   if (!tb[TCA_FLOWER_KEY_FLAGS_MASK])
-   mask = ~0;
-   else
-   mask = be32_to_cpu(nla_get_u32(tb[TCA_FLOWER_KEY_FLAGS_MASK]));
+   mask = be32_to_cpu(nla_get_u32(tb[TCA_FLOWER_KEY_FLAGS_MASK]));
 
*flags_key  = 0;
*flags_mask = 0;
 
fl_set_key_flag(key, mask, flags_key, flags_mask,
TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT, FLOW_DIS_IS_FRAGMENT);
+
+   return 0;
 }
 
 static int fl_set_key(struct net *net, struct nlattr **tb,
  struct fl_flow_key *key, struct fl_flow_key *mask)
 {
__be16 ethertype;
+   int ret = 0;
 #ifdef CONFIG_NET_CLS_IND
if (tb[TCA_FLOWER_INDEV]) {
int err = tcf_change_indev(net, tb[TCA_FLOWER_INDEV]);
@@ -614,9 +614,10 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
   &mask->enc_tp.dst, TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK,
   sizeof(key->enc_tp.dst));
 
-   fl_set_key_flags(tb, &key->control.flags, &mask->control.flags);
+   if (tb[TCA_FLOWER_KEY_FLAGS])
+   ret = fl_set_key_flags(tb, &key->control.flags, 
&mask->control.flags);
 
-   return 0;
+   return ret;
 }
 
 static bool fl_mask_eq(struct fl_flow_mask *mask1,
-- 
2.3.7

[PATCH net 1/2] net/sched: act_tunnel_key: Fix setting UDP dst port in metadata under IPv6

2016-12-22 Thread Or Gerlitz

The UDP dst port was provided to the helper function which sets the
IPv6 IP tunnel meta-data under a wrong param order, fix that.

Fixes: 75bfbca01e48 ('net/sched: act_tunnel_key: Add UDP dst port option')
Signed-off-by: Or Gerlitz 
Reviewed-by: Hadar Hen Zion 
---
 net/sched/act_tunnel_key.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
index 7af7125..e3a58e0 100644
--- a/net/sched/act_tunnel_key.c
+++ b/net/sched/act_tunnel_key.c
@@ -134,8 +134,8 @@ static int tunnel_key_init(struct net *net, struct nlattr 
*nla,
saddr = 
nla_get_in6_addr(tb[TCA_TUNNEL_KEY_ENC_IPV6_SRC]);
daddr = 
nla_get_in6_addr(tb[TCA_TUNNEL_KEY_ENC_IPV6_DST]);
 
-   metadata = __ipv6_tun_set_dst(&saddr, &daddr, 0, 0, 0,
- dst_port, TUNNEL_KEY,
+   metadata = __ipv6_tun_set_dst(&saddr, &daddr, 0, 0, 
dst_port,
+ 0, TUNNEL_KEY,
  key_id, 0);
}
 
-- 
2.3.7

[PATCH net 1/1] tipc: don't send FIN message from connectionless socket

2016-12-22 Thread Jon Maloy

In commit 6f00089c7372 ("tipc: remove SS_DISCONNECTING state") the
check for socket type is in the wrong place, causing a closing socket
to always send out a FIN message even when the socket was never
connected. This is normally harmless, since the destination node for
such messages most often is zero, and the message will be dropped, but
it is still a wrong and confusing behavior.

We fix this in this commit.

Reviewed-by: Parthasarathy Bhuvaragan 
Signed-off-by: Jon Maloy 
---
 net/tipc/socket.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 333c5da..800caaa 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -441,15 +441,19 @@ static void __tipc_shutdown(struct socket *sock, int 
error)
while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL) {
if (TIPC_SKB_CB(skb)->bytes_read) {
kfree_skb(skb);
-   } else {
-   if (!tipc_sk_type_connectionless(sk) &&
-   sk->sk_state != TIPC_DISCONNECTING) {
-   tipc_set_sk_state(sk, TIPC_DISCONNECTING);
-   tipc_node_remove_conn(net, dnode, tsk->portid);
-   }
-   tipc_sk_respond(sk, skb, error);
+   continue;
+   }
+   if (!tipc_sk_type_connectionless(sk) &&
+   sk->sk_state != TIPC_DISCONNECTING) {
+   tipc_set_sk_state(sk, TIPC_DISCONNECTING);
+   tipc_node_remove_conn(net, dnode, tsk->portid);
}
+   tipc_sk_respond(sk, skb, error);
}
+
+   if (tipc_sk_type_connectionless(sk))
+   return;
+
if (sk->sk_state != TIPC_DISCONNECTING) {
skb = tipc_msg_create(TIPC_CRITICAL_IMPORTANCE,
  TIPC_CONN_MSG, SHORT_H_SIZE, 0, dnode,
@@ -457,10 +461,8 @@ static void __tipc_shutdown(struct socket *sock, int error)
  tsk->portid, error);
if (skb)
tipc_node_xmit_skb(net, skb, dnode, tsk->portid);
-   if (!tipc_sk_type_connectionless(sk)) {
-   tipc_node_remove_conn(net, dnode, tsk->portid);
-   tipc_set_sk_state(sk, TIPC_DISCONNECTING);
-   }
+   tipc_node_remove_conn(net, dnode, tsk->portid);
+   tipc_set_sk_state(sk, TIPC_DISCONNECTING);
}
 }
 
-- 
2.7.4

[PATCH] Fixed status entry in m_can documentation

2016-12-22 Thread Vyacheslav V. Yurkov

Use valid value for 'enabled' in status field

Signed-off-by: Vyacheslav V. Yurkov 
---
 Documentation/devicetree/bindings/net/can/m_can.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/can/m_can.txt 
b/Documentation/devicetree/bindings/net/can/m_can.txt
index 9e33177..5facaf5 100644
--- a/Documentation/devicetree/bindings/net/can/m_can.txt
+++ b/Documentation/devicetree/bindings/net/can/m_can.txt
@@ -63,5 +63,5 @@ Board dts:
 &m_can1 {
pinctrl-names = "default";
pinctrl-0 = <&pinctrl_m_can1>;
-   status = "enabled";
+   status = "okay";
 };
-- 
2.9.0

Re: [PATCH] stmmac: CSR clock configuration fix

2016-12-22 Thread Joao Pinto

Às 6:21 PM de 12/21/2016, David Miller escreveu:
> From: Joao Pinto 
> Date: Tue, 20 Dec 2016 11:21:47 +
> 
>> When testing stmmac with my QoS reference design I checked a problem in the
>> CSR clock configuration that was impossibilitating the phy discovery, since
>> every read operation returned 0x. This patch fixes the issue.
>>
>> Signed-off-by: Joao Pinto 
> 
> This isn't enough.
> 
> It looks like various parts of this driver set the mask field
> differently.
> 
> dwmac1000_core.c and dwmac100_core.c set the mask to be the low bits.
> 
> But dwmac4_core.c uses GENMASK(11, 8) which means the mask is a value
> which is shifted up already.
> 
> So your patch will break chips driven by dwmac4_core.c.

I am using a GMAC4 reference design to test the patches. The clock configuration
as is, does not work, resulting in the phy discovery failure. By applying this
patch I am able to set the clock value properly.

I am going to check in the Databook of GMAC4 and older versions in order to
justify better.

> 
> In order for your change to be correct you must consolidate all of
> these various pieces to use the same convention.
> 

Thanks.

1 2 >

1 - 100 of 107 matches

Mail list logo