date:20180118

[patch net-next] net: sched: silence uninitialized parent variable warning in tc_dump_tfilter

2018-01-18 Thread Jiri Pirko

From: Jiri Pirko 

When tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK, parent is still passed
down but the value is never used. Compiler does not recognize it and
issues a warning. Silence it down initializing parent to 0.

Reported-by: David Miller 
Reported-by: Stephen Rothwell 
Signed-off-by: Jiri Pirko 
---
 net/sched/cls_api.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index e500d11da9cd..9683f8550f6f 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1303,7 +1303,7 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct 
netlink_callback *cb)
struct tcmsg *tcm = nlmsg_data(cb->nlh);
long index_start;
long index;
-   u32 parent;
+   u32 parent = 0;
int err;
 
if (nlmsg_len(cb->nlh) < sizeof(*tcm))
-- 
2.14.3

Re: [PATCH 0/3] Check gso_size of packets when forwarding

2018-01-18 Thread Pravin Shelar

On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens  wrote:
> When regular packets are forwarded, we validate their size against the
> MTU of the destination device. However, when GSO packets are
> forwarded, we do not validate their size against the MTU. We
> implicitly assume that when they are segmented, the resultant packets
> will be correctly sized.
>
> This is not always the case.
>
> We observed a case where a packet received on an ibmveth device had a
> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
> device, where it caused a firmware assert. This is described in detail
> at [0] and was the genesis of this series. Rather than fixing it in
> the driver, this series fixes the forwarding path.
>
Are there any other possible forwarding path in networking stack? TC
is one subsystem that could forward such a packet to the bnx2x device,
how is that handled ?

> To fix this:
>
>  - Move a helper in patch 1.
>
>  - Validate GSO segment lengths in is_skb_forwardable() in the GSO
>case, rather than assuming all will be well. This fixes bridges.
>This is patch 2.
>
>  - Open vSwitch uses its own slightly specialised algorithm for
>checking lengths. Wire up checking for that in patch 3.
>
> [0]: https://patchwork.ozlabs.org/patch/859410/
>
> Cc: manish.cho...@cavium.com
> Cc: d...@openvswitch.org
>
> Daniel Axtens (3):
>   net: move skb_gso_mac_seglen to skbuff.h
>   net: is_skb_forwardable: validate length of GSO packet segments
>   openvswitch: drop GSO packets that are too large
>
>  include/linux/skbuff.h  | 16 
>  net/core/dev.c  |  7 ---
>  net/core/skbuff.c   | 34 ++
>  net/openvswitch/vport.c | 37 ++---
>  net/sched/sch_tbf.c | 10 --
>  5 files changed, 84 insertions(+), 20 deletions(-)
>
> --
> 2.14.1
>

Re: iscsi target regression due to "tcp: remove prequeue support" patch

2018-01-18 Thread Nicholas A. Bellinger

Hi Florian,

(Adding netdev + DaveM CC')

Thanks for the response.  Comments below.

On Mon, 2018-01-15 at 11:41 +0100, Florian Westphal wrote:
> Mike Christie  wrote:
>  
> > Dec 13 17:55:01 rhel73n1 kernel: Got Login Command, Flags 0x81, ITT:
> > 0x, CmdSN: 0x, ExpStatSN: 0xf86dc69b, CID: 0, Length: 65
> > 
> > we have got a login command and we seem to then go into
> > iscsit_do_rx_data -> sock_recvmsg
> > 
> > We seem to get stuck in there though, because we stay blocked until:
> > 
> > Dec 13 17:55:01 rhel73n1 kernel: Entering iscsi_target_sk_data_ready:
> > conn: 88b35cbb3000
> > Dec 13 17:55:01 rhel73n1 kernel: Got LOGIN_FLAGS_READ_ACTIVE=1, conn:
> > 88b35cbb3000 
> > 
> > where initiator side timeout fires 15 seconds later and it disconnects
> > the tcp connection, and we eventually break out of the recvmsg call:
> > 
> > Dec 13 17:55:16 rhel73n1 kernel: Entering iscsi_target_sk_state_change
> > Dec 13 17:55:16 rhel73n1 kernel: __iscsi_target_sk_check_close:
> > TCP_CLOSE_WAIT|TCP_CLOSE,returning FALSE
> > 
> > 
> > 
> > Dec 13 17:55:16 rhel73n1 kernel: rx_loop: 68, total_rx: 68, data: 68
> > Dec 13 17:55:16 rhel73n1 kernel: iscsi_target_do_login_rx after
> > rx_login_io, 88b35cbb3000, kworker/2:2:1829
> > 
> > Is the iscsi target doing something incorrect in its use of
> > sk_data_ready and sock_recvmsg or is the tcp patch at fault?
> 
> I have not received any bug reports except this one.
> 
> I also have a hard time following iscsi code flow.
> 
> > Dec 13 17:55:01 rhel73n1 kernel: Starting login_timer for kworker/2:2/1829
> > Dec 13 17:55:01 rhel73n1 kernel: rx_loop: 48, total_rx: 48, data: 48
> > Dec 13 17:55:01 rhel73n1 kernel: Got Login Command, Flags 0x81, ITT: 
> > 0x, CmdSN: 0x, ExpStatSN: 0xf86dc69b, CID: 0, Length: 65
> > Dec 13 17:55:01 rhel73n1 kernel: Entering iscsi_target_sk_data_ready: conn: 
> > 88b35cbb3000
> 
> Looks like things are fine up to this point.
> 
> > Dec 13 17:55:01 rhel73n1 kernel: Got LOGIN_FLAGS_READ_ACTIVE=1, conn: 
> > 88b35cbb3000 
> 
> This makes things return early from sk_data_ready callback.

Correct.

This is existing behavior for individual iscsi_conn login delayed_work
contexts (conn->login_work) which have not yet returned from a previous
sock_recvmsg(..., MSG_WAITALL) blocking call.

This causes the next iscsi_target_sk_data_ready() callback to hit
LOGIN_FLAGS_READ_ACTIVE=1, and return immediately without kicking
conn->login_work to process iscsi_target_do_login_rx() ->
sock_recvmsg(..., MSG_WAITALL).

> > Dec 13 17:55:16 rhel73n1 kernel: Entering iscsi_target_sk_state_change
> > Dec 13 17:55:16 rhel73n1 kernel: __iscsi_target_sk_check_close: 
> > TCP_CLOSE_WAIT|TCP_CLOSE,returning FALSE
> > Dec 13 17:55:16 rhel73n1 kernel: __iscsi_target_sk_close_change: state: 1
> > Dec 13 17:55:16 rhel73n1 kernel: Got LOGIN_FLAGS_READ_ACTIVE=1 
> > sk_state_change conn: 88b35cbb3000
> > Dec 13 17:55:16 rhel73n1 kernel: rx_loop: 68, total_rx: 68, data: 68
> 
> So it looks like all data is there, and probably has been there all the
> past 15 seconds, but nothing noticed.
> 
> Why is LOGIN_FLAGS_READ_ACTIVE set?  Who sets this?  Who is supposed to clear 
> that?
> Why does it exist in first place?

The bit is set in iscsi_target_sk_data_ready() when conn->login_work is
not already blocked by sock_recvmsg(..., MSG_WAITALL).  Once it's set,
conn->login_work is kicked to run iscsi_target_do_login_rx() ->
sock_recvmsg(..., MSG_WAITALL) which blocks waiting for the next 48 byte
login request PDU + payload.

Once the active conn->login_work context in iscsi_target_do_login_rx()
returns from sock_recvmsg(..., MSG_WAITALL) with full login request PDU
+ payload bytes, the bit is cleared.

AFAICT, there was a wake_up removed by commit e7942d063 that results in
multi iscsi login PDU authentication exchanges blocking on a incoming
login request payload.

So I don't know if this is a pre-existing bug hidden in iscsi-target by
tcp prequeue login, or not...?

It would indicate users providing their own ->sk_data_ready() callback
must be responsible for waking up a kthread context blocked on
sock_recvmsg(..., MSG_WAITALL), when a second ->sk_data_ready() is
received before the first sock_recvmsg(..., MSG_WAITALL) completes.

I don't know if this is a pre-existing bug in iscsi-target hidden by tcp
prequeue logic, or not...?

Any ideas..?

Re: iscsi target regression due to "tcp: remove prequeue support" patch

2018-01-18 Thread Nicholas A. Bellinger

On Mon, 2018-01-15 at 14:33 -0600, Mike Christie wrote:
> On 01/15/2018 12:41 AM, Nicholas A. Bellinger wrote:
> > Hey MNC & Co,
> > 
> > Ping on the earlier iscsi-target authentication login failures atop
> > 4.14 + commit e7942d063 removing tcp prequeue support.
> > 
> > For reference, what is your pre 4.14 environment using for
> > sysctl_tcp_low_latency..?
> 
> tcp_low_latency=1.
> 
> I have tested the current kernel with 1 and 0, and it does not make a
> difference.
> 

Likewise, I've always used tcp_low_latency=1 on pre 4.14.y kernels and
never triggered this bug with multi login PDU exchanges.

What's strange from groking commit e7942d0 though, it appear the removal
of tcp prequeue logic should only effect tcp_low_latency=0 users,
right..?

But here, that doesn't seem to be the case..

Re: [Intel-wired-lan] [RFC v2 net-next 01/10] net: Add a new socket option for a future transmit time.

2018-01-18 Thread Miroslav Lichvar

On Wed, Jan 17, 2018 at 03:06:12PM -0800, Jesus Sanchez-Palencia wrote:
> From: Richard Cochran 
> 
> This patch introduces SO_TXTIME.  User space enables this option in
> order to pass a desired future transmit time in a CMSG when calling
> sendmsg(2).
> 
> A new field is added to struct sockcm_cookie, and the tstamp from
> skbuffs will be used later on.

In the discussion about the v1 patchset, there was a question if the
cmsg should include a clockid_t. Without that, how can an application
prevent the packet from being sent using an incorrect clock, e.g.
the system clock when it expects it to be a PHC, or a different PHC
when the socket is not bound to a specific interface?

At least in some applications it would be preferred to not sent a
packet at all instead of sending it at a wrong time.

Please keep in mind that the PHCs and the system clock don't have to
be synchronized to each other. If I understand the rest of the series
correctly, there is an assumption that the PHCs are keeping time in
TAI and CLOCK_TAI can be used as a fallback.

-- 
Miroslav Lichvar

Re: DPAA Ethernet traffice troubles with Linux kernel

2018-01-18 Thread Joakim Tjernlund

On Thu, 1970-01-01 at 00:00 +, Joakim Tjernlund wrote:
> On Thu, 1970-01-01 at 00:00 +, Madalin-cristian Bucur wrote:
> > CAUTION: This email originated from outside of the organization. Do not 
> > click links or open attachments unless you recognize the sender and know 
> > the content is safe.
> > 
> > 
> > > -Original Message-
> > > From: Joakim Tjernlund [mailto:joakim.tjernl...@infinera.com]
> > > Sent: Tuesday, January 16, 2018 7:58 PM
> > > To: and...@lunn.ch
> > > Subject: Re: DPAA Ethernet traffice troubles with Linux kernel
> > > 
> > > On Thu, 1970-01-01 at 00:00 +, Andrew Lunn wrote:
> > > > 
> > > > Hi Joakim
> > > > 
> > > > You appear to be using an old kernel. Take a look at:
> > > 
> > > Not really, I am using 4.14.x and I don't think that is old. Seems like
> > > this
> > > patch hasn't been sent to 4.14.x.
> > > 
> > > I wonder if I might be missing something else, we just moved to 4.14 and
> > > notic that all
> > > our fixed PHYs are non functioning:
> > > fsl_mac ffe4e2000.ethernet: FMan MEMAC
> > > fsl_mac ffe4e2000.ethernet: FMan MAC address: 00:06:9c:0b:06:20
> > > fsl_mac dpaa-ethernet.0: __devm_request_mem_region(mac) failed
> > > fsl_mac: probe of dpaa-ethernet.0 failed with error -16
> > > fsl_mac ffe4e4000.ethernet: FMan MEMAC
> > > fsl_mac ffe4e4000.ethernet: FMan MAC address: 00:06:9c:0b:06:21
> > > fsl_mac dpaa-ethernet.1: __devm_request_mem_region(mac) failed
> > > fsl_mac: probe of dpaa-ethernet.1 failed with error -16
> > > fsl_mac ffe4e6000.ethernet: FMan MEMAC
> > > fsl_mac ffe4e6000.ethernet: FMan MAC address: 00:06:9c:0b:06:22
> > > fsl_mac dpaa-ethernet.2: __devm_request_mem_region(mac) failed
> > > fsl_mac: probe of dpaa-ethernet.2 failed with error -16
> > > fsl_mac ffe4e8000.ethernet: FMan MEMAC
> > > fsl_mac ffe4e8000.ethernet: FMan MAC address: 00:06:9c:0b:06:23
> > > fsl_mac dpaa-ethernet.3: __devm_request_mem_region(mac) failed
> > > fsl_mac: probe of dpaa-ethernet.3 failed with error -16
> > > 
> > > Feels like FMAN still think there are real PHYs there ?
> > 
> > Hi Joakim,
> > 
> > These errors are issued when trying to probe the second time the same
> > MAC node. The issue was introduced by this commit:
> > 
> > commit 4d8ee1935bcd666360311dfdadeee235d682d69a
> > Author: Florian Fainelli 
> > Date: Tue Aug 22 15:24:47 2017 -0700
> > fsl/man: Inherit parent device and of_node
> > 
> > and was later addressed by this patch set:
> > 
> > http://patchwork.ozlabs.org/project/netdev/list/?series=8462&state=*
> > 
> > Even with these errors printed, all is working fine, it's just the
> > second probing that fails. Adding the latter patches or reverting
> > the one above makes the errors prints dissapear.
> > 
> > Madalin
> 
> Ahh, now it starts to look better, reverting "fsl/man: Inherit parent device 
> and of_node" on 4.14 gives:
> libphy: Fixed MDIO Bus: probed
> tun: Universal TUN/TAP device driver, 1.6
> libphy: Freescale XGMAC MDIO Bus: probed
> iommu: Adding device ffe488000.port to group 10
> libphy: Freescale XGMAC MDIO Bus: probed
> mdio_bus ffe4e1000: Error while reading PHY0 reg at 3.3
> iommu: Adding device ffe489000.port to group 22
> libphy: Freescale XGMAC MDIO Bus: probed
> mdio_bus ffe4e3000: Error while reading PHY0 reg at 3.3
> iommu: Adding device ffe48a000.port to group 23
> libphy: Freescale XGMAC MDIO Bus: probed
> mdio_bus ffe4e5000: Error while reading PHY0 reg at 3.3
> iommu: Adding device ffe48b000.port to group 24
> libphy: Freescale XGMAC MDIO Bus: probed
> mdio_bus ffe4e7000: Error while reading PHY0 reg at 3.3
> iommu: Adding device ffe48c000.port to group 25
> libphy: Freescale XGMAC MDIO Bus: probed
> mdio_bus ffe4e9000: Error while reading PHY0 reg at 3.3
> fsl_mac ffe4e2000.ethernet: FMan MEMAC
> fsl_mac ffe4e2000.ethernet: FMan MAC address: 00:06:9c:0b:06:20
> fsl_mac ffe4e4000.ethernet: FMan MEMAC
> fsl_mac ffe4e4000.ethernet: FMan MAC address: 00:06:9c:0b:06:21
> fsl_mac ffe4e6000.ethernet: FMan MEMAC
> fsl_mac ffe4e6000.ethernet: FMan MAC address: 00:06:9c:0b:06:22
> fsl_mac ffe4e8000.ethernet: FMan MEMAC
> fsl_mac ffe4e8000.ethernet: FMan MAC address: 00:06:9c:0b:06:23
> fsl_mac ffe4e.ethernet: FMan MEMAC
> fsl_mac ffe4e.ethernet: FMan MAC address: 00:06:9c:0b:06:1f
> fsl_dpa dpaa-ethernet.0 eth0: Probed interface eth0
> fsl_dpa dpaa-ethernet.1 eth1: Probed interface eth1
> fsl_dpa dpaa-ethernet.2 eth2: Probed interface eth2
> fsl_dpa dpaa-ethernet.3 eth3: Probed interface eth3
> fsl_dpa dpaa-ethernet.4 eth4: Probed interface eth4
> 
> Still some minor errors: mdio_bus ffe4e7000: Error while reading PHY0 reg at 
> 3.3
> but this is going the right way(I have not had a chance to try if they work 
> due
> to external modules not ported/ready yet)
> 
> The other patch series is still to be tested though but I already now wanted 
> to stress the importance of getting all upstream fixes into stable, ASAP.
> You now what they are, I have no idea.

FYI, applied 
http://patchwork.ozlabs.org/project/ne

Re: [PATCH next-queue 2/2] ixgbe: add unlikely notes to tx fastpath expressions

2018-01-18 Thread Yanjun Zhu




On 2018/1/9 6:47, Shannon Nelson wrote:

Add unlikely() to a few error checking expressions in the Tx
offload handling.

Suggested-by: Yanjun Zhu 

Hi,

I am fine with this patch. I have a question. The ipsec feature is 
supported in ixgbevf?


Thanks a lot.
Zhu Yanjun

Signed-off-by: Shannon Nelson 
---
  drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 57c10e6..3d069a2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -749,28 +749,28 @@ int ixgbe_ipsec_tx(struct ixgbe_ring *tx_ring,
struct xfrm_state *xs;
struct tx_sa *tsa;
  
-	if (!first->skb->sp->len) {

+   if (unlikely(!first->skb->sp->len)) {
netdev_err(tx_ring->netdev, "%s: no xfrm state len = %d\n",
   __func__, first->skb->sp->len);
return 0;
}
  
  	xs = xfrm_input_state(first->skb);

-   if (!xs) {
+   if (unlikely(!xs)) {
netdev_err(tx_ring->netdev, "%s: no xfrm_input_state() xs = 
%p\n",
   __func__, xs);
return 0;
}
  
  	itd->sa_idx = xs->xso.offload_handle - IXGBE_IPSEC_BASE_TX_INDEX;

-   if (itd->sa_idx > IXGBE_IPSEC_MAX_SA_COUNT) {
+   if (unlikely(itd->sa_idx > IXGBE_IPSEC_MAX_SA_COUNT)) {
netdev_err(tx_ring->netdev, "%s: bad sa_idx=%d handle=%lu\n",
   __func__, itd->sa_idx, xs->xso.offload_handle);
return 0;
}
  
  	tsa = &ipsec->tx_tbl[itd->sa_idx];

-   if (!tsa->used) {
+   if (unlikely(!tsa->used)) {
netdev_err(tx_ring->netdev, "%s: unused sa_idx=%d\n",
   __func__, itd->sa_idx);
return 0;

Re: [PATCH net] gso: validate gso_type if SKB_GSO_DODGY

2018-01-18 Thread Jason Wang




On 2018年01月18日 14:04, Willem de Bruijn wrote:

On Thu, Jan 18, 2018 at 12:20 AM, Willem de Bruijn
 wrote:

On Wed, Jan 17, 2018 at 10:48 PM, Jason Wang  wrote:


On 2018年01月18日 07:11, Willem de Bruijn wrote:

From: Willem de Bruijn

Validate gso_type of untrusted SKB_GSO_DODGY packets during
segmentation.

Untrusted user packets are limited to a small set of gso types in
virtio_net_hdr_to_skb. But segmentation occurs on packet contents.
Syzkaller was able to enter gso callbacks that are not hardened
against untrusted user input.

Fixes: f43798c27684 ("tun: Allow GSO using virtio_net_hdr")


This commit is suspicious, I guess it should be 5c7cdf339af5 ("gso: Remove
arbitrary checks for unsupported GSO")

The specific SCTP path was introduced with commit 90017accff61 ("sctp:
add GSO support"). But the main issue that packets can be delivered to
gso handlers different from their gso_type goes back further.

The commit you reference is actually older than the sctp gso patch, so
it makes sense that it did not have a check in the sctp_gso_segment.

I still think that we should check in inet_gso_segment when we have
the proto, instead of in each {tcp, sctp, udp, esp, ...} handler having
a check of the form.

   !(type & (SKB_GSO_TCPV4 |
 SKB_GSO_TCPV6

Unless we can create packets that legitimate combine
SKB_GSO_DODGY with tunnel headers.


As you mentioned below, looks like we can e.g bridge between tunnels 
(vxlan, gre or others) and tap, or even bpf can produce this (e.g 
bpf_skb_adjust_room).




virtio_net_hdr_to_skb does not accept tunneled gso types.


Yes, Vlad is trying to extend virtio to support more kinds of gso types, 
so it will be supported for sure.




But a tun device can be bridged with a gre tunnel in the
host, creating a path that will call gre_gso_segment.

If that is possible, then this patch is indeed too strict and
we do need checks in the individual handlers.


I think so.

Thanks

Re: [PATCH net] net: validate untrusted gso packets

2018-01-18 Thread Jason Wang




On 2018年01月18日 13:09, Willem de Bruijn wrote:

   Also, a packet can just as easily spoof an esp packet.
See also the discussion in the Link above.

Then we probably need check there.

Adding a check in every gso handler that does not expect packets
with SKB_GSO_DODGY source seems backwards to me.

A packet with gso_type SKB_GSO_TCPV4 should never be
delivered to an sctp handler, say. We must check before passing
to a handler whether the packet matches the callback.


That's the case for trusted source. For dodgy source, we can't assume that.




We can address this specific issue in segmentation by placing a check
analogous to skb_validate_dodgy_gso in the network layer. Something
like this (but with ECN option support):

@@ -1258,6 +1258,22 @@ struct sk_buff *inet_gso_segment(struct sk_buff
*skb,

  skb_reset_transport_header(skb);

+   gso_type = skb_shinfo(skb)->gso_type;
+   if (gso_type & SKB_GSO_DODGY) {
+   switch (gso_type & (SKB_GSO_TCPV4 | SKB_GSO_UDP)) {
+   case SKB_GSO_TCPV4:
+   if (proto != IPPROTO_TCP)
+   goto out;
+   break;
+   case SKB_GSO_UDP:
+   if (proto != IPPROTO_UDP)
+   goto out;
+   break;
+   default:
+   goto out;
+   }
+   }

This implements subset of function for codes which was removed by the commit
I mentioned below.

No, as I explain above, it performs a different check.





[...]


For performance reason. I think we should delay the check or segmentation
as
much as possible until it was really needed.

Going through segmentation is probably as expensive as flow dissector,
if not more so because of the indirect branches.

I think we don't even need to care about this consider the evil packet
should be rare.

How does frequency matter when a single packet can crash a host?


I mean consider we had fix the crash, we don't care how expensive do we 
spot this.





And what you propose here is just a very small subset of the
necessary checking, more comes at gso header checking. So even if we care
performance, it only help for some specific case.

It also fixed the bug that Eric sent a separate patch for, as that did
not dissect as a valid TCP packet, either.


I may miss something but how did this patch protects an evil thoff?


   And now we have two
pieces of code that need to be hardened against malicious packets.

To me fixing the exist path is a good balance between security and
performance.


But I did overlook the guest to guest communication, with virtual devices
that can set NETIF_F_GSO_ROBUST and so do not enter the stack. That
means that one guest can send misrepresented packets to another. Is that
preferable to absorbing the cost of validation?

The problem is that guests and hosts don't trust each other. Even if one
packet has already been validated by one side, it must be validated again by
another side when needed. Doing validation twice is meaningless in this
case.

Ack, agreed that guests need to have defensive coding of themselves.
Then it's fine to pass packets to guests where the gso_type does not
match the contents.


The current patch does not actually deprecate the path through the
segmentation layer. So it will indeed add overhead. We would have to
remove the DODGY logic in net-next.

I don't get the point of removing this.


   One additional possible optimization
is to switch to flow_keys_buf_dissector, which does not dissect as deeply.
I did that in an earlier patch; it should be sufficient for this purpose.

I suspect the cost is still noticeable, and consider we may support kind of
tunnel offload in the future.

We should first assume the correctness of metadata provided by untrusted
source and validate it before we really want to use it. Validate them during
entry means we assume they are wrong, then there's no need for most of the
fields of virtio-net header,

If you always need to use the data, then you always validate. In which case
you want to validate early, as there will be less vulnerable code before
validation.

But I see what I think is your point: in guest to guest communication we
do not need to validate at all, so early validation adds unnecessary cost
for this important use case. That's a fair argument for validating later and
only when needed (i.e., a path without NETIF_F_GSO_ROBUST).


Yes, and looking at Herbert's commit log for 576a30eb6453 ("[NET]: Added 
GSO header verification"). It was intended do the verification just 
before gso.


Thanks

[PATCH v2] tools/bpftool: silence a static checker warning

2018-01-18 Thread Dan Carpenter

There is a static checker warning that "proglen" has an upper bound but
no lower bound.  The allocation will just fail harmlessly so it's not a
big deal.

Signed-off-by: Dan Carpenter 
---
v2: change the type to unsigned instead of checking for negatives

diff --git a/tools/bpf/bpf_jit_disasm.c b/tools/bpf/bpf_jit_disasm.c
index 30044bc4f389..58c2bab4ef6e 100644
--- a/tools/bpf/bpf_jit_disasm.c
+++ b/tools/bpf/bpf_jit_disasm.c
@@ -172,7 +172,8 @@ static uint8_t *get_last_jit_image(char *haystack, size_t 
hlen,
 {
char *ptr, *pptr, *tmp;
off_t off = 0;
-   int ret, flen, proglen, pass, ulen = 0;
+   unsigned int proglen;
+   int ret, flen, pass, ulen = 0;
regmatch_t pmatch[1];
unsigned long base;
regex_t regex;
@@ -199,7 +200,7 @@ static uint8_t *get_last_jit_image(char *haystack, size_t 
hlen,
}
 
ptr = haystack + off - (pmatch[0].rm_eo - pmatch[0].rm_so);
-   ret = sscanf(ptr, "flen=%d proglen=%d pass=%d image=%lx",
+   ret = sscanf(ptr, "flen=%d proglen=%u pass=%d image=%lx",
 &flen, &proglen, &pass, &base);
if (ret != 4) {
regfree(®ex);
@@ -239,7 +240,7 @@ static uint8_t *get_last_jit_image(char *haystack, size_t 
hlen,
}
 
assert(ulen == proglen);
-   printf("%d bytes emitted from JIT compiler (pass:%d, flen:%d)\n",
+   printf("%u bytes emitted from JIT compiler (pass:%d, flen:%d)\n",
   proglen, pass, flen);
printf("%lx + :\n", base);

Re: [PATCH 0/3] Check gso_size of packets when forwarding

2018-01-18 Thread Jason Wang




On 2018年01月18日 16:28, Pravin Shelar wrote:

On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens  wrote:

When regular packets are forwarded, we validate their size against the
MTU of the destination device. However, when GSO packets are
forwarded, we do not validate their size against the MTU. We
implicitly assume that when they are segmented, the resultant packets
will be correctly sized.

This is not always the case.

We observed a case where a packet received on an ibmveth device had a
GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
device, where it caused a firmware assert. This is described in detail
at [0] and was the genesis of this series. Rather than fixing it in
the driver, this series fixes the forwarding path.


Are there any other possible forwarding path in networking stack? TC
is one subsystem that could forward such a packet to the bnx2x device,
how is that handled ?


Yes, so it looks to me we should do the check in e.g netif_needs_gso() 
before passing it to hardware. And bnx2x needs to set its gso_max_size 
correctly.


Btw, looks like this could be triggered from a guest which is a DOS. We 
need request a CVE for this.


Thanks




To fix this:

  - Move a helper in patch 1.

  - Validate GSO segment lengths in is_skb_forwardable() in the GSO
case, rather than assuming all will be well. This fixes bridges.
This is patch 2.

  - Open vSwitch uses its own slightly specialised algorithm for
checking lengths. Wire up checking for that in patch 3.

[0]: https://patchwork.ozlabs.org/patch/859410/

Cc: manish.cho...@cavium.com
Cc: d...@openvswitch.org

Daniel Axtens (3):
   net: move skb_gso_mac_seglen to skbuff.h
   net: is_skb_forwardable: validate length of GSO packet segments
   openvswitch: drop GSO packets that are too large

  include/linux/skbuff.h  | 16 
  net/core/dev.c  |  7 ---
  net/core/skbuff.c   | 34 ++
  net/openvswitch/vport.c | 37 ++---
  net/sched/sch_tbf.c | 10 --
  5 files changed, 84 insertions(+), 20 deletions(-)

--
2.14.1

Re: [PATCH v6 00/36] Andes(nds32) Linux Kernel Port

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> This is the 6th version patchset to add the Linux kernel port for Andes(nds32)
> processors. Almost all of the feedbacks from v5 patchseries has been 
> addressed.
> Thanks to everyone who provided feedback on the previous version.
>
>
> This patchset adds core architecture support to Linux for Andestech's
> N13, N15, D15, N10, D10 processor cores.
>
> Based on the 16/32-bit AndeStar RISC-like architecture, we designed the
> configurable AndesCore series of embedded processor families. AndesCores
> range from highly performance-efficient small-footprint cores for
> microcontrollers and deeply-embedded applications to 1GHz+ cores running
> Linux, covering general-purpose N-series cores for a wide range of computing
> need, DSP-capable D-series cores for digital signal control,
> instruction-extensible E-series cores for application-specific acceleration,
> and secure S-series cores for best protection of the most valuable.
>
> The patches are based on v4.14-rc8, and can also be found in the
> following git tree:
>   https://github.com/andestech/linux.git nds32-4.14-rc8-v6
>
> The build script and toolchain repositories are able to be found here:
>   https://github.com/andestech/build_script.git
>
> Freely available instruction set and architecture overview documents can
> be found on the following page:
>   http://www.andestech.com/product.php?cls=9
>
>
> Vincent Ren-Wei Chen and I will maintain this port. Thanks to everyone who
> helped us and contributed to it. :) Any feedback is welcome.

Hi Greentime,

I think it's time to move this to the next step towards inclusion, as you appear
to have addressed all major issues, and only smaller details remain.

This is what I'd suggest for moving forward:

- split the patches into two branches in git: one 'next' branch for
everything that
  is ready to get merged in one pull request that you send to Linus, including
  drivers that you have received an Ack for from the respective subsystem
  maintainers, and one 'testing' branch for anything that is either
not quite ready
  or that you expect to get merged through someone else's tree (e.g.
  most device drivers). The 'testing' branch can merge in the 'next' branch
  or get rebased on it frequently.

- Ask reviewers to send 'Acked-by' or 'Reviewed-by' tags for the patches they
  are happy with, or to complain if they still see any major issues that
  should prevent the series from going in. I'll go through the submission once
  more myself and Ack any patches that I have actually reviewed.

- Decide what base to use for the 'next' branch, rebase it on that release one
  more time and then plan to never rebase it again. This will be the branch
  to send to linux-next and to Linus Torvalds. Given the current timing of the
  merge window, I would suggest to base it on top of v4.16-rc1 once that
  gets released, and then send it for inclusion in 4.17. Any changes you do
  in the meantime would be added on top of the original set.

- Ask Stephen Rothwell to include your 'next' branch in linux-next.
  (if you base on 4.16-rc1, wait until that is out).

- Submit any remaining driver patches that you do not have an 'Ack' for to
  the respective subsystem maintainers for inclusion in their trees.

- Upload a gpg key (4096 bits) for your greent...@andestech.com address
  to the keyservers, and arrange to have that signed by other kernel
  developers. You will need to sign git tags for pull requests with your
  key, and the key itself should be signed for this to work best.

- Once you have at least three signatures on your gpg key, apply for
  an account on kernel.org and move your git tree there,
  see https://www.kernel.org/category/faq.html. Alternatively,
  host the git tree on a web-facing git server from andestech.com
  (github works initially, but has a couple of shortcomings).

 Arnd

Re: [PATCH] net: fec: add necessary defines to work on ARM64

2018-01-18 Thread Lucas Stach

Hi Andy,

Am Donnerstag, den 18.01.2018, 01:49 + schrieb Andy Duan:
> From: Lucas Stach  Sent: Thursday, January
> 18, 2018 2:31 AM
> > The i.MX8 is a ARMv8 based SoC, that uses the same FEC IP as the
> > earlier,
> > ARMv7 based, i.MX SoCs. Allow the driver to work on ARM64.
> > 
> > Signed-off-by: Lucas Stach 
> > ---
[...]

> Kconfig also need to add ARM64 support.

No, not as far as I'm aware. I introduced the ARCH_MXC symbol for
ARM64, which unlocks the FEC config option when selected.

Regards,
Lucas

Re: [PATCH v6 03/36] sparc: io: To use the define of ioremap_[nocache|wc|wb] in asm-generic/io.h

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> It will be built failed if commit id: d25ea659 is selected. This patch
> can fix this build error.
>
> Signed-off-by: Greentime Hu 

The change is fine, but the reference to commit 'd25ea659' is not for
two reasons:

- when you rebase the tree, you will get a different ID
- the recommended format for referring to another commit is to list it with
  12 digits and the one-line summary like this:

   commit d25ea659bc37 ("asm-generic/io.h: move
  ioremap_nocache/ioremap_uc/ioremap_wc/ioremap_…")

- Ideally you use a 'Fixes:' tag in the patch description.

You can add these lines in your .gitconfig to help you here

[alias]
fixes = show --format='Fixes: %h (\"%s\")' -s
[core]
abbrev = 12

That will give you a 'git fixes' command to output the Fixes: line
in the correct format for future bug fixes. For this particular change,
I would actually just merge the two patches into one patch that
then doesn't break anything in the first place.

  Arnd

RE: [PATCH] net: fec: add necessary defines to work on ARM64

2018-01-18 Thread Andy Duan

From: Lucas Stach  Sent: Thursday, January 18, 2018 
5:54 PM
>Hi Andy,
>
>Am Donnerstag, den 18.01.2018, 01:49 + schrieb Andy Duan:
>> From: Lucas Stach  Sent: Thursday, January 18,
>> 2018 2:31 AM
>> > The i.MX8 is a ARMv8 based SoC, that uses the same FEC IP as the
>> > earlier,
>> > ARMv7 based, i.MX SoCs. Allow the driver to work on ARM64.
>> >
>> > Signed-off-by: Lucas Stach 
>> > ---
>[...]
>
>> Kconfig also need to add ARM64 support.
>
>No, not as far as I'm aware. I introduced the ARCH_MXC symbol for ARM64,
>which unlocks the FEC config option when selected.
>
>Regards,
>Lucas

Okay, that make sense. 
Acked-by: Fugang Duan

Re: [PATCH v6 04/36] earlycon: add reg-offset to physical address before mapping

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> It will get the wrong virtual address because port->mapbase is not added
> the correct reg-offset yet. We have to update it before earlycon_map()
> is called
>
> Signed-off-by: Greentime Hu 

Acked-by: Arnd Bergmann 
Cc: Peter Hurley 
Cc: sta...@vger.kernel.org
Fixes: 088da2a17619 ("of: earlycon: Initialize port fields from DT properties")

As this is a bugfix for generic code, this might be good to
have Greg pick up directly for his serial drivers tree. An Ack from
Peter Hurley would also help, as he introduced the reg-offset handling
originally.

  Arnd

Re: [PATCH v6 05/36] nds32: Assembly macros and definitions

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch includes assembly macros, bit field definitions used in .S
> files across arch/nds32/.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 

Acked-by: Arnd Bergmann

Re: [PATCH v6 06/36] nds32: Kernel booting and initialization

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:

I had not looked at this patch in enough detail earlier, sorry about
that. It should be
easy enough to fix though.

> +#ifdef CONFIG_VGA_CONSOLE
> +struct screen_info screen_info;
> +#endif

I would assume that you can't ever have a VGA console. Just drop all
the references
here and instead send a patch to the fbdev maintainer to add the dependency
at CONFIG_VGA_CONSOLE to prevent selecting it with nds32.

> +extern struct mm_struct init_mm;

init_mm is declared in linux/mm_types.h, so you should need another declaration.
In general, you should never put 'extern' declarations in to .c files anyway.

> +
> +extern void __init early_init_devtree(void *params);
> +extern void __init early_trap_init(void);

similarly, these are declared in include/linux/of_fdt.h

> +void __init setup_arch(char **cmdline_p)
> +{
> +   early_init_devtree(__atags_pointer ?
> +  phys_to_virt(__atags_pointer) : __dtb_start);

The reference to '__atags_pointer' appears to be a leftover from pre-DT
days. Can you just remove that?

> +void calibrate_delay(void)
> +{
> +   const int *val;
> +   struct device_node *cpu = NULL;
> +   cpu = of_find_compatible_node(NULL, NULL, "andestech,nds32v3");
> +   val = of_get_property(cpu, "clock-frequency", NULL);
> +   if (!val || !*val)
> +   panic("no cpu 'clock-frequency' parameter in device tree");
> +   loops_per_jiffy = be32_to_cpup(val) / HZ;
> +   pr_cont("%lu.%02lu BogoMIPS (lpj=%lu)\n",
> +   loops_per_jiffy / (50 / HZ),
> +   (loops_per_jiffy / (5000 / HZ)) % 100, loops_per_jiffy);
> +}

This seems very odd to me: The 'clock-frequency' property in the
cpu node should refer to the actual frequency it is running at, but that
tends to be different from the bogomips as needed by the ndelay()
function. Can you explain what is going on here?

   Arnd

Re: [PATCHv3 net-next 2/2] openvswitch: add erspan version I and II support

2018-01-18 Thread Jiri Benc

Looks much better.

On Wed, 17 Jan 2018 09:32:51 -0800, William Tu wrote:
> + OVS_ERSPAN_OPT_IDX, /* be32 index */

Why don't you convert this to u32 while passing to/from user space?

> + [OVS_ERSPAN_OPT_IDX]= { .len = sizeof(u32) },

sizeof(__be32) but see above.

Thanks,

 Jiri

Re: [PATCH v6 07/36] nds32: Exception handling

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch includes the exception/interrupt entries, pt_reg structure and
> related accessors.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 

Here it would be good to have a more detailed explanation about the alignment
trap handling. I remember discussing it with you before, but don't remember
the exact outcome. In particular you should explain here why you need to
handle alignment traps in the first place, and what the expected defaults
are (e.g. always disabled unless a user requests it, or always enabled) and
what kind of code runs into the traps (e.g. buggy kernel code, correct
kernel code, buggy user space code etc).

   Arnd

[PATCH 1/2] can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once

2018-01-18 Thread Marc Kleine-Budde

If an invalid CAN frame is received, from a driver or from a tun
interface, a Kernel warning is generated.

This patch replaces the WARN_ONCE by a simple pr_warn_once, so that a
kernel, bootet with panic_on_warn, does not panic. A printk seems to be
more appropriate here.

Reported-by: syzbot+4386709c0c1284dca...@syzkaller.appspotmail.com
Suggested-by: Dmitry Vyukov 
Acked-by: Oliver Hartkopp 
Cc: linux-stable 
Signed-off-by: Marc Kleine-Budde 
---
 net/can/af_can.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/net/can/af_can.c b/net/can/af_can.c
index 003b2d6d655f..ae835382e678 100644
--- a/net/can/af_can.c
+++ b/net/can/af_can.c
@@ -721,20 +721,16 @@ static int can_rcv(struct sk_buff *skb, struct net_device 
*dev,
 {
struct canfd_frame *cfd = (struct canfd_frame *)skb->data;
 
-   if (WARN_ONCE(dev->type != ARPHRD_CAN ||
- skb->len != CAN_MTU ||
- cfd->len > CAN_MAX_DLEN,
- "PF_CAN: dropped non conform CAN skbuf: "
- "dev type %d, len %d, datalen %d\n",
- dev->type, skb->len, cfd->len))
-   goto drop;
+   if (unlikely(dev->type != ARPHRD_CAN || skb->len != CAN_MTU ||
+cfd->len > CAN_MAX_DLEN)) {
+   pr_warn_once("PF_CAN: dropped non conform CAN skbuf: dev type 
%d, len %d, datalen %d\n",
+dev->type, skb->len, cfd->len);
+   kfree_skb(skb);
+   return NET_RX_DROP;
+   }
 
can_receive(skb, dev);
return NET_RX_SUCCESS;
-
-drop:
-   kfree_skb(skb);
-   return NET_RX_DROP;
 }
 
 static int canfd_rcv(struct sk_buff *skb, struct net_device *dev,
-- 
2.15.1

pull-request: can 2018-01-18

2018-01-18 Thread Marc Kleine-Budde

Hello David,

this is a pull reqeust of two patches for net/master:

The syzkaller project triggered two WARN_ONCE() in the af_can code from
userspace and we decided to replace it by a pr_warn_once().

regards,
Marc

---

The following changes since commit ad9294dbc227cbc8e173b3b963e7dd9af5314f77:

  bpf: fix cls_bpf on filter replace (2018-01-17 17:14:06 -0500)

are available in the Git repository at:

  ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can.git 
tags/linux-can-fixes-for-4.15-20180118

for you to fetch changes up to d4689846881d160a4d12a514e991a740bcb5d65a:

  can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once (2018-01-18 
09:32:54 +0100)


linux-can-fixes-for-4.15-20180118


Marc Kleine-Budde (2):
  can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once
  can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once

 net/can/af_can.c | 36 ++--
 1 file changed, 14 insertions(+), 22 deletions(-)

[PATCH 2/2] can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once

2018-01-18 Thread Marc Kleine-Budde

If an invalid CANFD frame is received, from a driver or from a tun
interface, a Kernel warning is generated.

This patch replaces the WARN_ONCE by a simple pr_warn_once, so that a
kernel, bootet with panic_on_warn, does not panic. A printk seems to be
more appropriate here.

Reported-by: syzbot+e3b775f40babeff6e...@syzkaller.appspotmail.com
Suggested-by: Dmitry Vyukov 
Acked-by: Oliver Hartkopp 
Cc: linux-stable 
Signed-off-by: Marc Kleine-Budde 
---
 net/can/af_can.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/net/can/af_can.c b/net/can/af_can.c
index ae835382e678..4d7f988a3130 100644
--- a/net/can/af_can.c
+++ b/net/can/af_can.c
@@ -738,20 +738,16 @@ static int canfd_rcv(struct sk_buff *skb, struct 
net_device *dev,
 {
struct canfd_frame *cfd = (struct canfd_frame *)skb->data;
 
-   if (WARN_ONCE(dev->type != ARPHRD_CAN ||
- skb->len != CANFD_MTU ||
- cfd->len > CANFD_MAX_DLEN,
- "PF_CAN: dropped non conform CAN FD skbuf: "
- "dev type %d, len %d, datalen %d\n",
- dev->type, skb->len, cfd->len))
-   goto drop;
+   if (unlikely(dev->type != ARPHRD_CAN || skb->len != CANFD_MTU ||
+cfd->len > CANFD_MAX_DLEN)) {
+   pr_warn_once("PF_CAN: dropped non conform CAN FD skbuf: dev 
type %d, len %d, datalen %d\n",
+dev->type, skb->len, cfd->len);
+   kfree_skb(skb);
+   return NET_RX_DROP;
+   }
 
can_receive(skb, dev);
return NET_RX_SUCCESS;
-
-drop:
-   kfree_skb(skb);
-   return NET_RX_DROP;
 }
 
 /*
-- 
2.15.1

Re: [PATCH v2 03/31] net: Introduce net_sem for protection of pernet_list

2018-01-18 Thread Kirill Tkhai

On 17.01.2018 23:04, Andrei Vagin wrote:
> On Mon, Nov 20, 2017 at 09:32:34PM +0300, Kirill Tkhai wrote:
>> Curently mutex is used to protect pernet operations list. It makes
>> cleanup_net() to execute ->exit methods of the same operations set,
>> which was used on the time of ->init, even after net namespace is
>> unlinked from net_namespace_list.
>>
>> But the problem is it's need to synchronize_rcu() after net is removed
>> from net_namespace_list():
>>
>> Destroy net_ns:
>> cleanup_net()
>>   mutex_lock(&net_mutex)
>>   list_del_rcu(&net->list)
>>   synchronize_rcu()  <--- Sleep there for 
>> ages
>>   list_for_each_entry_reverse(ops, &pernet_list, list)
>> ops_exit_list(ops, &net_exit_list)
>>   list_for_each_entry_reverse(ops, &pernet_list, list)
>> ops_free_list(ops, &net_exit_list)
>>   mutex_unlock(&net_mutex)
>>
>> This primitive is not fast, especially on the systems with many processors
>> and/or when preemptible RCU is enabled in config. So, all the time, while
>> cleanup_net() is waiting for RCU grace period, creation of new net namespaces
>> is not possible, the tasks, who makes it, are sleeping on the same mutex:
>>
>> Create net_ns:
>> copy_net_ns()
>>   mutex_lock_killable(&net_mutex)<--- Sleep there for 
>> ages
>>
>> I observed 20-30 seconds hangs of "unshare -n" on ordinary 8-cpu laptop
>> with preemptible RCU enabled.
>>
>> The solution is to convert net_mutex to the rw_semaphore and add small locks
>> to really small number of pernet_operations, what really need them. Then,
>> pernet_operations::init/::exit methods, modifying the net-related data,
>> will require down_read() locking only, while down_write() will be used
>> for changing pernet_list.
>>
>> This gives signify performance increase, after all patch set is applied,
>> like you may see here:
>>
>> %for i in {1..1}; do unshare -n bash -c exit; done
>>
>> *before*
>> real 1m40,377s
>> user 0m9,672s
>> sys 0m19,928s
>>
>> *after*
>> real 0m17,007s
>> user 0m5,311s
>> sys 0m11,779
>>
>> (5.8 times faster)
>>
>> This patch starts replacing net_mutex to net_sem. It adds rw_semaphore,
>> describes the variables it protects, and makes to use where appropriate.
>> net_mutex is still present, and next patches will kick it out step-by-step.
>>
>> Signed-off-by: Kirill Tkhai 
>> ---
>>  include/linux/rtnetlink.h |1 +
>>  net/core/net_namespace.c  |   39 ++-
>>  net/core/rtnetlink.c  |4 ++--
>>  3 files changed, 29 insertions(+), 15 deletions(-)
>>
>> diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
>> index 2032ce2eb20b..f640fc87fe1d 100644
>> --- a/include/linux/rtnetlink.h
>> +++ b/include/linux/rtnetlink.h
>> @@ -35,6 +35,7 @@ extern int rtnl_is_locked(void);
>>  
>>  extern wait_queue_head_t netdev_unregistering_wq;
>>  extern struct mutex net_mutex;
>> +extern struct rw_semaphore net_sem;
>>  
>>  #ifdef CONFIG_PROVE_LOCKING
>>  extern bool lockdep_rtnl_is_held(void);
>> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
>> index 2e512965bf42..859dce31e37e 100644
>> --- a/net/core/net_namespace.c
>> +++ b/net/core/net_namespace.c
>> @@ -41,6 +41,11 @@ struct net init_net = {
> 
>> static LIST_HEAD(pernet_list);
>> static struct list_head *first_device = &pernet_list;
>> DEFINE_MUTEX(net_mutex);
> 
> With all patches, we still have the net_mutex, I think we need to add a
> comment, which explains why we need it. Are "sync" pernet operations
> depricated after this series? Or is it ok to have them?
 
net_mutex will leave till the time all pernet_operations are converted.
But people, who don't use unconverted operations, will have performance
profit already. Comment is not a problem :)

Thanks, Kirill
 
>>  EXPORT_SYMBOL(init_net);
>>  
>>  static bool init_net_initialized;
>> +/*
>> + * net_sem: protects: pernet_list, net_generic_ids,
>> + * init_net_initialized and first_device pointer.
>> + */
>> +DECLARE_RWSEM(net_sem);
>>  
>>  #define MIN_PERNET_OPS_ID   \
>>  ((sizeof(struct net_generic) + sizeof(void *) - 1) / sizeof(void *))
>> @@ -279,7 +284,7 @@ struct net *get_net_ns_by_id(struct net *net, int id)
>>   */
>>  static __net_init int setup_net(struct net *net, struct user_namespace 
>> *user_ns)
>>  {
>> -/* Must be called with net_mutex held */
>> +/* Must be called with net_sem held */
>>  const struct pernet_operations *ops, *saved_ops;
>>  int error = 0;
>>  LIST_HEAD(net_exit_list);
>> @@ -411,12 +416,16 @@ struct net *copy_net_ns(unsigned long flags,
>>  net->ucounts = ucounts;
>>  get_user_ns(user_ns);
>>  
>> -rv = mutex_lock_killable(&net_mutex);
>> +rv = down_read_killable(&net_sem);
>>  if (rv < 0)
>>  goto put_userns;
>> -
>> +rv = mutex_lock_killable(&net_mutex);
>> +if (rv < 0)
>> +goto up_read;
>>  rv = setup_net(net, user_ns);
>>  mutex_unlock(&net_mutex);
>> +up_read:
>> +up_read

Re: [PATCH v6 08/36] nds32: MMU definitions

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch includes virtual memory layout, PHYS_OFFSET is defined as 0x0. It
> also includes the 4KB/8KB page size configurations and pte operations.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 

Acked-by: Arnd Bergmann

Re: [PATCH v2 05/31] net: Allow pernet_operations to be executed in parallel

2018-01-18 Thread Kirill Tkhai

On 17.01.2018 21:34, Andrei Vagin wrote:
> On Mon, Nov 20, 2017 at 09:32:55PM +0300, Kirill Tkhai wrote:
>> This adds new pernet_operations::async flag to indicate operations,
>> which ->init(), ->exit() and ->exit_batch() methods are allowed
>> to be executed in parallel with the methods of any other pernet_operations.
>>
>> When there are only asynchronous pernet_operations in the system,
>> net_mutex won't be taken for a net construction and destruction.
>>
>> Also, remove BUG_ON(mutex_is_locked()) from net_assign_generic()
>> without replacing with the equivalent net_sem check, as there is
>> one more lockdep assert below.
>>
>> Suggested-by: Eric W. Biederman 
>> Signed-off-by: Kirill Tkhai 
>> ---
>>  include/net/net_namespace.h |6 ++
>>  net/core/net_namespace.c|   29 +++--
>>  2 files changed, 25 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
>> index 10f99dafd5ac..db978c4755f7 100644
>> --- a/include/net/net_namespace.h
>> +++ b/include/net/net_namespace.h
>> @@ -303,6 +303,12 @@ struct pernet_operations {
>>  void (*exit_batch)(struct list_head *net_exit_list);
>>  unsigned int *id;
>>  size_t size;
>> +/*
>> + * Indicates above methods are allowe to be executed in parallel
>> + * with methods of any other pernet_operations, i.e. they are not
>> + * need synchronization via net_mutex.
>> + */
>> +bool async;
>>  };
>>  
>>  /*
>> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
>> index c4f7452906bb..550c766f73aa 100644
>> --- a/net/core/net_namespace.c
>> +++ b/net/core/net_namespace.c
>> @@ -41,8 +41,9 @@ struct net init_net = {
>>  EXPORT_SYMBOL(init_net);
>>  
>>  static bool init_net_initialized;
>> +static unsigned nr_sync_pernet_ops;
>>  /*
>> - * net_sem: protects: pernet_list, net_generic_ids,
>> + * net_sem: protects: pernet_list, net_generic_ids, nr_sync_pernet_ops,
>>   * init_net_initialized and first_device pointer.
>>   */
>>  DECLARE_RWSEM(net_sem);
>> @@ -70,11 +71,10 @@ static int net_assign_generic(struct net *net, unsigned 
>> int id, void *data)
>>  {
>>  struct net_generic *ng, *old_ng;
>>  
>> -BUG_ON(!mutex_is_locked(&net_mutex));
>>  BUG_ON(id < MIN_PERNET_OPS_ID);
>>  
>>  old_ng = rcu_dereference_protected(net->gen,
>> -   lockdep_is_held(&net_mutex));
>> +   lockdep_is_held(&net_sem));
>>  if (old_ng->s.len > id) {
>>  old_ng->ptr[id] = data;
>>  return 0;
>> @@ -419,11 +419,14 @@ struct net *copy_net_ns(unsigned long flags,
>>  rv = down_read_killable(&net_sem);
>>  if (rv < 0)
>>  goto put_userns;
>> -rv = mutex_lock_killable(&net_mutex);
>> -if (rv < 0)
>> -goto up_read;
>> +if (nr_sync_pernet_ops) {
>> +rv = mutex_lock_killable(&net_mutex);
>> +if (rv < 0)
>> +goto up_read;
>> +}
>>  rv = setup_net(net, user_ns);
>> -mutex_unlock(&net_mutex);
>> +if (nr_sync_pernet_ops)
>> +mutex_unlock(&net_mutex);
>>  up_read:
>>  up_read(&net_sem);
>>  if (rv < 0) {
>> @@ -453,7 +456,8 @@ static void cleanup_net(struct work_struct *work)
>>  spin_unlock_irq(&cleanup_list_lock);
>>  
>>  down_read(&net_sem);
>> -mutex_lock(&net_mutex);
>> +if (nr_sync_pernet_ops)
>> +mutex_lock(&net_mutex);
>>  
>>  /* Don't let anyone else find us. */
>>  rtnl_lock();
>> @@ -489,7 +493,8 @@ static void cleanup_net(struct work_struct *work)
>>  list_for_each_entry_reverse(ops, &pernet_list, list)
>>  ops_exit_list(ops, &net_exit_list);
>>  
>> -mutex_unlock(&net_mutex);
>> +if (nr_sync_pernet_ops)
>> +mutex_unlock(&net_mutex);
>>  
>>  /* Free the net generic variables */
>>  list_for_each_entry_reverse(ops, &pernet_list, list)
>> @@ -961,6 +966,9 @@ static int register_pernet_operations(struct list_head 
>> *list,
>>  rcu_barrier();
>>  if (ops->id)
>>  ida_remove(&net_generic_ids, *ops->id);
>> +} else if (!ops->async) {
>> +pr_info_once("Pernet operations %ps are sync.\n", ops);
> 
> As far as I understand, we have this sync mode for backward
> compatibility with non-upstream modules, don't we? If the answer is yes,
> it may be better to add WARN_ONCE here?

There are 200+ more pernet operations requiring the review and making them 
async.
This pr_info_once() is to help people find unconverted pernet_operations they 
use
and start the work on converting them.

Thanks,
Kirill
 
>> +nr_sync_pernet_ops++;
>>  }
>>  
>>  return error;
>> @@ -968,7 +976,8 @@ static int register_pernet_operations(struct list_head 
>> *list,
>>  
>>  static void unregister_pernet_operations(struct pernet_operations *ops)
>>  {
>> -
>> +if (!ops->async)
>> +BU

Re: [PATCH v6 09/36] nds32: MMU initialization

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch includes memory initializations and highmem supporting.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 

Acked-by: Arnd Bergmann

Re: [PATCH v6 10/36] nds32: MMU fault handling and page table management

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch includes page fault handler, mmap and fixup implementations.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 

Acked-by: Arnd Bergmann

Re: [PATCH v6 11/36] nds32: Cache and TLB routines

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch contains cache and TLB maintenance functions.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 

Acked-by: Arnd Bergmann

Re: [PATCH v6 12/36] nds32: Process management

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:

> +void machine_restart(char *__unused)
> +{
> +   /*
> +* Clean and disable cache, and turn off interrupts
> +*/
> +   cpu_proc_fin();
> +
> +   /*
> +* Tell the mm system that we are going to reboot -
> +* we may need it to insert some 1:1 mappings so that
> +* soft boot works.
> +*/
> +   setup_mm_for_reboot(reboot_mode);
> +
> +   /*
> +* Now call the architecture specific reboot code.
> +*/
> +   arch_reset(reboot_mode);
> +
> +   /*
> +* Whoops - the architecture was unable to reboot.
> +* Tell the user!
> +*/
> +   mdelay(1000);
> +   pr_info("Reboot failed -- System halted\n");
> +   while (1) ;
> +}

You should insert a call to do_kernel_restart() in this function to allow e.g. a
watchdog driver to provide a machine-specific reboot method. Otherwise
the patch looks good to me,

Acked-by: Arnd Bergmann

Re: [PATCH v6 13/36] nds32: IRQ handling

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch includes irq related functions and irqchip_init().
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 

Acked-by: Arnd Bergmann

Re: [PATCH v6 14/36] nds32: Atomic operations

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch includes the atomic and futex operations. Many atomic operations 
> use
> the load-lock word(llw) and store-condition word(scw) operations.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 

Acked-by: Arnd Bergmann

Re: [PATCH v6 15/36] nds32: Device specific operations

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch introduces ioremap implementations.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 

Acked-by: Arnd Bergmann

Re: [PATCH v6 16/36] nds32: DMA mapping API

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch adds support for the DMA mapping API. It uses dma_map_ops for
> flexibility.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 

I'm still unhappy about the way the cache flushes are done here as discussed
before. It's not a show-stopped, but no Ack from me.

Arnd

Re: [PATCH net-next v5 0/4] dwmac-meson8b: clock fixes for Meson8b

2018-01-18 Thread Yixun Lan

HI Jerome

On 01/17/18 01:19, Jerome Brunet wrote:
> On Tue, 2018-01-16 at 12:17 +0100, Martin Blumenstingl wrote:
 Hi Martin

   I'm having problem with this series applied.
 I've tested on the A113D (AXG) platform, if this patch is applied, the
 driver will choose MPLL2 as clk source, and seems it doesn't work out
 with this clk (fail to get IP)

 grep mpll2 /sys/kernel/debug/clk/clk_summary
 mpll2  11   24449
   0 0

>>>
>>> This is because the fixed pll is 199200 on the axg instead of 2Ghz.
>>
>> is there a chance that we can have a public AXG datasheet, similar to
>> what Hardkernel and Khadas published for the S805, S905, S905X and
>> S912?
>> this would give us community developers a chance to know that we are
>> going to break something on AXG *before* a patch is published (like in
>> this case :( )
>>
>>> As a consequence fdiv4 is 49800.
> 
> Quick update
> I have just been able measure fdiv4 and it is exactly 500Mhz
> This means the the fixed_pll is actually 2GHz, as we would expect.
> 
yes, you right here. the output of fixed pll is actually 2GHz (1999.998MHz)


> This also means that calculation of the fixed_pll is slightly off on the axg
> platform.
> 
the calculation algorithm should be updated because previous one didn't
take the 'frac' part into account.


> Once the clock calculation is done correctly on axg, there should be no 
> problem.
> 
there is still a problem, current calculation will still result at
fixed_pll frequency - 1999,000,000, I would expect it 2000,000,000 (to
round it to closest? or just 1999,998,000)


>>>
>>> CCF can reach something closer to 250Mhz with the mpll2.
>>>
>>> The easy way to fix this would be to make the inputs of ethernet mux 
>>> optional
>>> and not provide the MPLL2 on the axg.
>>>
>>> However, this raises a few question on the axg platform :
>>> 1) Why is the root pll 1992Mhz and not 2GHz ? seems a weird choice just for 
>>> 4MHz
>>> This 1992Mhz does not seems to help generate audio freqs anymore than 2GHz
>>> would. I don't really get this choice. Could you help us understand Yixun ?
>>>
>>> 2) Why is ethernet not working with mpll2 on axg ? espcially since we have
>>> proven the rate be correct on meson8 ... is there any change on this 
>>> platform we
>>> don't know about ?
>>
>> I tried to force my Khadas VIM2 (GXM) into using MPLL2 by changing
>> meson-gxl.dtsi:
>> clocks = <&clkc CLKID_ETH>,
>> -<&clkc CLKID_FCLK_DIV2>,
>> +<&xtal>,
>> <&clkc CLKID_MPLL2>;
>>
>> the resulting clock tree looks fine:
>> fixed_pll 440  20
>> 0 0
>>   mpll2_div  110   25000
>>0 0
>>  mpll2   110   25000
>>0 0
>> c941.ethernet#m250_sel   110
>> 25000  0 0
>>c941.ethernet#m250_div   110
>> 25000  0 0
>>   c941.ethernet#fixed_div2   11
>> 0   12500  0 0
>>  c941.ethernet#rgmii_tx_en   11
>> 0   12500  0 0
>>
>> however, Ethernet is broken (I can't ping anything)
>>
>> @Yixun: according to all public datasheets bit 4 is reserved
>> however, Mike and Kevin were told that bit 4 controls a mux between
>> FCLK_DIV2 and MPLL2
>> could you please clarify the meaning of this bit 4 on:
>> - Meson8b
>> - Meson8m2 (a confirmation that it uses the same Ethernet registers
>> with the same bits/meanings of these as Meson8b would be enough)
>> - GXBB / GXL / GXM (assuming that these are using an identical IP for 
>> PRG_ETH0)
>> - AXG
> 
> Indeed, if bit 4 stands for something else, or if some combination are known 
> to
> fail, it would be nice to know.
> 
the bit 4 is acutally a mux which used to choose two clock sources:
  a) fclk_div2  b) MPLL2
(I couldn't check all the SoC generation, but a least I know..)

>>From my point of view, the problem reported by yixun is with the clock
> controller, not this series, which I think should be merged.
> 
feedback from ASIC team, they think the MPLL2 clock source should also
work.. I don't know why it's broken here)-(

> Still, we should clockin of the mac optional so don't have to provide the 
> inputs
> known to fail, if any.
> 
>>
>>
>> Regards
>> Martin
> 
> .
>

Re: [PATCH v6 17/36] nds32: ELF definitions

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch adds definitions for the ELF format, relocation types, vdso
> locations and EXEC_PAGESIZE.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 

Acked-by: Arnd Bergmann

Re: [PATCH v6 18/36] nds32: System calls handling

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch adds support for system calls.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 

You seem to have finally addressed all my previous concerns, looks good.

Reviewed-by: Arnd Bergmann

Re: [PATCH v6 19/36] nds32: VDSO support

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch adds VDSO support. The VDSO code is currently used for
> sys_rt_sigreturn() and optimised gettimeofday() (using the SoC timer counter).
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 

Acked-by: Arnd Bergmann

Re: [PATCH v6 20/36] nds32: Signal handling support

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch adds support for signal handling.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 

I never feel qualified enough to properly review signal handling code, so
no Ack from me for this code even though I don't see anything wrong with it.
Hopefully someone else can give an Ack after looking more closely.

   Arnd

Re: [PATCH v6 21/36] nds32: Library functions

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch add support for various library functions.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 

Acked-by: Arnd Bergmann

[PATCH iproute2] tc/lexer: let quotes actually start strings

2018-01-18 Thread Wolfgang Bumiller

The lexer will go with the longest match, so previously
the starting double quotes of a string would be swallowed by
the [^ \t\r\n()]+ pattern leaving the user no way to
actually use strings with escape sequences.
Fix this by not allowing this case to start with double
quotes.

Signed-off-by: Wolfgang Bumiller 
---
 tc/emp_ematch.l | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tc/emp_ematch.l b/tc/emp_ematch.l
index dc106759..d7a99304 100644
--- a/tc/emp_ematch.l
+++ b/tc/emp_ematch.l
@@ -137,7 +137,7 @@
 ")"{
return yylval.i = *yytext;
}
-[^ \t\r\n()]+  {
+[^" \t\r\n()][^ \t\r\n()]* {
yylval.b = bstr_alloc(yytext);
if (yylval.b == NULL)
return ERROR;
-- 
2.11.0

[PATCH net 2/2] net_sched: fix TCF_LAYER_LINK case in tcf_get_base_ptr

2018-01-18 Thread Wolfgang Bumiller

TCF_LAYER_LINK and TCF_LAYER_NETWORK returned the same pointer as
skb->data points to the network header.
Use skb_mac_header instead.

Signed-off-by: Wolfgang Bumiller 
---
Alternatively this could return skb->head directly, but
'sk_buff->mac_header' is documented as 'Link layer header' and this
seemed more clear. Since on the first read I thought "it looks fine"
while in fact skb->head comes before skb->data, so this seems less
confusing.

 include/net/pkt_cls.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 8e08b6da72f3..753ac9361154 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -522,7 +522,7 @@ static inline unsigned char * tcf_get_base_ptr(struct 
sk_buff *skb, int layer)
 {
switch (layer) {
case TCF_LAYER_LINK:
-   return skb->data;
+   return skb_mac_header(skb);
case TCF_LAYER_NETWORK:
return skb_network_header(skb);
case TCF_LAYER_TRANSPORT:
-- 
2.11.0

[PATCH net 1/2] net: sched: em_nbyte: don't add the data offset twice

2018-01-18 Thread Wolfgang Bumiller

'ptr' is shifted by the offset and then validated,
the memcmp should not add it a second time.

Signed-off-by: Wolfgang Bumiller 
---
 net/sched/em_nbyte.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/em_nbyte.c b/net/sched/em_nbyte.c
index df3110d69585..07c10bac06a0 100644
--- a/net/sched/em_nbyte.c
+++ b/net/sched/em_nbyte.c
@@ -47,15 +47,15 @@ static int em_nbyte_match(struct sk_buff *skb, struct 
tcf_ematch *em,
unsigned char *ptr = tcf_get_base_ptr(skb, nbyte->hdr.layer);
 
ptr += nbyte->hdr.off;
 
if (!tcf_valid_offset(skb, ptr, nbyte->hdr.len))
return 0;
 
-   return !memcmp(ptr + nbyte->hdr.off, nbyte->pattern, nbyte->hdr.len);
+   return !memcmp(ptr, nbyte->pattern, nbyte->hdr.len);
 }
 
 static struct tcf_ematch_ops em_nbyte_ops = {
.kind = TCF_EM_NBYTE,
.change   = em_nbyte_change,
.match= em_nbyte_match,
.owner= THIS_MODULE,
-- 
2.11.0

[PATCH net+iproute2 0/2] nbyte, cmp and text filter fixups

2018-01-18 Thread Wolfgang Bumiller

The iproute2 part allows the the actual use of the already existing
quoted string parsing.

The kernel side fixes an oob read in em_nbyte and allows 'layer 0' in
cmp and nbyte (and em_text whose existence surprised me given that I did
not see it exposed via iproute2) to actually match layer 0 rather than
being the same as specifying layer 1.

I seem to have stumbled upon a layer of dust (says git-blame).
Trying to match mac addresses I felt that the examples found online
using the 'u32' filter were rather inconvenient, particularly given
that there's the 'nbyte' filter around that could just memcmp the
entire a byte sequence at once.

Wolfgang Bumiller (1; 2):
  tc/lexer: let quotes actually start strings

 tc/emp_ematch.l | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

  net: sched: em_nbyte: don't add the data offset twice
  net_sched: fix TCF_LAYER_LINK case in tcf_get_base_ptr

 include/net/pkt_cls.h | 2 +-
 net/sched/em_nbyte.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

-- 
2.11.0

Re: [PATCH v6 22/36] nds32: Debugging support

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch adds ptrace support.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 

I must have missed this patch earlier, unfortunately I don't think
this is ready:

> +long arch_ptrace(struct task_struct *child, long request, unsigned long addr,
> +unsigned long data)
> +{
> +   int ret;
> +
> +   switch (request) {
> +   case PTRACE_PEEKUSR:
> +   ret =
> +   ptrace_read_user(child, addr, (unsigned long __user 
> *)data);
> +   break;
> +
> +   case PTRACE_POKEUSR:
> +   ret = ptrace_write_user(child, addr, data);
> +   break;
> +
> +   case PTRACE_GETREGS:
> +   ret = ptrace_getregs(child, (void __user *)data);
> +   break;
> +
> +   case PTRACE_SETREGS:
> +   ret = ptrace_setregs(child, (void __user *)data);
> +   break;
> +
> +   case PTRACE_GETFPREGS:
> +   ret = ptrace_getfpregs(child, (void __user *)data);
> +   break;
> +
> +   case PTRACE_SETFPREGS:
> +   ret = ptrace_setfpregs(child, (void __user *)data);
> +   break;
> +
> +   default:
> +   ret = ptrace_request(child, request, addr, data);
> +   break;
> +   }
> +
> +   return ret;
> +}

It appears that you are implementing the old-style ptrace handling
with architecture specific commands. Please have a look at how
this is done in risc-v or arm64. If this takes more too much time
to address, I'd suggest using an empty stub function for sys_ptrace
and adding it back at a later point, but not send the current version
upstream.

 Arnd

Re: [PATCH v6 23/36] nds32: L2 cache support

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch adds L2 cache support.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 

Acked-by: Arnd Bergmann

Re: [PATCH v6 24/36] nds32: Loadable modules

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch adds support for loadable modules.

One detail:

You still seem to have both the ELF_REL and ELF_RELA based functions
implemented here, you should drop the unused ELF_REL version:

> diff --git a/arch/nds32/kernel/module.c b/arch/nds32/kernel/module.c
> new file mode 100644
> index 000..714a6d6
> --- /dev/null
> +++ b/arch/nds32/kernel/module.c
> @@ -0,0 +1,286 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2005-2017 Andes Technology Corporation
> +
> +#include 
> +#include 
> +#include 
> +
> +#include 

include  to catch this.

> +int
> +apply_relocate(Elf32_Shdr * sechdrs, const char *strtab,
> +  unsigned int symindex, unsigned int relsec,
> +  struct module *module)
> +{
> +   return 0;
> +}

and drop this.

With that change,

Acked-by:  Arnd Bergmann

Re: [PATCH v6 25/36] nds32: Generic timers support

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch adds support for timer.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 
> Reviewed-by: Linus Walleij 

Acked-by: Arnd Bergmann

Re: [PATCH v6 26/36] nds32: Device tree support

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch adds support for device tree.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 


Acked-by: Arnd Bergmann

Re: [PATCH v6 28/36] nds32: defconfig

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch adds nds32 defconfig.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 

Acked-by: Arnd Bergmann

Re: [PATCH v6 30/36] MAINTAINERS: Add nds32

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> Signed-off-by: Greentime Hu 

Please add a changelog text to every single commit, otherwise:

Acked-by: Arnd Bergmann

Re: [PATCH v6 32/36] dt-bindings: nds32 L2 cache controller Bindings

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch adds nds32 L2 cache controller binding documents.
>
> Signed-off-by: Greentime Hu 
> Reviewed-by: Rob Herring 

Acked-by: Arnd Bergmann

Re: [PATCH v6 34/36] dt-bindings: interrupt-controller: Andestech Internal Vector Interrupt Controller

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch adds an irqchip driver document for the Andestech Internal Vector
> Interrupt Controller.
>
> Signed-off-by: Rick Chen 
> Signed-off-by: Greentime Hu 
> Reviewed-by: Rob Herring 

Acked-by: Arnd Bergmann

Re: [PATCH v6 27/36] nds32: Miscellaneous header files

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch introduces some miscellaneous header files.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 

Acked-by: Arnd Bergmann

Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

2018-01-18 Thread Dmitry Vyukov

On Wed, Jan 17, 2018 at 11:11 AM, Henrique de Moraes Holschuh
 wrote:
> On Wed, 17 Jan 2018, Dmitry Vyukov wrote:
>> On Wed, Jan 17, 2018 at 10:32 AM, Pavel Machek  wrote:
>> > On Fri 2018-01-12 17:58:01, syzbot wrote:
>> >> syzkaller hit the following crash on
>> >> 19d28fbd306e7ae7c1acf05c3e6968b56f0d196b
>> >
>> > What an useful way to describe kernel version.
>> >
>> > Could we get reasonable subject line? 4.15-rc7: prefix would be nice
>> > if it is on mainline,
>>
>> Yes, I guess. I am all for useful improvements.
>> What exactly is reasonable subject line? And how it can be extracted
>> for an arbitrary kernel tree?
>
> It can't, I guess.  But maybe you could extract it from syzbot
> information about the context of that patch?
>
> Maybe tagging it with the git tree you fetched when getting it over git,
> and mail from[1]+subject+message-id when getting it over email?
>
> [1] processing of related headers to handle mailing lists and
> retransmits is required, e.g. ressent-*, etc.  But this is relatively
> easy to do as well.
>
> A map to generate subject prefixes from key git trees or MLs could
> enhance that even further, to get at least mainline:, *-next:, etc.

Hi Henrique,

Re report format.

Ted also provided some useful feedback here:
https://groups.google.com/d/msg/syzkaller/5hjgr2v_oww/fn5QW6dvDQAJ

I've made a bunch of changes yesterday and today. This includes
rearranging lines in the email, rearranging attachment order, removing
some clutter, providing short repo alias (upstream, linux-next, net,
etc), providing commit date and title. syzbot will not also prefer to
report crashes on upstream tree, rather than on other trees.
Re subject line, I don't think prefixing subject with tree will work.
What you see as a single crash actually represents from tens to tens
of thousands crashes on some set of trees. And that set grows over
time. That can be one set of trees when the bug is first reported, and
then another subset of trees when a reproducer is found. It's
obviously a bad idea to send a email per crash (every few seconds),
and even per crash/tree. To alleviate this, syzbot will now say e.g.
"So far this crash happened 185 times on linux-next, mmots, net-next,
upstream". So that you can see that it's not only, say, linux-next
problem.

syzbot just mailed another report with all of these changes which you
can see here:
https://groups.google.com/forum/#!msg/syzkaller-bugs/u5nq3PdPkIc/F4tXzErxAgAJ

Thanks

Re: [PATCH net-next] macvlan: Pass SIOC[SG]HWTSTAMP ioctls and get_ts_info to lower device

2018-01-18 Thread Jiri Benc

On Thu, 18 Jan 2018 02:12:34 +0100, grzegorz.ha...@gmail.com wrote:
> This patch allows to enable hardware timestamping on macvlan intefaces and 
> find out capabilities of the lower device.
> 
> Signed-off-by: Grzegorz Halat 

NACK. This does not work.

For start, how do you deal with fwd_priv? When a packet is sent to
other software ports, it wouldn't be time stamped. And I expect more
cases like this to be there in macvlan, I only spent 10 seconds
checking it.

Please study how time stamping in the kernel works. A good start is
Documentation/networking/timestamping.txt. Then examine all possible
packet paths with macvlan, both egress and ingress.

> +static int macvlan_ethtool_get_ts_info(struct net_device *dev,
> + struct ethtool_ts_info *ts_info)
> +{
> + const struct macvlan_dev *vlan = netdev_priv(dev);
> + const struct ethtool_ops *eth_ops = vlan->lowerdev->ethtool_ops;
> +
> + if (eth_ops->get_ts_info)
> + return eth_ops->get_ts_info(vlan->lowerdev, ts_info);
> +
> + ts_info->so_timestamping = SOF_TIMESTAMPING_RX_SOFTWARE |
> +SOF_TIMESTAMPING_SOFTWARE;
> + ts_info->phc_index = -1;

What calls skb_tx_timestamp if the driver does not support it?

 Jiri

Re: [PATCH net+iproute2 0/2] nbyte, cmp and text filter fixups

2018-01-18 Thread Jiri Pirko

Thu, Jan 18, 2018 at 11:32:33AM CET, w.bumil...@proxmox.com wrote:
>The iproute2 part allows the the actual use of the already existing
>quoted string parsing.

Makes no sense to send iproute2 patch which is not related with the
kernel ones in the same set. Please send it separatelly.

>
>The kernel side fixes an oob read in em_nbyte and allows 'layer 0' in
>cmp and nbyte (and em_text whose existence surprised me given that I did
>not see it exposed via iproute2) to actually match layer 0 rather than
>being the same as specifying layer 1.
>
>I seem to have stumbled upon a layer of dust (says git-blame).
>Trying to match mac addresses I felt that the examples found online
>using the 'u32' filter were rather inconvenient, particularly given
>that there's the 'nbyte' filter around that could just memcmp the
>entire a byte sequence at once.
>
>Wolfgang Bumiller (1; 2):
>  tc/lexer: let quotes actually start strings
>
> tc/emp_ematch.l | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>  net: sched: em_nbyte: don't add the data offset twice
>  net_sched: fix TCF_LAYER_LINK case in tcf_get_base_ptr

"net: sched:" or "net_sched:"? - please, try to be consistent

Re: [PATCH v6 29/36] nds32: Build infrastructure

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch adds Makefile, Kconfig and vmlinux.lds.S files required for 
> building
> an nds32 kernel.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Greentime Hu 

I find some new details every time I look here ;-)

> @@ -0,0 +1,107 @@
> +#
> +# For a description of the syntax of this configuration file,
> +# see Documentation/kbuild/kconfig-language.txt.
> +#
> +
> +config NDS32
> +def_bool y
> +   select ARCH_HAS_RAW_COPY_USER

I don't think this symbol was ever merged. Do you remember why you added it?

> +   select ARCH_WANT_FRAME_POINTERS if FTRACE
> +   select ARCH_WANT_IPC_PARSE_VERSION

You most certainly don't want IPC_PARSE_VERSION, please drop this
and adapt your glibc.

> +   select CLKSRC_MMIO
> +   select CLONE_BACKWARDS
> +   select COMMON_CLK
> +   select FRAME_POINTER

Do you need both ARCH_WANT_FRAME_POINTERS and FRAME_POINTER here?

> +   select GENERIC_ATOMIC64
> +   select GENERIC_CPU_DEVICES
> +   select GENERIC_CLOCKEVENTS
> +   select GENERIC_IRQ_CHIP
> +   select GENERIC_IRQ_PROBE

I think it's better to drop GENERIC_IRQ_PROBE here, no modern driver
should rely on that.

> +choice
> +   prompt "CPU type"
> +   default CPU_V3
> +config CPU_N15
> +   bool "AndesCore N15"
> +config CPU_N13
> +   bool "AndesCore N13"
> +   select CPU_CACHE_ALIASING if ANDES_PAGE_SIZE_4KB
> +config CPU_N10
> +   bool "AndesCore N10"
> +   select CPU_CACHE_ALIASING
> +config CPU_D15
> +   bool "AndesCore D15"
> +config CPU_D10
> +   bool "AndesCore D10"
> +   select CPU_CACHE_ALIASING
> +config CPU_V3
> +   bool "AndesCore v3 compatible"
> +   select ANDES_PAGE_SIZE_8KB
> +endchoice

I forget what we discussed here earlier, but at the very least, there should be
some help text here to explain what the implications are. I assume that you
generally want to be able to build one kernel to run on all of the above, right?

Will selecting 'CPU_V3' result in a kernel binary that can run on all of them?
If so, please explain it here as that is not obvious.

For the other CPU types, can you list the what backwards-compatiblity
you get? E.g. will a kernel built for N13 run on any of N15, D15 or N10?

I think the 'select ANDES_PAGE_SIZE_8KB' cannot work as expected,
since ANDES_PAGE_SIZE_8KB is inside of a 'choice' statement. Since
there are only two options (4K and 8K), you can address that by making
it a simple bool option and fall back to 4K when ANDES_PAGE_SIZE_8KB
is disabled.

> +config CACHE_L2
> +   bool "Support L2 cache"
> +default y
> +   help
> + Say Y here to enable L2 cache if your SoC are integrated with L2CC.
> + If unsure, say N.
> +
> +menu "Memory configuration"
> +
> +choice
> +   prompt "Memory split"
> +   depends on MMU
> +   default VMSPLIT_3G

Why not default to VMSPLIT_3G_OPT?

Arnd

Re: [PATCH v6 31/36] dt-bindings: nds32 CPU Bindings

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch adds nds32 CPU binding documents.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Rick Chen 
> Signed-off-by: Zong Li 
> Signed-off-by: Greentime Hu 
> Reviewed-by: Rob Herring 
> ---
>  Documentation/devicetree/bindings/nds32/cpus.txt |   37 
> ++
>  1 file changed, 37 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/nds32/cpus.txt
>
> diff --git a/Documentation/devicetree/bindings/nds32/cpus.txt 
> b/Documentation/devicetree/bindings/nds32/cpus.txt
> new file mode 100644
> index 000..9a52937
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/nds32/cpus.txt
> @@ -0,0 +1,37 @@
> +* Andestech Processor Binding
> +
> +This binding specifies what properties must be available in the device tree
> +representation of a Andestech Processor Core, which is the root node in the
> +tree.
> +
> +Required properties:
> +
> +   - compatible:
> +   Usage: required
> +   Value type: 
> +   Definition: should be one of:
> +   "andestech,n13"
> +   "andestech,n15"
> +   "andestech,d15"
> +   "andestech,n10"
> +   "andestech,d10"
> +   "andestech,nds32v3"

Based on https://lkml.org/lkml/2017/11/27/1290, this should say that
the device tree should always list 'andestech,nds32v3' as the most
generic 'compatible' value and list exactly one of the others in
addition.

   Arnd

Re: [PATCH v6 33/36] dt-bindings: nds32 SoC Bindings

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch adds nds32 SoC(AE3XX and AG101P) binding documents.
>
> Signed-off-by: Greentime Hu 
> Reviewed-by: Rob Herring 

Acked-by: Arnd Bergmann

Re: [PATCH v6 36/36] net: faraday add nds32 support.

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:53 AM, Greentime Hu  wrote:
> From: Greentime Hu 
>
> This patch is used to support nds32 architecture to use these faraday
> mac IP.
>
> Signed-off-by: Greentime Hu 

Acked-by: Arnd Bergmann

Re: [PATCH RFC net-next 1/6] sock: MSG_PEEK support for sk_error_queue

2018-01-18 Thread Sowmini Varadhan

On (01/17/18 18:50), Willem de Bruijn wrote:
> 
> This can cause reordering with parallel readers. Can we avoid the need
> for peeking? It also caused a slew of subtle bugs previously.

Yes, I did notice the potential for re-ordering when writing the patch..
but these are not actuallly messages from the wire, so is re-ordering 
fatal?

In general, I"m not particularly attached to this solution- in my
testing, I'm seeing that it's possible to reduce the latency  and still
take a hit on the throughput if the application does not reap the
completion notifciation (and send out new data) efficiently

Some (radically differnt) alternatives that were suggested to me 

- send up all the cookies as ancillary data with recvmsg (i.e., send
  it as a cmsgdata along with actual data from the wire). In most
  cases, the application has data to read, anyway. If it doesnt (pure
  sender), we could wake up recvmsg with 0 bytes of data, but with
  the cookie info in the ancillary data. This feels not-so-elegant
  to me, but I suppose it would have the benefit of optimizing on
  the syscall overhead.. (and you could use MSG_CTRUNC to handle
  the case of insuufficient bufffer for cookies, sending the rest
  on the next call)..
- allow application to use a setsockopt on the rds socket, with
  some shmem region, into which the kernel could write the cookies,
  Let application reap cookies without syscall overhead from that
  shmem region..

> How about just define a max number of cookies and require the caller
> to always read with sufficient room to hold them?

This may be "good enough" as well, maybe allow a max of (say) 16 cookies,
and set up  the skb's in the error queue to send up batches of 16 cookies
at a time?

--Sowmini

Re: [PATCH v6 3/3] dt-bindings: timer: Add andestech atcpit100 timer binding doc

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:57 AM, Greentime Hu  wrote:
> From: Rick Chen 
>
> Add a document to describe Andestech atcpit100 timer and
> binding information.
>
> Signed-off-by: Rick Chen 
> Signed-off-by: Greentime Hu 
> Acked-by: Rob Herring 

Acked-by: Arnd Bergmann

Re: [PATCH v6 2/3] clocksource/drivers/atcpit100: VDSO support

2018-01-18 Thread Arnd Bergmann

On Mon, Jan 15, 2018 at 6:57 AM, Greentime Hu  wrote:
> From: Rick Chen 
>
> VDSO needs real-time cycle count to ensure the time accuracy.
> Unlike others, nds32 architecture does not define clock source,
> hence VDSO needs atcpit100 offering real-time cycle count
> to derive the correct time.
>
> Signed-off-by: Vincent Chen 
> Signed-off-by: Rick Chen 
> Signed-off-by: Greentime Hu 

I'm a bit puzzled by this patch, can you explain how the vdso actually
manages to access the clock hardware? It looks like you make the
physical address and the register offset available to user space, but
how does it end up accessing it?

  Arnd

Re: [PATCH RFC net-next 5/6] rds: support for zcopy completion notification

2018-01-18 Thread Sowmini Varadhan

On (01/17/18 19:23), Willem de Bruijn wrote:
> > +static void rds_rm_zerocopy_callback(struct rds_sock *rs)
 :
> > +   skb = alloc_skb(ncookies * sizeof(u32), GFP_ATOMIC);
> 
> TCP zerocopy avoids this issue by allocating the notification skb when the
> zerocopy packet is created, which would be rds_message_copy_from_user.

right, I could allocate the skb when we set up the zcopy data strucutres.

> This does not add an allocation, if using the same trick of stashing
> the intermediate notification object (op_mmp_znotifier) in skb->cb. Though,
> alloc_skb is probably more expensive than that kzalloc. If nothing else,
> because of more zeroing.

I would have liked to reuse skb->cb, but had to fall back to the alloc_skb
because of the attempt to fall back to flexibly numbered (and sized) cookie
notifications.

If we choose the "max number of cookies" option discussed in the previous 
thread, I think I should be able to do this with the existing 48 byte sized 
cb and pack in 8 32 bit cookies after the sock_extended_err.. maybe
that's sufficient.

> > +   serr = SKB_EXT_ERR(skb);
> > +   serr->ee.ee_errno = 0;
> > +   serr->ee.ee_origin = SO_EE_ORIGIN_ZEROCOPY;
> > +   serr->ee.ee_data = ncookies;
> 
> This changes the semantics of these fields. Please add a new SO_EE_CODE flag,
> even if the semantics can be derived from the packet family (for now).

sounds good, I'll take care of this (and other review comments)
for the next rev.

> Even better would be if we can avoid the cookies completely. I understand
> the issue with concurrent send threads racing on obtaining a zckey value. If
> the sender could learn which zckey was chosen for a call, would that suffice?

I'm not sure I understand the proposal- you want sendmsg to return
the cookie ("zckey") for you? How?

even if we mangled sendmsg() to return something other than
the existing POSIX semantics, how will the application be asynchronously
notified that send has completed (on a per-message/per-datagram) basis?

> I suspect that in even with concurrent senders, notifications arrive
> largely in
> order, in which case we could just maintain the existing semantics and even
> reuse that implementation.

not so. rds-stress [1] with -d8 -t8 quickly disproves this on my 10G ixgbe
connection. 

When you have multiple threads writing to a socket, you cannot know
what was the "order of send", unless you bottleneck all the threads
to go through a single-point-of-send.  rds-stress is an example of this
sort of usage (fairly typical in our applications)

[1] 
http://public-yum.oracle.com/repo/OracleLinux/OL6/ofed_UEK/x86_64//getPackageSource/rds-tools-2.0.7-1.12.el6.src.rpm

Consider 2 RDS sockets fd1 and fd2, each one sending to the
same pair of IP addresses: if fd1 gets bound to tcp sock1, fd2 to tcp sock2,
it's very possible that the send completion is not in the same order
as the order of send. The application cannot know which socket gets
bound to which TCP connection (or even whether/how-many tcp connections
are involved: mprds strives to make this transparent to the application)

Same problem exists for pure-datagram sockets like PF_PACKET, UDP etc.
application may send buf1 (datagram1) and buf2 (datagram2), and buf2
may make it to the destination before buf1.  The "notifications 
arrive largely in order" may be mostly true about a single-stream TCP
connection but not for the datagram models (or even threaded tcp apps
like iperf?)..

> > +   tail = skb_peek_tail(q);
> > +   if (!tail ||
> > +   SKB_EXT_ERR(tail)->ee.ee_origin != SO_EE_ORIGIN_ZEROCOPY)  {
> > +   __skb_queue_tail(q, skb);
> > +   skb = NULL;
> > +   }
> 
> This drops the packet if the branch is not taken. In the TCP case
> this condition
> means that we can try to coalesce packets, but that is not the case here.

good point, I'll check into this and fix (same applies for the comments
to patches 4/6 and 6/6)

> > } rdma;
> > struct rm_data_op {
> > unsigned intop_active:1;
> > -   unsigned intop_notify:1;
> > +   unsigned intop_notify:1,
> > +   op_zcopy:1,
 :
> > +   struct rds_znotifier*op_mmp_znotifier;
> 
> not necessary if op_mmp_znotifier is NULL unless set in
> rds_message_copy_from_user

To make sure I dont misunderstand, you are suggesting that we dont
need op_zcopy, but can just check for the null-ness of
op_mmp_znotifier (yes, true, I agree)? or something else?

--Sowmini

[patch net-next] mlxsw: spectrum: Upper-bound supported FW version

2018-01-18 Thread Jiri Pirko

From: Yuval Mintz 

During initialization the driver checks whether the flashed FW image
suits its requirements by checking that it's sufficiently new.
However, there's only a weak backward compatibility scheme that is
actually guaranteed by the FW, so driver must also upper bound the
version to prevent compatibility issues between current driver and some
possible future fw.

Signed-off-by: Yuval Mintz 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 30 ++
 1 file changed, 11 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index bbe48917dcad..7e2b552c2237 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -76,12 +76,7 @@
 #define MLXSW_FWREV_MAJOR 13
 #define MLXSW_FWREV_MINOR 1530
 #define MLXSW_FWREV_SUBMINOR 152
-
-static const struct mlxsw_fw_rev mlxsw_sp_supported_fw_rev = {
-   .major = MLXSW_FWREV_MAJOR,
-   .minor = MLXSW_FWREV_MINOR,
-   .subminor = MLXSW_FWREV_SUBMINOR
-};
+#define MLXSW_FWREV_MINOR_TO_BRANCH(minor) ((minor) / 100)
 
 #define MLXSW_SP_FW_FILENAME \
"mellanox/mlxsw_spectrum-" __stringify(MLXSW_FWREV_MAJOR) \
@@ -339,28 +334,25 @@ static int mlxsw_sp_firmware_flash(struct mlxsw_sp 
*mlxsw_sp,
return mlxfw_firmware_flash(&mlxsw_sp_mlxfw_dev.mlxfw_dev, firmware);
 }
 
-static bool mlxsw_sp_fw_rev_ge(const struct mlxsw_fw_rev *a,
-  const struct mlxsw_fw_rev *b)
-{
-   if (a->major != b->major)
-   return a->major > b->major;
-   if (a->minor != b->minor)
-   return a->minor > b->minor;
-   return a->subminor >= b->subminor;
-}
-
 static int mlxsw_sp_fw_rev_validate(struct mlxsw_sp *mlxsw_sp)
 {
const struct mlxsw_fw_rev *rev = &mlxsw_sp->bus_info->fw_rev;
const struct firmware *firmware;
int err;
 
-   if (mlxsw_sp_fw_rev_ge(rev, &mlxsw_sp_supported_fw_rev))
+   /* Validate driver & FW are compatible */
+   if (rev->major != MLXSW_FWREV_MAJOR) {
+   WARN(1, "Mismatch in major FW version [%d:%d] is never 
expected; Please contact support\n",
+rev->major, MLXSW_FWREV_MAJOR);
+   return -EINVAL;
+   }
+   if (MLXSW_FWREV_MINOR_TO_BRANCH(rev->minor) ==
+   MLXSW_FWREV_MINOR_TO_BRANCH(MLXSW_FWREV_MINOR))
return 0;
 
-   dev_info(mlxsw_sp->bus_info->dev, "The firmware version %d.%d.%d is out 
of date\n",
+   dev_info(mlxsw_sp->bus_info->dev, "The firmware version %d.%d.%d is 
incompatible with the driver\n",
 rev->major, rev->minor, rev->subminor);
-   dev_info(mlxsw_sp->bus_info->dev, "Upgrading firmware using file %s\n",
+   dev_info(mlxsw_sp->bus_info->dev, "Flashing firmware using file %s\n",
 MLXSW_SP_FW_FILENAME);
 
err = request_firmware_direct(&firmware, MLXSW_SP_FW_FILENAME,
-- 
2.14.3

Re: [PATCH 1/3 net] ibmvnic: Modify buffer size on failover

2018-01-18 Thread kbuild test robot

Hi John,

I love your patch! Yet something to improve:

[auto build test ERROR on net/master]

url:
https://github.com/0day-ci/linux/commits/John-Allen/ibmvnic-Modify-buffer-size-on-failover/20180118-190528
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   drivers/net//ethernet/ibm/ibmvnic.c: In function 'reset_rx_pools':
>> drivers/net//ethernet/ibm/ibmvnic.c:414:2: error: 'size_array' undeclared 
>> (first use in this function); did you mean 'sem_array'?
 size_array = (u64 *)((u8 *)(adapter->login_rsp_buf) +
 ^~
 sem_array
   drivers/net//ethernet/ibm/ibmvnic.c:414:2: note: each undeclared identifier 
is reported only once for each function it appears in

vim +414 drivers/net//ethernet/ibm/ibmvnic.c

   407  
   408  static int reset_rx_pools(struct ibmvnic_adapter *adapter)
   409  {
   410  struct ibmvnic_rx_pool *rx_pool;
   411  int rx_scrqs;
   412  int i, j, rc;
   413  
 > 414  size_array = (u64 *)((u8 *)(adapter->login_rsp_buf) +
   415  
be32_to_cpu(adapter->login_rsp_buf->off_rxadd_buff_size));
   416  
   417  rx_scrqs = 
be32_to_cpu(adapter->login_rsp_buf->num_rxadd_subcrqs);
   418  for (i = 0; i < rx_scrqs; i++) {
   419  rx_pool = &adapter->rx_pool[i];
   420  
   421  netdev_dbg(adapter->netdev, "Re-setting rx_pool[%d]\n", 
i);
   422  
   423  if (rx_pool->buff_size != be64_to_cpu(size_array[i])) {
   424  rx_pool->buff_size = be64_to_cpu(size_array[i]);
   425  rc = alloc_long_term_buff(adapter,
   426
&rx_pool->long_term_buff,
   427rx_pool->size *
   428rx_pool->buff_size);
   429  } else {
   430  rc = reset_long_term_buff(adapter,
   431
&rx_pool->long_term_buff);
   432  }
   433  if (rc)
   434  return rc;
   435  
   436  for (j = 0; j < rx_pool->size; j++)
   437  rx_pool->free_map[j] = j;
   438  
   439  memset(rx_pool->rx_buff, 0,
   440 rx_pool->size * sizeof(struct ibmvnic_rx_buff));
   441  
   442  atomic_set(&rx_pool->available, 0);
   443  rx_pool->next_alloc = 0;
   444  rx_pool->next_free = 0;
   445  rx_pool->active = 1;
   446  }
   447  
   448  return 0;
   449  }
   450  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

[net-next: PATCH v4 6/7] net: mvpp2: use device_/fwnode_ APIs instead of of_*

2018-01-18 Thread Marcin Wojtas

OF functions can be used only for the driver using DT.
As a preparation for introducing ACPI support in mvpp2
driver, use struct fwnode_handle in order to obtain
properties from the hardware description.

This patch replaces of_* function with device_*/fwnode_*
where possible in the mvpp2.

Signed-off-by: Marcin Wojtas 
---
 drivers/net/ethernet/marvell/mvpp2.c | 45 +++-
 1 file changed, 24 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
b/drivers/net/ethernet/marvell/mvpp2.c
index 7f42d90..f16448e 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -932,6 +932,9 @@ struct mvpp2_port {
 
struct mvpp2 *priv;
 
+   /* Firmware node associated to the port */
+   struct fwnode_handle *fwnode;
+
/* Per-port registers' base address */
void __iomem *base;
void __iomem *stats_base;
@@ -7711,17 +7714,16 @@ static bool mvpp2_port_has_tx_irqs(struct mvpp2 *priv,
 }
 
 static void mvpp2_port_copy_mac_addr(struct net_device *dev, struct mvpp2 
*priv,
-struct device_node *port_node,
+struct fwnode_handle *fwnode,
 char **mac_from)
 {
struct mvpp2_port *port = netdev_priv(dev);
char hw_mac_addr[ETH_ALEN] = {0};
-   const char *dt_mac_addr;
+   char fw_mac_addr[ETH_ALEN];
 
-   dt_mac_addr = of_get_mac_address(port_node);
-   if (dt_mac_addr && is_valid_ether_addr(dt_mac_addr)) {
-   *mac_from = "device tree";
-   ether_addr_copy(dev->dev_addr, dt_mac_addr);
+   if (fwnode_get_mac_address(fwnode, fw_mac_addr, ETH_ALEN)) {
+   *mac_from = "firmware node";
+   ether_addr_copy(dev->dev_addr, fw_mac_addr);
return;
}
 
@@ -7740,13 +7742,14 @@ static void mvpp2_port_copy_mac_addr(struct net_device 
*dev, struct mvpp2 *priv,
 
 /* Ports initialization */
 static int mvpp2_port_probe(struct platform_device *pdev,
-   struct device_node *port_node,
+   struct fwnode_handle *port_fwnode,
struct mvpp2 *priv)
 {
struct device_node *phy_node;
struct phy *comphy;
struct mvpp2_port *port;
struct mvpp2_port_pcpu *port_pcpu;
+   struct device_node *port_node = to_of_node(port_fwnode);
struct net_device *dev;
struct resource *res;
char *mac_from = "";
@@ -7773,7 +7776,7 @@ static int mvpp2_port_probe(struct platform_device *pdev,
return -ENOMEM;
 
phy_node = of_parse_phandle(port_node, "phy", 0);
-   phy_mode = of_get_phy_mode(port_node);
+   phy_mode = fwnode_get_phy_mode(port_fwnode);
if (phy_mode < 0) {
dev_err(&pdev->dev, "incorrect phy mode\n");
err = phy_mode;
@@ -7789,7 +7792,7 @@ static int mvpp2_port_probe(struct platform_device *pdev,
comphy = NULL;
}
 
-   if (of_property_read_u32(port_node, "port-id", &id)) {
+   if (fwnode_property_read_u32(port_fwnode, "port-id", &id)) {
err = -EINVAL;
dev_err(&pdev->dev, "missing port-id value\n");
goto err_free_netdev;
@@ -7820,7 +7823,7 @@ static int mvpp2_port_probe(struct platform_device *pdev,
/* the link irq is optional */
port->link_irq = 0;
 
-   if (of_property_read_bool(port_node, "marvell,loopback"))
+   if (fwnode_property_read_bool(port_fwnode, "marvell,loopback"))
port->flags |= MVPP2_F_LOOPBACK;
 
port->id = id;
@@ -7845,8 +7848,8 @@ static int mvpp2_port_probe(struct platform_device *pdev,
   MVPP21_MIB_COUNTERS_OFFSET +
   port->gop_id * MVPP21_MIB_COUNTERS_PORT_SZ;
} else {
-   if (of_property_read_u32(port_node, "gop-port-id",
-&port->gop_id)) {
+   if (fwnode_property_read_u32(port_fwnode, "gop-port-id",
+&port->gop_id)) {
err = -EINVAL;
dev_err(&pdev->dev, "missing gop-port-id value\n");
goto err_deinit_qvecs;
@@ -7876,7 +7879,7 @@ static int mvpp2_port_probe(struct platform_device *pdev,
mutex_init(&port->gather_stats_lock);
INIT_DELAYED_WORK(&port->stats_work, mvpp2_gather_hw_statistics);
 
-   mvpp2_port_copy_mac_addr(dev, priv, port_node, &mac_from);
+   mvpp2_port_copy_mac_addr(dev, priv, port_fwnode, &mac_from);
 
port->tx_ring_size = MVPP2_MAX_TXD_DFLT;
port->rx_ring_size = MVPP2_MAX_RXD_DFLT;
@@ -8194,8 +8197,8 @@ static int mvpp2_init(struct platform_device *pdev, 
struct mvpp2 *priv)
 
 static int mvpp2_probe(struct platform_device *pdev)
 {
-   struct device_node *dn = pdev->dev.o

[net-next: PATCH v4 7/7] net: mvpp2: enable ACPI support in the driver

2018-01-18 Thread Marcin Wojtas

This patch introduces an alternative way of obtaining resources - via
ACPI tables provided by firmware. Enabling coexistence with the DT
support, in addition to the OF_*->device_*/fwnode_* API replacement,
required following steps to be taken:

* Add mvpp2_acpi_match table
* Omit clock configuration and obtain tclk from the property - in ACPI
  world, the firmware is responsible for clock maintenance.
* Disable comphy and syscon handling as they are not available for ACPI.
* Modify way of obtaining interrupts - use newly introduced
  fwnode_irq_get() routine
* Until proper MDIO bus and PHY handling with ACPI is established in the
  kernel, use only link interrupts feature in the driver. For the RGMII
  port it results in depending on GMAC settings done during firmware
  stage.
* When booting with ACPI MVPP2_QDIST_MULTI_MODE is picked by
  default, as there is no need to keep any kind of the backward
  compatibility.

Moreover, a memory region used by mvmdio driver is usually placed in
the middle of the address space of the PP2 network controller.
The MDIO base address is obtained without requesting memory region
(by devm_ioremap() call) in mvmdio.c, later overlapping resources are
requested by the network driver, which is responsible for avoiding
a concurrent access.

In case the MDIO memory region is declared in the ACPI, it can
already appear as 'in-use' in the OS. Because it is overlapped by second
region of the network controller, make sure it is released, before
requesting it again. The care is taken by mvpp2 driver to avoid
concurrent access to this memory region.

Signed-off-by: Marcin Wojtas 
---
 drivers/net/ethernet/marvell/mvpp2.c | 133 ++--
 1 file changed, 94 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
b/drivers/net/ethernet/marvell/mvpp2.c
index f16448e..a1d7b88 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -10,6 +10,7 @@
  * warranty of any kind, whether express or implied.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -7502,7 +7503,10 @@ static int mvpp2_multi_queue_vectors_init(struct 
mvpp2_port *port,
strncpy(irqname, "rx-shared", sizeof(irqname));
}
 
-   v->irq = of_irq_get_byname(port_node, irqname);
+   if (port_node)
+   v->irq = of_irq_get_byname(port_node, irqname);
+   else
+   v->irq = fwnode_irq_get(port->fwnode, i);
if (v->irq <= 0) {
ret = -EINVAL;
goto err;
@@ -7746,7 +7750,7 @@ static int mvpp2_port_probe(struct platform_device *pdev,
struct mvpp2 *priv)
 {
struct device_node *phy_node;
-   struct phy *comphy;
+   struct phy *comphy = NULL;
struct mvpp2_port *port;
struct mvpp2_port_pcpu *port_pcpu;
struct device_node *port_node = to_of_node(port_fwnode);
@@ -7760,7 +7764,12 @@ static int mvpp2_port_probe(struct platform_device *pdev,
int phy_mode;
int err, i, cpu;
 
-   has_tx_irqs = mvpp2_port_has_tx_irqs(priv, port_node);
+   if (port_node) {
+   has_tx_irqs = mvpp2_port_has_tx_irqs(priv, port_node);
+   } else {
+   has_tx_irqs = true;
+   queue_mode = MVPP2_QDIST_MULTI_MODE;
+   }
 
if (!has_tx_irqs)
queue_mode = MVPP2_QDIST_SINGLE_MODE;
@@ -7775,7 +7784,11 @@ static int mvpp2_port_probe(struct platform_device *pdev,
if (!dev)
return -ENOMEM;
 
-   phy_node = of_parse_phandle(port_node, "phy", 0);
+   if (port_node)
+   phy_node = of_parse_phandle(port_node, "phy", 0);
+   else
+   phy_node = NULL;
+
phy_mode = fwnode_get_phy_mode(port_fwnode);
if (phy_mode < 0) {
dev_err(&pdev->dev, "incorrect phy mode\n");
@@ -7783,13 +7796,15 @@ static int mvpp2_port_probe(struct platform_device 
*pdev,
goto err_free_netdev;
}
 
-   comphy = devm_of_phy_get(&pdev->dev, port_node, NULL);
-   if (IS_ERR(comphy)) {
-   if (PTR_ERR(comphy) == -EPROBE_DEFER) {
-   err = -EPROBE_DEFER;
-   goto err_free_netdev;
+   if (port_node) {
+   comphy = devm_of_phy_get(&pdev->dev, port_node, NULL);
+   if (IS_ERR(comphy)) {
+   if (PTR_ERR(comphy) == -EPROBE_DEFER) {
+   err = -EPROBE_DEFER;
+   goto err_free_netdev;
+   }
+   comphy = NULL;
}
-   comphy = NULL;
}
 
if (fwnode_property_read_u32(port_fwnode, "port-id", &id)) {
@@ -7805,6 +7820,7 @@ static int mvpp2_port_probe(struct platform_device *pdev,
 
port = netdev_priv(dev);
port->dev = dev;
+   port->fwnode = port_fwnod

[net-next: PATCH v4 0/7] Armada 7k/8k PP2 ACPI support

2018-01-18 Thread Marcin Wojtas

Hi,

I quickly resend the series, thanks to Antoine Tenart's remark,
who spotted !CONFIG_ACPI compilation issue after introducing
the new fwnode_irq_get() routine. Please see the details in the changelog
below and the 3/7 commit log.

mvpp2 driver can work with the ACPI representation, as exposed
on a public branch:
https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/commits/marvell-armada-wip
It was compiled together with the most recent Tianocore EDK2 revision.
Please refer to the firmware build instruction on MacchiatoBin board:
http://wiki.macchiatobin.net/tiki-index.php?page=Build+from+source+-+UEFI+EDK+II

ACPI representation of PP2 controllers (withouth PHY support) can
be viewed in the github:
* MacchiatoBin:
https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/blob/71ae395da1661374b0f07d1602afb1eee56e9794/Platforms/Marvell/Armada/AcpiTables/Armada80x0McBin/Dsdt.asl#L201

* Armada 7040 DB:
https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/blob/71ae395da1661374b0f07d1602afb1eee56e9794/Platforms/Marvell/Armada/AcpiTables/Armada70x0/Dsdt.asl#L131

I will appreciate any comments or remarks.

Best regards,
Marcin

Changelog:
v3 -> v4:
* 3/7
- add new macro (ACPI_HANDLE_FWNODE) and fix
  compilation with !CONFIG_ACPI
- extend commit log and mention usability of fwnode_irq_get
  for the child nodes as well

v2 -> v3:
* 1/7, 2/7
- Add Rafael's Acked-by's
* 3/7, 4/7
- New patches
* 6/7, 7/7
- Update driver with new helper routines usage
- Improve commit log.

v1 -> v2:
* Remove MDIO patches
* Use PP2 ports only with link interrupts
* Release second region resources in mvpp2 driver (code moved from
  mvmdio), as explained in details in 5/5 commit message.

Marcin Wojtas (7):
  device property: Introduce fwnode_get_mac_address()
  device property: Introduce fwnode_get_phy_mode()
  device property: Introduce fwnode_irq_get()
  device property: Allow iterating over available child fwnodes
  net: mvpp2: simplify maintaining enabled ports' list
  net: mvpp2: use device_*/fwnode_* APIs instead of of_*
  net: mvpp2: enable ACPI support in the driver

 drivers/base/property.c  | 104 --
 drivers/net/ethernet/marvell/mvpp2.c | 206 
 include/linux/acpi.h |   3 +
 include/linux/property.h |  11 ++
 4 files changed, 232 insertions(+), 92 deletions(-)

-- 
2.7.4

[net-next: PATCH v4 3/7] device property: Introduce fwnode_irq_get()

2018-01-18 Thread Marcin Wojtas

Until now there were two very similar functions allowing
to get Linux IRQ number from ACPI handle (acpi_irq_get())
and OF node (of_irq_get()). The first one appeared to be used
only as a subroutine of platform_irq_get(), which (in the generic
code) limited IRQ obtaining from _CRS method only to nodes
associated to kernel's struct platform_device.

This patch introduces a new helper routine - fwnode_irq_get(),
which allows to get the IRQ number directly from the fwnode
to be used as common for OF/ACPI worlds. It is usable not
only for the parents fwnodes, but also for the child nodes
comprising their own _CRS methods with interrupts description.

In order to be able o satisfy compilation with !CONFIG_ACPI
and also simplify the new code, introduce a helper macro
(ACPI_HANDLE_FWNODE), with which it is possible to reach
an ACPI handle directly from its fwnode.

Signed-off-by: Marcin Wojtas 
---
 drivers/base/property.c  | 26 
 include/linux/acpi.h |  3 +++
 include/linux/property.h |  2 ++
 3 files changed, 31 insertions(+)

diff --git a/drivers/base/property.c b/drivers/base/property.c
index 7c4a53d..1d6c9d9 100644
--- a/drivers/base/property.c
+++ b/drivers/base/property.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1230,6 +1231,31 @@ void *device_get_mac_address(struct device *dev, char 
*addr, int alen)
 EXPORT_SYMBOL(device_get_mac_address);
 
 /**
+ * fwnode_irq_get - Get IRQ directly from a fwnode
+ * @fwnode:Pointer to the firmware node
+ * @index: Zero-based index of the IRQ
+ *
+ * Returns Linux IRQ number on success. Other values are determined
+ * accordingly to acpi_/of_ irq_get() operation.
+ */
+int fwnode_irq_get(struct fwnode_handle *fwnode, unsigned int index)
+{
+   struct device_node *of_node = to_of_node(fwnode);
+   struct resource res;
+   int ret;
+
+   if (IS_ENABLED(CONFIG_OF) && of_node)
+   return of_irq_get(of_node, index);
+
+   ret = acpi_irq_get(ACPI_HANDLE_FWNODE(fwnode), index, &res);
+   if (ret)
+   return ret;
+
+   return res.start;
+}
+EXPORT_SYMBOL(fwnode_irq_get);
+
+/**
  * device_graph_get_next_endpoint - Get next endpoint firmware node
  * @fwnode: Pointer to the parent firmware node
  * @prev: Previous endpoint node or %NULL to get the first
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index dc1ebfe..f05b9b6 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -56,6 +56,8 @@ static inline acpi_handle acpi_device_handle(struct 
acpi_device *adev)
 #define ACPI_COMPANION_SET(dev, adev)  set_primary_fwnode(dev, (adev) ? \
acpi_fwnode_handle(adev) : NULL)
 #define ACPI_HANDLE(dev)   acpi_device_handle(ACPI_COMPANION(dev))
+#define ACPI_HANDLE_FWNODE(fwnode) \
+   acpi_device_handle(to_acpi_device_node(fwnode))
 
 static inline struct fwnode_handle *acpi_alloc_fwnode_static(void)
 {
@@ -626,6 +628,7 @@ int acpi_arch_timer_mem_init(struct arch_timer_mem 
*timer_mem, int *timer_count)
 #define ACPI_COMPANION(dev)(NULL)
 #define ACPI_COMPANION_SET(dev, adev)  do { } while (0)
 #define ACPI_HANDLE(dev)   (NULL)
+#define ACPI_HANDLE_FWNODE(fwnode) (NULL)
 #define ACPI_DEVICE_CLASS(_cls, _msk)  .cls = (0), .cls_msk = (0),
 
 struct fwnode_handle;
diff --git a/include/linux/property.h b/include/linux/property.h
index 9b13332..e05889f 100644
--- a/include/linux/property.h
+++ b/include/linux/property.h
@@ -103,6 +103,8 @@ struct fwnode_handle *device_get_named_child_node(struct 
device *dev,
 struct fwnode_handle *fwnode_handle_get(struct fwnode_handle *fwnode);
 void fwnode_handle_put(struct fwnode_handle *fwnode);
 
+int fwnode_irq_get(struct fwnode_handle *fwnode, unsigned int index);
+
 unsigned int device_get_child_node_count(struct device *dev);
 
 static inline bool device_property_read_bool(struct device *dev,
-- 
2.7.4

[net-next: PATCH v4 5/7] net: mvpp2: simplify maintaining enabled ports' list

2018-01-18 Thread Marcin Wojtas

'port_count' field of the mvpp2 structure holds an overall amount
of available ports, based on DT nodes status. In order to be prepared
to support other HW description, obtain the value by incrementing it
upon each successful port initialization. This allowed for simplifying
port indexing in the controller's private array, whose size is now not
dynamically allocated, but fixed to MVPP2_MAX_PORTS.

This patch simplifies creating and filling list of enabled ports and
is a part of the preparation for adding ACPI support in the mvpp2 driver.

Signed-off-by: Marcin Wojtas 
---
 drivers/net/ethernet/marvell/mvpp2.c | 32 +++-
 1 file changed, 11 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
b/drivers/net/ethernet/marvell/mvpp2.c
index a197607..7f42d90 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -865,7 +865,7 @@ struct mvpp2 {
 
/* List of pointers to port structures */
int port_count;
-   struct mvpp2_port **port_list;
+   struct mvpp2_port *port_list[MVPP2_MAX_PORTS];
 
/* Aggregated TXQs */
struct mvpp2_tx_queue *aggr_txqs;
@@ -7741,7 +7741,7 @@ static void mvpp2_port_copy_mac_addr(struct net_device 
*dev, struct mvpp2 *priv,
 /* Ports initialization */
 static int mvpp2_port_probe(struct platform_device *pdev,
struct device_node *port_node,
-   struct mvpp2 *priv, int index)
+   struct mvpp2 *priv)
 {
struct device_node *phy_node;
struct phy *comphy;
@@ -7934,7 +7934,8 @@ static int mvpp2_port_probe(struct platform_device *pdev,
}
netdev_info(dev, "Using %s mac address %pM\n", mac_from, dev->dev_addr);
 
-   priv->port_list[index] = port;
+   priv->port_list[priv->port_count++] = port;
+
return 0;
 
 err_free_port_pcpu:
@@ -8313,28 +8314,17 @@ static int mvpp2_probe(struct platform_device *pdev)
goto err_mg_clk;
}
 
-   priv->port_count = of_get_available_child_count(dn);
-   if (priv->port_count == 0) {
-   dev_err(&pdev->dev, "no ports enabled\n");
-   err = -ENODEV;
-   goto err_mg_clk;
-   }
-
-   priv->port_list = devm_kcalloc(&pdev->dev, priv->port_count,
-  sizeof(*priv->port_list),
-  GFP_KERNEL);
-   if (!priv->port_list) {
-   err = -ENOMEM;
-   goto err_mg_clk;
-   }
-
/* Initialize ports */
-   i = 0;
for_each_available_child_of_node(dn, port_node) {
-   err = mvpp2_port_probe(pdev, port_node, priv, i);
+   err = mvpp2_port_probe(pdev, port_node, priv);
if (err < 0)
goto err_port_probe;
-   i++;
+   }
+
+   if (priv->port_count == 0) {
+   dev_err(&pdev->dev, "no ports enabled\n");
+   err = -ENODEV;
+   goto err_mg_clk;
}
 
/* Statistics must be gathered regularly because some of them (like
-- 
2.7.4

[net-next: PATCH v4 1/7] device property: Introduce fwnode_get_mac_address()

2018-01-18 Thread Marcin Wojtas

Until now there were two almost identical functions for
obtaining MAC address - of_get_mac_address() and, more generic,
device_get_mac_address(). However it is not uncommon,
that the network interface is represented as a child
of the actual controller, hence it is not associated
directly to any struct device, required by the latter
routine.

This commit allows for getting the MAC address for
children nodes in the ACPI world by introducing a new function -
fwnode_get_mac_address(). This commit also changes
device_get_mac_address() routine to be its wrapper, in order
to prevent unnecessary duplication.

Signed-off-by: Marcin Wojtas 
Acked-by: Rafael J. Wysocki 
---
 drivers/base/property.c  | 28 ++--
 include/linux/property.h |  2 ++
 2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/base/property.c b/drivers/base/property.c
index 851b1b6..f261d1a 100644
--- a/drivers/base/property.c
+++ b/drivers/base/property.c
@@ -1153,11 +1153,11 @@ int device_get_phy_mode(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(device_get_phy_mode);
 
-static void *device_get_mac_addr(struct device *dev,
+static void *fwnode_get_mac_addr(struct fwnode_handle *fwnode,
 const char *name, char *addr,
 int alen)
 {
-   int ret = device_property_read_u8_array(dev, name, addr, alen);
+   int ret = fwnode_property_read_u8_array(fwnode, name, addr, alen);
 
if (ret == 0 && alen == ETH_ALEN && is_valid_ether_addr(addr))
return addr;
@@ -1165,8 +1165,8 @@ static void *device_get_mac_addr(struct device *dev,
 }
 
 /**
- * device_get_mac_address - Get the MAC for a given device
- * @dev:   Pointer to the device
+ * fwnode_get_mac_address - Get the MAC from the firmware node
+ * @fwnode:Pointer to the firmware node
  * @addr:  Address of buffer to store the MAC in
  * @alen:  Length of the buffer pointed to by addr, should be ETH_ALEN
  *
@@ -1187,19 +1187,31 @@ static void *device_get_mac_addr(struct device *dev,
  * In this case, the real MAC is in 'local-mac-address', and 'mac-address'
  * exists but is all zeros.
 */
-void *device_get_mac_address(struct device *dev, char *addr, int alen)
+void *fwnode_get_mac_address(struct fwnode_handle *fwnode, char *addr, int 
alen)
 {
char *res;
 
-   res = device_get_mac_addr(dev, "mac-address", addr, alen);
+   res = fwnode_get_mac_addr(fwnode, "mac-address", addr, alen);
if (res)
return res;
 
-   res = device_get_mac_addr(dev, "local-mac-address", addr, alen);
+   res = fwnode_get_mac_addr(fwnode, "local-mac-address", addr, alen);
if (res)
return res;
 
-   return device_get_mac_addr(dev, "address", addr, alen);
+   return fwnode_get_mac_addr(fwnode, "address", addr, alen);
+}
+EXPORT_SYMBOL(fwnode_get_mac_address);
+
+/**
+ * device_get_mac_address - Get the MAC for a given device
+ * @dev:   Pointer to the device
+ * @addr:  Address of buffer to store the MAC in
+ * @alen:  Length of the buffer pointed to by addr, should be ETH_ALEN
+ */
+void *device_get_mac_address(struct device *dev, char *addr, int alen)
+{
+   return fwnode_get_mac_address(dev_fwnode(dev), addr, alen);
 }
 EXPORT_SYMBOL(device_get_mac_address);
 
diff --git a/include/linux/property.h b/include/linux/property.h
index f6189a3..35620e0 100644
--- a/include/linux/property.h
+++ b/include/linux/property.h
@@ -279,6 +279,8 @@ int device_get_phy_mode(struct device *dev);
 
 void *device_get_mac_address(struct device *dev, char *addr, int alen);
 
+void *fwnode_get_mac_address(struct fwnode_handle *fwnode,
+char *addr, int alen);
 struct fwnode_handle *fwnode_graph_get_next_endpoint(
const struct fwnode_handle *fwnode, struct fwnode_handle *prev);
 struct fwnode_handle *
-- 
2.7.4

[net-next: PATCH v4 2/7] device property: Introduce fwnode_get_phy_mode()

2018-01-18 Thread Marcin Wojtas

Until now there were two almost identical functions for
obtaining network PHY mode - of_get_phy_mode() and,
more generic, device_get_phy_mode(). However it is not uncommon,
that the network interface is represented as a child
of the actual controller, hence it is not associated
directly to any struct device, required by the latter
routine.

This commit allows for getting the PHY mode for
children nodes in the ACPI world by introducing a new function -
fwnode_get_phy_mode(). This commit also changes
device_get_phy_mode() routine to be its wrapper, in order
to prevent unnecessary duplication.

Signed-off-by: Marcin Wojtas 
Acked-by: Rafael J. Wysocki 
---
 drivers/base/property.c  | 24 
 include/linux/property.h |  1 +
 2 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/base/property.c b/drivers/base/property.c
index f261d1a..7c4a53d 100644
--- a/drivers/base/property.c
+++ b/drivers/base/property.c
@@ -1126,21 +1126,21 @@ enum dev_dma_attr device_get_dma_attr(struct device 
*dev)
 EXPORT_SYMBOL_GPL(device_get_dma_attr);
 
 /**
- * device_get_phy_mode - Get phy mode for given device
- * @dev:   Pointer to the given device
+ * fwnode_get_phy_mode - Get phy mode for given firmware node
+ * @fwnode:Pointer to the given node
  *
  * The function gets phy interface string from property 'phy-mode' or
  * 'phy-connection-type', and return its index in phy_modes table, or errno in
  * error case.
  */
-int device_get_phy_mode(struct device *dev)
+int fwnode_get_phy_mode(struct fwnode_handle *fwnode)
 {
const char *pm;
int err, i;
 
-   err = device_property_read_string(dev, "phy-mode", &pm);
+   err = fwnode_property_read_string(fwnode, "phy-mode", &pm);
if (err < 0)
-   err = device_property_read_string(dev,
+   err = fwnode_property_read_string(fwnode,
  "phy-connection-type", &pm);
if (err < 0)
return err;
@@ -1151,6 +1151,20 @@ int device_get_phy_mode(struct device *dev)
 
return -ENODEV;
 }
+EXPORT_SYMBOL_GPL(fwnode_get_phy_mode);
+
+/**
+ * device_get_phy_mode - Get phy mode for given device
+ * @dev:   Pointer to the given device
+ *
+ * The function gets phy interface string from property 'phy-mode' or
+ * 'phy-connection-type', and return its index in phy_modes table, or errno in
+ * error case.
+ */
+int device_get_phy_mode(struct device *dev)
+{
+   return fwnode_get_phy_mode(dev_fwnode(dev));
+}
 EXPORT_SYMBOL_GPL(device_get_phy_mode);
 
 static void *fwnode_get_mac_addr(struct fwnode_handle *fwnode,
diff --git a/include/linux/property.h b/include/linux/property.h
index 35620e0..9b13332 100644
--- a/include/linux/property.h
+++ b/include/linux/property.h
@@ -279,6 +279,7 @@ int device_get_phy_mode(struct device *dev);
 
 void *device_get_mac_address(struct device *dev, char *addr, int alen);
 
+int fwnode_get_phy_mode(struct fwnode_handle *fwnode);
 void *fwnode_get_mac_address(struct fwnode_handle *fwnode,
 char *addr, int alen);
 struct fwnode_handle *fwnode_graph_get_next_endpoint(
-- 
2.7.4

[net-next: PATCH v4 4/7] device property: Allow iterating over available child fwnodes

2018-01-18 Thread Marcin Wojtas

Implement a new helper function fwnode_get_next_available_child_node(),
which enables obtaining next enabled child fwnode, which
works on a similar basis to OF's of_get_next_available_child().

This commit also introduces a macro, thanks to which it is
possible to iterate over the available fwnodes, using the
new function described above.

Signed-off-by: Marcin Wojtas 
---
 drivers/base/property.c  | 26 
 include/linux/property.h |  6 +
 2 files changed, 32 insertions(+)

diff --git a/drivers/base/property.c b/drivers/base/property.c
index 1d6c9d9..613ba82 100644
--- a/drivers/base/property.c
+++ b/drivers/base/property.c
@@ -998,6 +998,32 @@ fwnode_get_next_child_node(const struct fwnode_handle 
*fwnode,
 EXPORT_SYMBOL_GPL(fwnode_get_next_child_node);
 
 /**
+ * fwnode_get_next_available_child_node - Return the next
+ * available child node handle for a node
+ * @fwnode: Firmware node to find the next child node for.
+ * @child: Handle to one of the node's child nodes or a %NULL handle.
+ */
+struct fwnode_handle *
+fwnode_get_next_available_child_node(const struct fwnode_handle *fwnode,
+struct fwnode_handle *child)
+{
+   struct fwnode_handle *next_child = child;
+
+   if (!fwnode)
+   return NULL;
+
+   do {
+   next_child = fwnode_get_next_child_node(fwnode, next_child);
+
+   if (!next_child || fwnode_device_is_available(next_child))
+   break;
+   } while (next_child);
+
+   return next_child;
+}
+EXPORT_SYMBOL_GPL(fwnode_get_next_available_child_node);
+
+/**
  * device_get_next_child_node - Return the next child node handle for a device
  * @dev: Device to find the next child node for.
  * @child: Handle to one of the device's child nodes or a null handle.
diff --git a/include/linux/property.h b/include/linux/property.h
index e05889f..5b0563a 100644
--- a/include/linux/property.h
+++ b/include/linux/property.h
@@ -83,11 +83,17 @@ struct fwnode_handle *fwnode_get_next_parent(
struct fwnode_handle *fwnode);
 struct fwnode_handle *fwnode_get_next_child_node(
const struct fwnode_handle *fwnode, struct fwnode_handle *child);
+struct fwnode_handle *fwnode_get_next_available_child_node(
+   const struct fwnode_handle *fwnode, struct fwnode_handle *child);
 
 #define fwnode_for_each_child_node(fwnode, child)  \
for (child = fwnode_get_next_child_node(fwnode, NULL); child;   \
 child = fwnode_get_next_child_node(fwnode, child))
 
+#define fwnode_for_each_available_child_node(fwnode, child)   \
+   for (child = fwnode_get_next_available_child_node(fwnode, NULL); child;\
+child = fwnode_get_next_available_child_node(fwnode, child))
+
 struct fwnode_handle *device_get_next_child_node(
struct device *dev, struct fwnode_handle *child);
 
-- 
2.7.4

Re: [net-next: PATCH 0/8] Armada 7k/8k PP2 ACPI support

2018-01-18 Thread Lorenzo Pieralisi

[+cc Mika]

On Tue, Jan 09, 2018 at 02:00:12PM +0100, Andrew Lunn wrote:
> On Tue, Jan 09, 2018 at 11:22:00AM +0100, Marcin Wojtas wrote:
> > 2018-01-09 11:19 GMT+01:00 Graeme Gregory :
> > > On Mon, Jan 08, 2018 at 06:17:06PM +0100, Marcin Wojtas wrote:
> > >> Hi Andrew,
> > >>
> > >>
> > >>
> > >> 2018-01-08 16:42 GMT+01:00 Andrew Lunn :
> > >> > w> I am not familiar with MDIO, but if its similar or a specific
> > >> >> implementation of a serial bus that does sound sane!
> > >> >
> > >>
> > >> Thanks for digging, I will check if and how we can use
> > >> GenericSerialBus with MDIO.
> > >>
> > > Maybe Lorenzo, Hanjun, Sudeep can comment here they might have come
> > > across similar on other ARM boards.
> > >
> > 
> > I'm looking forward to their feedback, however, what I've noticed,
> > each driver handles mdio/phys on its own, not using any generic
> > solution, which is what I need to actually avoid :)
> 
> Agreed. Lets define it once for all drivers using phylib/phylink.

To start with, I am not entirely familiar with MDIO, apologies in
advance. Building something on top of GenericSerialBus sounds correct
but if I am not mistaken you would need a new bus type in the ACPI
specs.

I CC'ed Mika since he is more familiar with handling these bits of ACPI
specs - I wonder whether this is a problem that cropped up on x86
systems too.

I do not think there is one and only answer but there must be a single
set of bindings and if the ACPI specs already cater for some of them
we have to reuse them.

Please take some time to ensure the solution you are pushing is widely
deployable.

Thanks,
Lorenzo

Re: [RFC PATCH net-next 00/12] selftests: forwarding: Add VRF-based tests

2018-01-18 Thread Andrew Lunn

> > > >>
> > > >>  br0
> > > >>   +
> > > >>vrf-h1 |   vrf-h2
> > > >>  ++---+++
> > > >>  ||||
> > > >> 192.0.2.1/24 ++++ 192.0.2.2/24
> > > >>swp1 swp2 swp3 swp4
> > > >>  ++++
> > > >>  ||||
> > > >>  ++++
> > > >>
> > 
> > > Agreed this is really cool! For DSA enabled switches, we usually have a
> > > host that does the test sequencing and then execute commands remotely on
> > > the DUT, but we might be able to get such a similar framework up and
> > > running on the DUT itself without too much hassle.
> > 
> > I think the problem we will have is a lack of ports. Most DSA switches
> > have 4 or 5 ports. Given the need for two ports per bridge port, we
> > will be limited to bridges with just two members. That really limits
> > what sort of tests you can do.
> 
> I was actually interested in feedback from you guys. Looking at
> dsa_slave_changeupper() I see you don't forbid the enslavement to a VRF
> and that you set STP state to forwarding when a port leaves a bridge
> (good). Does that mean you're able to use some of these tests on your
> switches?

By default, we have port separation. So you can add individual IP
addresses to interfaces, and ping from them. In fact, this is the
first test i always run. My test setup uses 4 ports of the switch, and
connects a USB-Ethernet dongle to each port. The 4 USB cables then
connect to my host running the tests. I setup four parallel subnets,
and ping across each.

I've never used vrf, so i've no idea if that will just work on top.

> The reason we can use these tests for mlxsw is that we support VRF and
> ACL offload. At least in the above example, swp4 is able to receive
> packets directed at 192.0.2.2 because we program the device with the
> host route 192.0.2.2/32.

We are all L2. So no route needed. No ACLs either.

So in theory, this could be used with DSA. But in practice, only
having two test ports really restricts you from doing anything
interesting. You need a minimum of 3, and all my tests use 4. Given
your double requirements, that means you actually need a switch with 8
external ports. I don't know of any DSA board which has that.

  Andrew

Re: [net-next: PATCH 0/8] Armada 7k/8k PP2 ACPI support

2018-01-18 Thread Andrew Lunn

> I CC'ed Mika since he is more familiar with handling these bits of ACPI
> specs - I wonder whether this is a problem that cropped up on x86
> systems too.

Hi Lorenzo

There is nothing about MDIO, PHYs, Ethernet switches, etc in version
6.2 of the spec. If x86 has this, it must be after 6.2 was released.
I would not be too surprised if x86 has none of this. If you look at
the typical drives used on x86, i210, e1000e, ixgb, r8169, etc. They
are all PCI devices, and hide all this.

> I do not think there is one and only answer but there must be a single
> set of bindings and if the ACPI specs already cater for some of them
> we have to reuse them.

Agreed. Due diligence so far suggests there is nothing already
defined. But im a newbie to ACPI, so could be looking in the wrong
place. I really hope there is somebody like Rob Herring, the DT
maintainer, who keeps an eye on all ACPI talk and would tell us if we
are heading off in the wrong direction.

Andrew

Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

2018-01-18 Thread Dmitry Vyukov

On Thu, Jan 18, 2018 at 2:09 AM, Theodore Ts'o  wrote:
> On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote:
>>
>> If syzkaller can only test one tree than linux-next should be the one.
>
> Well, there's been some controversy about that.  The problem is that
> it's often not clear if this is long-standing bug, or a bug which is
> in a particular subsystem tree --- and if so, *which* subsystem tree,
> etc.  So it gets blasted to linux-kernel, and to get_maintainer.pl,
> which is often not accurate --- since the location of the crash
> doesn't necessarily point out where the problem originated, and hence
> who should look at the syzbot report.  And so this has caused
> some irritation.

Re set of tested trees.

We now have an interesting spectrum of opinions.

Some assorted thoughts on this:

1. First, "upstream is clean" won't happen any time soon. There are
several reasons for this:
 - Currently syzkaller only tests a subset of subsystems that it knows
how to test, even the ones that it tests it tests poorly. Over time
it's improved to test most subsystems and existing subsystems better.
Just few weeks ago I've added some descriptions for crypto subsystem
and it uncovered 20+ old bugs.
 - syzkaller is guided, genetic fuzzer over time it leans how to do
more complex things by small steps. It takes time.
 - We have more bug detection tools coming: LEAKCHECK, KMSAN (uninit
memory), KTSAN (data races).
 - generic syzkaller smartness will be improved over time.
 - it will get more CPU resources.
Effect of all of these things is multiplicative: we test more code,
smarter, with more bug-detection tools, with more resources. So I
think we need to plan for a mix of old and new bugs for foreseeable
future.

2. get_maintainer.pl and mix of old and new bugs was mentioned as
harming attribution. I don't see what will change when/if we test only
upstream. Then the same mix of old/new bugs will be detected just on
upstream, with all of the same problems for old/new, maintainers,
which subsystem, etc. I think the amount of bugs in the kernel is
significant part of the problem, but the exact boundary where we
decide to start killing them won't affect number of bugs.

3. If we test only upstream, we increase chances of new security bugs
sinking into releases. We sure could raise perceived security value of
the bugs by keeping them private, letting them sink into release,
letting them sink into distros, and then reporting a high-profile
vulnerability. I think that's wrong. There is something broken with
value measuring in security community. Bug that is killed before
sinking into any release is the highest impact thing. As Alexei noted,
fixing bugs es early as possible also reduces fix costs, backporting
burden, etc. This also can eliminate need in bisection in some cases,
say if you accepted a large change to some files and a bunch of
crashes appears for these files on your tree soon, it's obvious what
happens.

4. It was mentioned that linux-next can have a broken slab allocator
and that will manifest as multiple random crashes. FWIW I don't
remember that I ever seen this. Yes, sometimes it does not build/boot,
but these builds are just rejected for testing.

I don't mind dropping linux-next specifically if that's the common
decision. However, (1) Alexei and Gruenter expressed opposite opinion,
(2) I don't see what it will change dramatically, (2) as far as I
understand Linus actually relies on linux-next giving some concrete
testing to the code there.
But I think that testing bpf-next is a positive thing provided that
there is explicit interest from maintainers. And note that that will
be testing targeted specifically at bpf subsystem, so that instance
will not generate bugs in SCSI, USB, etc (though it will cover a part
of net). Also note that the latest email format includes set of tree
where the crash happened, so if you see "upstream" or "upstream and
bpf-next", nothing really changes, you still know that it happens
upstream. Or if you see only "bpf-next", then you know that it's only
that tree.

[PATCH] can: m_can: mark runtime-PM handlers as __maybe_unused

2018-01-18 Thread Arnd Bergmann

Building without CONFIG_PM results in a harmless warning:

drivers/net/can/m_can/m_can.c:1763:12: error: 'm_can_runtime_resume' defined 
but not used [-Werror=unused-function]
drivers/net/can/m_can/m_can.c:1752:12: error: 'm_can_runtime_suspend' defined 
but not used [-Werror=unused-function]

Marking the functions as __maybe_unused lets the compiler
silently drop them instead.

Fixes: cdf8259d6573 ("can: m_can: Add PM Support")
Signed-off-by: Arnd Bergmann 
---
 drivers/net/can/m_can/m_can.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index 49ed8737871f..2594f7779c6f 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -1749,7 +1749,7 @@ static int m_can_plat_remove(struct platform_device *pdev)
return 0;
 }
 
-static int m_can_runtime_suspend(struct device *dev)
+static int __maybe_unused m_can_runtime_suspend(struct device *dev)
 {
struct net_device *ndev = dev_get_drvdata(dev);
struct m_can_priv *priv = netdev_priv(ndev);
@@ -1760,7 +1760,7 @@ static int m_can_runtime_suspend(struct device *dev)
return 0;
 }
 
-static int m_can_runtime_resume(struct device *dev)
+static int __maybe_unused m_can_runtime_resume(struct device *dev)
 {
struct net_device *ndev = dev_get_drvdata(dev);
struct m_can_priv *priv = netdev_priv(ndev);
-- 
2.9.0

Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

2018-01-18 Thread Dmitry Vyukov

On Thu, Jan 18, 2018 at 2:01 PM, Dmitry Vyukov  wrote:
> On Thu, Jan 18, 2018 at 2:09 AM, Theodore Ts'o  wrote:
>> On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote:
>>>
>>> If syzkaller can only test one tree than linux-next should be the one.
>>
>> Well, there's been some controversy about that.  The problem is that
>> it's often not clear if this is long-standing bug, or a bug which is
>> in a particular subsystem tree --- and if so, *which* subsystem tree,
>> etc.  So it gets blasted to linux-kernel, and to get_maintainer.pl,
>> which is often not accurate --- since the location of the crash
>> doesn't necessarily point out where the problem originated, and hence
>> who should look at the syzbot report.  And so this has caused
>> some irritation.
>
>
> Re set of tested trees.
>
> We now have an interesting spectrum of opinions.
>
> Some assorted thoughts on this:
>
> 1. First, "upstream is clean" won't happen any time soon. There are
> several reasons for this:
>  - Currently syzkaller only tests a subset of subsystems that it knows
> how to test, even the ones that it tests it tests poorly. Over time
> it's improved to test most subsystems and existing subsystems better.
> Just few weeks ago I've added some descriptions for crypto subsystem
> and it uncovered 20+ old bugs.

/\/\/\/\/\/\/\/\/\/\/\/\

While we are here, you can help syzkaller to test your subsystem
better (or at all). It frequently requires domain expertise which we
don't have for all kernel subsystems (sometimes we don't even know
they exist). It can be as simple as this (for /dev/ashmem):
https://github.com/google/syzkaller/blob/master/sys/linux/ashmem.txt


>  - syzkaller is guided, genetic fuzzer over time it leans how to do
> more complex things by small steps. It takes time.
>  - We have more bug detection tools coming: LEAKCHECK, KMSAN (uninit
> memory), KTSAN (data races).
>  - generic syzkaller smartness will be improved over time.
>  - it will get more CPU resources.
> Effect of all of these things is multiplicative: we test more code,
> smarter, with more bug-detection tools, with more resources. So I
> think we need to plan for a mix of old and new bugs for foreseeable
> future.
>
> 2. get_maintainer.pl and mix of old and new bugs was mentioned as
> harming attribution. I don't see what will change when/if we test only
> upstream. Then the same mix of old/new bugs will be detected just on
> upstream, with all of the same problems for old/new, maintainers,
> which subsystem, etc. I think the amount of bugs in the kernel is
> significant part of the problem, but the exact boundary where we
> decide to start killing them won't affect number of bugs.
>
> 3. If we test only upstream, we increase chances of new security bugs
> sinking into releases. We sure could raise perceived security value of
> the bugs by keeping them private, letting them sink into release,
> letting them sink into distros, and then reporting a high-profile
> vulnerability. I think that's wrong. There is something broken with
> value measuring in security community. Bug that is killed before
> sinking into any release is the highest impact thing. As Alexei noted,
> fixing bugs es early as possible also reduces fix costs, backporting
> burden, etc. This also can eliminate need in bisection in some cases,
> say if you accepted a large change to some files and a bunch of
> crashes appears for these files on your tree soon, it's obvious what
> happens.
>
> 4. It was mentioned that linux-next can have a broken slab allocator
> and that will manifest as multiple random crashes. FWIW I don't
> remember that I ever seen this. Yes, sometimes it does not build/boot,
> but these builds are just rejected for testing.
>
> I don't mind dropping linux-next specifically if that's the common
> decision. However, (1) Alexei and Gruenter expressed opposite opinion,
> (2) I don't see what it will change dramatically, (2) as far as I
> understand Linus actually relies on linux-next giving some concrete
> testing to the code there.
> But I think that testing bpf-next is a positive thing provided that
> there is explicit interest from maintainers. And note that that will
> be testing targeted specifically at bpf subsystem, so that instance
> will not generate bugs in SCSI, USB, etc (though it will cover a part
> of net). Also note that the latest email format includes set of tree
> where the crash happened, so if you see "upstream" or "upstream and
> bpf-next", nothing really changes, you still know that it happens
> upstream. Or if you see only "bpf-next", then you know that it's only
> that tree.

Re: [PATCH 0/3] Check gso_size of packets when forwarding

2018-01-18 Thread Daniel Axtens

Pravin Shelar  writes:

> On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens  wrote:
>> When regular packets are forwarded, we validate their size against the
>> MTU of the destination device. However, when GSO packets are
>> forwarded, we do not validate their size against the MTU. We
>> implicitly assume that when they are segmented, the resultant packets
>> will be correctly sized.
>>
>> This is not always the case.
>>
>> We observed a case where a packet received on an ibmveth device had a
>> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
>> device, where it caused a firmware assert. This is described in detail
>> at [0] and was the genesis of this series. Rather than fixing it in
>> the driver, this series fixes the forwarding path.
>>
> Are there any other possible forwarding path in networking stack? TC
> is one subsystem that could forward such a packet to the bnx2x device,
> how is that handled ?

So far I have only looked at bridges, openvswitch and macvlan. In
general, if the code uses dev_forward_skb() it should automatically be
fine as that invokes is_skb_forwardable(), which we patch.

There is potentially a forwarding path in macvlan that doesn't pass
through dev_forward_skb() (which calls is_skb_forwardable). I did post a
patch to this earlier, but it got somewhat derailed by how you would get
such a packet in to a macvlan device and whether that should be fixed
instead. Once we have some consensus on Jason's questions and the
overall approach is solid, I'm happy to work on that again.

To answer your specific question, I'm not aware of how tc can cause a
packet to be forwarded - if you can point me to the right general area I
can have a look.

Regards,
Daniel

>
>> To fix this:
>>
>>  - Move a helper in patch 1.
>>
>>  - Validate GSO segment lengths in is_skb_forwardable() in the GSO
>>case, rather than assuming all will be well. This fixes bridges.
>>This is patch 2.
>>
>>  - Open vSwitch uses its own slightly specialised algorithm for
>>checking lengths. Wire up checking for that in patch 3.
>>
>> [0]: https://patchwork.ozlabs.org/patch/859410/
>>
>> Cc: manish.cho...@cavium.com
>> Cc: d...@openvswitch.org
>>
>> Daniel Axtens (3):
>>   net: move skb_gso_mac_seglen to skbuff.h
>>   net: is_skb_forwardable: validate length of GSO packet segments
>>   openvswitch: drop GSO packets that are too large
>>
>>  include/linux/skbuff.h  | 16 
>>  net/core/dev.c  |  7 ---
>>  net/core/skbuff.c   | 34 ++
>>  net/openvswitch/vport.c | 37 ++---
>>  net/sched/sch_tbf.c | 10 --
>>  5 files changed, 84 insertions(+), 20 deletions(-)
>>
>> --
>> 2.14.1
>>

Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

2018-01-18 Thread Dmitry Vyukov

On Wed, Jan 17, 2018 at 12:09 PM, Dmitry Vyukov  wrote:
> On Wed, Jan 17, 2018 at 10:49 AM, Daniel Borkmann  
> wrote:
>> Don't know if there's such a possibility, but it would be nice if we could
>> target fuzzing for specific subsystems in related subtrees directly (e.g.
>> for bpf in bpf and bpf-next trees as one example). Dmitry?
>
> Hi Daniel,
>
> It's doable.
> Let's start with one bpf tree. Will it be bpf or bpf-next? Which one
> contains more ongoing work? What's the exact git repo address/branch,
> so that I don't second guess?
> Also what syscalls it makes sense to enable there to target it at bpf
> specifically? As far as I understand effects of bpf are far beyond the
> bpf call and proper testing requires some sockets and other stuff. For
> sockets, will it be enough to enable ip/ipv6? Because if we enable all
> of sctp/dccp/tipc/pptp/etc, it will sure will be finding lots of bugs
> there as well. Does bpf affect incoming network packets?
> Also are there any sysctl's, command line arguments, etc that need to
> be tuned. I know there are net.core.bpf_jit_enable/harden, but I don't
> know what's the most relevant combination. Ideally, we test all of
> them, but let start with one of them because it requires separate
> instances (since the setting is global and test programs can't just
> flip it randomly).
> Also do you want testing from root or not from root? We generally
> don't test under root, because syzkaller comes up with legal ways to
> shut everything down even if we try to contain it (e.g. kill init
> somehow or shut down network using netlink). But if we limit syscall
> surface, then root may work and allow testing staging bpf features.

So, Daniel, Alexei,

I understand that I asked lots of questions, but they are relatively
simple. I need that info to setup proper testing.

[PATCH] [wireless-next] mt76: fix building without CONFIG_LEDS_CLASS

2018-01-18 Thread Arnd Bergmann

When CONFIG_LEDS_CLASS is disabled, or it is a loadable module while
mt76 is built-in, we run into a link error:

drivers/net/wireless/mediatek/mt76/mac80211.o: In function 
`mt76_register_device':
mac80211.c:(.text+0xb78): relocation truncated to fit: R_AARCH64_CALL26 against 
undefined symbol `devm_of_led_classdev_register'

We don't really need a hard dependency here as the driver can presumably
work just fine without LEDs, so this follows the iwlwifi example and
adds a separate Kconfig option for the LED support, this will be available
whenever it will link, and otherwise the respective code gets left out from
the driver object.

Fixes: 17f1de56df05 ("mt76: add common code shared between multiple chipsets")
Signed-off-by: Arnd Bergmann 
---
 drivers/net/wireless/mediatek/mt76/Kconfig   | 5 +
 drivers/net/wireless/mediatek/mt76/mac80211.c| 8 +---
 drivers/net/wireless/mediatek/mt76/mt76x2_init.c | 6 --
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/mediatek/mt76/Kconfig 
b/drivers/net/wireless/mediatek/mt76/Kconfig
index fc05d79c80d0..9d12c1f5284e 100644
--- a/drivers/net/wireless/mediatek/mt76/Kconfig
+++ b/drivers/net/wireless/mediatek/mt76/Kconfig
@@ -8,3 +8,8 @@ config MT76x2E
depends on PCI
---help---
  This adds support for MT7612/MT7602/MT7662-based wireless PCIe 
devices.
+
+config MT76_LEDS
+   bool "LED support for MT76"
+   depends on MT76_CORE
+   depends on LEDS_CLASS=y || MT76_CORE=LEDS_CLASS
diff --git a/drivers/net/wireless/mediatek/mt76/mac80211.c 
b/drivers/net/wireless/mediatek/mt76/mac80211.c
index 3acf0e175d71..144e312a9407 100644
--- a/drivers/net/wireless/mediatek/mt76/mac80211.c
+++ b/drivers/net/wireless/mediatek/mt76/mac80211.c
@@ -295,9 +295,11 @@ int mt76_register_device(struct mt76_dev *dev, bool vht,
mt76_check_sband(dev, NL80211_BAND_2GHZ);
mt76_check_sband(dev, NL80211_BAND_5GHZ);
 
-   ret = mt76_led_init(dev);
-   if (ret)
-   return ret;
+   if (IS_ENABLED(CONFIG_MT76_LEDS)) {
+   ret = mt76_led_init(dev);
+   if (ret)
+   return ret;
+   }
 
return ieee80211_register_hw(hw);
 }
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2_init.c 
b/drivers/net/wireless/mediatek/mt76/mt76x2_init.c
index 4373a2ba5143..0c9d02af205d 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x2_init.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x2_init.c
@@ -849,8 +849,10 @@ int mt76x2_register_device(struct mt76x2_dev *dev)
mt76x2_dfs_init_detector(dev);
 
/* init led callbacks */
-   dev->mt76.led_cdev.brightness_set = mt76x2_led_set_brightness;
-   dev->mt76.led_cdev.blink_set = mt76x2_led_set_blink;
+   if (IS_ENABLED(CONFIG_MT76_LEDS)) {
+   dev->mt76.led_cdev.brightness_set = mt76x2_led_set_brightness;
+   dev->mt76.led_cdev.blink_set = mt76x2_led_set_blink;
+   }
 
ret = mt76_register_device(&dev->mt76, true, mt76x2_rates,
   ARRAY_SIZE(mt76x2_rates));
-- 
2.9.0

Re: [PATCH] can: m_can: mark runtime-PM handlers as __maybe_unused

2018-01-18 Thread Marc Kleine-Budde

On 01/18/2018 02:04 PM, Arnd Bergmann wrote:
> Building without CONFIG_PM results in a harmless warning:
> 
> drivers/net/can/m_can/m_can.c:1763:12: error: 'm_can_runtime_resume' defined 
> but not used [-Werror=unused-function]
> drivers/net/can/m_can/m_can.c:1752:12: error: 'm_can_runtime_suspend' defined 
> but not used [-Werror=unused-function]
> 
> Marking the functions as __maybe_unused lets the compiler
> silently drop them instead.
> 
> Fixes: cdf8259d6573 ("can: m_can: Add PM Support")
> Signed-off-by: Arnd Bergmann 

Tnx - applied.

However all archs using this driver have mandatory PM support and the
driver depends on it.

Marc

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature

Re: [PATCH 0/3] Check gso_size of packets when forwarding

2018-01-18 Thread Daniel Axtens

Jason Wang  writes:

> On 2018年01月18日 16:28, Pravin Shelar wrote:
>> On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens  wrote:
>>> When regular packets are forwarded, we validate their size against the
>>> MTU of the destination device. However, when GSO packets are
>>> forwarded, we do not validate their size against the MTU. We
>>> implicitly assume that when they are segmented, the resultant packets
>>> will be correctly sized.
>>>
>>> This is not always the case.
>>>
>>> We observed a case where a packet received on an ibmveth device had a
>>> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
>>> device, where it caused a firmware assert. This is described in detail
>>> at [0] and was the genesis of this series. Rather than fixing it in
>>> the driver, this series fixes the forwarding path.
>>>
>> Are there any other possible forwarding path in networking stack? TC
>> is one subsystem that could forward such a packet to the bnx2x device,
>> how is that handled ?
>
> Yes, so it looks to me we should do the check in e.g netif_needs_gso() 
> before passing it to hardware. And bnx2x needs to set its gso_max_size 
> correctly.

I don't think gso_max_size is quite the same. If I understand
net/ipv4/tcp.c correctly, gso_max_size is supposed to cap the total
length of the skb, not the length of each segmented packet. The problem
for the bnx2x card is the length of the segment, not the overall length.

>
> Btw, looks like this could be triggered from a guest which is a DOS. We 
> need request a CVE for this.
>

You are correct about how this can be triggered: in fact it came to my
attention because a network packet from one LPAR (PowerVM virtual
machine) brought down the card attached to a different LPAR. It didn't
occur to me that it was potentially a security issue. I am talking with
the security team at Canonical regarding this.

Regards,
Daniel

> Thanks
>
>>
>>> To fix this:
>>>
>>>   - Move a helper in patch 1.
>>>
>>>   - Validate GSO segment lengths in is_skb_forwardable() in the GSO
>>> case, rather than assuming all will be well. This fixes bridges.
>>> This is patch 2.
>>>
>>>   - Open vSwitch uses its own slightly specialised algorithm for
>>> checking lengths. Wire up checking for that in patch 3.
>>>
>>> [0]: https://patchwork.ozlabs.org/patch/859410/
>>>
>>> Cc: manish.cho...@cavium.com
>>> Cc: d...@openvswitch.org
>>>
>>> Daniel Axtens (3):
>>>net: move skb_gso_mac_seglen to skbuff.h
>>>net: is_skb_forwardable: validate length of GSO packet segments
>>>openvswitch: drop GSO packets that are too large
>>>
>>>   include/linux/skbuff.h  | 16 
>>>   net/core/dev.c  |  7 ---
>>>   net/core/skbuff.c   | 34 ++
>>>   net/openvswitch/vport.c | 37 ++---
>>>   net/sched/sch_tbf.c | 10 --
>>>   5 files changed, 84 insertions(+), 20 deletions(-)
>>>
>>> --
>>> 2.14.1
>>>

[PATCH] [net-next] net: sched: avoid uninitialized variable use

2018-01-18 Thread Arnd Bergmann

gcc has identified a code path in which we pass uninitialized
data into tc_dump_tfilter():

net/sched/cls_api.c: In function 'tc_dump_tfilter':
net/sched/cls_api.c:1268:8: error: 'parent' may be used uninitialized in this 
function [-Werror=maybe-uninitialized]

This initializes the variable to the value it had before the previous
change.

Fixes: 7960d1daf278 ("net: sched: use block index as a handle instead of qdisc 
when block is shared")
Signed-off-by: Arnd Bergmann 

I don't know if my patch is the best way to address the issue, but
if not, then at least it helps show what the warning is about
and lets someone else come up with a better solution.
---
 net/sched/cls_api.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index e500d11da9cd..a95bfc8fc442 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1317,6 +1317,7 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct 
netlink_callback *cb)
block = tcf_block_lookup(net, tcm->tcm_block_index);
if (!block)
goto out;
+   parent = tcm->tcm_parent;
} else {
const struct Qdisc_class_ops *cops;
struct net_device *dev;
-- 
2.9.0

Re: [PATCH v2 00/19] prevent bounds-check bypass via speculative execution

2018-01-18 Thread Will Deacon

Hi Dan, Linus,

On Thu, Jan 11, 2018 at 05:41:08PM -0800, Dan Williams wrote:
> On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds
>  wrote:
> > On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams  
> > wrote:
> >>
> >> This series incorporates Mark Rutland's latest ARM changes and adds
> >> the x86 specific implementation of 'ifence_array_ptr'. That ifence
> >> based approach is provided as an opt-in fallback, but the default
> >> mitigation, '__array_ptr', uses a 'mask' approach that removes
> >> conditional branches instructions, and otherwise aims to redirect
> >> speculation to use a NULL pointer rather than a user controlled value.
> >
> > Do you have any performance numbers and perhaps example code
> > generation? Is this noticeable? Are there any microbenchmarks showing
> > the difference between lfence use and the masking model?
> 
> I don't have performance numbers, but here's a sample code generation
> from __fcheck_files, where the 'and; lea; and' sequence is portion of
> array_ptr() after the mask generation with 'sbb'.
> 
> fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
>  8e7:   8b 02   mov(%rdx),%eax
>  8e9:   48 39 c7cmp%rax,%rdi
>  8ec:   48 19 c9sbb%rcx,%rcx
>  8ef:   48 8b 42 08 mov0x8(%rdx),%rax
>  8f3:   48 89 femov%rdi,%rsi
>  8f6:   48 21 ceand%rcx,%rsi
>  8f9:   48 8d 04 f0 lea(%rax,%rsi,8),%rax
>  8fd:   48 21 c8and%rcx,%rax
> 
> 
> > Having both seems good for testing, but wouldn't we want to pick one in the 
> > end?
> 
> I was thinking we'd keep it as a 'just in case' sort of thing, at
> least until the 'probably safe' assumption of the 'mask' approach has
> more time to settle out.

>From the arm64 side, the only concern I have (and this actually applies to
our CSDB sequence as well) is the calculation of the array size by the
caller. As Linus mentioned at the end of [1], if the determination of the
size argument is based on a conditional branch, then masking doesn't help
because you bound within the wrong range under speculation.

We ran into this when trying to use masking to protect our uaccess routines
where the conditional bound is either KERNEL_DS or USER_DS. It's possible
that a prior conditional set_fs(KERNEL_DS) could defeat the masking and so
we'd need to throw some heavy barriers in set_fs to make it robust.

The question then is whether or not we're likely to run into this elsewhere,
and I don't have a good feel for that.

Will

[1] 
http://lkml.kernel.org/r/CA+55aFz0tsreoa=5ud2nofcpng-dizlbht9wu9asyhplfjd...@mail.gmail.com

Re: [net-next: PATCH v4 0/7] Armada 7k/8k PP2 ACPI support

2018-01-18 Thread Antoine Tenart

Hi Marcin,

I tested the series on a MacchiatoBin to ensure the mvpp2 DT support was
still working. I was able to use all supported ports as before, and saw
no issue.

For all mvpp2 patches, you can add:

Tested-by: Antoine Tenart 

Thanks!
Antoine

On Thu, Jan 18, 2018 at 01:31:37PM +0100, Marcin Wojtas wrote:
> Hi,
> 
> I quickly resend the series, thanks to Antoine Tenart's remark,
> who spotted !CONFIG_ACPI compilation issue after introducing
> the new fwnode_irq_get() routine. Please see the details in the changelog
> below and the 3/7 commit log.
> 
> mvpp2 driver can work with the ACPI representation, as exposed
> on a public branch:
> https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/commits/marvell-armada-wip
> It was compiled together with the most recent Tianocore EDK2 revision.
> Please refer to the firmware build instruction on MacchiatoBin board:
> http://wiki.macchiatobin.net/tiki-index.php?page=Build+from+source+-+UEFI+EDK+II
> 
> ACPI representation of PP2 controllers (withouth PHY support) can
> be viewed in the github:
> * MacchiatoBin:
> https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/blob/71ae395da1661374b0f07d1602afb1eee56e9794/Platforms/Marvell/Armada/AcpiTables/Armada80x0McBin/Dsdt.asl#L201
> 
> * Armada 7040 DB:
> https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/blob/71ae395da1661374b0f07d1602afb1eee56e9794/Platforms/Marvell/Armada/AcpiTables/Armada70x0/Dsdt.asl#L131
> 
> I will appreciate any comments or remarks.
> 
> Best regards,
> Marcin
> 
> Changelog:
> v3 -> v4:
> * 3/7
> - add new macro (ACPI_HANDLE_FWNODE) and fix
>   compilation with !CONFIG_ACPI
> - extend commit log and mention usability of fwnode_irq_get
>   for the child nodes as well
> 
> v2 -> v3:
> * 1/7, 2/7
> - Add Rafael's Acked-by's
> * 3/7, 4/7
> - New patches
> * 6/7, 7/7
> - Update driver with new helper routines usage
> - Improve commit log.
> 
> v1 -> v2:
> * Remove MDIO patches
> * Use PP2 ports only with link interrupts
> * Release second region resources in mvpp2 driver (code moved from
>   mvmdio), as explained in details in 5/5 commit message.
> 
> Marcin Wojtas (7):
>   device property: Introduce fwnode_get_mac_address()
>   device property: Introduce fwnode_get_phy_mode()
>   device property: Introduce fwnode_irq_get()
>   device property: Allow iterating over available child fwnodes
>   net: mvpp2: simplify maintaining enabled ports' list
>   net: mvpp2: use device_*/fwnode_* APIs instead of of_*
>   net: mvpp2: enable ACPI support in the driver
> 
>  drivers/base/property.c  | 104 --
>  drivers/net/ethernet/marvell/mvpp2.c | 206 
>  include/linux/acpi.h |   3 +
>  include/linux/property.h |  11 ++
>  4 files changed, 232 insertions(+), 92 deletions(-)
> 
> -- 
> 2.7.4
> 

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

Re: [PATCH] [wireless-next] mt76: fix building without CONFIG_LEDS_CLASS

2018-01-18 Thread Kalle Valo

Arnd Bergmann  writes:

> When CONFIG_LEDS_CLASS is disabled, or it is a loadable module while
> mt76 is built-in, we run into a link error:
>
> drivers/net/wireless/mediatek/mt76/mac80211.o: In function 
> `mt76_register_device':
> mac80211.c:(.text+0xb78): relocation truncated to fit: R_AARCH64_CALL26 
> against undefined symbol `devm_of_led_classdev_register'
>
> We don't really need a hard dependency here as the driver can presumably
> work just fine without LEDs, so this follows the iwlwifi example and
> adds a separate Kconfig option for the LED support, this will be available
> whenever it will link, and otherwise the respective code gets left out from
> the driver object.
>
> Fixes: 17f1de56df05 ("mt76: add common code shared between multiple chipsets")
> Signed-off-by: Arnd Bergmann 

I'm planning to queue this for 4.16.

-- 
Kalle Valo

Re: [RFC v2 net-next 06/10] net/sched: Introduce the TBS Qdisc

2018-01-18 Thread Jamal Hadi Salim


On 18-01-17 06:06 PM, Jesus Sanchez-Palencia wrote:

From: Vinicius Costa Gomes 

TBS (Time Based Scheduler) uses the information added earlier in this
series (the socket option SO_TXTIME and the new role of
sk_buff->tstamp) to schedule traffic transmission based on absolute
time.

For some workloads, just bandwidth enforcement is not enough, and
precise control of the transmission of packets is necessary.

Example:

$ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \


handle 100:0 ?


map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

$ tc qdisc add dev enp2s0 parent 100:1 tbs delta 6 clockid 11 offload 1


In this example, the Qdisc will try to enable offloading (offload 1)
the control of the transmission time to the network adapter, the
time stamp in socket are in reference to the clockid '11' (CLOCK_TAI)
and packets leave the Qdisc "delta" (6) nanoseconds before its
transmission time.





When offloading is disabled, the network adapter will ignore the
sk_buff time stamp, and so, the transmission time will be only "best
effort" from the Qdisc.



General comments:
1) iproute2: Avoid magic numbers like 1 or 11 please; "offload"
(without 1) and "TAI" will be more human friendly.

2) Experience shows that adding padding fields in the control structs
implies they will _never ever_ be used. That was not design intent
for netlink but over years shit like that has happened.
Maybe look at using a 32 bitmap? It is more "future proof".
You seem to only have 2-3 flags but it gives you opportunity
to add more changes later. If you are 100% sure youll never need
it - then maybe just move the tc_tbs_qopt::offload to the end of
of the struct.

3)It would be helpful for debugging to increment some stats other
than drop counters on enqueu/dequeue obsolete packet drop. Maybe use
overlimits for the dequeu drops (in addition)?

4) I may be misreading things - but did you need to reset the
watchdog on dequeue? It is already being kicked for every incoming packet.

cheers,
jamal

[PATCH xfrm v1] xfrm: fix error flow in case of add state fails

2018-01-18 Thread Aviad Yehezkel

If add state fails in case of device offload, netdev refcount
will be negative since gc task is attempting to dev_free this state.
This is fixed by putting NULL in state dev field.

Signed-off-by: Aviad Yehezkel 
Signed-off-by: Boris Pismeny 
---
 net/xfrm/xfrm_device.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index fb3f920..cd2b7d3 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -109,6 +109,7 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state 
*x,
 
err = dev->xfrmdev_ops->xdo_dev_state_add(x);
if (err) {
+   xso->dev = NULL;
dev_put(dev);
return err;
}
-- 
2.7.4

Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

2018-01-18 Thread Henrique de Moraes Holschuh

On Thu, 18 Jan 2018, Dmitry Vyukov wrote:
> I've made a bunch of changes yesterday and today. This includes

...

> and even per crash/tree. To alleviate this, syzbot will now say e.g.
> "So far this crash happened 185 times on linux-next, mmots, net-next,
> upstream". So that you can see that it's not only, say, linux-next
> problem.
> 
> syzbot just mailed another report with all of these changes which you
> can see here:
> https://groups.google.com/forum/#!msg/syzkaller-bugs/u5nq3PdPkIc/F4tXzErxAgAJ

Looks good to me.  Not that I had anything against what it did before,
but it is much more recipient-friendly now, IMHO.

Thanks for all the hard work on syzkaller!

-- 
  Henrique Holschuh

Re: [PATCH v2] FIRMWARE: bcm47xx_nvram: Replace mac address parsing

2018-01-18 Thread Andy Shevchenko

On Thu, 2017-12-21 at 17:42 +0100, Hauke Mehrtens wrote:
> 
> On 12/21/2017 03:40 PM, Andy Shevchenko wrote:
> > Replace sscanf() with mac_pton().
> > 
> > Signed-off-by: Andy Shevchenko 
> 
> Acked-by: Hauke Mehrtens 
> 
> The patch looks good, but I haven't tested them on my devices.

Any news on this, anyone?

> > ---
> > - use negative condition to be consistent with the rest code
> >  drivers/firmware/broadcom/Kconfig |  1 +
> >  drivers/firmware/broadcom/bcm47xx_sprom.c | 18 +++---
> >  2 files changed, 4 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/firmware/broadcom/Kconfig
> > b/drivers/firmware/broadcom/Kconfig
> > index 5e29f83e7b39..c041dcb7ea52 100644
> > --- a/drivers/firmware/broadcom/Kconfig
> > +++ b/drivers/firmware/broadcom/Kconfig
> > @@ -13,6 +13,7 @@ config BCM47XX_NVRAM
> >  config BCM47XX_SPROM
> > bool "Broadcom SPROM driver"
> > depends on BCM47XX_NVRAM
> > +   select GENERIC_NET_UTILS
> > help
> >   Broadcom devices store configuration data in SPROM.
> > Accessing it is
> >   specific to the bus host type, e.g. PCI(e) devices have
> > it mapped in
> > diff --git a/drivers/firmware/broadcom/bcm47xx_sprom.c
> > b/drivers/firmware/broadcom/bcm47xx_sprom.c
> > index 62aa3cf09b4d..4787f86c8ac1 100644
> > --- a/drivers/firmware/broadcom/bcm47xx_sprom.c
> > +++ b/drivers/firmware/broadcom/bcm47xx_sprom.c
> > @@ -137,20 +137,6 @@ static void nvram_read_leddc(const char
> > *prefix, const char *name,
> > *leddc_off_time = (val >> 16) & 0xff;
> >  }
> >  
> > -static void bcm47xx_nvram_parse_macaddr(char *buf, u8 macaddr[6])
> > -{
> > -   if (strchr(buf, ':'))
> > -   sscanf(buf, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
> > &macaddr[0],
> > -   &macaddr[1], &macaddr[2], &macaddr[3],
> > &macaddr[4],
> > -   &macaddr[5]);
> > -   else if (strchr(buf, '-'))
> > -   sscanf(buf, "%hhx-%hhx-%hhx-%hhx-%hhx-%hhx",
> > &macaddr[0],
> > -   &macaddr[1], &macaddr[2], &macaddr[3],
> > &macaddr[4],
> > -   &macaddr[5]);
> > -   else
> > -   pr_warn("Can not parse mac address: %s\n", buf);
> > -}
> > -
> >  static void nvram_read_macaddr(const char *prefix, const char
> > *name,
> >u8 val[6], bool fallback)
> >  {
> > @@ -161,7 +147,9 @@ static void nvram_read_macaddr(const char
> > *prefix, const char *name,
> > if (err < 0)
> > return;
> >  
> > -   bcm47xx_nvram_parse_macaddr(buf, val);
> > +   strreplace(buf, '-', ':');
> > +   if (!mac_pton(buf, val))
> > +   pr_warn("Can not parse mac address: %s\n", buf);
> >  }
> >  
> >  static void nvram_read_alpha2(const char *prefix, const char *name,
> > 

-- 
Andy Shevchenko 
Intel Finland Oy

Re: [RFC v2 net-next 06/10] net/sched: Introduce the TBS Qdisc

2018-01-18 Thread Jamal Hadi Salim


One more comment:
Probably try to run a test with a very small delta with
no offload (probably using something like prio as the root qdisc)
and dump the stats.
My gut feeling is your accounting of the backlog in particular is off.

cheers,
jamal

On 18-01-18 08:35 AM, Jamal Hadi Salim wrote:

On 18-01-17 06:06 PM, Jesus Sanchez-Palencia wrote:

From: Vinicius Costa Gomes 

TBS (Time Based Scheduler) uses the information added earlier in this
series (the socket option SO_TXTIME and the new role of
sk_buff->tstamp) to schedule traffic transmission based on absolute
time.

For some workloads, just bandwidth enforcement is not enough, and
precise control of the transmission of packets is necessary.

Example:

$ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \


handle 100:0 ?


    map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

$ tc qdisc add dev enp2s0 parent 100:1 tbs delta 6 clockid 11 
offload 1



In this example, the Qdisc will try to enable offloading (offload 1)
the control of the transmission time to the network adapter, the
time stamp in socket are in reference to the clockid '11' (CLOCK_TAI)
and packets leave the Qdisc "delta" (6) nanoseconds before its
transmission time.





When offloading is disabled, the network adapter will ignore the
sk_buff time stamp, and so, the transmission time will be only "best
effort" from the Qdisc.



General comments:
1) iproute2: Avoid magic numbers like 1 or 11 please; "offload"
(without 1) and "TAI" will be more human friendly.

2) Experience shows that adding padding fields in the control structs
implies they will _never ever_ be used. That was not design intent
for netlink but over years shit like that has happened.
Maybe look at using a 32 bitmap? It is more "future proof".
You seem to only have 2-3 flags but it gives you opportunity
to add more changes later. If you are 100% sure youll never need
it - then maybe just move the tc_tbs_qopt::offload to the end of
of the struct.

3)It would be helpful for debugging to increment some stats other
than drop counters on enqueu/dequeue obsolete packet drop. Maybe use
overlimits for the dequeu drops (in addition)?

4) I may be misreading things - but did you need to reset the
watchdog on dequeue? It is already being kicked for every incoming packet.

cheers,
jamal

Re: [PATCH] [net-next] net: sched: avoid uninitialized variable use

2018-01-18 Thread Jiri Pirko

Thu, Jan 18, 2018 at 02:17:28PM CET, a...@arndb.de wrote:
>gcc has identified a code path in which we pass uninitialized
>data into tc_dump_tfilter():
>
>net/sched/cls_api.c: In function 'tc_dump_tfilter':
>net/sched/cls_api.c:1268:8: error: 'parent' may be used uninitialized in this 
>function [-Werror=maybe-uninitialized]
>
>This initializes the variable to the value it had before the previous
>change.
>
>Fixes: 7960d1daf278 ("net: sched: use block index as a handle instead of qdisc 
>when block is shared")
>Signed-off-by: Arnd Bergmann 
>
>I don't know if my patch is the best way to address the issue, but
>if not, then at least it helps show what the warning is about
>and lets someone else come up with a better solution.

I already sent a fix for this:
http://patchwork.ozlabs.org/patch/862787/

Re: [PATCH 0/3] Check gso_size of packets when forwarding

2018-01-18 Thread Daniel Axtens

Daniel Axtens  writes:

> Jason Wang  writes:
>
>> On 2018年01月18日 16:28, Pravin Shelar wrote:
>>> On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens  wrote:
 When regular packets are forwarded, we validate their size against the
 MTU of the destination device. However, when GSO packets are
 forwarded, we do not validate their size against the MTU. We
 implicitly assume that when they are segmented, the resultant packets
 will be correctly sized.

 This is not always the case.

 We observed a case where a packet received on an ibmveth device had a
 GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
 device, where it caused a firmware assert. This is described in detail
 at [0] and was the genesis of this series. Rather than fixing it in
 the driver, this series fixes the forwarding path.

>>> Are there any other possible forwarding path in networking stack? TC
>>> is one subsystem that could forward such a packet to the bnx2x device,
>>> how is that handled ?
>>
>> Yes, so it looks to me we should do the check in e.g netif_needs_gso() 
>> before passing it to hardware. And bnx2x needs to set its gso_max_size 
>> correctly.
>
> I don't think gso_max_size is quite the same. If I understand
> net/ipv4/tcp.c correctly, gso_max_size is supposed to cap the total
> length of the skb, not the length of each segmented packet. The problem
> for the bnx2x card is the length of the segment, not the overall length.
>
>>
>> Btw, looks like this could be triggered from a guest which is a DOS. We 
>> need request a CVE for this.
>>
>
> You are correct about how this can be triggered: in fact it came to my
> attention because a network packet from one LPAR (PowerVM virtual
> machine) brought down the card attached to a different LPAR. It didn't
> occur to me that it was potentially a security issue. I am talking with
> the security team at Canonical regarding this.

I have requested a CVE from the Distributed Weakness Filing.

Regards,
Daniel

>
> Regards,
> Daniel
>
>> Thanks
>>
>>>
 To fix this:

   - Move a helper in patch 1.

   - Validate GSO segment lengths in is_skb_forwardable() in the GSO
 case, rather than assuming all will be well. This fixes bridges.
 This is patch 2.

   - Open vSwitch uses its own slightly specialised algorithm for
 checking lengths. Wire up checking for that in patch 3.

 [0]: https://patchwork.ozlabs.org/patch/859410/

 Cc: manish.cho...@cavium.com
 Cc: d...@openvswitch.org

 Daniel Axtens (3):
net: move skb_gso_mac_seglen to skbuff.h
net: is_skb_forwardable: validate length of GSO packet segments
openvswitch: drop GSO packets that are too large

   include/linux/skbuff.h  | 16 
   net/core/dev.c  |  7 ---
   net/core/skbuff.c   | 34 ++
   net/openvswitch/vport.c | 37 ++---
   net/sched/sch_tbf.c | 10 --
   5 files changed, 84 insertions(+), 20 deletions(-)

 --
 2.14.1

[PATCH iproute2 v2 0/9] ip/tunnel: Improve tunnel parameters printing

2018-01-18 Thread Serhey Popovych

Continue improvements to tunnel modules. Final goal
is to make merge IP and IPv6 variants and make that
transition as transparent as possible.

Everything within this series is open for your comments,
suggestions and criticism.

See individual patch description message for details.

v2
  1) Fix checkpatch issues: no assignment in condtional
 anymore.

  2) Better code abstraction: introduce tnl_print_encap()
 helper and get rid of duplicated code for printing
 tunnel encapsulation options.

 Patch with subject
 "ip/tunnel: Abstract tunnel encapsulation options printing"
 replaces two previous in v1 series
 "ip/tunnel: Use print_string() and simplify encap option printing"
 and
 "ip/tunnel: Minor cleanups in print routines"

  3) Include patch with subject
 "tunnel: Return constant string without copying it"
 in the series: it is related to tunneling code too.

Thanks,
Serhii

Serhey Popovych (9):
  iplink: Use ll_index_to_name() instead of if_indextoname()
  ip/tunnel: Correct and unify ttl/hoplimit printing
  ip/tunnel: Simplify and unify tos printing
  ip/tunnel: Use print_0xhex() instead of print_string()
  ip/tunnel: Abstract tunnel encapsulation options printing
  gre/tunnel: Print erspan_index using print_uint()
  vti/tunnel: Unify ikey/okey printing
  vti6/tunnel: Unify and simplify link type help functions
  tunnel: Return constant string without copying it

 bridge/fdb.c  |5 +-
 bridge/link.c |   19 +++
 ip/ip6tunnel.c|5 +-
 ip/iplink_bond.c  |   38 ++
 ip/iplink_geneve.c|   38 ++
 ip/iplink_vxlan.c |   49 --
 ip/iproute_lwtunnel.c |   11 ++---
 ip/iptunnel.c |2 +-
 ip/link_gre.c |  132 +++--
 ip/link_gre6.c|  103 +-
 ip/link_ip6tnl.c  |  105 ++-
 ip/link_iptnl.c   |  128 +++
 ip/link_vti.c |   42 
 ip/link_vti6.c|   64 +++-
 ip/tunnel.c   |  116 ---
 ip/tunnel.h   |5 ++
 16 files changed, 340 insertions(+), 522 deletions(-)

-- 
1.7.10.4

Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

2018-01-18 Thread Greg Kroah-Hartman

On Thu, Jan 18, 2018 at 02:01:28PM +0100, Dmitry Vyukov wrote:
> On Thu, Jan 18, 2018 at 2:09 AM, Theodore Ts'o  wrote:
> > On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote:
> >>
> >> If syzkaller can only test one tree than linux-next should be the one.
> >
> > Well, there's been some controversy about that.  The problem is that
> > it's often not clear if this is long-standing bug, or a bug which is
> > in a particular subsystem tree --- and if so, *which* subsystem tree,
> > etc.  So it gets blasted to linux-kernel, and to get_maintainer.pl,
> > which is often not accurate --- since the location of the crash
> > doesn't necessarily point out where the problem originated, and hence
> > who should look at the syzbot report.  And so this has caused
> > some irritation.
> 
> 
> Re set of tested trees.
> 
> We now have an interesting spectrum of opinions.
> 
> Some assorted thoughts on this:
> 
> 1. First, "upstream is clean" won't happen any time soon. There are
> several reasons for this:
>  - Currently syzkaller only tests a subset of subsystems that it knows
> how to test, even the ones that it tests it tests poorly. Over time
> it's improved to test most subsystems and existing subsystems better.
> Just few weeks ago I've added some descriptions for crypto subsystem
> and it uncovered 20+ old bugs.
>  - syzkaller is guided, genetic fuzzer over time it leans how to do
> more complex things by small steps. It takes time.
>  - We have more bug detection tools coming: LEAKCHECK, KMSAN (uninit
> memory), KTSAN (data races).
>  - generic syzkaller smartness will be improved over time.
>  - it will get more CPU resources.
> Effect of all of these things is multiplicative: we test more code,
> smarter, with more bug-detection tools, with more resources. So I
> think we need to plan for a mix of old and new bugs for foreseeable
> future.

That's fine, but when you test Linus's tree, we "know" you are hitting
something that really is an issue, and it's not due to linux-next
oddities.

When I see a linux-next report, and it looks "odd", my default reaction
is "ugh, must be a crazy patch in some other subsystem, I _know_ my code
in linux-next is just fine." :)

> 2. get_maintainer.pl and mix of old and new bugs was mentioned as
> harming attribution. I don't see what will change when/if we test only
> upstream. Then the same mix of old/new bugs will be detected just on
> upstream, with all of the same problems for old/new, maintainers,
> which subsystem, etc. I think the amount of bugs in the kernel is
> significant part of the problem, but the exact boundary where we
> decide to start killing them won't affect number of bugs.

I don't worry about that, the traceback should tell you a lot, and even
when that is wrong (i.e. warnings thrown up by sysfs core calls that are
obviously not a sysfs issue, but rather a subsystem issue), it's easy to
see.

> 3. If we test only upstream, we increase chances of new security bugs
> sinking into releases. We sure could raise perceived security value of
> the bugs by keeping them private, letting them sink into release,
> letting them sink into distros, and then reporting a high-profile
> vulnerability. I think that's wrong. There is something broken with
> value measuring in security community. Bug that is killed before
> sinking into any release is the highest impact thing. As Alexei noted,
> fixing bugs es early as possible also reduces fix costs, backporting
> burden, etc. This also can eliminate need in bisection in some cases,
> say if you accepted a large change to some files and a bunch of
> crashes appears for these files on your tree soon, it's obvious what
> happens.

I agree, this is an issue, but I think you have a lot of "low hanging
fruit" in Linus's tree left to find.  Testing linux-next is great, but
the odds of something "new" being added there for your type of testing
right now is usually pretty low, right?

thanks,

greg k-h

[PATCH iproute2 v2 8/9] vti6/tunnel: Unify and simplify link type help functions

2018-01-18 Thread Serhey Popovych

Both of these two changes are missing for link_vti6.c:

  commit 8b47135474cd ("ip: link: Unify link type help functions a bit")
  commit 561e650eff67 ("ip link: Shortify printing the usage of link type")

Replay them on link_vti6.c to bring link type help functions
inline with other tunneling code.

Signed-off-by: Serhey Popovych 
---
 ip/link_vti6.c |   32 ++--
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/ip/link_vti6.c b/ip/link_vti6.c
index 6b61e3d..2a86d59 100644
--- a/ip/link_vti6.c
+++ b/ip/link_vti6.c
@@ -24,20 +24,25 @@
 #include "ip_common.h"
 #include "tunnel.h"
 
+static void print_usage(FILE *f)
+{
+   fprintf(f,
+   "Usage: ... vti6 [ remote ADDR ]\n"
+   "[ local ADDR ]\n"
+   "[ [i|o]key KEY ]\n"
+   "[ dev PHYS_DEV ]\n"
+   "[ fwmark MARK ]\n"
+   "\n"
+   "Where: ADDR := { IPV6_ADDRESS }\n"
+   "   KEY  := { DOTTED_QUAD | NUMBER }\n"
+   "   MARK := { 0x0..0x }\n"
+   );
+}
 
 static void usage(void) __attribute__((noreturn));
 static void usage(void)
 {
-   fprintf(stderr, "Usage: ip link { add | set | change | replace | del } 
NAME\n");
-   fprintf(stderr, "  type { vti6 } [ remote ADDR ] [ local ADDR 
]\n");
-   fprintf(stderr, "  [ [i|o]key KEY ]\n");
-   fprintf(stderr, "  [ dev PHYS_DEV ]\n");
-   fprintf(stderr, "  [ fwmark MARK ]\n");
-   fprintf(stderr, "\n");
-   fprintf(stderr, "Where: NAME := STRING\n");
-   fprintf(stderr, "   ADDR := { IPV6_ADDRESS }\n");
-   fprintf(stderr, "   KEY  := { DOTTED_QUAD | NUMBER }\n");
-   fprintf(stderr, "   MARK := { 0x0..0x }\n");
+   print_usage(stderr);
exit(-1);
 }
 
@@ -224,9 +229,16 @@ static void vti6_print_opt(struct link_util *lu, FILE *f, 
struct rtattr *tb[])
}
 }
 
+static void vti6_print_help(struct link_util *lu, int argc, char **argv,
+   FILE *f)
+{
+   print_usage(f);
+}
+
 struct link_util vti6_link_util = {
.id = "vti6",
.maxattr = IFLA_VTI_MAX,
.parse_opt = vti6_parse_opt,
.print_opt = vti6_print_opt,
+   .print_help = vti6_print_help,
 };
-- 
1.7.10.4

1 2 3 4 >

1 - 100 of 338 matches

Mail list logo