date:20170901

Re: [PATCH 2/2] Bluetooth: btqcomsmd: BD address setup

2017-09-01 Thread Marcel Holtmann

Hi Bjorn,

>>> Bluetooth BD address can be retrieved in the same way as
>>> for wcnss-wlan MAC address. This patch mainly stores the
>>> local-mac-address property and sets the BD address during
>>> hci device setup.
>>> 
>>> Signed-off-by: Loic Poulain 
>>> Signed-off-by: Bjorn Andersson 
>>> ---
>>> drivers/bluetooth/btqcomsmd.c | 28 
>>> 1 file changed, 28 insertions(+)
>>> 
>>> diff --git a/drivers/bluetooth/btqcomsmd.c b/drivers/bluetooth/btqcomsmd.c
>>> index d00c4fdae924..443bb2099329 100644
>>> --- a/drivers/bluetooth/btqcomsmd.c
>>> +++ b/drivers/bluetooth/btqcomsmd.c
>>> @@ -26,6 +26,7 @@
>>> struct btqcomsmd {
>>> struct hci_dev *hdev;
>>> 
>>> +   const bdaddr_t *addr;
>>> struct rpmsg_endpoint *acl_channel;
>>> struct rpmsg_endpoint *cmd_channel;
>>> };
>>> @@ -100,6 +101,27 @@ static int btqcomsmd_close(struct hci_dev *hdev)
>>> return 0;
>>> }
>>> 
>>> +static int btqcomsmd_setup(struct hci_dev *hdev)
>>> +{
>>> +   struct btqcomsmd *btq = hci_get_drvdata(hdev);
>>> +   struct sk_buff *skb;
>>> +
>>> +   skb = __hci_cmd_sync(hdev, HCI_OP_RESET, 0, NULL, HCI_INIT_TIMEOUT);
>>> +   if (IS_ERR(skb))
>>> +   return PTR_ERR(skb);
>>> +   kfree_skb(skb);
>>> +
>>> +   if (btq->addr) {
>>> +   bdaddr_t bdaddr;
>>> +
>>> +   /* btq->addr stored with most significant byte first */
>>> +   baswap(&bdaddr, btq->addr);
>>> +   return qca_set_bdaddr_rome(hdev, &bdaddr);
>>> +   }
>>> +
>>> +   return 0;
>>> +}
>>> +
>>> static int btqcomsmd_probe(struct platform_device *pdev)
>>> {
>>> struct btqcomsmd *btq;
>>> @@ -123,6 +145,11 @@ static int btqcomsmd_probe(struct platform_device 
>>> *pdev)
>>> if (IS_ERR(btq->cmd_channel))
>>> return PTR_ERR(btq->cmd_channel);
>>> 
>>> +   btq->addr = of_get_property(pdev->dev.of_node, "local-mac-address",
>>> +   &ret);
>>> +   if (ret != sizeof(bdaddr_t))
>>> +   btq->addr = NULL;
>>> +
>>> hdev = hci_alloc_dev();
>>> if (!hdev)
>>> return -ENOMEM;
>>> @@ -135,6 +162,7 @@ static int btqcomsmd_probe(struct platform_device *pdev)
>>> hdev->open = btqcomsmd_open;
>>> hdev->close = btqcomsmd_close;
>>> hdev->send = btqcomsmd_send;
>>> +   hdev->setup = btqcomsmd_setup;
>>> hdev->set_bdaddr = qca_set_bdaddr_rome;
>> 
>> I do not like this patch. Why not just set HCI_QUIRK_INVALID_BDADDR
>> and let a userspace tool deal with reading the BD_ADDR from some
>> storage.
>> 
> 
> That's what we currently have, but we regularly get complaints from
> developers using our board (DB410c).

at least not in the upstream driver. It does not use HCI_QUIRK_INVALID_BDADDR 
to tell the system that its BD_ADDR is not valid. Which is something you still 
need to do if local-mac-address would not be found.

What BD_ADDR is actually returned by default. Can someone send me a “btmon -w 
trace.log” for an init procedure of this chip?

> We're maintaining a Debian-based and an OpenEmbedded-based build and at
> least in the past btmgmt was not available in these - so we would have
> to maintain both a custom BlueZ package and then some scripts to inject
> the appropriate mac address.
> 
> Beyond these reference builds our users tend to build their own system
> images and I was hoping that they would not be forced to have a custom
> hook running each time hci0 is registered.

Frankly this has never been about btmgmt usage. That tool is really just for us 
to test the interface. What was needed is that we create a small daemon that 
can have backends for accessing the various OTPs. Or in dev mode just generate 
a random OUI from an unused OUI range. I would have put that into bluetoothd, 
but it seemed not a good idea since many companies were secret about their OTP 
access. So I assumed they build there own quick solution since mgmt API is 
fully documented and you only need to listen for Unconfigured Index event, send 
Set Public Address and leave. So something super simple.

For a LE only controller without a BD_ADDR, we recently added a pool of static 
addresses that it will generate and program. However that is specific since LE 
is capable of operating without a public address.

We could actually downgrade a dual-mode controller without a BD_ADDR into a 
single mode controller. That will automatically start using static addresses 
and be fully operational. That might be useful for people who get a dual-mode 
controller, but only care about LE. I have seen devices that only use the LE 
portion.

>> Frankly I do not get this WiFI MAC address or BD_ADDR stored in DT. I
>> assumed the DT is suppose to describe hardware and not some value that
>> is normally retrieved for OTP or alike.
>> 
> 
> While I share your skepticism here I find it way superior over the
> various cases where this information is hard coded in some firmware file
> that has to be patched for each device - in particular when considering
> the out-of-tre

Re: [PATCH v4 next 1/3] modules:capabilities: allow __request_module() to take a capability argument

2017-09-01 Thread Djalal Harouni

Hi Kees,

On Thu, Jun 1, 2017 at 9:10 PM, Kees Cook  wrote:
> On Thu, Jun 1, 2017 at 7:56 AM, Djalal Harouni  wrote:
...
>
>> BTW Kees, also in next version I won't remove the
>> capable(CAP_NET_ADMIN) check from [1]
>> even if there is the new request_module_cap(), I would like it to be
>> in a different patches, this way we go incremental
>> and maybe it is better to merge what we have now ?  and follow up
>> later, and of course if other maintainers agree too!
>
> Yes, incremental. I would suggest first creating the API changes to
> move a basic require_cap test into the LSM (which would drop the
> open-coded capable() checks in the net code), and then add the
> autoload logic in the following patches. That way the "infrastructure"
> changes happen separately and do not change any behaviors, but moves
> the caps test down where its wanted in the LSM, before then augmenting
> the logic.
>
>> I just need a bit of free time to check again everything and will send
>> a v5 with all requested changes.
>
> Great, thank you!
>

So sorry was busy these last months, I picked it again, will send v5 after the
merge window.

Kees I am looking on a way to integrate a test for it, we should use
something like
the example here [1] or maybe something else ? and which module to use ?

I still did not sort this out, if anyone has some suggestions, thank
you in advance!


[1] http://openwall.com/lists/kernel-hardening/2017/05/22/7

-- 
tixxdz

RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

2017-09-01 Thread Dexuan Cui

> From: Stefan Hajnoczi [mailto:stefa...@redhat.com]
> Sent: Thursday, August 31, 2017 4:55 AM
> ...
> On Tue, Aug 29, 2017 at 03:37:07PM +, Jorgen S. Hansen wrote:
> > > On Aug 29, 2017, at 4:36 AM, Dexuan Cui  wrote:
> > If we allow multiple host side transports, virtio host side support and
> > vmci should be able to coexist regardless of the order of initialization.
> 
> That sounds good to me.
> 
> This means af_vsock.c needs to be aware of CID allocation.  Currently the
> vhost_vsock.ko driver handles this itself (it keeps a list of CIDs and
> checks that they are not used twice).  It should be possible to move
> that state into af_vsock.c so we have  pairs.
> 
> I'm currently working on NFS over AF_VSOCK and sock_diag support (for
> ss(8) and netstat-like tools).
> 
> Multi-transport support is lower priority for me at the moment.  I'm
> happy to review patches though.  If there is no progress on this by the
> end of the year then I will have time to work on it.
I understand. Thank you both for sharing the details about the plan!
 
> Are either of you are in Prague, Czech Republic on October 25-27 for
> Linux Kernel Summit, Open Source Summit Europe, Embedded Linux
> Conference Europe, KVM Forum, or MesosCon Europe?
> 
> Stefan
I regret I won't be there this year. 

Thanks,
-- Dexuan

Re: [PATCH] ipv6: sr: Use ARRAY_SIZE macro

2017-09-01 Thread Thomas Meyer

On Fri, Sep 01, 2017 at 08:51:55PM -0700, Joe Perches wrote:
> On Fri, 2017-09-01 at 18:35 -0700, David Miller wrote:
> > From: Thomas Meyer 
> > Date: Thu, 31 Aug 2017 16:18:15 +0200
> > 
> > > Grepping for "sizeof\(.+\) / sizeof\(" found this as one of the first
> > > candidates.
> > > Maybe a coccinelle can catch all of those.
> 
Hi,

> Umm: try scripts/coccinelle/misc/array_size.cocci

Yes, I found out/remembered after I submitted above patch... I used to
run most of the cocci spatches (some just run too long) after each rc1 release, 
but lost interest/time. nobody seems to
do this regularly, at least for existing spatches.

See 6 patches with Message-ID 20170901212907.5662-1-tho...@m3y3r.de

> Until then, maybe a perl script?
> 
> $ git grep --name-only sizeof.*/.*sizeof drivers/net | \
>   xargs perl -p -i -e 
> 's/\bsizeof\s*\(\s*(\w+)\s*\)\s*\/\s*sizeof\s*\(\s*\1\s*\[\s*0\s*\]\s*\)/ARRAY_SIZE(\1)/g'
> 
> gives:
> 
> $ git diff --stat drivers/net
>  drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c   |   2 +-
>  drivers/net/ethernet/mellanox/mlx4/fw.c |   4 +--
>  drivers/net/ethernet/mellanox/mlx4/main.c   |   8 +++---
>  drivers/net/wireless/ath/ath9k/ar9003_eeprom.c  |   2 +-
>  drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phytbl_n.c | 186 
> +++---
>  5 files changed, 101 insertions(+), 101 deletions(-)

Which makes me wonder why cocci didn't found above places...
Also cocci includes linux/kernel.h if not already present.

I will give above regex a try for the whole kernel tree and check for
false positives.

with kind regards
thomas

Re: [PATCH 2/2] Bluetooth: btqcomsmd: BD address setup

2017-09-01 Thread Marcel Holtmann

Hi Rob,

>>> Bluetooth BD address can be retrieved in the same way as
>>> for wcnss-wlan MAC address. This patch mainly stores the
>>> local-mac-address property and sets the BD address during
>>> hci device setup.
>>> 
>>> Signed-off-by: Loic Poulain 
>>> Signed-off-by: Bjorn Andersson 
>>> ---
>>> drivers/bluetooth/btqcomsmd.c | 28 
>>> 1 file changed, 28 insertions(+)
>>> 
>>> diff --git a/drivers/bluetooth/btqcomsmd.c b/drivers/bluetooth/btqcomsmd.c
>>> index d00c4fdae924..443bb2099329 100644
>>> --- a/drivers/bluetooth/btqcomsmd.c
>>> +++ b/drivers/bluetooth/btqcomsmd.c
>>> @@ -26,6 +26,7 @@
>>> struct btqcomsmd {
>>>  struct hci_dev *hdev;
>>> 
>>> + const bdaddr_t *addr;
>>>  struct rpmsg_endpoint *acl_channel;
>>>  struct rpmsg_endpoint *cmd_channel;
>>> };
>>> @@ -100,6 +101,27 @@ static int btqcomsmd_close(struct hci_dev *hdev)
>>>  return 0;
>>> }
>>> 
>>> +static int btqcomsmd_setup(struct hci_dev *hdev)
>>> +{
>>> + struct btqcomsmd *btq = hci_get_drvdata(hdev);
>>> + struct sk_buff *skb;
>>> +
>>> + skb = __hci_cmd_sync(hdev, HCI_OP_RESET, 0, NULL, HCI_INIT_TIMEOUT);
>>> + if (IS_ERR(skb))
>>> + return PTR_ERR(skb);
>>> + kfree_skb(skb);
>>> +
>>> + if (btq->addr) {
>>> + bdaddr_t bdaddr;
>>> +
>>> + /* btq->addr stored with most significant byte first */
>>> + baswap(&bdaddr, btq->addr);
>>> + return qca_set_bdaddr_rome(hdev, &bdaddr);
>>> + }
>>> +
>>> + return 0;
>>> +}
>>> +
>>> static int btqcomsmd_probe(struct platform_device *pdev)
>>> {
>>>  struct btqcomsmd *btq;
>>> @@ -123,6 +145,11 @@ static int btqcomsmd_probe(struct platform_device 
>>> *pdev)
>>>  if (IS_ERR(btq->cmd_channel))
>>>  return PTR_ERR(btq->cmd_channel);
>>> 
>>> + btq->addr = of_get_property(pdev->dev.of_node, "local-mac-address",
>>> + &ret);
>>> + if (ret != sizeof(bdaddr_t))
>>> + btq->addr = NULL;
>>> +
>>>  hdev = hci_alloc_dev();
>>>  if (!hdev)
>>>  return -ENOMEM;
>>> @@ -135,6 +162,7 @@ static int btqcomsmd_probe(struct platform_device *pdev)
>>>  hdev->open = btqcomsmd_open;
>>>  hdev->close = btqcomsmd_close;
>>>  hdev->send = btqcomsmd_send;
>>> + hdev->setup = btqcomsmd_setup;
>>>  hdev->set_bdaddr = qca_set_bdaddr_rome;
>> 
>> I do not like this patch. Why not just set HCI_QUIRK_INVALID_BDADDR and let 
>> a userspace tool deal with reading the BD_ADDR from some storage.
>> 
>> Frankly I do not get this WiFI MAC address or BD_ADDR stored in DT. I 
>> assumed the DT is suppose to describe hardware and not some value that is 
>> normally retrieved for OTP or alike.
> 
> Use of "local-mac-address" for ethernet at least has existed as long
> at OpenFirmware I think. For some platforms, DT is the only OTP. And
> sometimes, the bootloader (like u-boot) stores MAC addresses and then
> populates them on boot.
> 
> Seems like if we just let userspace deal with it, then we're back to a
> btattach tool with every platform's specific way of reading the MAC
> address.

for Bluetooth that is not true. We have Set Public Address command that is 
uniquely handling this and the HCI_QUIRK_INVALID_BDADDR address does the right 
magic to allow userspace to identify a missing address. It is done nicely and 
correctly and works fine.

Mind you this is even used when there actually is a BD_ADDR, but the device 
manufacturer wants to have one from its own OUI range compared to the chip 
manufacturer’s OUI range.

If DT is really the only place for the BD_ADDR and the bootloader kinda does 
add / merge it into the DT, then by all means that is fine. However if it is 
not, then this feature is dangerous since it can lead to multiple devices with 
the same address. I rather have these devices leave the kernel in unconfigured 
mode. And then force a userspace tool to use Set Public Address to bring it 
into configured mode.

Regards

Marcel

[PATCH v2 net-next 4/4] bpf: add a test case for helper bpf_perf_prog_read_time

2017-09-01 Thread Yonghong Song

The bpf sample program trace_event is enhanced to use the new
helper to print out enabled/running time.

Signed-off-by: Yonghong Song 
---
 samples/bpf/trace_event_kern.c| 10 ++
 samples/bpf/trace_event_user.c| 13 -
 tools/testing/selftests/bpf/bpf_helpers.h |  3 +++
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/samples/bpf/trace_event_kern.c b/samples/bpf/trace_event_kern.c
index 41b6115..e93c8b1 100644
--- a/samples/bpf/trace_event_kern.c
+++ b/samples/bpf/trace_event_kern.c
@@ -37,10 +37,14 @@ struct bpf_map_def SEC("maps") stackmap = {
 SEC("perf_event")
 int bpf_prog1(struct bpf_perf_event_data *ctx)
 {
+   char time_fmt1[] = "Time Enabled: %llu, Time Running: %llu";
+   char time_fmt2[] = "Get Time Failed, ErrCode: %d";
char fmt[] = "CPU-%d period %lld ip %llx";
u32 cpu = bpf_get_smp_processor_id();
+   struct bpf_perf_time time_buf;
struct key_t key;
u64 *val, one = 1;
+   int ret;
 
if (ctx->sample_period < 1)
/* ignore warmup */
@@ -54,6 +58,12 @@ int bpf_prog1(struct bpf_perf_event_data *ctx)
return 0;
}
 
+   ret = bpf_perf_prog_read_time(ctx, (void *)&time_buf, sizeof(struct 
bpf_perf_time));
+   if (!ret)
+ bpf_trace_printk(time_fmt1, sizeof(time_fmt1), time_buf.enabled, 
time_buf.running);
+   else
+ bpf_trace_printk(time_fmt2, sizeof(time_fmt2), ret);
+
val = bpf_map_lookup_elem(&counts, &key);
if (val)
(*val)++;
diff --git a/samples/bpf/trace_event_user.c b/samples/bpf/trace_event_user.c
index 7bd827b..bf4f1b6 100644
--- a/samples/bpf/trace_event_user.c
+++ b/samples/bpf/trace_event_user.c
@@ -127,6 +127,9 @@ static void test_perf_event_all_cpu(struct perf_event_attr 
*attr)
int *pmu_fd = malloc(nr_cpus * sizeof(int));
int i, error = 0;
 
+   /* system wide perf event, no need to inherit */
+   attr->inherit = 0;
+
/* open perf_event on all cpus */
for (i = 0; i < nr_cpus; i++) {
pmu_fd[i] = sys_perf_event_open(attr, -1, i, -1, 0);
@@ -154,6 +157,11 @@ static void test_perf_event_task(struct perf_event_attr 
*attr)
 {
int pmu_fd;
 
+   /* per task perf event, enable inherit so the "dd ..." command can be 
traced properly.
+* Enabling inherit will cause bpf_perf_prog_read_time helper failure.
+*/
+   attr->inherit = 1;
+
/* open task bound event */
pmu_fd = sys_perf_event_open(attr, 0, -1, -1, 0);
if (pmu_fd < 0) {
@@ -175,14 +183,12 @@ static void test_bpf_perf_event(void)
.freq = 1,
.type = PERF_TYPE_HARDWARE,
.config = PERF_COUNT_HW_CPU_CYCLES,
-   .inherit = 1,
};
struct perf_event_attr attr_type_sw = {
.sample_freq = SAMPLE_FREQ,
.freq = 1,
.type = PERF_TYPE_SOFTWARE,
.config = PERF_COUNT_SW_CPU_CLOCK,
-   .inherit = 1,
};
struct perf_event_attr attr_hw_cache_l1d = {
.sample_freq = SAMPLE_FREQ,
@@ -192,7 +198,6 @@ static void test_bpf_perf_event(void)
PERF_COUNT_HW_CACHE_L1D |
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
(PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16),
-   .inherit = 1,
};
struct perf_event_attr attr_hw_cache_branch_miss = {
.sample_freq = SAMPLE_FREQ,
@@ -202,7 +207,6 @@ static void test_bpf_perf_event(void)
PERF_COUNT_HW_CACHE_BPU |
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
(PERF_COUNT_HW_CACHE_RESULT_MISS << 16),
-   .inherit = 1,
};
struct perf_event_attr attr_type_raw = {
.sample_freq = SAMPLE_FREQ,
@@ -210,7 +214,6 @@ static void test_bpf_perf_event(void)
.type = PERF_TYPE_RAW,
/* Intel Instruction Retired */
.config = 0xc0,
-   .inherit = 1,
};
 
printf("Test HW_CPU_CYCLES\n");
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h 
b/tools/testing/selftests/bpf/bpf_helpers.h
index fe41852..ddad690 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -74,6 +74,9 @@ static int (*bpf_perf_read_counter_time)(void *map, unsigned 
long long flags,
   void *counter_time_buf,
   unsigned int buf_size) =
(void *) BPF_FUNC_perf_read_counter_time;
+static int (*bpf_perf_prog_read_time)(void *ctx, void *time_buf,
+ unsigned int size) =
+   (void *) BPF_FUNC_perf_prog_read_time;
 
 
 /* llvm builtin functions that eBPF C program may use to
-- 
2.9.5

[PATCH v2 net-next 1/4] bpf: add helper bpf_perf_read_counter_time for perf event array map

2017-09-01 Thread Yonghong Song

Hardware pmu counters are limited resources. When there are more
pmu based perf events opened than available counters, kernel will
multiplex these events so each event gets certain percentage
(but not 100%) of the pmu time. In case that multiplexing happens,
the number of samples or counter value will not reflect the
case compared to no multiplexing. This makes comparison between
different runs difficult.

Typically, the number of samples or counter value should be
normalized before comparing to other experiments. The typical
normalization is done like:
  normalized_num_samples = num_samples * time_enabled / time_running
  normalized_counter_value = counter_value * time_enabled / time_running
where time_enabled is the time enabled for event and time_running is
the time running for event since last normalization.

This patch adds helper bpf_perf_read_counter_time for kprobed based perf
event array map, to read perf counter and enabled/running time.
The enabled/running time is accumulated since the perf event open.
To achieve scaling factor between two bpf invocations, users
can can use cpu_id as the key (which is typical for perf array usage model)
to remember the previous value and do the calculation inside the
bpf program.

Signed-off-by: Yonghong Song 
---
 include/linux/perf_event.h |  3 ++-
 include/uapi/linux/bpf.h   | 21 -
 kernel/bpf/arraymap.c  |  2 +-
 kernel/bpf/verifier.c  |  4 +++-
 kernel/events/core.c   | 19 +--
 kernel/trace/bpf_trace.c   | 44 
 6 files changed, 79 insertions(+), 14 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index b14095b..5a50808 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -898,7 +898,8 @@ perf_event_create_kernel_counter(struct perf_event_attr 
*attr,
void *context);
 extern void perf_pmu_migrate_context(struct pmu *pmu,
int src_cpu, int dst_cpu);
-int perf_event_read_local(struct perf_event *event, u64 *value);
+int perf_event_read_local(struct perf_event *event, u64 *value,
+ u64 *enabled, u64 *running);
 extern u64 perf_event_read_value(struct perf_event *event,
 u64 *enabled, u64 *running);
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ba848b7..9c23bef 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -582,6 +582,14 @@ union bpf_attr {
  * @map: pointer to sockmap to update
  * @key: key to insert/update sock in map
  * @flags: same flags as map update elem
+ *
+ * int bpf_perf_read_counter_time(map, flags, counter_time_buf, buf_size)
+ * read perf event counter value and perf event enabled/running time
+ * @map: pointer to perf_event_array map
+ * @flags: index of event in the map or bitmask flags
+ * @counter_time_buf: buf to fill
+ * @buf_size: size of the counter_time_buf
+ * Return: 0 on success or negative error code
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -638,6 +646,7 @@ union bpf_attr {
FN(redirect_map),   \
FN(sk_redirect_map),\
FN(sock_map_update),\
+   FN(perf_read_counter_time), \
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -681,7 +690,8 @@ enum bpf_func_id {
 #define BPF_F_ZERO_CSUM_TX (1ULL << 1)
 #define BPF_F_DONT_FRAGMENT(1ULL << 2)
 
-/* BPF_FUNC_perf_event_output and BPF_FUNC_perf_event_read flags. */
+/* BPF_FUNC_perf_event_output, BPF_FUNC_perf_event_read and
+ * BPF_FUNC_perf_read_counter_time flags. */
 #define BPF_F_INDEX_MASK   0xULL
 #define BPF_F_CURRENT_CPU  BPF_F_INDEX_MASK
 /* BPF_FUNC_perf_event_output for sk_buff input context. */
@@ -864,4 +874,13 @@ enum {
 #define TCP_BPF_IW 1001/* Set TCP initial congestion window */
 #define TCP_BPF_SNDCWND_CLAMP  1002/* Set sndcwnd_clamp */
 
+struct bpf_perf_time {
+   __u64 enabled;
+   __u64 running;
+};
+struct bpf_perf_counter_time {
+   __u64 counter;
+   struct bpf_perf_time time;
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 98c0f00..68d8666 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -492,7 +492,7 @@ static void *perf_event_fd_array_get_ptr(struct bpf_map 
*map,
 
ee = ERR_PTR(-EOPNOTSUPP);
event = perf_file->private_data;
-   if (perf_event_read_local(event, &value) == -EOPNOTSUPP)
+   if (perf_event_read_local(event, &value, NULL, NULL) == -EOPNOTSUPP)
goto err_out;
 
ee = bpf_event_entry_gen(perf_file, map_file);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d690c7d..c4d29e3 100644

[PATCH v2 net-next 0/4] bpf: add two helpers to read perf event enabled/running time

2017-09-01 Thread Yonghong Song

Hardware pmu counters are limited resources. When there are more
pmu based perf events opened than available counters, kernel will
multiplex these events so each event gets certain percentage
(but not 100%) of the pmu time. In case that multiplexing happens,
the number of samples or counter value will not reflect the
case compared to no multiplexing. This makes comparison between
different runs difficult.

Typically, the number of samples or counter value should be
normalized before comparing to other experiments. The typical
normalization is done like:
  normalized_num_samples = num_samples * time_enabled / time_running
  normalized_counter_value = counter_value * time_enabled / time_running
where time_enabled is the time enabled for event and time_running is
the time running for event since last normalization.

This patch set implements two helper functions.
The helper bpf_perf_read_counter_time reads counter/time_enabled/time_running
for perf event array map. The helper bpf_perf_prog_read_time read
time_enabled/time_running for bpf prog with type BPF_PROG_TYPE_PERF_EVENT.

Yonghong Song (4):
  bpf: add helper bpf_perf_read_counter_time for perf event array map
  bpf: add a test case to read enabled/running time for perf array
  bpf: add helper bpf_perf_prog_read_time
  bpf: add a test case for helper bpf_perf_prog_read_time

 include/linux/perf_event.h|  4 +-
 include/uapi/linux/bpf.h  | 29 -
 kernel/bpf/arraymap.c |  2 +-
 kernel/bpf/verifier.c |  4 +-
 kernel/events/core.c  | 20 ++---
 kernel/trace/bpf_trace.c  | 67 +--
 samples/bpf/trace_event_kern.c| 10 +
 samples/bpf/trace_event_user.c| 13 +++---
 samples/bpf/tracex6_kern.c| 26 
 samples/bpf/tracex6_user.c| 13 +-
 tools/testing/selftests/bpf/bpf_helpers.h |  7 
 11 files changed, 175 insertions(+), 20 deletions(-)

-- 
2.9.5

[PATCH v2 net-next 3/4] bpf: add helper bpf_perf_prog_read_time

2017-09-01 Thread Yonghong Song

This patch adds helper bpf_perf_prog_read_time for perf event based bpf
programs, to read event enabled/running time.
The enabled/running time is accumulated since the perf event open.

The typical use case for perf event based bpf program is to attach itself
to a single event. In such cases, if it is desirable to get scaling factor
between two bpf invocations, users can can save the time values in a map,
and use the value from the map and the current value to calculate
the scaling factor.

Signed-off-by: Yonghong Song 
---
 include/linux/perf_event.h |  1 +
 include/uapi/linux/bpf.h   |  8 
 kernel/events/core.c   |  1 +
 kernel/trace/bpf_trace.c   | 23 +++
 4 files changed, 33 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 5a50808..6756ae7 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -821,6 +821,7 @@ struct perf_output_handle {
 struct bpf_perf_event_data_kern {
struct pt_regs *regs;
struct perf_sample_data *data;
+   struct perf_event *event;
 };
 
 #ifdef CONFIG_CGROUP_PERF
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 9c23bef..1ae55c8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -590,6 +590,13 @@ union bpf_attr {
  * @counter_time_buf: buf to fill
  * @buf_size: size of the counter_time_buf
  * Return: 0 on success or negative error code
+ *
+ * int bpf_perf_prog_read_time(ctx, time_buf, buf_size)
+ * Read perf event enabled and running time
+ * @ctx: pointer to ctx
+ * @time_buf: buf to fill
+ * @buf_size: size of the time_buf
+ * Return : 0 on success or negative error code
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -647,6 +654,7 @@ union bpf_attr {
FN(sk_redirect_map),\
FN(sock_map_update),\
FN(perf_read_counter_time), \
+   FN(perf_prog_read_time),\
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 20c4039..338f564 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8026,6 +8026,7 @@ static void bpf_overflow_handler(struct perf_event *event,
struct bpf_perf_event_data_kern ctx = {
.data = data,
.regs = regs,
+   .event = event,
};
int ret = 0;
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 7ef953f..89b0744 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -603,6 +603,18 @@ BPF_CALL_3(bpf_get_stackid_tp, void *, tp_buff, struct 
bpf_map *, map,
   flags, 0, 0);
 }
 
+BPF_CALL_3(bpf_perf_prog_read_time_tp, void *, ctx, struct bpf_perf_time *,
+   time_buf, u32, size)
+{
+   struct bpf_perf_event_data_kern *kctx = (struct 
bpf_perf_event_data_kern *)ctx;
+
+   if (size != sizeof(struct bpf_perf_time))
+   return -EINVAL;
+
+   return perf_event_read_local(kctx->event, NULL, &time_buf->enabled,
+&time_buf->running);
+}
+
 static const struct bpf_func_proto bpf_get_stackid_proto_tp = {
.func   = bpf_get_stackid_tp,
.gpl_only   = true,
@@ -612,6 +624,15 @@ static const struct bpf_func_proto 
bpf_get_stackid_proto_tp = {
.arg3_type  = ARG_ANYTHING,
 };
 
+static const struct bpf_func_proto bpf_perf_prog_read_time_proto_tp = {
+ .func   = bpf_perf_prog_read_time_tp,
+ .gpl_only   = true,
+ .ret_type   = RET_INTEGER,
+ .arg1_type  = ARG_PTR_TO_CTX,
+ .arg2_type  = ARG_PTR_TO_UNINIT_MEM,
+ .arg3_type  = ARG_CONST_SIZE,
+};
+
 static const struct bpf_func_proto *tp_prog_func_proto(enum bpf_func_id 
func_id)
 {
switch (func_id) {
@@ -619,6 +640,8 @@ static const struct bpf_func_proto *tp_prog_func_proto(enum 
bpf_func_id func_id)
return &bpf_perf_event_output_proto_tp;
case BPF_FUNC_get_stackid:
return &bpf_get_stackid_proto_tp;
+   case BPF_FUNC_perf_prog_read_time:
+   return &bpf_perf_prog_read_time_proto_tp;
default:
return tracing_func_proto(func_id);
}
-- 
2.9.5

[PATCH v2 net-next 2/4] bpf: add a test case to read enabled/running time for perf array

2017-09-01 Thread Yonghong Song

The bpf sample program tracex6 is enhanced to use the new
helper to read enabled/running time as well.

Signed-off-by: Yonghong Song 
---
 samples/bpf/tracex6_kern.c| 26 ++
 samples/bpf/tracex6_user.c| 13 -
 tools/testing/selftests/bpf/bpf_helpers.h |  4 
 3 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/samples/bpf/tracex6_kern.c b/samples/bpf/tracex6_kern.c
index e7d1803..46acfef 100644
--- a/samples/bpf/tracex6_kern.c
+++ b/samples/bpf/tracex6_kern.c
@@ -15,6 +15,12 @@ struct bpf_map_def SEC("maps") values = {
.value_size = sizeof(u64),
.max_entries = 64,
 };
+struct bpf_map_def SEC("maps") values2 = {
+   .type = BPF_MAP_TYPE_HASH,
+   .key_size = sizeof(int),
+   .value_size = sizeof(struct bpf_perf_counter_time),
+   .max_entries = 64,
+};
 
 SEC("kprobe/htab_map_get_next_key")
 int bpf_prog1(struct pt_regs *ctx)
@@ -37,5 +43,25 @@ int bpf_prog1(struct pt_regs *ctx)
return 0;
 }
 
+SEC("kprobe/htab_map_lookup_elem")
+int bpf_prog2(struct pt_regs *ctx)
+{
+   u32 key = bpf_get_smp_processor_id();
+   struct bpf_perf_counter_time *val, buf;
+   int error;
+
+   error = bpf_perf_read_counter_time(&counters, key, &buf, sizeof(buf));
+   if (error)
+   return 0;
+
+   val = bpf_map_lookup_elem(&values2, &key);
+   if (val)
+   *val = buf;
+   else
+   bpf_map_update_elem(&values2, &key, &buf, BPF_NOEXIST);
+
+   return 0;
+}
+
 char _license[] SEC("license") = "GPL";
 u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/samples/bpf/tracex6_user.c b/samples/bpf/tracex6_user.c
index a05a99a..2a0c5d8 100644
--- a/samples/bpf/tracex6_user.c
+++ b/samples/bpf/tracex6_user.c
@@ -22,6 +22,7 @@
 
 static void check_on_cpu(int cpu, struct perf_event_attr *attr)
 {
+   struct bpf_perf_counter_time value2;
int pmu_fd, error = 0;
cpu_set_t set;
__u64 value;
@@ -46,8 +47,18 @@ static void check_on_cpu(int cpu, struct perf_event_attr 
*attr)
fprintf(stderr, "Value missing for CPU %d\n", cpu);
error = 1;
goto on_exit;
+   } else {
+   fprintf(stderr, "CPU %d: %llu\n", cpu, value);
+   }
+   /* The above bpf_map_lookup_elem should trigger the second kprobe */
+   if (bpf_map_lookup_elem(map_fd[2], &cpu, &value2)) {
+   fprintf(stderr, "Value2 missing for CPU %d\n", cpu);
+   error = 1;
+   goto on_exit;
+   } else {
+   fprintf(stderr, "CPU %d: counter: %llu, enabled: %llu, running: 
%llu\n", cpu,
+   value2.counter, value2.time.enabled, 
value2.time.running);
}
-   fprintf(stderr, "CPU %d: %llu\n", cpu, value);
 
 on_exit:
assert(bpf_map_delete_elem(map_fd[0], &cpu) == 0 || error);
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h 
b/tools/testing/selftests/bpf/bpf_helpers.h
index 36fb916..fe41852 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -70,6 +70,10 @@ static int (*bpf_sk_redirect_map)(void *map, int key, int 
flags) =
 static int (*bpf_sock_map_update)(void *map, void *key, void *value,
  unsigned long long flags) =
(void *) BPF_FUNC_sock_map_update;
+static int (*bpf_perf_read_counter_time)(void *map, unsigned long long flags,
+  void *counter_time_buf,
+  unsigned int buf_size) =
+   (void *) BPF_FUNC_perf_read_counter_time;
 
 
 /* llvm builtin functions that eBPF C program may use to
-- 
2.9.5

Re: [PATCH] ipv6: sr: Use ARRAY_SIZE macro

2017-09-01 Thread Joe Perches

On Fri, 2017-09-01 at 18:35 -0700, David Miller wrote:
> From: Thomas Meyer 
> Date: Thu, 31 Aug 2017 16:18:15 +0200
> 
> > Grepping for "sizeof\(.+\) / sizeof\(" found this as one of the first
> > candidates.
> > Maybe a coccinelle can catch all of those.

Umm: try scripts/coccinelle/misc/array_size.cocci

Until then, maybe a perl script?

$ git grep --name-only sizeof.*/.*sizeof drivers/net | \
  xargs perl -p -i -e 
's/\bsizeof\s*\(\s*(\w+)\s*\)\s*\/\s*sizeof\s*\(\s*\1\s*\[\s*0\s*\]\s*\)/ARRAY_SIZE(\1)/g'

gives:

$ git diff --stat drivers/net
 drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c   |   2 +-
 drivers/net/ethernet/mellanox/mlx4/fw.c |   4 +--
 drivers/net/ethernet/mellanox/mlx4/main.c   |   8 +++---
 drivers/net/wireless/ath/ath9k/ar9003_eeprom.c  |   2 +-
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phytbl_n.c | 186 
+++---
 5 files changed, 101 insertions(+), 101 deletions(-)

Re: [PATCH net-next, 0/4] cleanups and fixes of channel settings

2017-09-01 Thread David Miller

From: Haiyang Zhang 
Date: Fri,  1 Sep 2017 14:30:03 -0700

> This patch set cleans up some unused variables, unnecessary checks.
> Also fixed some limit checking of channel number.

Series applied.

Re: [PATCH net-next] net: Add module reference to FIB notifiers

2017-09-01 Thread David Miller

From: Ido Schimmel 
Date: Fri,  1 Sep 2017 12:15:17 +0300

> When a listener registers to the FIB notification chain it receives a
> dump of the FIB entries and rules from existing address families by
> invoking their dump operations.
> 
> While we call into these modules we need to make sure they aren't
> removed. Do that by increasing their reference count before invoking
> their dump operations and decrease it afterwards.
> 
> Fixes: 04b1d4e50e82 ("net: core: Make the FIB notification chain generic")
> Signed-off-by: Ido Schimmel 
> Reviewed-by: Jiri Pirko 

Oops, yes, you'll need to do this.

Applied, thanks.

Re: [PATCH net-next 0/2] netvsc: transparent VF related cleanups

2017-09-01 Thread David Miller

From: Stephen Hemminger 
Date: Thu, 31 Aug 2017 16:16:11 -0700

> The first gets rid of unnecessary ref counting, and second
> allows removing hv_netvsc driver even if VF present.

Series applied.

Re: [net-next PATCH] bpf: sockmap update/simplify memory accounting scheme

2017-09-01 Thread David Miller

From: John Fastabend 
Date: Fri, 01 Sep 2017 11:29:26 -0700

> Instead of tracking wmem_queued and sk_mem_charge by incrementing
> in the verdict SK_REDIRECT paths and decrementing in the tx work
> path use skb_set_owner_w and sock_writeable helpers. This solves
> a few issues with the current code. First, in SK_REDIRECT inc on
> sk_wmem_queued and sk_mem_charge were being done without the peers
> sock lock being held. Under stress this can result in accounting
> errors when tx work and/or multiple verdict decisions are working
> on the peer psock.
> 
> Additionally, this cleans up the code because we can rely on the
> default destructor to decrement memory accounting on kfree_skb. Also
> this will trigger sk_write_space when space becomes available on
> kfree_skb() which wasn't happening before and prevent __sk_free
> from being called until all in-flight packets are completed.
> 
> Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
> Signed-off-by: John Fastabend 
> Acked-by: Daniel Borkmann 

Applied.

Re: [PATCH v2 net-next 0/2] net: ubuf_info.refcnt conversion

2017-09-01 Thread David Miller

From: Eric Dumazet 
Date: Fri, 01 Sep 2017 10:36:29 -0700

> On Thu, 2017-08-31 at 17:04 -0700, Eric Dumazet wrote:
>> On Thu, 2017-08-31 at 16:48 -0700, Eric Dumazet wrote:
>> > Yet another atomic_t -> refcount_t conversion, split in two patches.
>> > 
>> > First patch prepares the automatic conversion done in the second patch.
>> > 
>> > Eric Dumazet (2):
>> >   net: prepare (struct ubuf_info)->refcnt conversion
>> >   net: convert (struct ubuf_info)->refcnt to refcount_t
>> > 
>> >  drivers/vhost/net.c|  2 +-
>> >  include/linux/skbuff.h |  5 +++--
>> >  net/core/skbuff.c  | 14 --
>> >  net/ipv4/tcp.c |  2 --
>> >  4 files changed, 8 insertions(+), 15 deletions(-)
>> > 
>> 
>> David please ignore this series, I will send a V3 :)
>> 
> 
> No need for a V3, sorry for the confusion, but we had to double check
> with Willem that everything had been covered.
> 
> Please tell me if I need to resend, thanks !

Ok, series applied, thanks Eric.

Re: [PATCH net-next] net: systemport: Correctly set TSB endian for host

2017-09-01 Thread David Miller

From: Florian Fainelli 
Date: Fri,  1 Sep 2017 17:32:34 -0700

> Similarly to how we configure the RSB (Receive Status Block) we also
> need to set the TSB (Transmit Status Block) based on the host endian.
> This was missing from the commit indicated below.
> 
> Fixes: 389a06bc534e ("net: systemport: Set correct RSB endian bits based on 
> host")
> Signed-off-by: Florian Fainelli 

Applied, thanks Florian.

Re: netdev carrier changes is one even after ethernet link up.

2017-09-01 Thread Florian Fainelli

On 08/31/2017 10:49 PM, Bhadram Varka wrote:
> Thanks for responding. Now responding inline
> 
>> -Original Message-
>> From: Florian Fainelli [mailto:f.faine...@gmail.com]
>> Sent: Friday, September 01, 2017 5:53 AM
>> To: Bhadram Varka ; and...@lunn.ch
>> Cc: linux-netdev 
>> Subject: Re: netdev carrier changes is one even after ethernet link up.
>>
>> On 08/30/2017 10:53 PM, Bhadram Varka wrote:
>>> Hi,
>>>
>>>
>>>
>>> I have observed that carrier_changes is one even in case of the
>>> ethernet link is up.
>>>
>>>
>>>
>>> After investigating the code below is my observation –
>>>
>>>
>>>
>>> ethernet_driver_probe()
>>>
>>> +--->phy_connect()
>>>
>>> | +--->phy_attach_direct()
>>>
>>> |   +---> netif_carrier_off(): which increments
>>> carrier_changes to one.
>>>
>>> +--->register_netdevice() : will the carrier_changes becomes zero here ?
>>>
>>> +--->netif_carrier_off(): not increment the carrier_changes since
>>> __LINK_STATE_NOCARRIER already set.
>>>
>>>
>>>
>>> From ethernet driver open will start the PHY and trigger the
>>> phy_state_machine.
>>>
>>> Phy_state_machine workqueue calling netif_carrier_on() once the link is
>> UP.
>>>
>>> netif_carrier_on() increments the carrier_changes by one.
>>
>> If the call trace is correct, then there is at least two problems here:
>>
>> - phy_connect() does start the PHY machine which means that as soon as it
>> detects a link state of any kind (up or down) it can call
>> netif_carrier_off() respectively netif_carrier_on()
>>
>> - as soon as you call register_netdevice() notifiers run and other parts of 
>> the
>> kernel or user-space programs can see an inconsistent link state
>>
>> I would suggest doing the following sequence instead:
>>
>> netif_carrier_off()
>> register_netdevice()
>> phy_connect()
>>
>> Which should result in a consistent link state and carrier value.
>>
> Yes, It will address the issue. 
> 
> If we did the phy_conect in ndo_open it will make the carrier changes as two. 
> But if we did in probe function then it's not working.
> 
> In ethernet driver probe - (below sequence is not working)
> phy_connect()
> register_netdevice()
> netif_carrier_off()
> 
> working sequence:
> In probe():
> register_netdevice()
> ndo_open:
>phy_connect()
> 
> After reverting - https://lkml.org/lkml/2016/1/9/173 this works if we do 
> phy_connect in probe as well.

But as mentioned before you should not be doing the PHY probe in your
driver's probe function for different reasons:

- the probe function's responsibility is to initialize the driver and
the HW to a state where they both have everything needed but it should
be in quiesced state. There is no guarantee that your network device may
ever be used after probe unless something calls ndo_open(), you should
therefore keep all resources to a minimum: memory allocated, HW powered
down etc.

- there is a race condition between the PHY state machine started in
phy_connect(), and when register_netdevice() is called and notifiers
running which can lead to an inconsistent state for the carrier

So considering that your driver does not do that, I am not sure what you
are expecting...
-- 
Florian

Re: [PATCH net-next v5 0/2] report TCP MD5 signing keys and addresses

2017-09-01 Thread David Miller

From: Ivan Delalande 
Date: Thu, 31 Aug 2017 09:59:37 -0700

> Allow userspace to retrieve MD5 signature keys and addresses configured
> on TCP sockets through inet_diag.
 ...

Series applied to net-next, thanks.

Re: [PATCH][net-next] net: qualcomm: rmnet: remove unused variable priv

2017-09-01 Thread David Miller

From: Colin King 
Date: Thu, 31 Aug 2017 15:07:27 +0100

> From: Colin Ian King 
> 
> priv is being assigned but is never used, so remove it.
> 
> Cleans up clang build warning:
> "warning: Value stored to 'priv' is never read"
> 
> Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial 
> implementation")
> Signed-off-by: Colin Ian King 

Applied.

Re: [PATCH] ipv6: sr: Use ARRAY_SIZE macro

2017-09-01 Thread David Miller

From: Thomas Meyer 
Date: Thu, 31 Aug 2017 16:18:15 +0200

> Grepping for "sizeof\(.+\) / sizeof\(" found this as one of the first
> candidates.
> Maybe a coccinelle can catch all of those.
> 
> Signed-off-by: Thomas Meyer 

Applied, thanks.

Re: [PATCH] net: phy: bcm7xxx: make array bcm7xxx_suspend_cfg static, reduces object code size

2017-09-01 Thread David Miller

From: Colin King 
Date: Thu, 31 Aug 2017 14:57:15 +0100

> From: Colin Ian King 
> 
> Don't populate the array bcm7xxx_suspend_cfg A on the stack, instead
> make it static.  Makes the object code smaller by over 300 bytes:
> 
> Before:
>text  data bss dec hex filename
>6351  8146   0   1449738a1 drivers/net/phy/bcm7xxx.o
> 
> After:
>text  data bss dec hex filename
>5986  8210   0   141963774 drivers/net/phy/bcm7xxx.o
> 
> Signed-off-by: Colin Ian King 

Applied.

Re: [PATCH net-next v6] net: stmmac: Delete dead code for MDIO registration

2017-09-01 Thread David Miller

From: Romain Perier 
Date: Thu, 31 Aug 2017 15:53:03 +0200

> This code is no longer used, the logging function was changed by commit
> fbca164776e4 ("net: stmmac: Use the right logging function in 
> stmmac_mdio_register").
> It was previously showing information about the type of the IRQ, if it's
> polled, ignored or a normal interrupt. As we don't want information loss,
> I have moved this code to phy_attached_print().
> 
> Fixes: fbca164776e4 ("net: stmmac: Use the right logging function in 
> stmmac_mdio_register")
> Signed-off-by: Romain Perier 

You'll need to respin this against net-next as phy_attached_print() has had
some changes recently.

Thanks.

Re: [PATCH] net: ethernet: ibm-emac: Add 5482 PHY init for OpenBlocks 600

2017-09-01 Thread Benjamin Herrenschmidt

On Fri, 2017-09-01 at 17:35 -0700, Florian Fainelli wrote:
> On 08/31/2017 09:44 PM, Benjamin Herrenschmidt wrote:
> > The vendor patches initialize those registers to get the
> > PHY working properly.
> > 
> > Sadly I don't have that PHY spec and whatever Broadcom PHY
> > code we already have don't seem to document these two shadow
> > registers (unless I miscalculated the address) so I'm keeping
> > this as "vendor magic for that board". The vendor has long
> > abandoned that product, but I find it handy to test ppc405
> > kernels and so would like to keep it alive upstream :-)
> > 
> > Signed-off-by: Benjamin Herrenschmidt 
> > ---
> > 
> > Note: Ideally, the whole driver should switch over to the
> > generic PHY layer. However this is a much bigger undertaking
> > which requires access to a bunch of HW to test, and for which
> > I have neither the time nor the HW available these days.
> 
> Yes it sure does and the function names are so close, it is almost
> irresistible not to do it.

I think there's some common ancestry :-)

That said, I'm weary of doing it without proper testing, especially
those old cell blades which I'm not sure I still have a functional
one, and whatever is using gpcs...

Cheers,
Ben.

> 
> > 
> > (Some of the HW could prove hard to find ...)
> > ---
> >  drivers/net/ethernet/ibm/emac/phy.c | 30 ++
> >  1 file changed, 30 insertions(+)
> > 
> > diff --git a/drivers/net/ethernet/ibm/emac/phy.c 
> > b/drivers/net/ethernet/ibm/emac/phy.c
> > index 35865d05fccd..daa10de542fb 100644
> > --- a/drivers/net/ethernet/ibm/emac/phy.c
> > +++ b/drivers/net/ethernet/ibm/emac/phy.c
> > @@ -24,6 +24,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include "emac.h"
> >  #include "phy.h"
> > @@ -363,6 +364,34 @@ static struct mii_phy_def bcm5248_phy_def = {
> > .ops= &generic_phy_ops
> >  };
> >  
> > +static int bcm5482_init(struct mii_phy *phy)
> > +{
> > +   if (!of_machine_is_compatible("plathome,obs600"))
> > +   return 0;
> 
> You can probably include brcmphy.h and pull the definition for at least
> 0x1c: MII_BCM54XX_SHD

Yup.

> > +
> > +   /* Magic inits from vendor original patches */
> > +   phy_write(phy, 0x1c, 0xa410);
> 
> What you are doing here is write to shadow register 9 (9 << 10) which is
> the LED control register, and making the activity LED be driven on
> activity/link as opposed to just activity. So this can probably be
> written as:

Ok so I really don't *need* that in fact.


>   phy_write(phy, MII_BCM54XX_SHD, MII_BCM54XX_SHD_WRITE |
> MII_BCM54XX_SHD_VAL(9) | MII_BCM54XX_SHD_DATA(BIT(4));
> 
> > +   phy_write(phy, 0x1c, 0x8804);
> 
> And here you are writing to the spare control 1 register and setting bit
> 2 (which appears reserved but this is not clear) which would be enabling
> the activity LED for 10BaseT or no link which can be written as:
> 
>   phy_write(phy, MII_BCM54XX_SHD, MII_BCM54XX_SHD_WRITE |
> MII_BCM54XX_SHD_VAL(2) | MII_BCM4XX_SHD_DATA(BIT(2));
> 
> So basically you are touching registers that only affect LED
> configuration and should not be doing anything else...

I wonder if I need to bother at all then. I was worried it was related
to actual function of the device, but if it's just LEDs, I think I may
as well just drop it.

> > +
> > +   return 0;
> > +}
> > +
> > +static const struct mii_phy_ops bcm5482_phy_ops = {
> > +   .init   = bcm5482_init,
> > +   .setup_aneg = genmii_setup_aneg,
> > +   .setup_forced   = genmii_setup_forced,
> > +   .poll_link  = genmii_poll_link,
> > +   .read_link  = genmii_read_link
> > +};
> > +
> > +static struct mii_phy_def bcm5482_phy_def = {
> > +
> > +   .phy_id = 0x0143bcb0,
> > +   .phy_id_mask= 0x0ff0,
> > +   .name   = "BCM5482 Gigabit Ethernet",
> > +   .ops= &bcm5482_phy_ops
> > +};
> > +
> >  static int m88e_init(struct mii_phy *phy)
> >  {
> > pr_debug("%s: Marvell 88E Ethernet\n", __func__);
> > @@ -499,6 +528,7 @@ static struct mii_phy_def *mii_phy_table[] = {
> > &et1011c_phy_def,
> > &cis8201_phy_def,
> > &bcm5248_phy_def,
> > +   &bcm5482_phy_def,
> > &m88e_phy_def,
> > &m88e1112_phy_def,
> > &ar8035_phy_def,
> > 
> 
>

[PATCH net-next] net: systemport: Correctly set TSB endian for host

2017-09-01 Thread Florian Fainelli

Similarly to how we configure the RSB (Receive Status Block) we also
need to set the TSB (Transmit Status Block) based on the host endian.
This was missing from the commit indicated below.

Fixes: 389a06bc534e ("net: systemport: Set correct RSB endian bits based on 
host")
Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 13 +
 drivers/net/ethernet/broadcom/bcmsysport.h |  3 ++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
b/drivers/net/ethernet/broadcom/bcmsysport.c
index 931751e4f369..ef13b6041ef1 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1390,6 +1390,19 @@ static int bcm_sysport_init_tx_ring(struct 
bcm_sysport_priv *priv,
tdma_writel(priv, RING_IGNORE_STATUS, TDMA_DESC_RING_MAPPING(index));
tdma_writel(priv, 0, TDMA_DESC_RING_PCP_DEI_VID(index));
 
+   /* Do not use tdma_control_bit() here because TSB_SWAP1 collides
+* with the original definition of ACB_ALGO
+*/
+   reg = tdma_readl(priv, TDMA_CONTROL);
+   if (priv->is_lite)
+   reg &= ~BIT(TSB_SWAP1);
+   /* Set a correct TSB format based on host endian */
+   if (!IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
+   reg |= tdma_control_bit(priv, TSB_SWAP0);
+   else
+   reg &= ~tdma_control_bit(priv, TSB_SWAP0);
+   tdma_writel(priv, reg, TDMA_CONTROL);
+
/* Program the number of descriptors as MAX_THRESHOLD and half of
 * its size for the hysteresis trigger
 */
diff --git a/drivers/net/ethernet/broadcom/bcmsysport.h 
b/drivers/net/ethernet/broadcom/bcmsysport.h
index 80b463b7..82e401df199e 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.h
+++ b/drivers/net/ethernet/broadcom/bcmsysport.h
@@ -449,7 +449,8 @@ struct bcm_rsb {
 /* Uses 2 bits on SYSTEMPORT Lite and shifts everything by 1 bit, we
  * keep the SYSTEMPORT layout here and adjust with tdma_control_bit()
  */
-#define  TSB_SWAP  2
+#define  TSB_SWAP0 2
+#define  TSB_SWAP1 3
 #define  ACB_ALGO  3
 #define  BUF_DATA_OFFSET_SHIFT 4
 #define  BUF_DATA_OFFSET_MASK  0x3ff
-- 
1.9.1

Re: [PATCH] net: ethernet: ibm-emac: Add 5482 PHY init for OpenBlocks 600

2017-09-01 Thread Florian Fainelli

On 08/31/2017 09:44 PM, Benjamin Herrenschmidt wrote:
> The vendor patches initialize those registers to get the
> PHY working properly.
> 
> Sadly I don't have that PHY spec and whatever Broadcom PHY
> code we already have don't seem to document these two shadow
> registers (unless I miscalculated the address) so I'm keeping
> this as "vendor magic for that board". The vendor has long
> abandoned that product, but I find it handy to test ppc405
> kernels and so would like to keep it alive upstream :-)
> 
> Signed-off-by: Benjamin Herrenschmidt 
> ---
> 
> Note: Ideally, the whole driver should switch over to the
> generic PHY layer. However this is a much bigger undertaking
> which requires access to a bunch of HW to test, and for which
> I have neither the time nor the HW available these days.

Yes it sure does and the function names are so close, it is almost
irresistible not to do it.

> 
> (Some of the HW could prove hard to find ...)
> ---
>  drivers/net/ethernet/ibm/emac/phy.c | 30 ++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/drivers/net/ethernet/ibm/emac/phy.c 
> b/drivers/net/ethernet/ibm/emac/phy.c
> index 35865d05fccd..daa10de542fb 100644
> --- a/drivers/net/ethernet/ibm/emac/phy.c
> +++ b/drivers/net/ethernet/ibm/emac/phy.c
> @@ -24,6 +24,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "emac.h"
>  #include "phy.h"
> @@ -363,6 +364,34 @@ static struct mii_phy_def bcm5248_phy_def = {
>   .ops= &generic_phy_ops
>  };
>  
> +static int bcm5482_init(struct mii_phy *phy)
> +{
> + if (!of_machine_is_compatible("plathome,obs600"))
> + return 0;

You can probably include brcmphy.h and pull the definition for at least
0x1c: MII_BCM54XX_SHD

> +
> + /* Magic inits from vendor original patches */
> + phy_write(phy, 0x1c, 0xa410);

What you are doing here is write to shadow register 9 (9 << 10) which is
the LED control register, and making the activity LED be driven on
activity/link as opposed to just activity. So this can probably be
written as:

phy_write(phy, MII_BCM54XX_SHD, MII_BCM54XX_SHD_WRITE |
  MII_BCM54XX_SHD_VAL(9) | MII_BCM54XX_SHD_DATA(BIT(4));

> + phy_write(phy, 0x1c, 0x8804);

And here you are writing to the spare control 1 register and setting bit
2 (which appears reserved but this is not clear) which would be enabling
the activity LED for 10BaseT or no link which can be written as:

phy_write(phy, MII_BCM54XX_SHD, MII_BCM54XX_SHD_WRITE |
  MII_BCM54XX_SHD_VAL(2) | MII_BCM4XX_SHD_DATA(BIT(2));

So basically you are touching registers that only affect LED
configuration and should not be doing anything else...

> +
> + return 0;
> +}
> +
> +static const struct mii_phy_ops bcm5482_phy_ops = {
> + .init   = bcm5482_init,
> + .setup_aneg = genmii_setup_aneg,
> + .setup_forced   = genmii_setup_forced,
> + .poll_link  = genmii_poll_link,
> + .read_link  = genmii_read_link
> +};
> +
> +static struct mii_phy_def bcm5482_phy_def = {
> +
> + .phy_id = 0x0143bcb0,
> + .phy_id_mask= 0x0ff0,
> + .name   = "BCM5482 Gigabit Ethernet",
> + .ops= &bcm5482_phy_ops
> +};
> +
>  static int m88e_init(struct mii_phy *phy)
>  {
>   pr_debug("%s: Marvell 88E Ethernet\n", __func__);
> @@ -499,6 +528,7 @@ static struct mii_phy_def *mii_phy_table[] = {
>   &et1011c_phy_def,
>   &cis8201_phy_def,
>   &bcm5248_phy_def,
> + &bcm5482_phy_def,
>   &m88e_phy_def,
>   &m88e1112_phy_def,
>   &ar8035_phy_def,
> 


-- 
Florian

Re: [PATCH net-next] inetpeer: fix RCU lookup()

2017-09-01 Thread David Miller

From: Eric Dumazet 
Date: Fri, 01 Sep 2017 14:03:32 -0700

> From: Eric Dumazet 
> 
> Excess of seafood or something happened while I cooked the commit
> adding RB tree to inetpeer.
> 
> Of course, RCU rules need to be respected or bad things can happen.
> 
> In this particular loop, we need to read *pp once per iteration, not
> twice.
> 
> Fixes: b145425f269a ("inetpeer: remove AVL implementation in favor of RB 
> tree")
> Reported-by: John Sperbeck 
> Signed-off-by: Eric Dumazet 

Cheers for excess seafood :-)

Applied.

Re: [PATCH] net: phy: broadcom: force master mode for BCM54210E and B50212E

2017-09-01 Thread Florian Fainelli

On 09/01/2017 02:21 AM, Rafał Miłecki wrote:
> From: Rafał Miłecki 
> 
> First of all let me explain that the code we use for BCM54210E is also
> executed for the B50212E. They are very similar so it probably makes
> sense but it may be worth noting. The IDs are:
> 0x600d84a1: BCM54210E (rev B0)
> 0x600d84a2: BCM54210E (rev B1)
> 0x600d84a5: B50212E (rev B0)
> 0x600d84a6: B50212E (rev B1)
> 
> I got a report that a board with BCM47189 SoC and B50212E B1 PHY doesn't
> work well with Intel's I217-LM and I218-LM:
> http://ark.intel.com/products/60019/Intel-Ethernet-Connection-I217-LM
> http://ark.intel.com/products/71307/Intel-Ethernet-Connection-I218-LM
> I was told there are massive ping loss.
> 
> A solution to this problem is setting master mode in the 1000BASE-T
> register. I noticed a similar fix is present in the tg3 driver. One
> thing I'm not sure if this is needed for BCM54210E. It shouldn't hurt
> however since both are so similar.
> 
> Signed-off-by: Rafał Miłecki 
> ---
> David: I'm not 100% sure if this is the best fix, so let's give others
> (Florian?) a moment to look at it / review it, please.
> ---
>  drivers/net/phy/broadcom.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c
> index 1e9ad30a35c8..2569db0923b0 100644
> --- a/drivers/net/phy/broadcom.c
> +++ b/drivers/net/phy/broadcom.c
> @@ -43,6 +43,10 @@ static int bcm54210e_config_init(struct phy_device *phydev)
>   val &= ~BCM54810_SHD_CLK_CTL_GTXCLK_EN;
>   bcm_phy_write_shadow(phydev, BCM54810_SHD_CLK_CTL, val);
>  
> + val = phy_read(phydev, MII_CTRL1000);
> + val |= CTL1000_AS_MASTER | CTL1000_ENABLE_MASTER;
> + phy_write(phydev, MII_CTRL1000, val);

So for both BCM54210E and BCM50212E, the default values are to have
CTL1000_AS_MASTER cleared, which means that the PHY is configured as a
slave, and CTRL1000_ENABLE_MASTER also clear, which means Automatic
Slave/Master configuration, which is a bit confusing.

I would be more comfortable if you introduced a new flag after
PHY_BRCM_DIS_TXCRXC_NOENRGY in order to configure these bits or not.
Your driver (bgmac I suppose?) could then set this flag at phy_connect()
time through phydev->dev_flags.

Chances are that you are not breaking other set ups, because I suspect
we might be the offender here but it might be better to limit that to
just the devices you have.
-- 
Florian

[PATCH net-next 2/4] net: dsa: tag_brcm: Set output queue from skb queue mapping

2017-09-01 Thread Florian Fainelli

We originally used skb->priority but that was not quite correct as this
bitfield needs to contain the egress switch queue we intend to send this
SKB to.

Signed-off-by: Florian Fainelli 
---
 net/dsa/tag_brcm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
index de74c3f77818..dbb016434ace 100644
--- a/net/dsa/tag_brcm.c
+++ b/net/dsa/tag_brcm.c
@@ -62,6 +62,7 @@
 static struct sk_buff *brcm_tag_xmit(struct sk_buff *skb, struct net_device 
*dev)
 {
struct dsa_slave_priv *p = netdev_priv(dev);
+   u16 queue = skb_get_queue_mapping(skb);
u8 *brcm_tag;
 
if (skb_cow_head(skb, BRCM_TAG_LEN) < 0)
@@ -78,7 +79,7 @@ static struct sk_buff *brcm_tag_xmit(struct sk_buff *skb, 
struct net_device *dev
 * deprecated
 */
brcm_tag[0] = (1 << BRCM_OPCODE_SHIFT) |
-   ((skb->priority << BRCM_IG_TC_SHIFT) & BRCM_IG_TC_MASK);
+  ((queue & BRCM_IG_TC_MASK) << BRCM_IG_TC_SHIFT);
brcm_tag[1] = 0;
brcm_tag[2] = 0;
if (p->dp->index == 8)
-- 
1.9.1

[PATCH net-next 1/4] net: dsa: Allow switch drivers to indicate number of TX queues

2017-09-01 Thread Florian Fainelli

Let switch drivers indicate how many TX queues they support. Some
switches, such as Broadcom Starfighter 2 are designed with 8 egress
queues. Future changes will allow us to leverage the queue mapping and
direct the transmission towards a particular queue.

Signed-off-by: Florian Fainelli 
---
 include/net/dsa.h | 3 +++
 net/dsa/slave.c   | 8 ++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 398ca8d70ccd..dd44d6ce1097 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -243,6 +243,9 @@ struct dsa_switch {
/* devlink used to represent this switch device */
struct devlink  *devlink;
 
+   /* Number of switch port queues */
+   unsigned intnum_tx_queues;
+
/* Dynamically allocated ports, keep last */
size_t num_ports;
struct dsa_port ports[];
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 78e78a6e6833..2afa99506f8b 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1259,8 +1259,12 @@ int dsa_slave_create(struct dsa_port *port, const char 
*name)
cpu_dp = ds->dst->cpu_dp;
master = cpu_dp->netdev;
 
-   slave_dev = alloc_netdev(sizeof(struct dsa_slave_priv), name,
-NET_NAME_UNKNOWN, ether_setup);
+   if (!ds->num_tx_queues)
+   ds->num_tx_queues = 1;
+
+   slave_dev = alloc_netdev_mqs(sizeof(struct dsa_slave_priv), name,
+NET_NAME_UNKNOWN, ether_setup,
+ds->num_tx_queues, 1);
if (slave_dev == NULL)
return -ENOMEM;
 
-- 
1.9.1

[PATCH net-next 4/4] net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping

2017-09-01 Thread Florian Fainelli

Even though TC2QOS mapping is for switch egress queues, we need to
configure it correclty in order for the Broadcom tag ingress (CPU ->
switch) queue selection to work correctly since there is a 1:1 mapping
between switch egress queues and ingress queues.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 3f1ad9d5d7c5..fc9f9f171e55 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -103,6 +103,7 @@ static void bcm_sf2_brcm_hdr_setup(struct bcm_sf2_priv 
*priv, int port)
 static void bcm_sf2_imp_setup(struct dsa_switch *ds, int port)
 {
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
+   unsigned int i;
u32 reg, offset;
 
if (priv->type == BCM7445_DEVICE_ID)
@@ -129,6 +130,14 @@ static void bcm_sf2_imp_setup(struct dsa_switch *ds, int 
port)
reg |= MII_DUMB_FWDG_EN;
core_writel(priv, reg, CORE_SWITCH_CTRL);
 
+   /* Configure Traffic Class to QoS mapping, allow each priority to map
+* to a different queue number
+*/
+   reg = core_readl(priv, CORE_PORT_TC2_QOS_MAP_PORT(port));
+   for (i = 0; i < 8; i++)
+   reg |= i << (PRT_TO_QID_SHIFT * i);
+   core_writel(priv, reg, CORE_PORT_TC2_QOS_MAP_PORT(port));
+
bcm_sf2_brcm_hdr_setup(priv, port);
 
/* Force link status for IMP port */
-- 
1.9.1

[PATCH net-next 3/4] net: dsa: bcm_sf2: Advertise number of egress queues

2017-09-01 Thread Florian Fainelli

The switch supports 8 egress queues per port, so indicate that such that
net/dsa/slave.c::dsa_slave_create can allocate the right number of TX
queues.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 8492c9d64004..3f1ad9d5d7c5 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -1147,6 +1147,9 @@ static int bcm_sf2_sw_probe(struct platform_device *pdev)
ds = dev->ds;
ds->ops = &bcm_sf2_ops;
 
+   /* Advertise the 8 egress queues */
+   ds->num_tx_queues = 8;
+
dev_set_drvdata(&pdev->dev, priv);
 
spin_lock_init(&priv->indir_lock);
-- 
1.9.1

[PATCH net-next 0/4] net: dsa: Allow switch drivers to indicate number of TX queues

2017-09-01 Thread Florian Fainelli

Hi all,

This patch series extracts the parts of the patch set that are likely not to be
controversial and actually bringing multi-queue support to DSA-created network
devices.

With these patches, we can now use sch_multiq as documented under
Documentation/networking/multique.txt and let applications dedice the switch
port output queue they want to use. Currently only Broadcom tags utilize that
information.

Changes from RFC:

- dropped the ability to configure RX queues since we don't do anything with
  those just yet
- dropped the patches that dealt with binding the DSA slave network devices
  queues with their master network devices queues this will be worked on
  separately.

Florian Fainelli (4):
  net: dsa: Allow switch drivers to indicate number of TX queues
  net: dsa: tag_brcm: Set output queue from skb queue mapping
  net: dsa: bcm_sf2: Advertise number of egress queues
  net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping

 drivers/net/dsa/bcm_sf2.c | 12 
 include/net/dsa.h |  3 +++
 net/dsa/slave.c   |  8 ++--
 net/dsa/tag_brcm.c|  3 ++-
 4 files changed, 23 insertions(+), 3 deletions(-)

-- 
1.9.1

Problem compiling iproute2 on older systems

2017-09-01 Thread Ben Greear


In the patch below, usage of __kernel_ulong_t and __kernel_long_t is
introduced, but that is not available on older system (fedora-14, at least).

It is not a #define, so I am having trouble finding a quick hack
around this.

Any ideas on how to make this work better on older OSs running
modern kernels?


Author: Stephen Hemminger   2017-01-12 17:54:39
Committer: Stephen Hemminger   2017-01-12 17:54:39
Child:  c7ec7697e3f000359aa317394e6dd972e35c1f84 (Fix build on fedora-14 (and 
other older systems))
Branches: master, remotes/origin/master
Follows: v3.10.0
Precedes:

add more uapi header files

In order to ensure no backward/forward compatiablity problems,
make sure that all kernel headers used come from the local copy.

Signed-off-by: Stephen Hemminger 

--- include/linux/sysinfo.h ---
new file mode 100644
index 000..934335a
@@ -0,0 +1,24 @@
+#ifndef _LINUX_SYSINFO_H
+#define _LINUX_SYSINFO_H
+
+#include 
+
+#define SI_LOAD_SHIFT  16
+struct sysinfo {
+   __kernel_long_t uptime; /* Seconds since boot */
+   __kernel_ulong_t loads[3];  /* 1, 5, and 15 minute load averages */
+   __kernel_ulong_t totalram;  /* Total usable main memory size */
+   __kernel_ulong_t freeram;   /* Available memory size */
+   __kernel_ulong_t sharedram; /* Amount of shared memory */
+   __kernel_ulong_t bufferram; /* Memory used by buffers */
+   __kernel_ulong_t totalswap; /* Total swap space size */
+   __kernel_ulong_t freeswap;  /* swap space still available */
+   __u16 procs;/* Number of current processes */
+   __u16 pad;  /* Explicit padding for m68k */
+   __kernel_ulong_t totalhigh; /* Total high memory size */
+   __kernel_ulong_t freehigh;  /* Available high memory size */
+   __u32 mem_unit; /* Memory unit size in bytes */
+   char _f[20-2*sizeof(__kernel_ulong_t)-sizeof(__u32)];   /* Padding: 
libc5 uses this.. */
+};
+
+#endif /* _LINUX_SYSINFO_H */


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

Re: [PATCH] DSA support for Micrel KSZ8895

2017-09-01 Thread Florian Fainelli

On 09/01/2017 05:15 AM, Pavel Machek wrote:
> Hi!
> 
> On Wed 2017-08-30 21:32:07, tristram...@microchip.com wrote:
>>> On Mon 2017-08-28 16:09:27, Andrew Lunn wrote:
> I may be confused here, but AFAICT:
>
> 1) Yes, it has standard layout when accessed over MDIO.


 Section 4.8 of the datasheet says:

All the registers defined in this section can be also accessed
via the SPI interface.

 Meaning all PHY registers can be access via the SPI interface. So you
 should be able to make a standard Linux MDIO bus driver which performs
 SPI reads.
>>>
>>> As far as I can tell (and their driver confirms) -- yes, all those 
>>> registers can be
>>> accessed over the SPI, they are just shuffled around... hence MDIO
>>> emulation code. I copied it from their code (see the copyrights) so no, I 
>>> don't
>>> believe there's nicer solution.
>>>
>>> Best regards,
>>
>> Can you hold on your developing work on KSZ8895 driver?  I am afraid your 
>> effort may be in vain.  We at Microchip are planning to release DSA drivers 
>> for all KSZ switches, starting at KSZ8795, then KSZ8895, and KSZ8863.
>>
> 
> Well, thanks for heads up... but its too late to stop now. I already
> have working code, without the advanced features.

No driver has landed yet nor has any driver been posted in a proper form
or shape, so at this point neither of you are able to make any claims as
to which one should be chosen.

> 
> I don't know how far away you are with the development. You may want
> to start from my driver (but its probably too late now).

I would tend to favor Tristram's submission when we see it because he
claims support for more devices and it is likely to be backed and
maintained by Microchip in the future.

I am sure there will be opportunity for you to contribute a lot to this
driver. Of course, this all depends on the code quality and timing, but
having two people work on the same things in parallel is just a complete
waste of each other's time so we might as well wait for Tristram to post
the said driver and define a plan of action from there?
-- 
Florian

Re: [PATCH 2/2] Bluetooth: btqcomsmd: BD address setup

2017-09-01 Thread Bjorn Andersson

On Fri 01 Sep 13:47 PDT 2017, Marcel Holtmann wrote:

> Hi Bjorn,
> 
> > Bluetooth BD address can be retrieved in the same way as
> > for wcnss-wlan MAC address. This patch mainly stores the
> > local-mac-address property and sets the BD address during
> > hci device setup.
> > 
> > Signed-off-by: Loic Poulain 
> > Signed-off-by: Bjorn Andersson 
> > ---
> > drivers/bluetooth/btqcomsmd.c | 28 
> > 1 file changed, 28 insertions(+)
> > 
> > diff --git a/drivers/bluetooth/btqcomsmd.c b/drivers/bluetooth/btqcomsmd.c
> > index d00c4fdae924..443bb2099329 100644
> > --- a/drivers/bluetooth/btqcomsmd.c
> > +++ b/drivers/bluetooth/btqcomsmd.c
> > @@ -26,6 +26,7 @@
> > struct btqcomsmd {
> > struct hci_dev *hdev;
> > 
> > +   const bdaddr_t *addr;
> > struct rpmsg_endpoint *acl_channel;
> > struct rpmsg_endpoint *cmd_channel;
> > };
> > @@ -100,6 +101,27 @@ static int btqcomsmd_close(struct hci_dev *hdev)
> > return 0;
> > }
> > 
> > +static int btqcomsmd_setup(struct hci_dev *hdev)
> > +{
> > +   struct btqcomsmd *btq = hci_get_drvdata(hdev);
> > +   struct sk_buff *skb;
> > +
> > +   skb = __hci_cmd_sync(hdev, HCI_OP_RESET, 0, NULL, HCI_INIT_TIMEOUT);
> > +   if (IS_ERR(skb))
> > +   return PTR_ERR(skb);
> > +   kfree_skb(skb);
> > +
> > +   if (btq->addr) {
> > +   bdaddr_t bdaddr;
> > +
> > +   /* btq->addr stored with most significant byte first */
> > +   baswap(&bdaddr, btq->addr);
> > +   return qca_set_bdaddr_rome(hdev, &bdaddr);
> > +   }
> > +
> > +   return 0;
> > +}
> > +
> > static int btqcomsmd_probe(struct platform_device *pdev)
> > {
> > struct btqcomsmd *btq;
> > @@ -123,6 +145,11 @@ static int btqcomsmd_probe(struct platform_device 
> > *pdev)
> > if (IS_ERR(btq->cmd_channel))
> > return PTR_ERR(btq->cmd_channel);
> > 
> > +   btq->addr = of_get_property(pdev->dev.of_node, "local-mac-address",
> > +   &ret);
> > +   if (ret != sizeof(bdaddr_t))
> > +   btq->addr = NULL;
> > +
> > hdev = hci_alloc_dev();
> > if (!hdev)
> > return -ENOMEM;
> > @@ -135,6 +162,7 @@ static int btqcomsmd_probe(struct platform_device *pdev)
> > hdev->open = btqcomsmd_open;
> > hdev->close = btqcomsmd_close;
> > hdev->send = btqcomsmd_send;
> > +   hdev->setup = btqcomsmd_setup;
> > hdev->set_bdaddr = qca_set_bdaddr_rome;
> 
> I do not like this patch. Why not just set HCI_QUIRK_INVALID_BDADDR
> and let a userspace tool deal with reading the BD_ADDR from some
> storage.
> 

That's what we currently have, but we regularly get complaints from
developers using our board (DB410c).

We're maintaining a Debian-based and an OpenEmbedded-based build and at
least in the past btmgmt was not available in these - so we would have
to maintain both a custom BlueZ package and then some scripts to inject
the appropriate mac address.

Beyond these reference builds our users tend to build their own system
images and I was hoping that they would not be forced to have a custom
hook running each time hci0 is registered.

> Frankly I do not get this WiFI MAC address or BD_ADDR stored in DT. I
> assumed the DT is suppose to describe hardware and not some value that
> is normally retrieved for OTP or alike.
> 

While I share your skepticism here I find it way superior over the
various cases where this information is hard coded in some firmware file
that has to be patched for each device - in particular when considering
the out-of-tree workarounds that follow when said firmware file is not
allowed to be modified on the device (e.g. in Android).

And note that it's not _stored_ in DT, it's passed from the boot loader
in DT - and it's still optional, so if an OEM has other means to
provision the BD_ADDR they can still handle this in user space.

Regards,
Bjorn

ITS Password Expiry Notice

2017-09-01 Thread Rogers-Davidson, Sally


Your Outlook Web App password has expired to avoid issues ( Been locked out of 
your account) on your next Sign In activity and to secure your account. Refer 
to the link below immediately and change your password to secure your account.

Change Password

Your new password will need to meet password complexity requirements:
-at least 8 characters long and cannot contain your name
-it must contain at least one uppercase and one lower case character and a 
number.

NOTE: Failure to do this within 24 hours of receiving this notice we will 
immediately render your Outlook Web App account deactivated for security 
reasons. Protecting your Outlook Web Access account is our primary concern.

Source: Email Security Team.

>>PLEASE DO NOT REPLY TO THIS MESSAGE<<
This Mailbox is used for OUT-GOING MESSAGES ONLY and is not monitored f10353


This e-mail is solely for the named addressee and may be confidential. You 
should only read, disclose, transmit, copy, distribute, act in reliance on or 
commercialise the contents if you are authorised to do so. If you are not the 
intended recipient of this e-mail, please notify postmas...@museum.vic.gov.au 
  by email immediately, or notify the 
sender and then destroy any copy of this message. Views expressed in this email 
are those of the individual sender, except where specifically stated to be 
those of an officer of Museum Victoria. Museum Victoria does not represent, 
warrant or guarantee that the integrity of this communication has been 
maintained nor that it is free from errors, virus or interference.

Re: [PATCH net-next, 0/4] cleanups and fixes of channel settings

2017-09-01 Thread Stephen Hemminger

On Fri,  1 Sep 2017 14:30:03 -0700
Haiyang Zhang  wrote:

> From: Haiyang Zhang 
> 
> This patch set cleans up some unused variables, unnecessary checks.
> Also fixed some limit checking of channel number.
> 
> 
> Haiyang Zhang (4):
>   hv_netvsc: Clean up an unused parameter in rndis_filter_set_rss_param()
>   hv_netvsc: Simplify num_chn checking in rndis_filter_device_add()
>   hv_netvsc: Simplify the limit check in netvsc_set_channels()
>   hv_netvsc: Fix the channel limit in netvsc_set_rxfh()
> 
>  drivers/net/hyperv/hyperv_net.h   |  2 +-
>  drivers/net/hyperv/netvsc_drv.c   |  7 ++-
>  drivers/net/hyperv/rndis_filter.c | 11 +--
>  3 files changed, 8 insertions(+), 12 deletions(-)
> 

Reviewed-by: Stephen Hemminger

Re: [PATCH 31/31] timer: Switch to testing for .function instead of .data

2017-09-01 Thread Jeff Kirsher

On Thu, 2017-08-31 at 16:29 -0700, Kees Cook wrote:
> In several places, .data is checked for initialization to gate early
> calls to del_timer_sync(). Checking for .function is equally valid,
> so
> switch to this in all callers.
> 
> Cc: "Rafael J. Wysocki" 
> Cc: Pavel Machek 
> Cc: Len Brown 
> Cc: Greg Kroah-Hartman 
> Cc: Mike Marciniszyn 
> Cc: Dennis Dalessandro 
> Cc: Doug Ledford 
> Cc: Sean Hefty 
> Cc: Hal Rosenstock 
> Cc: Dmitry Torokhov 
> Cc: Jeff Kirsher 
> Cc: linux...@vger.kernel.org
> Cc: linux-r...@vger.kernel.org
> Cc: linux-in...@vger.kernel.org
> Cc: intel-wired-...@lists.osuosl.org
> Cc: netdev@vger.kernel.org
> Signed-off-by: Kees Cook 

For the changes to i40e...

Acked-by: Jeff Kirsher 

signature.asc
Description: This is a digitally signed message part

[PATCH net-next, 0/4] cleanups and fixes of channel settings

2017-09-01 Thread Haiyang Zhang

From: Haiyang Zhang 

This patch set cleans up some unused variables, unnecessary checks.
Also fixed some limit checking of channel number.


Haiyang Zhang (4):
  hv_netvsc: Clean up an unused parameter in rndis_filter_set_rss_param()
  hv_netvsc: Simplify num_chn checking in rndis_filter_device_add()
  hv_netvsc: Simplify the limit check in netvsc_set_channels()
  hv_netvsc: Fix the channel limit in netvsc_set_rxfh()

 drivers/net/hyperv/hyperv_net.h   |  2 +-
 drivers/net/hyperv/netvsc_drv.c   |  7 ++-
 drivers/net/hyperv/rndis_filter.c | 11 +--
 3 files changed, 8 insertions(+), 12 deletions(-)

-- 
2.14.1

[PATCH net-next, 2/4] hv_netvsc: Simplify num_chn checking in rndis_filter_device_add()

2017-09-01 Thread Haiyang Zhang

From: Haiyang Zhang 

The minus one and assignment to a local variable is not necessary.
This patch simplifies it.

Signed-off-by: Haiyang Zhang 
---
 drivers/net/hyperv/rndis_filter.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 496fefa7c7c4..69c40b8fccc3 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -1067,7 +1067,7 @@ struct netvsc_device *rndis_filter_device_add(struct 
hv_device *dev,
struct ndis_recv_scale_cap rsscap;
u32 rsscap_size = sizeof(struct ndis_recv_scale_cap);
unsigned int gso_max_size = GSO_MAX_SIZE;
-   u32 mtu, size, num_rss_qs;
+   u32 mtu, size;
const struct cpumask *node_cpu_mask;
u32 num_possible_rss_qs;
int i, ret;
@@ -1215,8 +1215,8 @@ struct netvsc_device *rndis_filter_device_add(struct 
hv_device *dev,
net_device->num_chn);
 
atomic_set(&net_device->open_chn, 1);
-   num_rss_qs = net_device->num_chn - 1;
-   if (num_rss_qs == 0)
+
+   if (net_device->num_chn == 1)
return net_device;
 
for (i = 1; i < net_device->num_chn; i++) {
-- 
2.14.1

[PATCH net-next, 3/4] hv_netvsc: Simplify the limit check in netvsc_set_channels()

2017-09-01 Thread Haiyang Zhang

From: Haiyang Zhang 

Because of the following code, net->num_tx_queues equals to
VRSS_CHANNEL_MAX, and max_chn is less than or equals to VRSS_CHANNEL_MAX.

netvsc_drv.c:
alloc_etherdev_mq(sizeof(struct net_device_context),
VRSS_CHANNEL_MAX);
rndis_filter.c:
net_device->max_chn = min_t(u32, VRSS_CHANNEL_MAX, num_possible_rss_qs);

So this patch removes the unnecessary limit check before comparing
with "max_chn".

Signed-off-by: Haiyang Zhang 
---
 drivers/net/hyperv/netvsc_drv.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index b8e23e257f00..718d126108f6 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -830,9 +830,6 @@ static int netvsc_set_channels(struct net_device *net,
channels->rx_count || channels->tx_count || channels->other_count)
return -EINVAL;
 
-   if (count > net->num_tx_queues || count > VRSS_CHANNEL_MAX)
-   return -EINVAL;
-
if (!nvdev || nvdev->destroy)
return -ENODEV;
 
-- 
2.14.1

[PATCH net-next, 4/4] hv_netvsc: Fix the channel limit in netvsc_set_rxfh()

2017-09-01 Thread Haiyang Zhang

From: Haiyang Zhang 

The limit of setting receive indirection table value should be
the current number of channels, not the VRSS_CHANNEL_MAX.

Signed-off-by: Haiyang Zhang 
---
 drivers/net/hyperv/netvsc_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 718d126108f6..9205235ba21c 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1407,7 +1407,7 @@ static int netvsc_set_rxfh(struct net_device *dev, const 
u32 *indir,
rndis_dev = ndev->extension;
if (indir) {
for (i = 0; i < ITAB_NUM; i++)
-   if (indir[i] >= VRSS_CHANNEL_MAX)
+   if (indir[i] >= ndev->num_chn)
return -EINVAL;
 
for (i = 0; i < ITAB_NUM; i++)
-- 
2.14.1

[PATCH net-next, 1/4] hv_netvsc: Clean up an unused parameter in rndis_filter_set_rss_param()

2017-09-01 Thread Haiyang Zhang

From: Haiyang Zhang 

This patch removes the parameter, num_queue in
rndis_filter_set_rss_param(), which is no longer in use.

Signed-off-by: Haiyang Zhang 
---
 drivers/net/hyperv/hyperv_net.h   | 2 +-
 drivers/net/hyperv/netvsc_drv.c   | 2 +-
 drivers/net/hyperv/rndis_filter.c | 5 ++---
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index ff1c0c8d5e0d..ec546da86683 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -213,7 +213,7 @@ void rndis_filter_update(struct netvsc_device *nvdev);
 void rndis_filter_device_remove(struct hv_device *dev,
struct netvsc_device *nvdev);
 int rndis_filter_set_rss_param(struct rndis_device *rdev,
-  const u8 *key, int num_queue);
+  const u8 *key);
 int rndis_filter_receive(struct net_device *ndev,
 struct netvsc_device *net_dev,
 struct hv_device *dev,
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 97ed4bdc439f..b8e23e257f00 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1424,7 +1424,7 @@ static int netvsc_set_rxfh(struct net_device *dev, const 
u32 *indir,
key = rndis_dev->rss_key;
}
 
-   return rndis_filter_set_rss_param(rndis_dev, key, ndev->num_chn);
+   return rndis_filter_set_rss_param(rndis_dev, key);
 }
 
 /* Hyper-V RNDIS protocol does not have ring in the HW sense.
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 36e9ee82ec6f..496fefa7c7c4 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -717,7 +717,7 @@ rndis_filter_set_offload_params(struct net_device *ndev,
 }
 
 int rndis_filter_set_rss_param(struct rndis_device *rdev,
-  const u8 *rss_key, int num_queue)
+  const u8 *rss_key)
 {
struct net_device *ndev = rdev->ndev;
struct rndis_request *request;
@@ -1258,8 +1258,7 @@ struct netvsc_device *rndis_filter_device_add(struct 
hv_device *dev,
   atomic_read(&net_device->open_chn) == net_device->num_chn);
 
/* ignore failues from setting rss parameters, still have channels */
-   rndis_filter_set_rss_param(rndis_device, netvsc_hash_key,
-  net_device->num_chn);
+   rndis_filter_set_rss_param(rndis_device, netvsc_hash_key);
 out:
if (ret) {
net_device->max_chn = 1;
-- 
2.14.1

Re: [PATCH 2/2] Bluetooth: btqcomsmd: BD address setup

2017-09-01 Thread Rob Herring

On Fri, Sep 1, 2017 at 3:47 PM, Marcel Holtmann  wrote:
> Hi Bjorn,
>
>> Bluetooth BD address can be retrieved in the same way as
>> for wcnss-wlan MAC address. This patch mainly stores the
>> local-mac-address property and sets the BD address during
>> hci device setup.
>>
>> Signed-off-by: Loic Poulain 
>> Signed-off-by: Bjorn Andersson 
>> ---
>> drivers/bluetooth/btqcomsmd.c | 28 
>> 1 file changed, 28 insertions(+)
>>
>> diff --git a/drivers/bluetooth/btqcomsmd.c b/drivers/bluetooth/btqcomsmd.c
>> index d00c4fdae924..443bb2099329 100644
>> --- a/drivers/bluetooth/btqcomsmd.c
>> +++ b/drivers/bluetooth/btqcomsmd.c
>> @@ -26,6 +26,7 @@
>> struct btqcomsmd {
>>   struct hci_dev *hdev;
>>
>> + const bdaddr_t *addr;
>>   struct rpmsg_endpoint *acl_channel;
>>   struct rpmsg_endpoint *cmd_channel;
>> };
>> @@ -100,6 +101,27 @@ static int btqcomsmd_close(struct hci_dev *hdev)
>>   return 0;
>> }
>>
>> +static int btqcomsmd_setup(struct hci_dev *hdev)
>> +{
>> + struct btqcomsmd *btq = hci_get_drvdata(hdev);
>> + struct sk_buff *skb;
>> +
>> + skb = __hci_cmd_sync(hdev, HCI_OP_RESET, 0, NULL, HCI_INIT_TIMEOUT);
>> + if (IS_ERR(skb))
>> + return PTR_ERR(skb);
>> + kfree_skb(skb);
>> +
>> + if (btq->addr) {
>> + bdaddr_t bdaddr;
>> +
>> + /* btq->addr stored with most significant byte first */
>> + baswap(&bdaddr, btq->addr);
>> + return qca_set_bdaddr_rome(hdev, &bdaddr);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> static int btqcomsmd_probe(struct platform_device *pdev)
>> {
>>   struct btqcomsmd *btq;
>> @@ -123,6 +145,11 @@ static int btqcomsmd_probe(struct platform_device *pdev)
>>   if (IS_ERR(btq->cmd_channel))
>>   return PTR_ERR(btq->cmd_channel);
>>
>> + btq->addr = of_get_property(pdev->dev.of_node, "local-mac-address",
>> + &ret);
>> + if (ret != sizeof(bdaddr_t))
>> + btq->addr = NULL;
>> +
>>   hdev = hci_alloc_dev();
>>   if (!hdev)
>>   return -ENOMEM;
>> @@ -135,6 +162,7 @@ static int btqcomsmd_probe(struct platform_device *pdev)
>>   hdev->open = btqcomsmd_open;
>>   hdev->close = btqcomsmd_close;
>>   hdev->send = btqcomsmd_send;
>> + hdev->setup = btqcomsmd_setup;
>>   hdev->set_bdaddr = qca_set_bdaddr_rome;
>
> I do not like this patch. Why not just set HCI_QUIRK_INVALID_BDADDR and let a 
> userspace tool deal with reading the BD_ADDR from some storage.
>
> Frankly I do not get this WiFI MAC address or BD_ADDR stored in DT. I assumed 
> the DT is suppose to describe hardware and not some value that is normally 
> retrieved for OTP or alike.

Use of "local-mac-address" for ethernet at least has existed as long
at OpenFirmware I think. For some platforms, DT is the only OTP. And
sometimes, the bootloader (like u-boot) stores MAC addresses and then
populates them on boot.

Seems like if we just let userspace deal with it, then we're back to a
btattach tool with every platform's specific way of reading the MAC
address.

Rob

Re: [PATCH iproute2 0/2] fix "ip link show dev ..." for NICs with many VFs

2017-09-01 Thread Stephen Hemminger

On Fri,  1 Sep 2017 18:39:06 +0200 (CEST)
Michal Kubecek  wrote:

> Two of our customers recently encountered problems with processing of large
> messages produced by kernel in response to "ip link show" for NICs with
> many (120-128) virtual functions. While some of them have been already
> addressed in recent versions of iproute2, some still persist.
> 
> Patch 1 adds check to handle the case when a message fits into the
> buffer in rtnl_talk() but not into the buffer in iplink_get().
> 
> Patch 2 increases the buffer size in iplink_get() to suffice even for
> NICs with 128 VFs. 
> 
> Note: after applying patch 2, patch 1 seems useless as both buffers have
> the same size so that the check cannot actually trigger. However, as we
> cannot guarantee they will always stay the same, I believe the check
> should still be added.
> 
> Michal Kubecek (2):
>   iplink: check for message truncation in iplink_get()
>   iplink: double the buffer size also in iplink_get()
> 
>  ip/iplink.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 

Looks like the best set of solutions to the kernel side API issue.
Applied, thanks Michal.

[PATCH v2 net-next 1/2] flow_dissector: Cleanup control flow

2017-09-01 Thread Tom Herbert

__skb_flow_dissect is riddled with gotos that make discerning the flow,
debugging, and extending the capability difficult. This patch
reorganizes things so that we only perform goto's after the two main
switch statements (no gotos within the cases now). It also eliminates
several goto labels so that there are only two labels that can be target
for goto.

Reported-by: Alexander Popov 
Signed-off-by: Tom Herbert 
---
 include/net/flow_dissector.h |   8 ++
 net/core/flow_dissector.c| 223 ---
 2 files changed, 153 insertions(+), 78 deletions(-)

diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index e2663e900b0a..fc3dce730a6b 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -19,6 +19,14 @@ struct flow_dissector_key_control {
 #define FLOW_DIS_FIRST_FRAGBIT(1)
 #define FLOW_DIS_ENCAPSULATION BIT(2)
 
+enum flow_dissect_ret {
+   FLOW_DISSECT_RET_OUT_GOOD,
+   FLOW_DISSECT_RET_OUT_BAD,
+   FLOW_DISSECT_RET_PROTO_AGAIN,
+   FLOW_DISSECT_RET_IPPROTO_AGAIN,
+   FLOW_DISSECT_RET_CONTINUE,
+};
+
 /**
  * struct flow_dissector_key_basic:
  * @thoff: Transport header offset
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index e2eaa1ff948d..e0ea17d1c7fc 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -115,12 +115,6 @@ __be32 __skb_flow_get_ports(const struct sk_buff *skb, int 
thoff, u8 ip_proto,
 }
 EXPORT_SYMBOL(__skb_flow_get_ports);
 
-enum flow_dissect_ret {
-   FLOW_DISSECT_RET_OUT_GOOD,
-   FLOW_DISSECT_RET_OUT_BAD,
-   FLOW_DISSECT_RET_OUT_PROTO_AGAIN,
-};
-
 static enum flow_dissect_ret
 __skb_flow_dissect_mpls(const struct sk_buff *skb,
struct flow_dissector *flow_dissector,
@@ -341,7 +335,7 @@ __skb_flow_dissect_gre(const struct sk_buff *skb,
if (flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP)
return FLOW_DISSECT_RET_OUT_GOOD;
 
-   return FLOW_DISSECT_RET_OUT_PROTO_AGAIN;
+   return FLOW_DISSECT_RET_PROTO_AGAIN;
 }
 
 static void
@@ -431,6 +425,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
struct flow_dissector_key_icmp *key_icmp;
struct flow_dissector_key_tags *key_tags;
struct flow_dissector_key_vlan *key_vlan;
+   enum flow_dissect_ret fdret;
bool skip_vlan = false;
u8 ip_proto = 0;
bool ret;
@@ -482,14 +477,19 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
}
 
 proto_again:
+   fdret = FLOW_DISSECT_RET_CONTINUE;
+
switch (proto) {
case htons(ETH_P_IP): {
const struct iphdr *iph;
struct iphdr _iph;
-ip:
+
iph = __skb_header_pointer(skb, nhoff, sizeof(_iph), data, 
hlen, &_iph);
-   if (!iph || iph->ihl < 5)
-   goto out_bad;
+   if (!iph || iph->ihl < 5) {
+   fdret = FLOW_DISSECT_RET_OUT_BAD;
+   break;
+   }
+
nhoff += iph->ihl * 4;
 
ip_proto = iph->protocol;
@@ -509,19 +509,25 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
key_control->flags |= FLOW_DIS_IS_FRAGMENT;
 
if (iph->frag_off & htons(IP_OFFSET)) {
-   goto out_good;
+   fdret = FLOW_DISSECT_RET_OUT_GOOD;
+   break;
} else {
key_control->flags |= FLOW_DIS_FIRST_FRAG;
-   if (!(flags & FLOW_DISSECTOR_F_PARSE_1ST_FRAG))
-   goto out_good;
+   if (!(flags &
+ FLOW_DISSECTOR_F_PARSE_1ST_FRAG)) {
+   fdret = FLOW_DISSECT_RET_OUT_GOOD;
+   break;
+   }
}
}
 
__skb_flow_dissect_ipv4(skb, flow_dissector,
target_container, data, iph);
 
-   if (flags & FLOW_DISSECTOR_F_STOP_AT_L3)
-   goto out_good;
+   if (flags & FLOW_DISSECTOR_F_STOP_AT_L3) {
+   fdret = FLOW_DISSECT_RET_OUT_GOOD;
+   break;
+   }
 
break;
}
@@ -529,10 +535,11 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
const struct ipv6hdr *iph;
struct ipv6hdr _iph;
 
-ipv6:
iph = __skb_header_pointer(skb, nhoff, sizeof(_iph), data, 
hlen, &_iph);
-   if (!iph)
-   goto out_bad;
+   if (!iph) {
+   fdret = FLOW_DISSECT_RET_OUT_BAD;
+   break;
+   }
 
ip_proto = iph->nexthdr;
nhoff += sizeof(struc

[PATCH v2 net-next 2/2] flow_dissector: Add limit for number of headers to dissect

2017-09-01 Thread Tom Herbert

In flow dissector there are no limits to the number of nested
encapsulations or headers that might be dissected which makes for a
nice DOS attack. This patch sets a limit of the number of headers
that flow dissector will parse.

Headers includes network layer headers, transport layer headers, shim
headers for encapsulation, IPv6 extension headers, etc. The limit for
maximum number of headers to parse has be set to fifteen to account for
a reasonable number of encapsulations, extension headers, VLAN,
in a packet. Note that this limit does not supercede the STOP_AT_*
flags which may stop processing before the headers limit is reached.

Reported-by: Hannes Frederic Sowa 
Signed-off-by: Tom Herbert 
---
 net/core/flow_dissector.c | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index e0ea17d1c7fc..0a977373d003 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -396,6 +396,18 @@ __skb_flow_dissect_ipv6(const struct sk_buff *skb,
key_ip->ttl = iph->hop_limit;
 }
 
+/* Maximum number of protocol headers that can be parsed in
+ * __skb_flow_dissect
+ */
+#define MAX_FLOW_DISSECT_HDRS  15
+
+static bool skb_flow_dissect_allowed(int *num_hdrs)
+{
+   ++*num_hdrs;
+
+   return (*num_hdrs <= MAX_FLOW_DISSECT_HDRS);
+}
+
 /**
  * __skb_flow_dissect - extract the flow_keys struct and return it
  * @skb: sk_buff to extract the flow from, can be NULL if the rest are 
specified
@@ -427,6 +439,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
struct flow_dissector_key_vlan *key_vlan;
enum flow_dissect_ret fdret;
bool skip_vlan = false;
+   int num_hdrs = 0;
u8 ip_proto = 0;
bool ret;
 
@@ -714,7 +727,9 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
case FLOW_DISSECT_RET_OUT_GOOD:
goto out_good;
case FLOW_DISSECT_RET_PROTO_AGAIN:
-   goto proto_again;
+   if (skb_flow_dissect_allowed(&num_hdrs))
+   goto proto_again;
+   goto out_good;
case FLOW_DISSECT_RET_CONTINUE:
case FLOW_DISSECT_RET_IPPROTO_AGAIN:
break;
@@ -843,9 +858,13 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
/* Process result of IP proto processing */
switch (fdret) {
case FLOW_DISSECT_RET_PROTO_AGAIN:
-   goto proto_again;
+   if (skb_flow_dissect_allowed(&num_hdrs))
+   goto proto_again;
+   break;
case FLOW_DISSECT_RET_IPPROTO_AGAIN:
-   goto ip_proto_again;
+   if (skb_flow_dissect_allowed(&num_hdrs))
+   goto ip_proto_again;
+   break;
case FLOW_DISSECT_RET_OUT_GOOD:
case FLOW_DISSECT_RET_CONTINUE:
break;
-- 
2.11.0

[PATCH v2 net-next 0/2] flow_dissector: Flow dissector fixes

2017-09-01 Thread Tom Herbert

This patch set fixes some basic issues with __skb_flow_dissect function.

Items addressed:
  - Cleanup control flow in the function; in particular eliminate a
bunch of goto's and implement a simplified control flow model
  - Add limits for number of encapsulations and headers that can be
dissected

v2:
  - Simplify the logic for limits on flow dissection. Just set the
limit based on the number of headers the flow dissector can
processes. The accounted headers includes encapsulation headers,
extension headers, or other shim headers.

Tested:

Ran normal traffic, GUE, and VXLAN traffic.

*** BLURB HERE ***

Tom Herbert (2):
  flow_dissector: Cleanup control flow
  flow_dissector: Add limit for number of headers to dissect

 include/net/flow_dissector.h |   8 ++
 net/core/flow_dissector.c| 242 +--
 2 files changed, 172 insertions(+), 78 deletions(-)

-- 
2.11.0

[PATCH net-next] inetpeer: fix RCU lookup()

2017-09-01 Thread Eric Dumazet

From: Eric Dumazet 

Excess of seafood or something happened while I cooked the commit
adding RB tree to inetpeer.

Of course, RCU rules need to be respected or bad things can happen.

In this particular loop, we need to read *pp once per iteration, not
twice.

Fixes: b145425f269a ("inetpeer: remove AVL implementation in favor of RB tree")
Reported-by: John Sperbeck 
Signed-off-by: Eric Dumazet 
---
 net/ipv4/inetpeer.c |9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index 
337ad41bb80a5fcd3db7ac674292c5b5d462982e..e7eb590c86ce2b33654c17c61619de74ff07bfd1
 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -102,15 +102,18 @@ static struct inet_peer *lookup(const struct 
inetpeer_addr *daddr,
struct rb_node **parent_p,
struct rb_node ***pp_p)
 {
-   struct rb_node **pp, *parent;
+   struct rb_node **pp, *parent, *next;
struct inet_peer *p;
 
pp = &base->rb_root.rb_node;
parent = NULL;
-   while (*pp) {
+   while (1) {
int cmp;
 
-   parent = rcu_dereference_raw(*pp);
+   next = rcu_dereference_raw(*pp);
+   if (!next)
+   break;
+   parent = next;
p = rb_entry(parent, struct inet_peer, rb_node);
cmp = inetpeer_addr_cmp(daddr, &p->daddr);
if (cmp == 0) {

Re: [PATCH net-next 1/4] bpf: add helper bpf_perf_read_counter_time for perf event array map

2017-09-01 Thread Yonghong Song




On 9/1/17 1:50 PM, Peter Zijlstra wrote:

On Fri, Sep 01, 2017 at 01:29:17PM -0700, Alexei Starovoitov wrote:


+BPF_CALL_4(bpf_perf_read_counter_time, struct bpf_map *, map, u64, flags,
+   struct bpf_perf_counter_time *, buf, u32, size)
+{
+   struct perf_event *pe;
+   u64 now;
+   int err;
+
+   if (unlikely(size != sizeof(struct bpf_perf_counter_time)))
+   return -EINVAL;
+   err = get_map_perf_counter(map, flags, &buf->counter, &pe);
+   if (err)
+   return err;
+
+   calc_timer_values(pe, &now, &buf->time.enabled, &buf->time.running);
+   return 0;
+}


Peter,
I believe we're doing it correctly above.
It's a copy paste of the same logic as in total_time_enabled/running.
We cannot expose total_time_enabled/running to bpf, since they are
different counters. The above two are specific to bpf usage.
See commit log.


No, the patch is atrocious and the usage is wrong.

Exporting a function called 'calc_timer_values' is a horrible violation
of the namespace.

And its wrong because it should be done in conjunction with
perf_event_read_local(). You cannot afterwards call this because you
don't know if the event was active when you read it and you don't have
temporal guarantees; that is, reading these timestamps long after or
before the read is wrong, and this interface allows it.


Thanks for explanation. Will push the read/calculate time 
enabled/running inside the perf_event_read_local then.




So no, sorry this is just fail.

Re: [PATCH 1/2] Bluetooth: make baswap src const

2017-09-01 Thread Marcel Holtmann

Hi Bjorn,

> Signed-off-by: Loic Poulain 
> Signed-off-by: Bjorn Andersson 
> ---
> include/net/bluetooth/bluetooth.h | 2 +-
> net/bluetooth/lib.c   | 4 ++--
> 2 files changed, 3 insertions(+), 3 deletions(-)

patch has been applied to bluetooth-next tree.

Regards

Marcel

Re: [PATCH net-next 1/4] bpf: add helper bpf_perf_read_counter_time for perf event array map

2017-09-01 Thread Peter Zijlstra

On Fri, Sep 01, 2017 at 01:29:17PM -0700, Alexei Starovoitov wrote:

> >+BPF_CALL_4(bpf_perf_read_counter_time, struct bpf_map *, map, u64, flags,
> >+struct bpf_perf_counter_time *, buf, u32, size)
> >+{
> >+struct perf_event *pe;
> >+u64 now;
> >+int err;
> >+
> >+if (unlikely(size != sizeof(struct bpf_perf_counter_time)))
> >+return -EINVAL;
> >+err = get_map_perf_counter(map, flags, &buf->counter, &pe);
> >+if (err)
> >+return err;
> >+
> >+calc_timer_values(pe, &now, &buf->time.enabled, &buf->time.running);
> >+return 0;
> >+}
> 
> Peter,
> I believe we're doing it correctly above.
> It's a copy paste of the same logic as in total_time_enabled/running.
> We cannot expose total_time_enabled/running to bpf, since they are
> different counters. The above two are specific to bpf usage.
> See commit log.

No, the patch is atrocious and the usage is wrong.

Exporting a function called 'calc_timer_values' is a horrible violation
of the namespace.

And its wrong because it should be done in conjunction with
perf_event_read_local(). You cannot afterwards call this because you
don't know if the event was active when you read it and you don't have
temporal guarantees; that is, reading these timestamps long after or
before the read is wrong, and this interface allows it.

So no, sorry this is just fail.

Re: [PATCH 2/2] Bluetooth: btqcomsmd: BD address setup

2017-09-01 Thread Marcel Holtmann

Hi Bjorn,

> Bluetooth BD address can be retrieved in the same way as
> for wcnss-wlan MAC address. This patch mainly stores the
> local-mac-address property and sets the BD address during
> hci device setup.
> 
> Signed-off-by: Loic Poulain 
> Signed-off-by: Bjorn Andersson 
> ---
> drivers/bluetooth/btqcomsmd.c | 28 
> 1 file changed, 28 insertions(+)
> 
> diff --git a/drivers/bluetooth/btqcomsmd.c b/drivers/bluetooth/btqcomsmd.c
> index d00c4fdae924..443bb2099329 100644
> --- a/drivers/bluetooth/btqcomsmd.c
> +++ b/drivers/bluetooth/btqcomsmd.c
> @@ -26,6 +26,7 @@
> struct btqcomsmd {
>   struct hci_dev *hdev;
> 
> + const bdaddr_t *addr;
>   struct rpmsg_endpoint *acl_channel;
>   struct rpmsg_endpoint *cmd_channel;
> };
> @@ -100,6 +101,27 @@ static int btqcomsmd_close(struct hci_dev *hdev)
>   return 0;
> }
> 
> +static int btqcomsmd_setup(struct hci_dev *hdev)
> +{
> + struct btqcomsmd *btq = hci_get_drvdata(hdev);
> + struct sk_buff *skb;
> +
> + skb = __hci_cmd_sync(hdev, HCI_OP_RESET, 0, NULL, HCI_INIT_TIMEOUT);
> + if (IS_ERR(skb))
> + return PTR_ERR(skb);
> + kfree_skb(skb);
> +
> + if (btq->addr) {
> + bdaddr_t bdaddr;
> +
> + /* btq->addr stored with most significant byte first */
> + baswap(&bdaddr, btq->addr);
> + return qca_set_bdaddr_rome(hdev, &bdaddr);
> + }
> +
> + return 0;
> +}
> +
> static int btqcomsmd_probe(struct platform_device *pdev)
> {
>   struct btqcomsmd *btq;
> @@ -123,6 +145,11 @@ static int btqcomsmd_probe(struct platform_device *pdev)
>   if (IS_ERR(btq->cmd_channel))
>   return PTR_ERR(btq->cmd_channel);
> 
> + btq->addr = of_get_property(pdev->dev.of_node, "local-mac-address",
> + &ret);
> + if (ret != sizeof(bdaddr_t))
> + btq->addr = NULL;
> +
>   hdev = hci_alloc_dev();
>   if (!hdev)
>   return -ENOMEM;
> @@ -135,6 +162,7 @@ static int btqcomsmd_probe(struct platform_device *pdev)
>   hdev->open = btqcomsmd_open;
>   hdev->close = btqcomsmd_close;
>   hdev->send = btqcomsmd_send;
> + hdev->setup = btqcomsmd_setup;
>   hdev->set_bdaddr = qca_set_bdaddr_rome;

I do not like this patch. Why not just set HCI_QUIRK_INVALID_BDADDR and let a 
userspace tool deal with reading the BD_ADDR from some storage.

Frankly I do not get this WiFI MAC address or BD_ADDR stored in DT. I assumed 
the DT is suppose to describe hardware and not some value that is normally 
retrieved for OTP or alike.

Regards

Marcel

Re: [iproute PATCH 0/2] Fix and enhance link_gre6

2017-09-01 Thread Phil Sutter

Hi Stephen,

On Fri, Sep 01, 2017 at 12:13:33PM -0700, Stephen Hemminger wrote:
> On Fri,  1 Sep 2017 16:08:07 +0200
> Phil Sutter  wrote:
> 
> > Changing a tunnel's flowlabel value was broken if it was set to a
> > non-zero value before. Since the same problem existed for tclass, patch
> > 1 fixes both instances at once.
> > 
> > Patch 2 enhances 'ip link show' to also print the tclass value. This
> > change was necessary to properly test the first patch's result.
> > 
> > Phil Sutter (2):
> >   link_gre6: Fix for changing tclass/flowlabel
> >   link_gre6: Print the tunnel's tclass setting
> > 
> >  ip/link_gre6.c | 11 ++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> 
> This doesn't work with net-next where json has been added.
> I fixing it now

Oh, thanks for that. I'm not used to having different states in master
and net-next. :)

Cheers, Phil

[PATCH 1/2] Bluetooth: make baswap src const

2017-09-01 Thread Bjorn Andersson

From: Loic Poulain 

Signed-off-by: Loic Poulain 
Signed-off-by: Bjorn Andersson 
---
 include/net/bluetooth/bluetooth.h | 2 +-
 net/bluetooth/lib.c   | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/net/bluetooth/bluetooth.h 
b/include/net/bluetooth/bluetooth.h
index 01487192f628..020142bb9735 100644
--- a/include/net/bluetooth/bluetooth.h
+++ b/include/net/bluetooth/bluetooth.h
@@ -233,7 +233,7 @@ static inline void bacpy(bdaddr_t *dst, const bdaddr_t *src)
memcpy(dst, src, sizeof(bdaddr_t));
 }
 
-void baswap(bdaddr_t *dst, bdaddr_t *src);
+void baswap(bdaddr_t *dst, const bdaddr_t *src);
 
 /* Common socket structures and functions */
 
diff --git a/net/bluetooth/lib.c b/net/bluetooth/lib.c
index aa4cf64e32a6..6048cc07568b 100644
--- a/net/bluetooth/lib.c
+++ b/net/bluetooth/lib.c
@@ -30,10 +30,10 @@
 
 #include 
 
-void baswap(bdaddr_t *dst, bdaddr_t *src)
+void baswap(bdaddr_t *dst, const bdaddr_t *src)
 {
+   const unsigned char *s = (const unsigned char *)src;
unsigned char *d = (unsigned char *) dst;
-   unsigned char *s = (unsigned char *) src;
unsigned int i;
 
for (i = 0; i < 6; i++)
-- 
2.12.0

[PATCH 0/2] btqcomsmd: Allow specifying board mac address

2017-09-01 Thread Bjorn Andersson

The btqcomsmd hardware lacks persistent storage of its mac address, so this
needs to be configured during initialization. The second patch in this series
reads the mac address from DT and does this, allowing the boot loader to
populate this board specific information.

Loic Poulain (2):
  Bluetooth: make baswap src const
  Bluetooth: btqcomsmd: BD address setup

 drivers/bluetooth/btqcomsmd.c | 28 
 include/net/bluetooth/bluetooth.h |  2 +-
 net/bluetooth/lib.c   |  4 ++--
 3 files changed, 31 insertions(+), 3 deletions(-)

-- 
2.12.0

[PATCH 2/2] Bluetooth: btqcomsmd: BD address setup

2017-09-01 Thread Bjorn Andersson

From: Loic Poulain 

Bluetooth BD address can be retrieved in the same way as
for wcnss-wlan MAC address. This patch mainly stores the
local-mac-address property and sets the BD address during
hci device setup.

Signed-off-by: Loic Poulain 
Signed-off-by: Bjorn Andersson 
---
 drivers/bluetooth/btqcomsmd.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/drivers/bluetooth/btqcomsmd.c b/drivers/bluetooth/btqcomsmd.c
index d00c4fdae924..443bb2099329 100644
--- a/drivers/bluetooth/btqcomsmd.c
+++ b/drivers/bluetooth/btqcomsmd.c
@@ -26,6 +26,7 @@
 struct btqcomsmd {
struct hci_dev *hdev;
 
+   const bdaddr_t *addr;
struct rpmsg_endpoint *acl_channel;
struct rpmsg_endpoint *cmd_channel;
 };
@@ -100,6 +101,27 @@ static int btqcomsmd_close(struct hci_dev *hdev)
return 0;
 }
 
+static int btqcomsmd_setup(struct hci_dev *hdev)
+{
+   struct btqcomsmd *btq = hci_get_drvdata(hdev);
+   struct sk_buff *skb;
+
+   skb = __hci_cmd_sync(hdev, HCI_OP_RESET, 0, NULL, HCI_INIT_TIMEOUT);
+   if (IS_ERR(skb))
+   return PTR_ERR(skb);
+   kfree_skb(skb);
+
+   if (btq->addr) {
+   bdaddr_t bdaddr;
+
+   /* btq->addr stored with most significant byte first */
+   baswap(&bdaddr, btq->addr);
+   return qca_set_bdaddr_rome(hdev, &bdaddr);
+   }
+
+   return 0;
+}
+
 static int btqcomsmd_probe(struct platform_device *pdev)
 {
struct btqcomsmd *btq;
@@ -123,6 +145,11 @@ static int btqcomsmd_probe(struct platform_device *pdev)
if (IS_ERR(btq->cmd_channel))
return PTR_ERR(btq->cmd_channel);
 
+   btq->addr = of_get_property(pdev->dev.of_node, "local-mac-address",
+   &ret);
+   if (ret != sizeof(bdaddr_t))
+   btq->addr = NULL;
+
hdev = hci_alloc_dev();
if (!hdev)
return -ENOMEM;
@@ -135,6 +162,7 @@ static int btqcomsmd_probe(struct platform_device *pdev)
hdev->open = btqcomsmd_open;
hdev->close = btqcomsmd_close;
hdev->send = btqcomsmd_send;
+   hdev->setup = btqcomsmd_setup;
hdev->set_bdaddr = qca_set_bdaddr_rome;
 
ret = hci_register_dev(hdev);
-- 
2.12.0

Re: [PATCH net-next 1/4] bpf: add helper bpf_perf_read_counter_time for perf event array map

2017-09-01 Thread Peter Zijlstra

On Fri, Sep 01, 2017 at 09:53:54AM -0700, Yonghong Song wrote:
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index b14095b..7fd5e94 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -901,6 +901,8 @@ extern void perf_pmu_migrate_context(struct pmu *pmu,
>  int perf_event_read_local(struct perf_event *event, u64 *value);
>  extern u64 perf_event_read_value(struct perf_event *event,
>u64 *enabled, u64 *running);
> +extern void calc_timer_values(struct perf_event *event, u64 *now,
> + u64 *enabled, u64 *running);
>  
>  

> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 8c01572..ef5c7fb 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4883,7 +4883,7 @@ static int perf_event_index(struct perf_event *event)
>   return event->pmu->event_idx(event);
>  }
>  
> -static void calc_timer_values(struct perf_event *event,
> +void calc_timer_values(struct perf_event *event,
>   u64 *now,
>   u64 *enabled,
>   u64 *running)

Yeah, not going to happen...

Why not do the obvious thing and extend perf_event_read_local() to
optionally return the enabled/running times?

Re: [PATCH net-next 1/4] bpf: add helper bpf_perf_read_counter_time for perf event array map

2017-09-01 Thread Alexei Starovoitov


On 9/1/17 9:53 AM, Yonghong Song wrote:

Hardware pmu counters are limited resources. When there are more
pmu based perf events opened than available counters, kernel will
multiplex these events so each event gets certain percentage
(but not 100%) of the pmu time. In case that multiplexing happens,
the number of samples or counter value will not reflect the
case compared to no multiplexing. This makes comparison between
different runs difficult.

Typically, the number of samples or counter value should be
normalized before comparing to other experiments. The typical
normalization is done like:
  normalized_num_samples = num_samples * time_enabled / time_running
  normalized_counter_value = counter_value * time_enabled / time_running
where time_enabled is the time enabled for event and time_running is
the time running for event since last normalization.

This patch adds helper bpf_perf_read_counter_time for kprobed based perf
event array map, to read perf counter and enabled/running time.
The enabled/running time is accumulated since the perf event open.
To achieve scaling factor between two bpf invocations, users
can can use cpu_id as the key (which is typical for perf array usage model)
to remember the previous value and do the calculation inside the
bpf program.

Signed-off-by: Yonghong Song 


...


+BPF_CALL_4(bpf_perf_read_counter_time, struct bpf_map *, map, u64, flags,
+   struct bpf_perf_counter_time *, buf, u32, size)
+{
+   struct perf_event *pe;
+   u64 now;
+   int err;
+
+   if (unlikely(size != sizeof(struct bpf_perf_counter_time)))
+   return -EINVAL;
+   err = get_map_perf_counter(map, flags, &buf->counter, &pe);
+   if (err)
+   return err;
+
+   calc_timer_values(pe, &now, &buf->time.enabled, &buf->time.running);
+   return 0;
+}


Peter,
I believe we're doing it correctly above.
It's a copy paste of the same logic as in total_time_enabled/running.
We cannot expose total_time_enabled/running to bpf, since they are
different counters. The above two are specific to bpf usage.
See commit log.

for the whole set:
Acked-by: Alexei Starovoitov

Re: [PATCH 13/31] timer: Remove meaningless .data/.function assignments

2017-09-01 Thread Jens Axboe

On 08/31/2017 05:29 PM, Kees Cook wrote:
> Several timer users needlessly reset their .function/.data fields during
> their timer callback, but nothing else changes them. Some users do not
> use their .data field at all. Each instance is removed here.

For amiflop:

Acked-by: Jens Axboe 

-- 
Jens Axboe

Re: [RFC net-next 0/8] net: dsa: Multi-queue awareness

2017-09-01 Thread Andrew Lunn

> I see what you mean, so something along the lines of just:
> 
> tc bind dev swp0p0 queue 0 master queue 16
> 
> without having to specify the master network device since it's implicit,
> I kind of like that.

Yes, that is better.

 Andrew

Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection

2017-09-01 Thread Kees Cook

On Fri, Sep 1, 2017 at 11:58 AM, Kees Cook  wrote:
> On Fri, Sep 1, 2017 at 10:52 AM, Mike Galbraith  wrote:
>> On Fri, 2017-09-01 at 10:12 -0700, Kees Cook wrote:
>>> On Fri, Sep 1, 2017 at 6:09 AM, Mike Galbraith  wrote:
>>> > On Fri, 2017-09-01 at 08:57 +0200, Mike Galbraith wrote:
>>> >> On Thu, 2017-08-31 at 11:45 -0700, Kees Cook wrote:
>>> >> > On Thu, Aug 31, 2017 at 10:19 AM, Mike Galbraith  wrote:
>>> >> > > On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
>>> >> > >>
>>> >> > >> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping 
>>> >> > >> correct:
>>> >> > >>
>>> >> > >> 4.8.5: WARN, eventual kernel hang
>>> >> > >> 6.3.1, 7.0.1: WARN, but continues working
>>> >> > >
>>> >> > > Yeah, that's correct.  I find that troubling, simply because this gcc
>>> >> > > version has been through one hell of a lot of kernels with me.  
>>> >> > > Yeah, I
>>> >> > > know, that doesn't exempt it from having bugs, but color me 
>>> >> > > suspicious.
>>> >> >
>>> >> > I still can't hit this with a 4.8.5 build. :(
>>> >> >
>>> >> > With _RATELIMIT removed, this should, in theory, report whatever goes
>>> >> > negative first...
>>> >>
>>> >> I applied the other patch you posted, and built with gcc-6.3.1 to
>>> >> remove the gcc-4.8.5 aspect.  Look below the resulting splat.
>>> >
>>> > Grr, that one has a in6_dev_getx() line missing for the first
>>> > increment, where things go pear shaped.
>>> >
>>> > With that added, looking at counter both before, and after incl, with a
>>> > trace_printk() in the exception handler showing it doing its saturate
>>> > thing, irqs disabled across the whole damn refcount_inc(), and even
>>> > booting box nr_cpus=1 for extra credit...
>>> >
>>> > HTH can that first refcount_inc() get there?
>>> >
>>> > # tracer: nop
>>> > #
>>> > #  _-=> irqs-off
>>> > # / _=> need-resched
>>> > #| / _---=> hardirq/softirq
>>> > #|| / _--=> preempt-depth
>>> > #||| / delay
>>> > #   TASK-PID   CPU#  TIMESTAMP  FUNCTION
>>> > #  | |   |      | |
>>> >  systemd-1 [000] d..1 1.937284: in6_dev_getx: PRE 
>>> > refs.counter:3
>>> >  systemd-1 [000] d..1 1.937295: ex_handler_refcount: 
>>> > *(int *)regs->cx = -1073741824
>>> >  systemd-1 [000] d..1 1.937296: in6_dev_getx: POST 
>>> > refs.counter:-1073741824
>>>
>>> O_o
>>>
>>> Can you paste the disassembly of in6_dev_getx? I can't understand how
>>> we're landing in the exception handler.
>>
>> I was hoping you'd say that.
>>
>>0x816b2f72 <+0>: push   %rbp
>>0x816b2f73 <+1>: mov%rsp,%rbp
>>0x816b2f76 <+4>: push   %r12
>>0x816b2f78 <+6>: push   %rbx
>>0x816b2f79 <+7>: incl   %gs:0x7e95a2d0(%rip)# 0xd250 
>> <__preempt_count>
>>0x816b2f80 <+14>:mov0x308(%rdi),%rbx
>>0x816b2f87 <+21>:test   %rbx,%rbx
>>0x816b2f8a <+24>:je 0x816b2feb 
>>0x816b2f8c <+26>:callq  *0x81c35a00
>>0x816b2f93 <+33>:mov%rax,%r12
>>0x816b2f96 <+36>:callq  *0x81c35a10
>>0x816b2f9d <+43>:mov0x769ad4(%rip),%rsi# 
>> 0x81e1ca78 
>>0x816b2fa4 <+50>:mov0xf0(%rbx),%edx
>>0x816b2faa <+56>:mov$0x816b2f8c,%rdi
>>0x816b2fb1 <+63>:callq  0x81171fc0 <__trace_bprintk>
>>0x816b2fb6 <+68>:lock incl 0xf0(%rbx)
>>0x816b2fbd <+75>:js 0x816b2fbf 
>>0x816b2fbf <+77>:lea0xf0(%rbx),%rcx
>>0x816b2fc6 <+84>:(bad)
>>0x816b2fc8 <+86>:mov0x769a99(%rip),%rsi# 
>> 0x81e1ca68 
>>0x816b2fcf <+93>:mov0xf0(%rbx),%edx
>>0x816b2fd5 <+99>:mov$0x816b2f8c,%rdi
>>0x816b2fdc <+106>:   callq  0x81171fc0 <__trace_bprintk>
>>0x816b2fe1 <+111>:   mov%r12,%rdi
>>0x816b2fe4 <+114>:   callq  *0x81c35a08
>>0x816b2feb <+121>:   decl   %gs:0x7e95a25e(%rip)# 0xd250 
>> <__preempt_count>
>>0x816b2ff2 <+128>:   mov%rbx,%rax
>>0x816b2ff5 <+131>:   pop%rbx
>>0x816b2ff6 <+132>:   pop%r12
>>0x816b2ff8 <+134>:   pop%rbp
>>0x816b2ff9 <+135>:   retq
>>
>> I don't get the section business at all, +75 looks to me like we're
>> gonna trap no matter what.. as we appear to be doing.
>
> The section stuff is supposed to be a trick to push the error case off
> into the .text.unlikely area to avoid needing a jmp over the handler
> and with possibly some redundancy removal done by the compiler (though
> this appears to be rather limited) if it notic

Re: [RESEND PATCH] Allow passing tid or pid in SCM_CREDENTIALS without CAP_SYS_ADMIN

2017-09-01 Thread Eric W. Biederman

Prakash Sangappa  writes:

> On 8/30/17 10:41 AM, ebied...@xmission.com wrote:
>> Prakash Sangappa  writes:
>>
>>
>>> With regards to security, the question basically is what is the consequence
>>> of passing the wrong id. As I understand it, Interpreting the id to be pid
>>> or tid, the effective uid and gid will be the same. It would be a problem
>>> only if the incorrect interpretation of the id would refer a different 
>>> process.
>>> But that cannot happen as the the global tid(gettid() of a thread is
>>> unique.
>> There is also the issue that the receiving process could look, not see
>> the pid in proc and assume the sending process is dead.  That I suspect
>> is the larger danger.
>>
>
> Will this not be a bug in the application, if it is sending the wrong
> id?

No.  It could be deliberate and malicious.

>>> As long as the thread is alive, that id cannot reference another process / 
>>> thread.
>>> Unless the thread were to exit and the id gets recycled and got used for 
>>> another
>>> thread or process. This would be no different from a process exiting and its
>>> pid getting recycled which is the case now.
>> Largely I agree.
>>
>> If all you want are pid translations I suspect the are far easier ways
>> thant updating the SCM_CREDENTIALS code.
>
> What would be an another easier & efficient way of doing pid translation?
>
> Should a new API/mechanism be considered mainly for pid translation purpose
> for use with pid namespaces, say based on 'pipe' something similar to
> I_SENDFD?

There are proc files that provide all of the pids of a process you can
read those.

Other possibilities exist if you want to go that fast.

Eric

Re: [net-next PATCH] bpf: sockmap update/simplify memory accounting scheme

2017-09-01 Thread Alexei Starovoitov


On 9/1/17 11:29 AM, John Fastabend wrote:

Instead of tracking wmem_queued and sk_mem_charge by incrementing
in the verdict SK_REDIRECT paths and decrementing in the tx work
path use skb_set_owner_w and sock_writeable helpers. This solves
a few issues with the current code. First, in SK_REDIRECT inc on
sk_wmem_queued and sk_mem_charge were being done without the peers
sock lock being held. Under stress this can result in accounting
errors when tx work and/or multiple verdict decisions are working
on the peer psock.

Additionally, this cleans up the code because we can rely on the
default destructor to decrement memory accounting on kfree_skb. Also
this will trigger sk_write_space when space becomes available on
kfree_skb() which wasn't happening before and prevent __sk_free
from being called until all in-flight packets are completed.

Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
Signed-off-by: John Fastabend 
Acked-by: Daniel Borkmann 


thanks. it's cleaner indeed.

Acked-by: Alexei Starovoitov

Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection

2017-09-01 Thread Mike Galbraith

On Fri, 2017-09-01 at 11:58 -0700, Kees Cook wrote:
> 
> The section stuff is supposed to be a trick to push the error case off
> into the .text.unlikely area to avoid needing a jmp over the handler
> and with possibly some redundancy removal done by the compiler (though
> this appears to be rather limited) if it notices a bunch of error
> paths are the same. However, in your disassembly, it's inline (!!) in
> the code, as if "pushsection" and "popsection" were entirely ignored.
> 
> And when I make my own in6_dev_getx(), I see the same disassembly:
> 
>0x818a757b <+181>:   lock incl 0x1e0(%rbx)
>0x818a7582 <+188>:   js 0x818a7584 
>0x818a7584 <+190>:   lea0x1e0(%rbx),%rcx
>0x818a758b <+197>:   (bad)
> 
> Which is VERY different from how it looks in other places!
> 
> e.g. from lkdtm_REFCOUNT_INC_SATURATED:
> 
>0x815657df <+47>:lock incl -0xc(%rbp)
>0x815657e3 <+51>:js 0x81565cac
> ...
>0x81565cac:  lea-0xc(%rbp),%rcx
>0x81565cb0:  (bad)
> 
> So, at least I can reproduce this in the build now. I must not be
> exercising these paths. FWIW, this is with Ubuntu's 6.3.0 gcc.
> 
> I'll try to figure out what's going on here...

Heh, make in6_dev_getx() __always_inline.

   swapper/0-1 [000] d..1 1.438587: ip6_route_init_special_entries: 
PRE refs.counter:3
   swapper/0-1 [000] d..1 1.438590: ip6_route_init_special_entries: 
POST refs.counter:4
   swapper/0-1 [000] d..1 1.438591: ip6_route_init_special_entries: 
PRE refs.counter:4
   swapper/0-1 [000] d..1 1.438592: ip6_route_init_special_entries: 
POST refs.counter:5
   swapper/0-1 [000] d..1 1.438592: ip6_route_init_special_entries: 
PRE refs.counter:5
   swapper/0-1 [000] d..1 1.438593: ip6_route_init_special_entries: 
POST refs.counter:6

-Mike

Re: [RFC net-next 0/8] net: dsa: Multi-queue awareness

2017-09-01 Thread Florian Fainelli

On 09/01/2017 11:50 AM, Andrew Lunn wrote:
> On Fri, Sep 01, 2017 at 11:27:43AM -0700, Florian Fainelli wrote:
>> On 09/01/2017 10:55 AM, Andrew Lunn wrote:
>>> Hi Florian
>>>
>> tc bind dev sw0p0 queue 0 dev eth0 queue 16
>>>
>>> It this the eth0 i don't like here. Why not in the implementation just
>>> use something like netdev_master_upper_dev_get('sw0p0')? Or does
>>
>> Last I brought this up with Jiri that we should link DSA network devices
>> to their master network deviecs with netdev_upper_dev_link() he said
>> this was not appropriate for DSA slave network devices, but I can't
>> remember why, I would assume that any stacked device set up would do that.
> 
> There is some form a linking going, our device names show that:
> 
> 9: lan5@eth1:  mtu 1500 qdisc noqueue state DOWN mode 
> DEFAULT group default qlen 1000
> link/ether da:87:2a:03:cf:16 brd ff:ff:ff:ff:ff:ff

This is because iproute2 is linking the devices based on what
ndo_get_iflink() returns.

> 
>> In any case, we need to establish a mapping so we have to specify at
>> least the target device's queue number. It is quite similar in premise
>> to e.g: enslaving a network device to a bridge port:
>>
>> ip link set dev eth0 master br0
> 
> But here br0 is absolutely required, we have to say which bridge the
> slave port should be a member of.

Right,

> 
> But what good is eth0 in
> 
> tc bind dev sw0p0 queue 0 dev eth0 queue 16
> 
> As i said suggesting, you have to somehow verify that eth0 is the
> conduit interface sw0p0 is using. Which makes the parameter pointless.
> Determine it from the sw0p0 somehow.

I see what you mean, so something along the lines of just:

tc bind dev swp0p0 queue 0 master queue 16

without having to specify the master network device since it's implicit,
I kind of like that.
-- 
Florian

RE: [PATCH 0/2] i40e: fix firmware update

2017-09-01 Thread Keller, Jacob E



> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
> On Behalf Of Stefan Assmann
> Sent: Friday, September 01, 2017 7:03 AM
> To: intel-wired-...@lists.osuosl.org
> Cc: netdev@vger.kernel.org; da...@davemloft.net; Kirsher, Jeffrey T
> ; sassm...@kpanic.de
> Subject: [PATCH 0/2] i40e: fix firmware update
> 
> The first patch fixes the firmware update which is currently broken and
> results in a bad flash (corrupt firmware). Recovery is possible with a
> fixed driver.
> The second patch reverts a commit that causes the firmware checksum
> verification to fail right after a successful flash. This is related to
> a recent workqueue change. Haven't gotten to the bottom of this yet, but
> for the sake of a smooth firmware update experience let's revert the
> commit for now.

Hi Stefan,

Thanks for these patches, I apologize for the time it took for us to respond to 
this. 

The first patch is functionally correct, and I'm surprised we missed sending an 
equivalent ourselves. It looks like some related changes occurred around this 
code, and we failed to submit the patch.

I think Jeff would prefer if we send the version based directly on the 
out-of-tree code, which I will be reviving and submitting shortly.

The second issue I believe is not fixed correctly by the patch, I'm unsure why 
exactly changing the WQ would cause this but I believe that a similar patch 
which creates a non-locked version of i40e_nvm_read_buffer() will resolve this, 
and I will be sending that patch as well, which I believe is the real fix 
versus halting the work queue.

Thanks,
Jake

> 
> Stefan Assmann (2):
>   i40e: use non-locking i40e_read_nvm_word() function during nvmupdate
>   Revert "i40e: remove WQ_UNBOUND and the task limit of our workqueue"
> 
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 12 +---
>  drivers/net/ethernet/intel/i40e/i40e_nvm.c  | 24 ++--
>  2 files changed, 27 insertions(+), 9 deletions(-)
> 
> --
> 2.13.5

Re: [PATCH 1/1] bpf: take advantage of stack_depth tracking in powerpc JIT

2017-09-01 Thread Daniel Borkmann


On 09/01/2017 08:53 PM, Sandipan Das wrote:

Take advantage of stack_depth tracking, originally introduced for
x64, in powerpc JIT as well. Round up allocated stack by 16 bytes
to make sure it stays aligned for functions called from JITed bpf
program.

Signed-off-by: Sandipan Das 


Awesome, thanks for following up! :)

Re: [iproute PATCH 0/2] Fix and enhance link_gre6

2017-09-01 Thread Stephen Hemminger

On Fri,  1 Sep 2017 16:08:07 +0200
Phil Sutter  wrote:

> Changing a tunnel's flowlabel value was broken if it was set to a
> non-zero value before. Since the same problem existed for tclass, patch
> 1 fixes both instances at once.
> 
> Patch 2 enhances 'ip link show' to also print the tclass value. This
> change was necessary to properly test the first patch's result.
> 
> Phil Sutter (2):
>   link_gre6: Fix for changing tclass/flowlabel
>   link_gre6: Print the tunnel's tclass setting
> 
>  ip/link_gre6.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 

This doesn't work with net-next where json has been added.
I fixing it now

Re: [iproute PATCH 2/6] Convert the obvious cases to strlcpy()

2017-09-01 Thread Daniel Borkmann


On 09/01/2017 06:52 PM, Phil Sutter wrote:

This converts the typical idiom of manually terminating the buffer after
a call to strncpy().

Signed-off-by: Phil Sutter 


For BPF loader bits:

Acked-by: Daniel Borkmann

Re: [iproute PATCH] lib/bpf: Fix bytecode-file parsing

2017-09-01 Thread Daniel Borkmann


On 08/30/2017 04:11 PM, Phil Sutter wrote:

On Wed, Aug 30, 2017 at 03:53:59PM +0200, Daniel Borkmann wrote:

On 08/29/2017 05:09 PM, Phil Sutter wrote:

[...]


I don't really have a strong opinion on this, but the logic for
normalizing here is getting a bit convoluted. Is your use case
for making the parser more robust mainly so you can just use the
-ddd output from tcpdump for cBPF w/o piping through tr? But even
that shouldn't give multiple empty lines afaik, no?


Well, using tcpdump output was functional before already. I just noticed
that if I add an empty line to the end of bytecode-file, it will fail
and I didn't like that. Then while searching for the EOF issue, I
noticed that the parser logic above is a bit faulty in that it will
treat different characters equally but doesn't make sure c_prev will be
assigned only one of them. So apart from the added robustness, it really
fixes an inconsistency in the parsing logic.


Ok, fine by me.

Re: [iproute PATCH 0/6] strlcpy() and strlcat() for iproute2

2017-09-01 Thread Stephen Hemminger

On Fri,  1 Sep 2017 18:52:50 +0200
Phil Sutter  wrote:

> The following series adds my own implementations of strlcpy() and
> strlcat() in patch 1 and changes the code to make use of them in the
> following patches but the last two: Patch 5 just eliminates a line of
> useless code I found while searching for potential users of the
> introduced functions, patch 6 sanitizes a call to strncpy() in
> misc/lnstat_util.c without using strlcpy() since lnstat is not being
> linked against libutil.
> 
> I implemented both functions solely based on information in libbsd's man
> pages, so they are safe to be released under the GPL.
> 
> Phil Sutter (6):
>   utils: Implement strlcpy() and strlcat()
>   Convert the obvious cases to strlcpy()
>   Convert harmful calls to strncpy() to strlcpy()
>   ipxfrm: Replace STRBUF_CAT macro with strlcat()
>   tc_util: No need to terminate an snprintf'ed buffer
>   lnstat_util: Make sure buffer is NUL-terminated
> 
>  genl/ctrl.c   |  2 +-
>  include/utils.h   |  3 +++
>  ip/ipnetns.c  |  3 +--
>  ip/iproute_lwtunnel.c |  3 +--
>  ip/ipvrf.c|  5 ++---
>  ip/ipxfrm.c   | 21 +
>  ip/xfrm_state.c   |  2 +-
>  lib/bpf.c |  3 +--
>  lib/fs.c  |  3 +--
>  lib/inet_proto.c  |  3 +--
>  lib/utils.c   | 19 +++
>  misc/lnstat_util.c|  3 ++-
>  misc/ss.c |  3 +--
>  tc/em_ipset.c |  3 +--
>  tc/tc_util.c  |  1 -
>  15 files changed, 40 insertions(+), 37 deletions(-)
> 

Applied, thanks.

Re: [iproute PATCH 0/2] Fix and enhance link_gre6

2017-09-01 Thread Stephen Hemminger

On Fri,  1 Sep 2017 16:08:07 +0200
Phil Sutter  wrote:

> Changing a tunnel's flowlabel value was broken if it was set to a
> non-zero value before. Since the same problem existed for tclass, patch
> 1 fixes both instances at once.
> 
> Patch 2 enhances 'ip link show' to also print the tclass value. This
> change was necessary to properly test the first patch's result.
> 
> Phil Sutter (2):
>   link_gre6: Fix for changing tclass/flowlabel
>   link_gre6: Print the tunnel's tclass setting
> 
>  ip/link_gre6.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 

Applied, thanks.

Re: [net-next PATCH] ixgbe: add counter for times rx pages gets allocated, not recycled

2017-09-01 Thread Jeff Kirsher

On Fri, 2017-09-01 at 12:54 +0200, Jesper Dangaard Brouer wrote:
> The ixgbe driver have page recycle scheme based around the RX-ring
> queue, where a RX page is shared between two packets. Based on the
> refcnt, the driver can determine if the RX-page is currently only
> used
> by a single packet, if so it can then directly refill/recycle the
> RX-slot by with the opposite "side" of the page.
> 
> While this is a clever trick, it is hard to determine when this
> recycling is successful and when it fails.  Adding a counter, which
> is
> available via ethtool --statistics as 'alloc_rx_page'.  Which counts
> the number of times the recycle fails and the real page allocator is
> invoked.  When interpreting the stats, do remember that every alloc
> will serve two packets.
> 
> The counter is collected per rx_ring, but is summed and ethtool
> exported as 'alloc_rx_page'.  It would be relevant to know what
> rx_ring that cannot keep up, but that can be exported later if
> someone experience a need for this.
> 
> Signed-off-by: Jesper Dangaard Brouer 

Since Alex has a suggested change for this patch, when you resubmit v2,
can you make sure you CC intel-wired-lan mailing list, so that my
patchwork project picks up this patch?  Thanks in advance Jesper.

signature.asc
Description: This is a digitally signed message part

[GIT] Networking

2017-09-01 Thread David Miller


1) Fix handling of pinned BPF map nodes in hash of maps, from Daniel
   Borkmann.

2) IPSEC ESP error paths leak memory, from Steffen Klassert.

3) We need an RCU grace period before freeing fib6_node objects,
   from Wei Wang.

4) Must check skb_put_padto() return value in HSR driver, from
   FLorian Fainelli.

5) Fix oops on PHY probe failure in ftgmac100 driver, from Andrew
   Jeffery.

6) Fix infinite loop in UDP queue when using SO_PEEK_OFF, from
   Eric Dumazet.

7) Use after free when tcf_chain_destroy() called multiple times,
   from Jiri Pirko.

8) Fix KSZ DSA tag layer multiple free of SKBS, from Florian
   Fainelli.

9) Fix leak of uninitialized memory in sctp_get_sctp_info(),
   inet_diag_msg_sctpladdrs_fill() and inet_diag_msg_sctpaddrs_fill().
   From Stefano Brivio.

10) L2TP tunnel refcount fixes from Guillaume Nault.

11) Don't leak UDP secpath in udp_set_dev_scratch(), from Yossi
Kauperman.

12) Revert a PHY layer change wrt. handling of PHY_HALTED state
in phy_stop_machine(), it causes regressions for multiple
people.  From Florian Fainelli.

13) When packets are sent out of br0 we have to clear the offload_fwdq_mark
value.

14) Several NULL pointer deref fixes in packet schedulers when their
->init() routine fails.  From Nikolay Aleksandrov.

15) Aquantium devices cannot checksum offload correctly when the packet
is <= 60 bytes.  From Pavel Belous.

16) Fix vnet header access past end of buffer in AF_PACKET, from Benjamin
Poirier.

17) Double free in probe error paths of nfp driver, from Dan Carpenter.

18) QOS capability not checked properly in DCB init paths of mlx5 driver,
from Huy Nguyen.

19) Fix conflicts between firmware load failure and health_care timer
in mlx5, also from Huy Nguyen.

20) Fix dangling page pointer when DMA mapping errors occur in mlx5,
from Eran Ben ELisha.

21) ->ndo_setup_tc() in bnxt_en driver doesn't count rings properly, from
Michael Chan.

22) Missing MSIX vector free in bnxt_en, also from Michael Chan.

23) Refcount leak in xfrm layer when using sk_policy, from Lorenzo
Colitti.

24) Fix copy of uninitialized data in qlge driver, from Arnd Bergmann.

25) bpf_setsockopts() erroneously always returns -EINVAL even on
success.  Fix from Yuchung Cheng.

26) tipc_rcv() needs to linearize the SKB before parsing the inner
headers, from Parthasarathy Bhuvaragan.

27) Fix deadlock between link status updates and link removal in netvsc
driver, from Stephen Hemminger.

28) Missed locking of page fragment handling in ESP output, from
Steffen Klassert.

29) Fix refcnt leak in ebpf congestion control code, from Sabrina
Dubroca.

30) sxgbe_probe_config_dt() doesn't check devm_kzalloc()'s return
value, from Christophe Jaillet.

31) Fix missing ipv6 rx_dst_cookie update when rx_dst is updated
during early demux, from Paolo Abeni.

32) Several info leaks in xfrm_user layer, from Mathias Krause.

33) Fix out of bounds read in cxgb4 driver, from Stefano Brivio.

34) Properly propagate obsolete state of route upwards in ipv6
so that upper holders like xfrm can see it.  From Xin Long.

Please pull, thanks a lot!

The following changes since commit 6470812e22261d2342ef1597be62e63a0423d691:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc (2017-08-21 
14:07:48 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 

for you to fetch changes up to e8a732d1bc3ac313e22249c13a153c3fe54aa577:

  udp: fix secpath leak (2017-09-01 10:29:34 -0700)


Aleksander Morgado (1):
  cdc_ncm: flag the u-blox TOBY-L4 as wwan

Andrew Jeffery (1):
  net: ftgmac100: Fix oops in probe on failure to find associated PHY

Antoine Tenart (1):
  net: mvpp2: fix the mac address used when using PPv2.2

Arnd Bergmann (1):
  qlge: avoid memcpy buffer overflow

Benjamin Poirier (1):
  packet: Don't write vnet header beyond end of buffer

Bob Peterson (1):
  tipc: Fix tipc_sk_reinit handling of -EAGAIN

Christophe Jaillet (1):
  net: sxgbe: check memory allocation failure

Cong Wang (1):
  wl1251: add a missing spin_lock_init()

Dan Carpenter (1):
  nfp: double free on error in probe

Daniel Borkmann (1):
  bpf: fix map value attribute for hash of maps

David S. Miller (16):
  Merge branch 'master' of git://git.kernel.org/.../klassert/ipsec
  Merge branch 'tipc-topology-server-fixes'
  Merge branch 'net-sched-couple-of-chain-fixes'
  Merge branch 'dst-tag-ksz-fix'
  Merge branch 'nfp-fixes'
  Merge branch 'bnxt_en-bug-fixes'
  Merge git://git.kernel.org/.../pablo/nf
  Merge branch 'tipc-buffer-reassignment-fixes'
  Merge branch 'r8169-Be-drop-monitor-friendly'
  Merge tag 'wireless-drivers-for-davem-2017-08-25' of 
git://git.kernel.org/.../kvalo/wireless-drivers
  Merge branch 'l2tp-tunnel-refs'
  Merge branch 'nfp

Re: [PATCH 1/1] bpf: take advantage of stack_depth tracking in powerpc JIT

2017-09-01 Thread Naveen N. Rao

On 2017/09/02 12:23AM, Sandipan Das wrote:
> Take advantage of stack_depth tracking, originally introduced for
> x64, in powerpc JIT as well. Round up allocated stack by 16 bytes
> to make sure it stays aligned for functions called from JITed bpf
> program.
> 
> Signed-off-by: Sandipan Das 
> ---

LGTM, thanks!
Reviewed-by: Naveen N. Rao 

Michael,
Seeing as this is powerpc specific, can you please take this through 
your tree?


Thanks,
Naveen

>  arch/powerpc/net/bpf_jit64.h  |  7 ---
>  arch/powerpc/net/bpf_jit_comp64.c | 16 ++--
>  2 files changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit64.h b/arch/powerpc/net/bpf_jit64.h
> index 62fa7589db2b..8bdef7ed28a8 100644
> --- a/arch/powerpc/net/bpf_jit64.h
> +++ b/arch/powerpc/net/bpf_jit64.h
> @@ -23,7 +23,7 @@
>   *   [   nv gpr save area] 8*8   |
>   *   [tail_call_cnt  ] 8 |
>   *   [local_tmp_var  ] 8 |
> - * fp (r31) -->  [   ebpf stack space] 512   |
> + * fp (r31) -->  [   ebpf stack space] upto 512  |
>   *   [ frame header  ] 32/112|
>   * sp (r1) --->  [stack pointer  ] --
>   */
> @@ -32,8 +32,8 @@
>  #define BPF_PPC_STACK_SAVE   (8*8)
>  /* for bpf JIT code internal usage */
>  #define BPF_PPC_STACK_LOCALS 16
> -/* Ensure this is quadword aligned */
> -#define BPF_PPC_STACKFRAME   (STACK_FRAME_MIN_SIZE + MAX_BPF_STACK + \
> +/* stack frame excluding BPF stack, ensure this is quadword aligned */
> +#define BPF_PPC_STACKFRAME   (STACK_FRAME_MIN_SIZE + \
>BPF_PPC_STACK_LOCALS + BPF_PPC_STACK_SAVE)
> 
>  #ifndef __ASSEMBLY__
> @@ -103,6 +103,7 @@ struct codegen_context {
>*/
>   unsigned int seen;
>   unsigned int idx;
> + unsigned int stack_size;
>  };
> 
>  #endif /* !__ASSEMBLY__ */
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> b/arch/powerpc/net/bpf_jit_comp64.c
> index 6ba5d253e857..a01362c88f6a 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -69,7 +69,7 @@ static inline bool bpf_has_stack_frame(struct 
> codegen_context *ctx)
>  static int bpf_jit_stack_local(struct codegen_context *ctx)
>  {
>   if (bpf_has_stack_frame(ctx))
> - return STACK_FRAME_MIN_SIZE + MAX_BPF_STACK;
> + return STACK_FRAME_MIN_SIZE + ctx->stack_size;
>   else
>   return -(BPF_PPC_STACK_SAVE + 16);
>  }
> @@ -82,8 +82,9 @@ static int bpf_jit_stack_tailcallcnt(struct codegen_context 
> *ctx)
>  static int bpf_jit_stack_offsetof(struct codegen_context *ctx, int reg)
>  {
>   if (reg >= BPF_PPC_NVR_MIN && reg < 32)
> - return (bpf_has_stack_frame(ctx) ? BPF_PPC_STACKFRAME : 0)
> - - (8 * (32 - reg));
> + return (bpf_has_stack_frame(ctx) ?
> + (BPF_PPC_STACKFRAME + ctx->stack_size) : 0)
> + - (8 * (32 - reg));
> 
>   pr_err("BPF JIT is asking about unknown registers");
>   BUG();
> @@ -134,7 +135,7 @@ static void bpf_jit_build_prologue(u32 *image, struct 
> codegen_context *ctx)
>   PPC_BPF_STL(0, 1, PPC_LR_STKOFF);
>   }
> 
> - PPC_BPF_STLU(1, 1, -BPF_PPC_STACKFRAME);
> + PPC_BPF_STLU(1, 1, -(BPF_PPC_STACKFRAME + ctx->stack_size));
>   }
> 
>   /*
> @@ -161,7 +162,7 @@ static void bpf_jit_build_prologue(u32 *image, struct 
> codegen_context *ctx)
>   /* Setup frame pointer to point to the bpf stack area */
>   if (bpf_is_seen_register(ctx, BPF_REG_FP))
>   PPC_ADDI(b2p[BPF_REG_FP], 1,
> - STACK_FRAME_MIN_SIZE + MAX_BPF_STACK);
> + STACK_FRAME_MIN_SIZE + ctx->stack_size);
>  }
> 
>  static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context 
> *ctx)
> @@ -183,7 +184,7 @@ static void bpf_jit_emit_common_epilogue(u32 *image, 
> struct codegen_context *ctx
> 
>   /* Tear down our stack frame */
>   if (bpf_has_stack_frame(ctx)) {
> - PPC_ADDI(1, 1, BPF_PPC_STACKFRAME);
> + PPC_ADDI(1, 1, BPF_PPC_STACKFRAME + ctx->stack_size);
>   if (ctx->seen & SEEN_FUNC) {
>   PPC_BPF_LL(0, 1, PPC_LR_STKOFF);
>   PPC_MTLR(0);
> @@ -993,6 +994,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
> 
>   memset(&cgctx, 0, sizeof(struct codegen_context));
> 
> + /* Make sure that the stack is quadword aligned. */
> + cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
> +
>   /* Scouting faux-generate pass 0 */
>   if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
>   /* We hit something illegal or unsupported. */
> -- 
> 2.13.5
>

Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection

2017-09-01 Thread Kees Cook

On Fri, Sep 1, 2017 at 10:52 AM, Mike Galbraith  wrote:
> On Fri, 2017-09-01 at 10:12 -0700, Kees Cook wrote:
>> On Fri, Sep 1, 2017 at 6:09 AM, Mike Galbraith  wrote:
>> > On Fri, 2017-09-01 at 08:57 +0200, Mike Galbraith wrote:
>> >> On Thu, 2017-08-31 at 11:45 -0700, Kees Cook wrote:
>> >> > On Thu, Aug 31, 2017 at 10:19 AM, Mike Galbraith  wrote:
>> >> > > On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
>> >> > >>
>> >> > >> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping 
>> >> > >> correct:
>> >> > >>
>> >> > >> 4.8.5: WARN, eventual kernel hang
>> >> > >> 6.3.1, 7.0.1: WARN, but continues working
>> >> > >
>> >> > > Yeah, that's correct.  I find that troubling, simply because this gcc
>> >> > > version has been through one hell of a lot of kernels with me.  Yeah, 
>> >> > > I
>> >> > > know, that doesn't exempt it from having bugs, but color me 
>> >> > > suspicious.
>> >> >
>> >> > I still can't hit this with a 4.8.5 build. :(
>> >> >
>> >> > With _RATELIMIT removed, this should, in theory, report whatever goes
>> >> > negative first...
>> >>
>> >> I applied the other patch you posted, and built with gcc-6.3.1 to
>> >> remove the gcc-4.8.5 aspect.  Look below the resulting splat.
>> >
>> > Grr, that one has a in6_dev_getx() line missing for the first
>> > increment, where things go pear shaped.
>> >
>> > With that added, looking at counter both before, and after incl, with a
>> > trace_printk() in the exception handler showing it doing its saturate
>> > thing, irqs disabled across the whole damn refcount_inc(), and even
>> > booting box nr_cpus=1 for extra credit...
>> >
>> > HTH can that first refcount_inc() get there?
>> >
>> > # tracer: nop
>> > #
>> > #  _-=> irqs-off
>> > # / _=> need-resched
>> > #| / _---=> hardirq/softirq
>> > #|| / _--=> preempt-depth
>> > #||| / delay
>> > #   TASK-PID   CPU#  TIMESTAMP  FUNCTION
>> > #  | |   |      | |
>> >  systemd-1 [000] d..1 1.937284: in6_dev_getx: PRE 
>> > refs.counter:3
>> >  systemd-1 [000] d..1 1.937295: ex_handler_refcount: *(int 
>> > *)regs->cx = -1073741824
>> >  systemd-1 [000] d..1 1.937296: in6_dev_getx: POST 
>> > refs.counter:-1073741824
>>
>> O_o
>>
>> Can you paste the disassembly of in6_dev_getx? I can't understand how
>> we're landing in the exception handler.
>
> I was hoping you'd say that.
>
>0x816b2f72 <+0>: push   %rbp
>0x816b2f73 <+1>: mov%rsp,%rbp
>0x816b2f76 <+4>: push   %r12
>0x816b2f78 <+6>: push   %rbx
>0x816b2f79 <+7>: incl   %gs:0x7e95a2d0(%rip)# 0xd250 
> <__preempt_count>
>0x816b2f80 <+14>:mov0x308(%rdi),%rbx
>0x816b2f87 <+21>:test   %rbx,%rbx
>0x816b2f8a <+24>:je 0x816b2feb 
>0x816b2f8c <+26>:callq  *0x81c35a00
>0x816b2f93 <+33>:mov%rax,%r12
>0x816b2f96 <+36>:callq  *0x81c35a10
>0x816b2f9d <+43>:mov0x769ad4(%rip),%rsi# 
> 0x81e1ca78 
>0x816b2fa4 <+50>:mov0xf0(%rbx),%edx
>0x816b2faa <+56>:mov$0x816b2f8c,%rdi
>0x816b2fb1 <+63>:callq  0x81171fc0 <__trace_bprintk>
>0x816b2fb6 <+68>:lock incl 0xf0(%rbx)
>0x816b2fbd <+75>:js 0x816b2fbf 
>0x816b2fbf <+77>:lea0xf0(%rbx),%rcx
>0x816b2fc6 <+84>:(bad)
>0x816b2fc8 <+86>:mov0x769a99(%rip),%rsi# 
> 0x81e1ca68 
>0x816b2fcf <+93>:mov0xf0(%rbx),%edx
>0x816b2fd5 <+99>:mov$0x816b2f8c,%rdi
>0x816b2fdc <+106>:   callq  0x81171fc0 <__trace_bprintk>
>0x816b2fe1 <+111>:   mov%r12,%rdi
>0x816b2fe4 <+114>:   callq  *0x81c35a08
>0x816b2feb <+121>:   decl   %gs:0x7e95a25e(%rip)# 0xd250 
> <__preempt_count>
>0x816b2ff2 <+128>:   mov%rbx,%rax
>0x816b2ff5 <+131>:   pop%rbx
>0x816b2ff6 <+132>:   pop%r12
>0x816b2ff8 <+134>:   pop%rbp
>0x816b2ff9 <+135>:   retq
>
> I don't get the section business at all, +75 looks to me like we're
> gonna trap no matter what.. as we appear to be doing.

The section stuff is supposed to be a trick to push the error case off
into the .text.unlikely area to avoid needing a jmp over the handler
and with possibly some redundancy removal done by the compiler (though
this appears to be rather limited) if it notices a bunch of error
paths are the same. However, in your disassembly, it's inline (!!) in
the code, as if "pushsection" and "popsection" were entirely ignored.

[PATCH 1/1] bpf: take advantage of stack_depth tracking in powerpc JIT

2017-09-01 Thread Sandipan Das

Take advantage of stack_depth tracking, originally introduced for
x64, in powerpc JIT as well. Round up allocated stack by 16 bytes
to make sure it stays aligned for functions called from JITed bpf
program.

Signed-off-by: Sandipan Das 
---
 arch/powerpc/net/bpf_jit64.h  |  7 ---
 arch/powerpc/net/bpf_jit_comp64.c | 16 ++--
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit64.h b/arch/powerpc/net/bpf_jit64.h
index 62fa7589db2b..8bdef7ed28a8 100644
--- a/arch/powerpc/net/bpf_jit64.h
+++ b/arch/powerpc/net/bpf_jit64.h
@@ -23,7 +23,7 @@
  * [   nv gpr save area] 8*8   |
  * [tail_call_cnt  ] 8 |
  * [local_tmp_var  ] 8 |
- * fp (r31) -->[   ebpf stack space] 512   |
+ * fp (r31) -->[   ebpf stack space] upto 512  |
  * [ frame header  ] 32/112|
  * sp (r1) --->[stack pointer  ] --
  */
@@ -32,8 +32,8 @@
 #define BPF_PPC_STACK_SAVE (8*8)
 /* for bpf JIT code internal usage */
 #define BPF_PPC_STACK_LOCALS   16
-/* Ensure this is quadword aligned */
-#define BPF_PPC_STACKFRAME (STACK_FRAME_MIN_SIZE + MAX_BPF_STACK + \
+/* stack frame excluding BPF stack, ensure this is quadword aligned */
+#define BPF_PPC_STACKFRAME (STACK_FRAME_MIN_SIZE + \
 BPF_PPC_STACK_LOCALS + BPF_PPC_STACK_SAVE)
 
 #ifndef __ASSEMBLY__
@@ -103,6 +103,7 @@ struct codegen_context {
 */
unsigned int seen;
unsigned int idx;
+   unsigned int stack_size;
 };
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 6ba5d253e857..a01362c88f6a 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -69,7 +69,7 @@ static inline bool bpf_has_stack_frame(struct codegen_context 
*ctx)
 static int bpf_jit_stack_local(struct codegen_context *ctx)
 {
if (bpf_has_stack_frame(ctx))
-   return STACK_FRAME_MIN_SIZE + MAX_BPF_STACK;
+   return STACK_FRAME_MIN_SIZE + ctx->stack_size;
else
return -(BPF_PPC_STACK_SAVE + 16);
 }
@@ -82,8 +82,9 @@ static int bpf_jit_stack_tailcallcnt(struct codegen_context 
*ctx)
 static int bpf_jit_stack_offsetof(struct codegen_context *ctx, int reg)
 {
if (reg >= BPF_PPC_NVR_MIN && reg < 32)
-   return (bpf_has_stack_frame(ctx) ? BPF_PPC_STACKFRAME : 0)
-   - (8 * (32 - reg));
+   return (bpf_has_stack_frame(ctx) ?
+   (BPF_PPC_STACKFRAME + ctx->stack_size) : 0)
+   - (8 * (32 - reg));
 
pr_err("BPF JIT is asking about unknown registers");
BUG();
@@ -134,7 +135,7 @@ static void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
PPC_BPF_STL(0, 1, PPC_LR_STKOFF);
}
 
-   PPC_BPF_STLU(1, 1, -BPF_PPC_STACKFRAME);
+   PPC_BPF_STLU(1, 1, -(BPF_PPC_STACKFRAME + ctx->stack_size));
}
 
/*
@@ -161,7 +162,7 @@ static void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
/* Setup frame pointer to point to the bpf stack area */
if (bpf_is_seen_register(ctx, BPF_REG_FP))
PPC_ADDI(b2p[BPF_REG_FP], 1,
-   STACK_FRAME_MIN_SIZE + MAX_BPF_STACK);
+   STACK_FRAME_MIN_SIZE + ctx->stack_size);
 }
 
 static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context 
*ctx)
@@ -183,7 +184,7 @@ static void bpf_jit_emit_common_epilogue(u32 *image, struct 
codegen_context *ctx
 
/* Tear down our stack frame */
if (bpf_has_stack_frame(ctx)) {
-   PPC_ADDI(1, 1, BPF_PPC_STACKFRAME);
+   PPC_ADDI(1, 1, BPF_PPC_STACKFRAME + ctx->stack_size);
if (ctx->seen & SEEN_FUNC) {
PPC_BPF_LL(0, 1, PPC_LR_STKOFF);
PPC_MTLR(0);
@@ -993,6 +994,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 
memset(&cgctx, 0, sizeof(struct codegen_context));
 
+   /* Make sure that the stack is quadword aligned. */
+   cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
+
/* Scouting faux-generate pass 0 */
if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
/* We hit something illegal or unsupported. */
-- 
2.13.5

Re: [RFC net-next 0/8] net: dsa: Multi-queue awareness

2017-09-01 Thread Andrew Lunn

On Fri, Sep 01, 2017 at 11:27:43AM -0700, Florian Fainelli wrote:
> On 09/01/2017 10:55 AM, Andrew Lunn wrote:
> > Hi Florian
> > 
>  tc bind dev sw0p0 queue 0 dev eth0 queue 16
> > 
> > It this the eth0 i don't like here. Why not in the implementation just
> > use something like netdev_master_upper_dev_get('sw0p0')? Or does
> 
> Last I brought this up with Jiri that we should link DSA network devices
> to their master network deviecs with netdev_upper_dev_link() he said
> this was not appropriate for DSA slave network devices, but I can't
> remember why, I would assume that any stacked device set up would do that.

There is some form a linking going, our device names show that:

9: lan5@eth1:  mtu 1500 qdisc noqueue state DOWN mode 
DEFAULT group default qlen 1000
link/ether da:87:2a:03:cf:16 brd ff:ff:ff:ff:ff:ff

> In any case, we need to establish a mapping so we have to specify at
> least the target device's queue number. It is quite similar in premise
> to e.g: enslaving a network device to a bridge port:
> 
> ip link set dev eth0 master br0

But here br0 is absolutely required, we have to say which bridge the
slave port should be a member of.

But what good is eth0 in

tc bind dev sw0p0 queue 0 dev eth0 queue 16

As i said suggesting, you have to somehow verify that eth0 is the
conduit interface sw0p0 is using. Which makes the parameter pointless.
Determine it from the sw0p0 somehow.

  Andrew

[net-next PATCH] bpf: sockmap update/simplify memory accounting scheme

2017-09-01 Thread John Fastabend

Instead of tracking wmem_queued and sk_mem_charge by incrementing
in the verdict SK_REDIRECT paths and decrementing in the tx work
path use skb_set_owner_w and sock_writeable helpers. This solves
a few issues with the current code. First, in SK_REDIRECT inc on
sk_wmem_queued and sk_mem_charge were being done without the peers
sock lock being held. Under stress this can result in accounting
errors when tx work and/or multiple verdict decisions are working
on the peer psock.

Additionally, this cleans up the code because we can rely on the
default destructor to decrement memory accounting on kfree_skb. Also
this will trigger sk_write_space when space becomes available on
kfree_skb() which wasn't happening before and prevent __sk_free
from being called until all in-flight packets are completed.

Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
Signed-off-by: John Fastabend 
Acked-by: Daniel Borkmann 
---
 kernel/bpf/sockmap.c |   18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index db0d99d..f6ffde9 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -111,7 +111,7 @@ static int smap_verdict_func(struct smap_psock *psock, 
struct sk_buff *skb)
 
 static void smap_do_verdict(struct smap_psock *psock, struct sk_buff *skb)
 {
-   struct sock *sock;
+   struct sock *sk;
int rc;
 
/* Because we use per cpu values to feed input from sock redirect
@@ -123,16 +123,16 @@ static void smap_do_verdict(struct smap_psock *psock, 
struct sk_buff *skb)
rc = smap_verdict_func(psock, skb);
switch (rc) {
case SK_REDIRECT:
-   sock = do_sk_redirect_map();
+   sk = do_sk_redirect_map();
preempt_enable();
-   if (likely(sock)) {
-   struct smap_psock *peer = smap_psock_sk(sock);
+   if (likely(sk)) {
+   struct smap_psock *peer = smap_psock_sk(sk);
 
if (likely(peer &&
   test_bit(SMAP_TX_RUNNING, &peer->state) &&
-  sk_stream_memory_free(peer->sock))) {
-   peer->sock->sk_wmem_queued += skb->truesize;
-   sk_mem_charge(peer->sock, skb->truesize);
+  !sock_flag(sk, SOCK_DEAD) &&
+  sock_writeable(sk))) {
+   skb_set_owner_w(skb, sk);
skb_queue_tail(&peer->rxqueue, skb);
schedule_work(&peer->tx_work);
break;
@@ -282,16 +282,12 @@ static void smap_tx_work(struct work_struct *w)
/* Hard errors break pipe and stop xmit */
smap_report_sk_error(psock, n ? -n : EPIPE);
clear_bit(SMAP_TX_RUNNING, &psock->state);
-   sk_mem_uncharge(psock->sock, skb->truesize);
-   psock->sock->sk_wmem_queued -= skb->truesize;
kfree_skb(skb);
goto out;
}
rem -= n;
off += n;
} while (rem);
-   sk_mem_uncharge(psock->sock, skb->truesize);
-   psock->sock->sk_wmem_queued -= skb->truesize;
kfree_skb(skb);
}
 out:

Re: [RFC net-next 0/8] net: dsa: Multi-queue awareness

2017-09-01 Thread Florian Fainelli

On 09/01/2017 10:55 AM, Andrew Lunn wrote:
> Hi Florian
> 
 tc bind dev sw0p0 queue 0 dev eth0 queue 16
> 
> It this the eth0 i don't like here. Why not in the implementation just
> use something like netdev_master_upper_dev_get('sw0p0')? Or does
> 
> tc bind dev sw0p0 queue 0 dev lo queue 16
> 
> make sense?

Last I brought this up with Jiri that we should link DSA network devices
to their master network deviecs with netdev_upper_dev_link() he said
this was not appropriate for DSA slave network devices, but I can't
remember why, I would assume that any stacked device set up would do that.

In any case, we need to establish a mapping so we have to specify at
least the target device's queue number. It is quite similar in premise
to e.g: enslaving a network device to a bridge port:

ip link set dev eth0 master br0

Thanks
-- 
Florian

Re: [PATCH 13/31] timer: Remove meaningless .data/.function assignments

2017-09-01 Thread Krzysztof Halasa

Kees Cook  writes:

> Several timer users needlessly reset their .function/.data fields during
> their timer callback, but nothing else changes them. Some users do not
> use their .data field at all. Each instance is removed here.

For *wan/hdlc*
Acked-by: Krzysztof Halasa 

> --- a/drivers/net/wan/hdlc_cisco.c
> +++ b/drivers/net/wan/hdlc_cisco.c
> @@ -276,8 +276,6 @@ static void cisco_timer(unsigned long arg)
>   spin_unlock(&st->lock);
>  
>   st->timer.expires = jiffies + st->settings.interval * HZ;
> - st->timer.function = cisco_timer;
> - st->timer.data = arg;
>   add_timer(&st->timer);
>  }
>  
> diff --git a/drivers/net/wan/hdlc_fr.c b/drivers/net/wan/hdlc_fr.c
> index de42faca076a..7da2424c28a4 100644
> --- a/drivers/net/wan/hdlc_fr.c
> +++ b/drivers/net/wan/hdlc_fr.c
> @@ -644,8 +644,6 @@ static void fr_timer(unsigned long arg)
>   state(hdlc)->settings.t391 * HZ;
>   }
>  
> - state(hdlc)->timer.function = fr_timer;
> - state(hdlc)->timer.data = arg;
>   add_timer(&state(hdlc)->timer);
>  }

-- 
Krzysztof Halasa

Re: [RFC net-next 0/8] net: dsa: Multi-queue awareness

2017-09-01 Thread Andrew Lunn

Hi Florian

> >> tc bind dev sw0p0 queue 0 dev eth0 queue 16

It this the eth0 i don't like here. Why not in the implementation just
use something like netdev_master_upper_dev_get('sw0p0')? Or does

tc bind dev sw0p0 queue 0 dev lo queue 16

make sense?

 Andrew

Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection

2017-09-01 Thread Mike Galbraith

On Fri, 2017-09-01 at 10:12 -0700, Kees Cook wrote:
> On Fri, Sep 1, 2017 at 6:09 AM, Mike Galbraith  wrote:
> > On Fri, 2017-09-01 at 08:57 +0200, Mike Galbraith wrote:
> >> On Thu, 2017-08-31 at 11:45 -0700, Kees Cook wrote:
> >> > On Thu, Aug 31, 2017 at 10:19 AM, Mike Galbraith  wrote:
> >> > > On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
> >> > >>
> >> > >> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping 
> >> > >> correct:
> >> > >>
> >> > >> 4.8.5: WARN, eventual kernel hang
> >> > >> 6.3.1, 7.0.1: WARN, but continues working
> >> > >
> >> > > Yeah, that's correct.  I find that troubling, simply because this gcc
> >> > > version has been through one hell of a lot of kernels with me.  Yeah, I
> >> > > know, that doesn't exempt it from having bugs, but color me suspicious.
> >> >
> >> > I still can't hit this with a 4.8.5 build. :(
> >> >
> >> > With _RATELIMIT removed, this should, in theory, report whatever goes
> >> > negative first...
> >>
> >> I applied the other patch you posted, and built with gcc-6.3.1 to
> >> remove the gcc-4.8.5 aspect.  Look below the resulting splat.
> >
> > Grr, that one has a in6_dev_getx() line missing for the first
> > increment, where things go pear shaped.
> >
> > With that added, looking at counter both before, and after incl, with a
> > trace_printk() in the exception handler showing it doing its saturate
> > thing, irqs disabled across the whole damn refcount_inc(), and even
> > booting box nr_cpus=1 for extra credit...
> >
> > HTH can that first refcount_inc() get there?
> >
> > # tracer: nop
> > #
> > #  _-=> irqs-off
> > # / _=> need-resched
> > #| / _---=> hardirq/softirq
> > #|| / _--=> preempt-depth
> > #||| / delay
> > #   TASK-PID   CPU#  TIMESTAMP  FUNCTION
> > #  | |   |      | |
> >  systemd-1 [000] d..1 1.937284: in6_dev_getx: PRE 
> > refs.counter:3
> >  systemd-1 [000] d..1 1.937295: ex_handler_refcount: *(int 
> > *)regs->cx = -1073741824
> >  systemd-1 [000] d..1 1.937296: in6_dev_getx: POST 
> > refs.counter:-1073741824
> 
> O_o
> 
> Can you paste the disassembly of in6_dev_getx? I can't understand how
> we're landing in the exception handler.

I was hoping you'd say that.

   0x816b2f72 <+0>: push   %rbp
   0x816b2f73 <+1>: mov%rsp,%rbp
   0x816b2f76 <+4>: push   %r12
   0x816b2f78 <+6>: push   %rbx
   0x816b2f79 <+7>: incl   %gs:0x7e95a2d0(%rip)# 0xd250 
<__preempt_count>
   0x816b2f80 <+14>:mov0x308(%rdi),%rbx
   0x816b2f87 <+21>:test   %rbx,%rbx
   0x816b2f8a <+24>:je 0x816b2feb 
   0x816b2f8c <+26>:callq  *0x81c35a00
   0x816b2f93 <+33>:mov%rax,%r12
   0x816b2f96 <+36>:callq  *0x81c35a10
   0x816b2f9d <+43>:mov0x769ad4(%rip),%rsi# 
0x81e1ca78 
   0x816b2fa4 <+50>:mov0xf0(%rbx),%edx
   0x816b2faa <+56>:mov$0x816b2f8c,%rdi
   0x816b2fb1 <+63>:callq  0x81171fc0 <__trace_bprintk>
   0x816b2fb6 <+68>:lock incl 0xf0(%rbx)
   0x816b2fbd <+75>:js 0x816b2fbf 
   0x816b2fbf <+77>:lea0xf0(%rbx),%rcx
   0x816b2fc6 <+84>:(bad)  
   0x816b2fc8 <+86>:mov0x769a99(%rip),%rsi# 
0x81e1ca68 
   0x816b2fcf <+93>:mov0xf0(%rbx),%edx
   0x816b2fd5 <+99>:mov$0x816b2f8c,%rdi
   0x816b2fdc <+106>:   callq  0x81171fc0 <__trace_bprintk>
   0x816b2fe1 <+111>:   mov%r12,%rdi
   0x816b2fe4 <+114>:   callq  *0x81c35a08
   0x816b2feb <+121>:   decl   %gs:0x7e95a25e(%rip)# 0xd250 
<__preempt_count>
   0x816b2ff2 <+128>:   mov%rbx,%rax
   0x816b2ff5 <+131>:   pop%rbx
   0x816b2ff6 <+132>:   pop%r12
   0x816b2ff8 <+134>:   pop%rbp
   0x816b2ff9 <+135>:   retq

I don't get the section business at all, +75 looks to me like we're
gonna trap no matter what.. as we appear to be doing.

> > --- a/arch/x86/include/asm/refcount.h
> > +++ b/arch/x86/include/asm/refcount.h
> > @@ -55,6 +55,20 @@ static __always_inline void refcount_inc
> > : : "cc", "cx");
> >  }
> >
> > +static __always_inline void refcount_inc_x(refcount_t *r)
> > +{
> > +   unsigned long flags;
> > +
> > +   local_irq_save(flags);
> > +   trace_printk("PRE refs.counter:%d\n", r->refs.counter);
> > +   asm volatile(LOCK_PREFIX "incl %0\n\t"
> > +   REFCOUNT_CHECK_LT_ZERO
> > +   : [counter] "+m" (r->refs.counter)
> > +   : : "cc", "cx");
> 
> Does this need an explicit

Re: virtio_net: ethtool supported link modes

2017-09-01 Thread Michael S. Tsirkin

On Fri, Sep 01, 2017 at 05:19:53PM +0100, Radu Rendec wrote:
> On Fri, 2017-09-01 at 18:43 +0300, Michael S. Tsirkin wrote:
> > On Thu, Aug 31, 2017 at 06:04:04PM +0100, Radu Rendec wrote:
> > > Looking at the code in virtnet_set_link_ksettings, it seems the speed
> > > and duplex can be set to any valid value. The driver will "remember"
> > > them and report them back in virtnet_get_link_ksettings.
> > > 
> > > However, the supported link modes (link_modes.supported in struct
> > > ethtool_link_ksettings) is always 0, indicating that no speed/duplex
> > > setting is supported.
> > > 
> > > Does it make more sense to set (at least a few of) the supported link
> > > modes, such as 10baseT_Half ... 1baseT_Full?
> > > 
> > > I would expect to see consistency between what is reported in
> > > link_modes.supported and what can actually be set. Could you please
> > > share your opinion on this?
> > 
> > I would like to know more about why this is desirable.
> > 
> > We used not to support the modes at all, but it turned out
> > some tools are confused by this: e.g. people would try to
> > bond virtio with a hardware device, tools would see
> > a mismatch in speed and features between bonded devices
> > and get confused.
> > 
> > See
> > 
> > commit 16032be56c1f66770da15cb94f0eb366c37aff6e
> > Author: Nikolay Aleksandrov 
> > Date:   Wed Feb 3 04:04:37 2016 +0100
> > 
> > virtio_net: add ethtool support for set and get of settings
> > 
> > 
> > as well as the discussion around it
> > https://www.spinics.net/lists/netdev/msg362111.html
> 
> Thanks for pointing these out. It is much more clear now why modes
> support is implemented the way it is and what the expectations are.
> 
> > If you think we need to add more hacks like this, a stronger
> > motivation than "to see consistency" would be needed.
> 
> The use case behind my original question is very simple:
>  * Net device is queried via ethtool for supported modes.
>  * Supported modes are presented to user.
>  * User can configure any of the supported modes.

Since this has no effect on virtio, isn't presenting
"no supported modes" to user the right thing to do?

> This is done transparently to the net device type (driver), so it
> actually makes sense for physical NICs.
> 
> This alone of course is not a good enough motivation to modify the
> driver. And it can be easily addressed in user-space at the application
> level by testing for the driver.

I think you might want to special-case no supported modes.
Special-casing virtio is probably best avoided.

> I was merely trying to avoid driver-specific workarounds (i.e. keep the
> application driver agnostic)

I think that's the right approach. So if driver does not present
any supported modes this probably means it is not necessary
to display or program any.

> and wondered if "advertising" supported
> modes through ethtool made any sense and/or would be a desirable change
> from the driver perspective. I believe I have my answers now.
> 
> Thanks,
> Radu

Re: [PATCH net-next] selftests: correct define in msg_zerocopy.c

2017-09-01 Thread David Miller

From: Willem de Bruijn 
Date: Fri,  1 Sep 2017 12:31:51 -0400

> From: Willem de Bruijn 
> 
> The msg_zerocopy test defines SO_ZEROCOPY if necessary, but its value
> is inconsistent with the one in asm-generic.h. Correct that.
> 
> Also convert one error to a warning. When the test is complete, report
> throughput and close cleanly even if the process did not wait for all
> completions.
> 
> Reported-by: Dan Melnic 
> Signed-off-by: Willem de Bruijn 

Applied.

Re: [PATCH net-next v2] doc: document MSG_ZEROCOPY

2017-09-01 Thread David Miller

From: Willem de Bruijn 
Date: Fri,  1 Sep 2017 12:01:41 -0400

> From: Willem de Bruijn 
> 
> Documentation for this feature was missing from the patchset.
> Copied a lot from the netdev 2.1 paper, addressing some small
> interface changes since then.
> 
> Changes
>   v1 -> v2
> - change email discussion URL format
> - clarify that u32 counter is per-syscall, unsigned and
>   wraps after UINT_MAX calls
> - describe errno on send failure specific to MSG_ZEROCOPY
> - a few very minor rewordings
> 
> Signed-off-by: Willem de Bruijn 

Applied.

Re: [PATCH net-next] bpf: Collapse offset checks in sock_filter_is_valid_access

2017-09-01 Thread David Miller

From: David Ahern 
Date: Fri,  1 Sep 2017 08:18:07 -0700

> Make sock_filter_is_valid_access consistent with other is_valid_access
> helpers.
> 
> Requested-by: Daniel Borkmann 
> Signed-off-by: David Ahern 

Applied.

Re: [PATCH] mvneta: Driver and hardware supports IPv6 offload, so enable it

2017-09-01 Thread David Miller

From: Andrew Pilloud 
Date: Fri,  1 Sep 2017 07:49:49 -0700

> The mvneta driver and hardware supports IPv6 offload, however it
> isn't enabled. Set the NETIF_F_IPV6_CSUM feature to inform the
> network layer that this driver can offload IPV6 TCP and UDP
> checksums. This change has been tested on an Armada 370 and the
> feature support confirmed with several device datasheets
> including the Armada XP and Armada 3700.
> 
> Signed-off-by: Andrew Pilloud 

Applied to net-next, thanks.

Re: [PATCH v2 net-next 0/2] net: ubuf_info.refcnt conversion

2017-09-01 Thread Eric Dumazet

On Thu, 2017-08-31 at 17:04 -0700, Eric Dumazet wrote:
> On Thu, 2017-08-31 at 16:48 -0700, Eric Dumazet wrote:
> > Yet another atomic_t -> refcount_t conversion, split in two patches.
> > 
> > First patch prepares the automatic conversion done in the second patch.
> > 
> > Eric Dumazet (2):
> >   net: prepare (struct ubuf_info)->refcnt conversion
> >   net: convert (struct ubuf_info)->refcnt to refcount_t
> > 
> >  drivers/vhost/net.c|  2 +-
> >  include/linux/skbuff.h |  5 +++--
> >  net/core/skbuff.c  | 14 --
> >  net/ipv4/tcp.c |  2 --
> >  4 files changed, 8 insertions(+), 15 deletions(-)
> > 
> 
> David please ignore this series, I will send a V3 :)
> 

No need for a V3, sorry for the confusion, but we had to double check
with Willem that everything had been covered.

Please tell me if I need to resend, thanks !

Re: pull-request: wireless-drivers-next 2017-09-01

2017-09-01 Thread David Miller

From: Kalle Valo 
Date: Fri, 01 Sep 2017 17:34:43 +0300

> here's a pull request to net-next for 4.14. If the merge window opens on
> Sunday I'm planning to have this as the last one.
> 
> Please let me know if there are any problems.

Ok, pulled, thanks.

Re: [PATCH] qlcnic: remove redundant zero check on retries counter

2017-09-01 Thread David Miller

From: Colin King 
Date: Fri,  1 Sep 2017 14:44:31 +0100

> From: Colin Ian King 
> 
> At the end of the do while loop the integer counter retries will
> always be zero and so the subsequent check to see if it is zero
> is always true and therefore redundant.  Remove the redundant check
> and always return -EIO on this return path.  Also unbreak the literal
> string in dev_err message to clean up a checkpatch warning.
> 
> Detected by CoverityScan, CID#744279 ("Logically dead code")
> 
> Signed-off-by: Colin Ian King 

Applied to net-next, thanks.

Re: [RESEND PATCH] Allow passing tid or pid in SCM_CREDENTIALS without CAP_SYS_ADMIN

2017-09-01 Thread Prakash Sangappa




On 8/30/17 10:41 AM, ebied...@xmission.com wrote:

Prakash Sangappa  writes:



With regards to security, the question basically is what is the consequence
of passing the wrong id. As I understand it, Interpreting the id to be pid
or tid, the effective uid and gid will be the same. It would be a problem
only if the incorrect interpretation of the id would refer a different process.
But that cannot happen as the the global tid(gettid() of a thread is
unique.

There is also the issue that the receiving process could look, not see
the pid in proc and assume the sending process is dead.  That I suspect
is the larger danger.



Will this not be a bug in the application, if it is sending the wrong id?


As long as the thread is alive, that id cannot reference another process / 
thread.
Unless the thread were to exit and the id gets recycled and got used for another
thread or process. This would be no different from a process exiting and its
pid getting recycled which is the case now.

Largely I agree.

If all you want are pid translations I suspect the are far easier ways
thant updating the SCM_CREDENTIALS code.


What would be an another easier & efficient way of doing pid translation?

Should a new API/mechanism be considered mainly for pid translation purpose
for use with pid namespaces, say based on 'pipe' something similar to 
I_SENDFD?


Thanks,
-Prakash.


Eric

Re: [PATCH net] udp: fix secpath leak

2017-09-01 Thread David Miller

From: Paolo Abeni 
Date: Fri,  1 Sep 2017 14:42:30 +0200

> From: Yossi Kuperman 
> 
> After commit dce4551cb2ad ("udp: preserve head state for IP_CMSG_PASSSEC")
> we preserve the secpath for the whole skb lifecycle, but we also
> end up leaking a reference to it.
> 
> We must clear the head state on skb reception, if secpath is
> present.
> 
> Fixes: dce4551cb2ad ("udp: preserve head state for IP_CMSG_PASSSEC")
> Signed-off-by: Yossi Kuperman 
> Signed-off-by: Paolo Abeni 

Applied.

Re: [PATCH v2 0/5] net: mdio-mux: Misc fix

2017-09-01 Thread David Miller

From: Corentin Labbe 
Date: Fri,  1 Sep 2017 13:55:59 +0200

> This patch series fix minor problems found when working on the
> dwmac-sun8i syscon mdio-mux.
 ...
> Changes since v1:
> - Removed obsolete comment about of_mdio_find_bus/put_device
> - removed more DRV_VERSION

Series applied to net-next.

Re: [net 0/3] gianfar: Tx flow control fix (adjust_link)

2017-09-01 Thread David Miller

From: Claudiu Manoil 
Date: Fri, 1 Sep 2017 12:41:00 +0300

> Fix a small blunder in the Tx pause frame settings, that
> went unnoticed in the tangled code of adjust_link().
> I followed up with a couple of simple refactoring patches,
> aiming to make adjust_link() more manageable.
> 
> (The last 2 patches may be postponed if they are too much
> for the current stage of net.)

You need to fix some things up here.

First, do not mix bug fixes with cleanups.  Refactoring is a cleanup.

Submit the bug fix for 'net' and then later you can submit the
cleanups for 'net-next'.

Second, do not CC: stable for networking bug fixes, instead explicitly
ask me to queue up the fix for -stable.

Third, you need to fix how you specify your Fixes tag, it must
be exactly:

Fixes: $(SHA1_ID) ("Commit header text.")

And no matter how long the line is, do not break it up.

Thank you.

Re: [PATCH v2 1/3] dt-bindings: add SFF vendor prefix

2017-09-01 Thread Rob Herring

On Wed, Aug 30, 2017 at 12:51:10PM +0300, Baruch Siach wrote:
> Signed-off-by: Baruch Siach 
> ---
> v2: New patch in this series
> ---
>  Documentation/devicetree/bindings/vendor-prefixes.txt | 1 +
>  1 file changed, 1 insertion(+)

Acked-by: Rob Herring

Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection

2017-09-01 Thread Kees Cook

On Fri, Sep 1, 2017 at 6:09 AM, Mike Galbraith  wrote:
> On Fri, 2017-09-01 at 08:57 +0200, Mike Galbraith wrote:
>> On Thu, 2017-08-31 at 11:45 -0700, Kees Cook wrote:
>> > On Thu, Aug 31, 2017 at 10:19 AM, Mike Galbraith  wrote:
>> > > On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
>> > >>
>> > >> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping 
>> > >> correct:
>> > >>
>> > >> 4.8.5: WARN, eventual kernel hang
>> > >> 6.3.1, 7.0.1: WARN, but continues working
>> > >
>> > > Yeah, that's correct.  I find that troubling, simply because this gcc
>> > > version has been through one hell of a lot of kernels with me.  Yeah, I
>> > > know, that doesn't exempt it from having bugs, but color me suspicious.
>> >
>> > I still can't hit this with a 4.8.5 build. :(
>> >
>> > With _RATELIMIT removed, this should, in theory, report whatever goes
>> > negative first...
>>
>> I applied the other patch you posted, and built with gcc-6.3.1 to
>> remove the gcc-4.8.5 aspect.  Look below the resulting splat.
>
> Grr, that one has a in6_dev_getx() line missing for the first
> increment, where things go pear shaped.
>
> With that added, looking at counter both before, and after incl, with a
> trace_printk() in the exception handler showing it doing its saturate
> thing, irqs disabled across the whole damn refcount_inc(), and even
> booting box nr_cpus=1 for extra credit...
>
> HTH can that first refcount_inc() get there?
>
> # tracer: nop
> #
> #  _-=> irqs-off
> # / _=> need-resched
> #| / _---=> hardirq/softirq
> #|| / _--=> preempt-depth
> #||| / delay
> #   TASK-PID   CPU#  TIMESTAMP  FUNCTION
> #  | |   |      | |
>  systemd-1 [000] d..1 1.937284: in6_dev_getx: PRE 
> refs.counter:3
>  systemd-1 [000] d..1 1.937295: ex_handler_refcount: *(int 
> *)regs->cx = -1073741824
>  systemd-1 [000] d..1 1.937296: in6_dev_getx: POST 
> refs.counter:-1073741824

O_o

Can you paste the disassembly of in6_dev_getx? I can't understand how
we're landing in the exception handler.

>  systemd-1 [000] d..1 1.937296: in6_dev_getx: PRE 
> refs.counter:-1073741824
>  systemd-1 [000] d..1 1.937297: ex_handler_refcount: *(int 
> *)regs->cx = -1073741824
>  systemd-1 [000] d..1 1.937297: in6_dev_getx: POST 
> refs.counter:-1073741824
>  systemd-1 [000] d..1 1.937297: in6_dev_getx: PRE 
> refs.counter:-1073741824
>  systemd-1 [000] d..1 1.937298: ex_handler_refcount: *(int 
> *)regs->cx = -1073741824
>  systemd-1 [000] d..1 1.937299: in6_dev_getx: POST 
> refs.counter:-1073741824
>
> ---
>  arch/x86/include/asm/refcount.h |   14 ++
>  arch/x86/mm/extable.c   |1 +
>  include/net/addrconf.h  |   12 
>  net/ipv6/route.c|6 +++---
>  4 files changed, 30 insertions(+), 3 deletions(-)
>
> --- a/arch/x86/include/asm/refcount.h
> +++ b/arch/x86/include/asm/refcount.h
> @@ -55,6 +55,20 @@ static __always_inline void refcount_inc
> : : "cc", "cx");
>  }
>
> +static __always_inline void refcount_inc_x(refcount_t *r)
> +{
> +   unsigned long flags;
> +
> +   local_irq_save(flags);
> +   trace_printk("PRE refs.counter:%d\n", r->refs.counter);
> +   asm volatile(LOCK_PREFIX "incl %0\n\t"
> +   REFCOUNT_CHECK_LT_ZERO
> +   : [counter] "+m" (r->refs.counter)
> +   : : "cc", "cx");

Does this need an explicit "memory" added to the clobbers line here?
This isn't present in the atomic_inc() implementation, but maybe
something confuses gcc in this case into ignoring the "+m" marking?

> +   trace_printk("POST refs.counter:%d\n", r->refs.counter);
> +   local_irq_restore(flags);
> +}
> +
>  static __always_inline void refcount_dec(refcount_t *r)
>  {
> asm volatile(LOCK_PREFIX "decl %0\n\t"
> --- a/arch/x86/mm/extable.c
> +++ b/arch/x86/mm/extable.c
> @@ -45,6 +45,7 @@ bool ex_handler_refcount(const struct ex
>  {
> /* First unconditionally saturate the refcount. */
> *(int *)regs->cx = INT_MIN / 2;
> +   trace_printk("*(int *)regs->cx = %d\n", *(int *)regs->cx);

Just for fun, can you print out *(int *)regs->cx before the assignment too?

>
> /*
>  * Strictly speaking, this reports the fixup destination, not
> --- a/include/net/addrconf.h
> +++ b/include/net/addrconf.h
> @@ -321,6 +321,18 @@ static inline struct inet6_dev *in6_dev_
> return idev;
>  }
>
> +static inline struct inet6_dev *in6_dev_getx(const struct net_device *dev)
> +{
> +   struct inet6_dev *idev;
> +
> +   rcu_read_lock();
> +   idev = rcu_dereference(dev->ip6_ptr);
> +   if (idev)
> +   refcount_inc_x(&idev->refcnt);
> +

Re: [PATCH v2 2/3] dt-binding: net: sfp binding documentation

2017-09-01 Thread Rob Herring

On Wed, Aug 30, 2017 at 04:58:29PM +0200, Andrew Lunn wrote:
> > > > > Your example shows there's GPIO phandle *and* specifier.
> > > > 
> > > > Would "GPIO specifier" be enough here?
> > > 
> > >No, specifier is the cells following GPIO (or any other) phandle.
> > 
> > So this should be "GPIO phandle and specifier of ...", is that correct?
> > 
> > I have found very few (< 4) occurrences of this language in (lots of) 
> > '-gpios' 
> > property descriptions under Documentation/devicetree/bindings/. Is this a 
> > new 
> > requirement?
> 
> Sometimes it is just easier to refer to another document:
> 
> GPIO, as defined in Documentation/devicetree/binding/gpio/gpio.txt

Yes, and what I care about here is how many GPIOs, direction and active 
state. IOW, worry about the information necessary to validate a specific 
instance is correct. And hopefully someday we'll have a format parseable 
to do that checking, and all the free form text will be gone.

Rob

1 2 3 >

1 - 100 of 229 matches

Mail list logo