RE: A reduced Linux network stack for small systems

2014-05-06 Thread David Laight
From: Andi Kleen
 There has been a lot of interest recently to run Linux on very small systems,
 like Quark systems. These may have only 2-4MB memory. They are also limited
 by flash space.

I'm intrigued about the 2-4MB memory.
That is more that would typically be available on-chip in a DSP or FPGA.
It sounds like an expensive SRAM chip.
OTOH a single SDRAM gives 16MB and DDR a lot more - and are a lot cheaper
and lower power.
Most modern silicon can easily have SDRAM/DDR interfaces.

You may want some size reduction to run in 16MB, but it is not as problematic
as running in 2MB.

With that little memory I wouldn't want to run anything that relied on
dynamic memory allocation (after startup) - except for fixed size data
buffers.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH/RFC] usb: gadget: CONFIG_USB_S3C_HSOTG should depend on HAS_DMA

2014-05-06 Thread David Laight
From: Geert Uytterhoeven
 If NO_DMA=y:
 
 drivers/built-in.o: In function `s3c_hsotg_map_dma':
 s3c-hsotg.c:(.text+0x375b2c): undefined reference to `usb_gadget_map_request'
 drivers/built-in.o: In function `s3c_hsotg_unmap_dma':
 s3c-hsotg.c:(.text+0x376a32): undefined reference to 
 `usb_gadget_unmap_request'
 make[3]: *** [vmlinux] Error 1
 
 Note that all callers of s3c_hsotg_map_dma()/s3c_hsotg_unmap_dma() are
 protected by a call into the using_dma() checking function:
 
 static inline bool using_dma(struct s3c_hsotg *hsotg)
 {
 return false;   /* support is not complete */
 }
 
 but not all versions of gcc manage to optimize the check away, causing a
 link error.

A have seen gcc optimise away a call, but leave the undefined symbol
in the symbol table.
That might be what is happening here.

I don't know a generic solution, I used 'objcopy -N__udivdi3' to remove
the explicit unwanted reference from my driver.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 00/13] Refactor pci_is_brdige() to simplify code

2014-04-25 Thread David Laight
From: Yijing Wang
 This patchset rename the current pci_is_bridge() to pci_has_subordinate(),
 and introduce a new pci_is_bridge() which determine pci bridge by check
 dev-hdr_type. The new one is more accurate. PCIe Spec define the pci
 device is a bridge by the dev-hdr_type = 0x01 || 0x02.

That is a dangerous rename and is likely to cause difficult to
identify bugs in any code you've missed.

David



RE: [PATCH v3 2/8] DMA: Freescale: unify register access methods

2014-04-10 Thread David Laight
From: hongbo.zh...@freescale.com
 Methods of accessing DMA contorller registers are inconsistent, some registers
   ^^
 are accessed by DMA_IN/OUT directly, while others are accessed by functions
 get/set_* which are wrappers of DMA_IN/OUT, and even for the BCR register, it
 is read by get_bcr but written by DMA_OUT.
 This patch unifies the inconsistent methods, all registers are accessed by
 get/set_* now.

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] tcp: fix compiler array bounds warning on selective_acks[]

2014-04-11 Thread David Laight
Bjorn Helgaas 
 With -Werror=array-bounds, gcc v4.8.x warns that in tcp_sack_remove(), a
 selective_acks[] array subscript is above array bounds.
 
 I don't understand how gcc figures this out, or why we don't see similar
 problems many other places, but this is the only fix I can figure out.
...
 diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
 index 65cf90e063d5..65133b108236 100644
 --- a/net/ipv4/tcp_input.c
 +++ b/net/ipv4/tcp_input.c
 @@ -4047,7 +4047,8 @@ static void tcp_sack_remove(struct tcp_sock *tp)
 
   /* Zap this SACK, by moving forward any other SACKS. */
   for (i = this_sack+1; i  num_sacks; i++)
 - tp-selective_acks[i-1] = tp-selective_acks[i];
 + if (i  ARRAY_SIZE(tp-selective_acks))
 + tp-selective_acks[i-1] = 
 tp-selective_acks[i];
   num_sacks--;
   continue;
   }

You really shouldn't add that test every time around the loop.

Try changing the loop so the assignment is:
   tp-selective_acks[i] = tp-selective_acks[i + 1];

or the loop test to:
i = num_sacks - 1;

Or beat up the gcc developers :-)

David



RE: [PATCH v2 15/45] rcu: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-27 Thread David Laight
  It would also increase the latency of CPU-hotunplug operations.
 
  Is that a big deal?
 
  I thought that was the whole deal with this patchset - making cpu
  hotunplugs lighter and faster mostly for powersaving.  That said, just
  removing stop_machine call would be a pretty good deal and I don't
  know how meaningful reducing CPU hotunplug latency is.  Srivatsa?
 
 
 Keeping the hotunplug latency is important for suspend/resume, where
 we take all non-boot CPUs in a loop. That's an interesting use-case
 where intrusiveness doesn't matter much, but latency does. So yes,
 making CPU hotplug faster is also one of the goals of this patchset.

If you are removing all but one of the cpu, the you only need
one rcu cycle (remove everything from the list first).

I'd also guess that you can't suspend a cpu until you can sleep
the process that is running on it - so if a process has pre-emption
disabled you aren't going to complete suspend until the process
sleeps (this wouldn't be true if you suspended the cpu with its
current stack - but if suspend is removing the non-boot cpus first
it must be doing so from the scheduler idle loop).

If you are doing suspend for aggressive power saving, then all the
processes (and processors) will already be idle. However you
probably wouldn't want the memory accesses to determine this on
a large NUMA system with 1024+ processors.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] hso: fix deadlock when receiving bursts of data

2014-07-10 Thread David Laight
From: Olivier Sobrie
...
 The function put_rxbuf_data() is called from the urb completion handler.
 It puts the data of the urb transfer in the tty buffer with
 tty_insert_flip_string_flags() and schedules a work queue in order to
 push the data to the ldisc.
 Problem is that we are in a urb completion handler so we can't wait
 until there is room in the tty buffer.

Surely you can just keep the urb?
Resubmit it later when all the data has been transferred.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] hso: fix deadlock when receiving bursts of data

2014-07-10 Thread David Laight
From: One Thousand Gnomes
 On Thu, 10 Jul 2014 14:37:37 +
 David Laight david.lai...@aculab.com wrote:
 
  From: Olivier Sobrie
  ...
   The function put_rxbuf_data() is called from the urb completion handler.
   It puts the data of the urb transfer in the tty buffer with
   tty_insert_flip_string_flags() and schedules a work queue in order to
   push the data to the ldisc.
   Problem is that we are in a urb completion handler so we can't wait
   until there is room in the tty buffer.
 
 The tty provides the input queue, if the queue is full then just chuck
 the data in the bitbucket.  hso is trying to be far too clever.
 
 If hso is fast enough that the buffering isn't sufficient on the tty side
 then we need to fix the tty buffer size.

You really want to apply flow control back over the 'serial' link.
That may just cause data discards earlier on the local system.
But it is possible that not resubmitting the receive urb will cause the
modem to flow control back to the sender.
In which case there is some chance that data won't be lost.

David

 
 Arguably what we need are tty fastpaths for non N_TTY but thats rather
 more work. There's no fundamental reason that hso can't throw the buffer
 at buffer straight at the PPP ldisc and straight into the network stack -
 just that
 
 1. The tty mid layer glue is standing in the way
 2. The change of line discipline code has to lock against it
 
 Alan


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] hso: fix deadlock when receiving bursts of data

2014-07-11 Thread David Laight
From: Olivier Sobrie Olivier Sobrie
 Hi Alan and Davids,
 
 On Thu, Jul 10, 2014 at 04:50:03PM +0100, One Thousand Gnomes wrote:
  On Thu, 10 Jul 2014 14:37:37 +
  David Laight david.lai...@aculab.com wrote:
 
   From: Olivier Sobrie
   ...
The function put_rxbuf_data() is called from the urb completion handler.
It puts the data of the urb transfer in the tty buffer with
tty_insert_flip_string_flags() and schedules a work queue in order to
push the data to the ldisc.
Problem is that we are in a urb completion handler so we can't wait
until there is room in the tty buffer.
 
  The tty provides the input queue, if the queue is full then just chuck
  the data in the bitbucket.  hso is trying to be far too clever.
 
  If hso is fast enough that the buffering isn't sufficient on the tty side
  then we need to fix the tty buffer size.
 
 Ok I'll adapt the patch to drop the data that can't be put in the tty
 buffer. I test this and resend a new patch.

If you are going to drop data, then ideally you want to discard entire ppp
packets. Depending on exactly how the interface works it might be that
urb are likely to contain complete packets.

So discarding entire urb might work better than discarding a few bytes.
(But don't even think of scanning the data stream in the usb driver - except
for experiments.)

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/5] xhci: Use correct SLOT ID when handling a reset device command

2014-06-24 Thread David Laight
From: Of Mathias Nyman
 Command completion events normally include command completion status,
 SLOT_ID, and a pointer to the original command. Reset device command
 completion SLOT_ID may be zero according to xhci specs 4.6.11.
 
 VIA controllers set the SLOT_ID to zero, triggering a WARN_ON in the
 command completion handler.
 
 Use the SLOT ID found from the original command instead.
 
 This patch should be applied to stable kernels since 3.13 that contain
 the commit 20e7acb13ff48fbc884d5918c3697c27de63922a
 xhci: use completion event's slot id rather than dig it out of command
 
 Cc: sta...@vger.kernel.org # 3.13
 Reported-by: Saran Neti saran...@gmail.com
 Tested-by: Saran Neti saran...@gmail.com
 Signed-off-by: Mathias Nyman mathias.ny...@linux.intel.com
 ---
  drivers/usb/host/xhci-ring.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
 index d67ff71..71657d3 100644
 --- a/drivers/usb/host/xhci-ring.c
 +++ b/drivers/usb/host/xhci-ring.c
 @@ -1433,8 +1433,11 @@ static void handle_cmd_completion(struct xhci_hcd 
 *xhci,
   xhci_handle_cmd_reset_ep(xhci, slot_id, cmd_trb, cmd_comp_code);
   break;
   case TRB_RESET_DEV:
 - WARN_ON(slot_id != TRB_TO_SLOT_ID(
 - le32_to_cpu(cmd_trb-generic.field[3])));
 + /* SLOT_ID field in reset device cmd completion event TRB is 0.

Minor nit...
Surely is would be better to say 'is undefined' or 'may be zero'.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-24 Thread David Laight
From: Alexey Kardashevskiy
...
  So IMHO we should either create new, generic iowrite helpers that don't
  do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls.
 
  I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense
 
 
 I do not understand why @val is considered LE here and need to be converted
 to CPU. Really. I truly believe it should be cpu_to_le32().

Think about it carefully...

Apparently iowrite32() is writing a 'cpu' value out 'le'.
So if you have a 'le' value you need to convert it to 'cpu' first.

I really won't ask how 'be' ppc managed to get 'le' on-chip peripherals.

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] USB: ehci-pci: USB host controller support for Intel Quark X1000

2014-06-25 Thread David Laight
From: Chen, Alvin
  On Tue, Jun 24, 2014 at 08:51:43AM -0700, Chen, Alvin wrote:
   diff --git a/drivers/usb/host/pci-quirks.c
   b/drivers/usb/host/pci-quirks.c index 00661d3..1ea8803 100644
   --- a/drivers/usb/host/pci-quirks.c
   +++ b/drivers/usb/host/pci-quirks.c
   @@ -823,6 +823,48 @@ static int handshake(void __iomem *ptr, u32 mask,
  u32 done,
 return -ETIMEDOUT;
}
  
   +#define PCI_DEVICE_ID_INTEL_QUARK_X1000_SOC  0x0939
   +bool usb_is_intel_qrk(struct pci_dev *pdev) {
   + return pdev-vendor == PCI_VENDOR_ID_INTEL 
   + pdev-device == PCI_DEVICE_ID_INTEL_QUARK_X1000_SOC;
   +
   +}
   +EXPORT_SYMBOL_GPL(usb_is_intel_qrk);
 
  There is no lack of vowels available to you, please use them.
 
 
 OK, I will change ' usb_is_intel_qrk ' to ' usb_is_intel_quark'.

Or even usb_is_intel_quark_x1000() ?

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2] sched: Fix compiler warnings

2014-06-25 Thread David Laight
From: Guenter Roeck
 On 06/25/2014 07:49 AM, Uwe Kleine-Knig wrote:
  Hello Guenter,
 
  On Wed, Jun 25, 2014 at 07:27:47AM -0700, Guenter Roeck wrote:
  Maybe the author's intention was:
 
static inline int cpu_corepower_flags(void) __attribute__((const));
 
  ?
  This specifies that the function has no side effects and the return value
  only depends on the (here non-existing) function arguments.
 
 
  Possibly, but either I am missing something or this doesn't compile.
  You need to do a separate declaration:
 
  static inline int cpu_corepower_flags(void) __attribute__((const));
  static inline int cpu_corepower_flags(void)
  {
  ...
 
 Actually turns out one can use __attribute_const__, and it is
 
   static inline int __attribute_const__ cpu_corepower_flags(void)
 
 which turns out to be widely used.
 
 I'll change that and resubmit after testing.

You don't need to tell the compiler that for an inline function.

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH v2] sched: Fix compiler warnings

2014-06-25 Thread David Laight
From: Uwe Kleine-König 
 Hello,
 
 On Wed, Jun 25, 2014 at 03:40:28PM +, David Laight wrote:
  From: Guenter Roeck
   Actually turns out one can use __attribute_const__, and it is
  
 static inline int __attribute_const__ cpu_corepower_flags(void)
  
   which turns out to be widely used.
  
   I'll change that and resubmit after testing.
 
  You don't need to tell the compiler that for an inline function.
 I didn't check for the functions in question here, but in general your
 statement is wrong.
 
 For example:
 
 static inline unsigned int __attribute_const__ read_cpuid_id(void)
 {
   return readl(BASEADDR_V7M_SCB + V7M_SCB_CPUID);
 }
 
 from arch/arm/include/asm/cputype.h. The V7M_SCB_CPUID register never
 changes, but there is no way gcc can deduce that.

Hmm... it all rather depends on the order of the optimisations and 
inlining.

I've tried to use 'restrict' on the parameters to an inline function
in an attempt to get 'noalias' - but the reverse inference never
seems to be applied.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2 net-next 0/2] split BPF out of core networking

2014-06-23 Thread David Laight
From: Chema Gonzalez
...
 4.5. BPF_ST|BPF_MEM
 Operation: *(size *) (dst_reg + off16) = imm32
 
 This insn encodes 2 immediate values (the offset and the imm32 value)
 in the insn, and actually forces the sock_filter_int 64-bit struct to
 have both a 16-bit offset field and a 32-bit immediate field). In
 fact, it's the only instructions that uses .off and .imm at the same
 time (for all other instructions, at least one of the fields is always
 0).
 
 This did not exist in classic BPF (where BPF_ST|BPF_MEM actually did
 mem[pc-k] = A;). In fact, it's rare to find an ISA that allows
 encoding 2 immediate values in a single insn. My impression (after
 checking the x86 JIT implementation, which works on the eBPF code) is
 that this was added as an x86 optimization, because x86 allows
 encoding 2 values (offset and immediate) by using the displacement and
 immediate suffixes. I wonder whether the ISA would be more readable if
 we did this in 2 insn, one to put dst_reg+off16 in a temporary
 register, and the second a simpler BPF_STX|BPF_MEM. Then we could use
 the same space for the immediate and offset fields.

One option is to add code to the x86 JIT to detect the two instruction
sequence and generate a single instruction.

Thinks further, the JIT might be easier to write if there is a temporary
register that is defined to be only valid for the next instruction (or two).
Then the JIT can completely optimise away any assignments to it without
requiring a full analysis of the entire program.

David



RE: [PATCH net-next 1/2] net: filter: split filter.c into two files

2014-07-23 Thread David Laight
From: Varka Bhadram
 On 07/23/2014 11:31 AM, Alexei Starovoitov wrote:
  BPF is used in several kernel components. This split creates logical 
  boundary
  between generic eBPF core and the rest
 
  kernel/bpf/core.c: eBPF interpreter
 
  net/core/filter.c: classic-eBPF converter, classic verifiers, socket 
  filters
 
  This patch only moves functions.
 
 If we are moving the code also its good to do cleanup.

Except you need to do it as a separate patch.

David

 Run checkpatch.pl on this...
 
 
 --
 Regards,
 Varka Bhadram.
 
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2] USB: ehci-pci: USB host controller support for Intel Quark X1000

2014-06-27 Thread David Laight
From: Jingoo Han

...
 /* The maximal threshold value is 0x80, which means 512 bytes */
 #define EHCI_THRESHOLD_512BYTES   0x80
 #define EHCI_THRESHOLD_508BYTES   0x79

It would be better to define these using expressions. So:
#define EHCI_THRESHOLD_512BYTES (512u / 8u)
#define EHCI_THRESHOLD_508BYTES (508u / 8u)

Then you might decide to use:
#define EHCI_THRESHOLD(size) ((size) / 8u)

Then realise that the names are not generic EHCI, so need some
driver-specific prefix (for namespace reasons).

And that the defines are probably limit values, and should
be named as such.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCHv2] usb: gadget: serial: replace hardcoded ttyGS with PREFIX

2014-06-30 Thread David Laight
From: Of Richard Leitner
 Replaces all hardcoded ttyGS strings with the PREFIX macro.
 Due to the fact the strings are spread over different source files the
 PREFIX definition is moved to u_serial.h

Lots of changes like:
 - DBG(cdev, acm ttyGS%d completion, err %d\n,
 - acm-port_num, req-status);
 + DBG(cdev, acm %s%d completion, err %d\n,
 + PREFIX, acm-port_num, req-status);

I'm not sure this improves the code.
Do you see a need to ever change PREFIX?

Even if it is a reasonable change, the name PREFIX is hopeless.
It would have to be something like ACM_TTYNAME_PREFIX.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCHv2] usb: gadget: serial: replace hardcoded ttyGS with PREFIX

2014-06-30 Thread David Laight
From: Richard Leitner 
 On Mon, 30 Jun 2014 08:41:18 +
 David Laight david.lai...@aculab.com wrote:
 
  From: Of Richard Leitner
   Replaces all hardcoded ttyGS strings with the PREFIX macro.
   Due to the fact the strings are spread over different source files the
   PREFIX definition is moved to u_serial.h
 
  Lots of changes like:
   - DBG(cdev, acm ttyGS%d completion, err %d\n,
   - acm-port_num, req-status);
   + DBG(cdev, acm %s%d completion, err %d\n,
   + PREFIX, acm-port_num, req-status);
 
  I'm not sure this improves the code.
 
 Maybe you're right, the readability of the code wouldn't be better afterwards,
...

Indeed it will make most attempts to grep for the error message fail.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH RFC net-next 07/14] bpf: expand BPF syscall with program load/unload

2014-06-30 Thread David Laight
From: Alexei Starovoitov
 On Fri, Jun 27, 2014 at 5:19 PM, Andy Lutomirski l...@amacapital.net wrote:
  On Fri, Jun 27, 2014 at 5:05 PM, Alexei Starovoitov a...@plumgrid.com 
  wrote:
  eBPF programs are safe run-to-completion functions with load/unload
  methods from userspace similar to kernel modules.
 
  User space API:
 
  - load eBPF program
prog_id = bpf_prog_load(int prog_id, bpf_prog_type, struct nlattr *prog, 
  int len)
 
where 'prog' is a sequence of sections (currently TEXT and LICENSE)
TEXT - array of eBPF instructions
LICENSE - GPL compatible
  +
  +   err = -EINVAL;
  +   /* look for mandatory license string */
  +   if (!tb[BPF_PROG_LICENSE])
  +   goto free_attr;
  +
  +   /* eBPF programs must be GPL compatible */
  +   if (!license_is_gpl_compatible(nla_data(tb[BPF_PROG_LICENSE])))
  +   goto free_attr;
 
  Seriously?  My mind boggles.
 
 Yes. Quite a bit of logic can fit into one eBPF program. I don't think it's 
 wise
 to leave this door open for abuse. This check makes it clear that if you
 write a program in C, the source code must be available.

That seems utterly extreme.
Loadable kernel modules don't have to be GPL.

I can imagine that some people might not want to load code for which
they don't have the source - but in that case they probably want to
compile it themselves anyway.

I don't want to have to put a gpl licence on random pieces of test
code I might happen to write for my own use.

David



RE: [PATCH] vsprintf: Remove SPECIAL from pointer types

2014-07-07 Thread David Laight
From: Joe Perches
 Because gcc issues a complaint about any pointer format with %#p,
 remove the use of SPECIAL to prefix 0x to various pointer types.
 
 There are no uses in the kernel tree of %#p.

I know you guys don't really care about them, but there might
be uses in out of tree drivers.

With the change what is output for %#p ?

I know I've used %#p in some code (possibly userspace) that need
to run under multiple OS because 0x%p generates 0x0x on one of the OS.
(We might have removed them because of the gcc warning though.)

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] hso: fix deadlock when receiving bursts of data

2014-07-07 Thread David Laight
From: Olivier Sobrie
 When the module sends bursts of data, sometimes a deadlock happens in
 the hso driver when the tty buffer doesn't get the chance to be flushed
 quickly enough.
 
 To avoid this, first, we remove the endless while loop in
 put_rx_bufdata() which is the root cause of the deadlock.
 Secondly, when there is no room anymore in the tty buffer, we set up a
 timer of 100 msecs to give a chance to the upper layer to flush the tty
 buffer and make room for new data.

What is the timer for?
You need to get the sending code woken up by the urb completion.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] hso: fix deadlock when receiving bursts of data

2014-07-07 Thread David Laight
From: Olivier Sobrie 
 Hi David,
 
 On Mon, Jul 07, 2014 at 09:13:53AM +, David Laight wrote:
  From: Olivier Sobrie
   When the module sends bursts of data, sometimes a deadlock happens in
   the hso driver when the tty buffer doesn't get the chance to be flushed
   quickly enough.
  
   To avoid this, first, we remove the endless while loop in
   put_rx_bufdata() which is the root cause of the deadlock.
   Secondly, when there is no room anymore in the tty buffer, we set up a
   timer of 100 msecs to give a chance to the upper layer to flush the tty
   buffer and make room for new data.
 
  What is the timer for?
  You need to get the sending code woken up by the urb completion.
 
 In put_rxbuf_data() (which can be called under irq disabled),
 tty_flip_buffer_push() is called and schedules a push of the tty buffer.
 When the buffer is full, I give some time to the above layer in order
 to flush it.
 The timer is used to recall put_rxbuf_data_and_resubmit_bulk_urb()
 later in order to read the remaining data stored in
 urb-transfer_buffer and then to resubmit the urb to receive more data
 from the gsm module.
 I don't understand what you mean by getting the sending code woken up.
 Calling tty_port_tty_wakeup()?? We are in the receive path...

Sorry, it isn't at all clear from the general description whether you
are referring to the transmit or receive path.

'flush' can mean all sorts of things ...

In either case a 100ms delay to data doesn't seem right at all.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/3] PCI/MSI: Add pci_enable_msi_partial()

2014-07-08 Thread David Laight
From: Bjorn Helgaas
...
  Even if you do that, you ought to write valid interrupt information
  into the 4th slot (maybe replicating one of the earlier interrupts).
  Then, if the device does raise the 'unexpected' interrupt you don't
  get a write to a random kernel location.
 
  I might be missing something, but we are talking of MSI address space
  here, aren't we? I am not getting how we could end up with a 'write'
  to a random kernel location when a unclaimed MSI vector sent. We could
  only expect a spurious interrupt at worst, which is handled and reported.
 
 Yes, that's how I understand it.  With MSI, the OS specifies the a
 single Message Address, e.g., a LAPIC address, and a single Message
 Data value, e.g., a vector number that will be written to the LAPIC.
 The device is permitted to modify some low-order bits of the Message
 Data to send one of several vector numbers (the MME value tells the
 device how many bits it can modify).
 
 Bottom line, I think a spurious interrupt is the failure we'd expect
 if a device used more vectors than the OS expects it to.

So you need to tell the device where to write in order to raise the
'spurious interrupt'.

David



RE: [PATCH 1/2] net: cadence: macb: add support for the WOL

2014-07-14 Thread David Laight
From: Varka Bhadram
 On 07/14/2014 02:32 PM, Jongsung Kim wrote:
  This patch enables the ethtool utility to control the WOL function
  of the PHY connected to the GEM/MACB. (if supported)
...
  +static int macb_set_wol(struct net_device *netdev, struct ethtool_wolinfo 
  *wol)
  +{
  +   struct macb *bp = netdev_priv(netdev);
  +   struct phy_device *phydev = bp-phy_dev;
  +   int err = -ENODEV;
  +
  +   if (phydev)
  +   err = phy_ethtool_set_wol(phydev, wol);
  +
  +   return err;
  +}
  +
 
 I think we can do in this way:
 
 if (phydev)
   return phy_ethtool_set_wol(phydev, wol);
 else
   return -ENODEV;
 
 
 we can save err. What do you say ...?

I would do:
if (!phydev)
return -ENODEV;
return phy_ethtool_set_wol(phydev, wol);

Although it might even be worth moving the NULL test into the function.
(sort of depends on style and the number of callers who need to do the test.)

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH RFC net-next 08/14] bpf: add eBPF verifier

2014-07-02 Thread David Laight
From: Alexei Starovoitov
...
  +#define _(OP) ({ int ret = OP; if (ret  0) return ret; })
...
  +   _(get_map_info(env, map_id, map));
 
  Nit: such macros should be removed, please.
 
 It may surely look unconventional, but alternative is to replace
 every usage of _ macro with:
 err =
 if (err)
   return err;
 
 and since this macro is used 38 times, it will add ~120 unnecessary
 lines that will only make code much harder to follow.
 I tried not using macro and results were not pleasing.

The problem is that they are hidden control flow.
As such they make flow analysis harder for the casual reader.

The extra lines really shouldn't matter.

David



RE: [PATCH][RFC] USB: zerocopy support for usbfs

2014-07-03 Thread David Laight
From: Stefan Klug
...
 Is there any way to check if the host controller supports arbitrary
 alignment?
 If I read the xhci spec correctly arbitrary alignment is explicitly
 permitted.

Not entirely.
The xhci spec has a few limits on the alignment of transfer buffer.
They seem to be designed to make life difficult for the kernel!
1) Transfer buffers cannot be longer than 64k.
2) Transfer buffers cannot cross 64k address boundaries.
3) The end of a ring segment must occur on a USB packet boundary.

The current xhci driver doesn't implement check 3 - which causes
certain devices to fail (notable the ax88179_178a usb3 ethernet).

David



RE: [PATCH RFC net-next 08/14] bpf: add eBPF verifier

2014-07-03 Thread David Laight
From: Alexei Starovoitov
  +#define _(OP) ({ int ret = OP; if (ret  0) return ret; })
  +1 to removing the _ macro. If you want to avoid the 3 lines (is there
  anything in the style guide against if ((err=OP)  0) ... ?), at
 
 assignment and function call inside 'if' ? I don't like such style.
 
  least use some meaningful macro name (DO_AND_CHECK, or something like
  that).

It would have to be RETURN_IF_NEGATIVE().
But even then it is skipped by searches for 'return'.

 Try replacing _ with any other name and see how bad it will look.
 I tried with MACRO_NAME and with 'if (err) goto' and with 'if (err) return',
 before I converged on _ macro.
 I think it's a hidden gem of this patch.

No, it is one of those things that 'seems like a good idea at the time',
but causes grief much later on.

Have you considered saving the error code into 'env' and making most of
the functions return if an error is set?
Then the calling code need not check the result of every function call.

David



RE: [PATCH 1/3] PCI/MSI: Add pci_enable_msi_partial()

2014-07-03 Thread David Laight
From: Bjorn Helgaas
 On Tue, Jun 10, 2014 at 03:10:30PM +0200, Alexander Gordeev wrote:
  There are PCI devices that require a particular value written
  to the Multiple Message Enable (MME) register while aligned on
  power of 2 boundary value of actually used MSI vectors 'nvec'
  is a lesser of that MME value:
 
  roundup_pow_of_two(nvec)  'Multiple Message Enable'
 
  However the existing pci_enable_msi_block() interface is not
  able to configure such devices, since the value written to the
  MME register is calculated from the number of requested MSIs
  'nvec':
 
  'Multiple Message Enable' = roundup_pow_of_two(nvec)
 
 For MSI, software learns how many vectors a device requests by reading
 the Multiple Message Capable (MMC) field.  This field is encoded, so a
 device can only request 1, 2, 4, 8, etc., vectors.  It's impossible
 for a device to request 3 vectors; it would have to round up that up
 to a power of two and request 4 vectors.
 
 Software writes similarly encoded values to MME to tell the device how
 many vectors have been allocated for its use.  For example, it's
 impossible to tell the device that it can use 3 vectors; the OS has to
 round that up and tell the device it can use 4 vectors.
 
 So if I understand correctly, the point of this series is to take
 advantage of device-specific knowledge, e.g., the device requests 4
 vectors via MMC, but we know the device is only capable of using 3.
 Moreover, we tell the device via MME that 4 vectors are available, but
 we've only actually set up 3 of them.
...

Even if you do that, you ought to write valid interrupt information
into the 4th slot (maybe replicating one of the earlier interrupts).
Then, if the device does raise the 'unexpected' interrupt you don't
get a write to a random kernel location.

Plausibly something similar should be done when a smaller number of
interrupts is assigned.

David



RE: [PATCH 1/3] PCI/MSI: Add pci_enable_msi_partial()

2014-07-04 Thread David Laight
From: Alexander Gordeev
...
  Even if you do that, you ought to write valid interrupt information
  into the 4th slot (maybe replicating one of the earlier interrupts).
  Then, if the device does raise the 'unexpected' interrupt you don't
  get a write to a random kernel location.
 
 I might be missing something, but we are talking of MSI address space
 here, aren't we? I am not getting how we could end up with a 'write'
 to a random kernel location when a unclaimed MSI vector sent. We could
 only expect a spurious interrupt at worst, which is handled and reported.
 
 Anyway, as I described in my reply to Bjorn, this is not a concern IMO.

I'm thinking of the following - which might be MSI-X ?
1) Hardware requests some interrupts and tells the host the BAR (and offset)
   where the 'vectors' should be written.
2) To raise an interrupt the hardware uses the 'vector' as the address
   of a normal PCIe write cycle.

So if the hardware requests 4 interrupts, but the driver (believing it
will only use 3) only write 3 vectors, and then the hardware uses the
4th vector it can write to a random location.

Debugging that would be hard!

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 3/7] drivers: usb: Include appropriate header file in hcd.h

2013-12-19 Thread David Laight
 From: Rashika Kheria
 Include header file include/linux/usb.h in include/linux/usb/hcd.h
 because structures usb_device, usb_host_config and usb_interface have
 their definitions in include/linux/usb.h.
 
 This eliminates the following warning in include/linux/usb/hcd.h:
 include/linux/usb/hcd.h:311:44: warning: ‘struct usb_device’ declared inside 
 parameter list [enabled
 by default]
 include/linux/usb/hcd.h:412:10: warning: ‘struct usb_host_config’ declared 
 inside parameter list
 [enabled by default]
 include/linux/usb/hcd.h:614:9: warning: ‘struct usb_interface’ declared 
 inside parameter list [enabled
 by default]

All it is necessary to do is add a declaration of the struct before the 
function definition.
There is no need to include the definition of the structure.
It is a shame that gcc doesn't defer this warning to any call site
(where an incorrect type would get passed).

David
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH 1/7] drivers: usb: Include appropriate header file in hcd.c

2013-12-19 Thread David Laight
  These warning are non-default GCC warnings. These can be seen either
  by adding W=1 while running make (i.e. make W=1) or adding
  -Wmissing-prototypes in KBUILD_CFLAGS in the toplevel Makefile.
 
 By default, we don't care about 'W=1' warnings, as no one sees them, and
 they don't matter.

-Wmissing-prototypes really ought to be defined.

Some of the other warning are a little more pedantic, the most annoying
is -Wsign-compare.  OTOH a lot of code out there is clean enough to build
without any warnings and with a moderate amount of pedantry from the compiler.

OTOH just including extra headers isn't ideal - it can considerably
slow down the compilation time. There are many subsystems that don't
really separate their internal headers from their external ones.

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH 1/7] drivers: usb: Include appropriate header file in hcd.c

2013-12-20 Thread David Laight
 From: Josh Triplett
 On Thu, Dec 19, 2013 at 05:33:09PM -, David Laight wrote:
  OTOH just including extra headers isn't ideal - it can considerably
  slow down the compilation time. There are many subsystems that don't
  really separate their internal headers from their external ones.
 
 There's a benefit to doing so, though: it ensures that the prototypes in
 the header stay in sync with the definition.

I think you misunderstood what I was saying.
The header with the function prototypes has to be included when the
driver itself is built - there is a gcc warning for that as well.

The 'problem' is that if I only have:
void foo(struct foo *);
then the C language (uselessly) scopes the declaration of 'struct foo'
to the inside of the functions - making it almost impossible to call.

All you need is an outer declaration:
struct foo;
void foo(struct foo *);
what you don't need is the actual definition of 'struct foo'.
Source files that need to look at the members of the structure
should be directly including the header that defines the structure.

Adding nested includes just makes it more likely that code will fail
to directly include the headers that define the structures it uses.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] net: asix: fix bad header length bug

2014-02-06 Thread David Laight
From: Emil Goode
 The AX88772B occasionally send rx packets that cross urb boundaries
 and the remaining partial packet is sent with no header.
 When the buffer with a partial packet is of less number of octets
 than the value of hard_header_len the buffer is discarded by the
 usbnet module. This is causing dropped packages and error messages
 in dmesg.
 
 This can be reproduced by using ping with a packet size
 between 1965-1976.

I think this can affect other USB ethernet drivers.
Probably most of the ones that explicitly set rx_urb_len.

The ax88179_178a driver sets massive 20k receive urb.
I've seen over 10k of data in a single urb, dunno if it
can actually generate more than 20k - possibly if the usb3 link
is loaded with other traffic.
It would be much more efficient for it to use an aligned 4k urb
and then merge the fragment into skbs.

Once you've set:
+   dev-net-hard_header_len = 0; /* Partial packets have no header */
try setting the mtu to a multiple of 1k.
There is a very odd check in usbnet_change_mtu() that tries to stop the
receive urb_length being a multiple of the usb packet size.

This code looks as though it is hoping that the usb controller will discard
any full length bulk messages after finding a short buffer.
I suspect that might be just wishful thinking!

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] net: asix: fix bad header length bug

2014-02-06 Thread David Laight
From: Igor Gnatenko
 On Thu, 2014-02-06 at 13:56 +0100, Emil Goode wrote:
  The AX88772B occasionally send rx packets that cross urb boundaries
  and the remaining partial packet is sent with no header.
  When the buffer with a partial packet is of less number of octets
  than the value of hard_header_len the buffer is discarded by the
  usbnet module. This is causing dropped packages and error messages
  in dmesg.
 
  This can be reproduced by using ping with a packet size
  between 1965-1976.
 
  The bug has been reported here:
 
  https://bugzilla.kernel.org/show_bug.cgi?id=29082
 
  Signed-off-by: Emil Goode emilgo...@gmail.com
 Reported-and-tested-by: Igor Gnatenko i.gnatenko.br...@gmail.com
  ---
   drivers/net/usb/asix_devices.c |1 +
   1 file changed, 1 insertion(+)
 
  diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c
  index 9765a7d..120bb29 100644
  --- a/drivers/net/usb/asix_devices.c
  +++ b/drivers/net/usb/asix_devices.c
  @@ -455,6 +455,7 @@ static int ax88772_bind(struct usbnet *dev, struct 
  usb_interface *intf)
  dev-net-ethtool_ops = ax88772_ethtool_ops;
  dev-net-needed_headroom = 4; /* cf asix_tx_fixup() */
  dev-net-needed_tailroom = 4; /* cf asix_tx_fixup() */
  +   dev-net-hard_header_len = 0; /* Partial packets have no header */

That is the wrong place for the fix.

It should only be done when rx_urb_size is set to a multiple of the usb
packet size.
That is only done for some of the supported devices.

In fact, if the rx_urb_size is a multiple of the usb frame size (or 1k)
then maybe the usbnet code should assume that the driver is capable
of processing ethernet frames that cross usb packet boundaries and
not delete short packets at all - regardless of the hard_header_len.

David



RE: [PATCH 19/26] drivers: isdn: Move prototype declaration to header file platform.h from diva_didd.c

2014-02-07 Thread David Laight
From: Rashika Kheria
 Move prototype declarations of function to header file
 hardware/eicon/platform.h because they are used by more than one file.
 
 This eliminates the following warnings in hardware/eicon/diddfunc.c:
 drivers/isdn/hardware/eicon/diddfunc.c:95:12: warning: no previous prototype 
 for diddfunc_init [-
 Wmissing-prototypes]
 drivers/isdn/hardware/eicon/diddfunc.c:110:13: warning: no previous prototype 
 for diddfunc_finit [-
 Wmissing-prototypes]
...
 diff --git a/drivers/isdn/hardware/eicon/diva_didd.c 
 b/drivers/isdn/hardware/eicon/diva_didd.c
 index fab6ccf..56d32a7 100644
 --- a/drivers/isdn/hardware/eicon/diva_didd.c
 +++ b/drivers/isdn/hardware/eicon/diva_didd.c
 @@ -39,9 +39,6 @@ MODULE_LICENSE(GPL);
  #define DBG_MINIMUM  (DL_LOG + DL_FTL + DL_ERR)
  #define DBG_DEFAULT  (DBG_MINIMUM + DL_XLOG + DL_REG)
 
 -extern int diddfunc_init(void);
 -extern void diddfunc_finit(void);
 -
  extern void DIVA_DIDD_Read(void *, int);

You should move that one as well.
There really shouldn't be 'extern' definitions for any function in
any C files since you want the compiler to check they are correct
when the function itself is compiled.

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] net: asix: fix bad header length bug

2014-02-07 Thread David Laight
From: Emil Goode 
 On Fri, Feb 07, 2014 at 10:38:04AM +0100, Bjørn Mork wrote:
  Emil Goode emilgo...@gmail.com writes:
   On Thu, Feb 06, 2014 at 03:28:13PM +, David Laight wrote:
   From: Igor Gnatenko
On Thu, 2014-02-06 at 13:56 +0100, Emil Goode wrote:
 The AX88772B occasionally send rx packets that cross urb boundaries
 and the remaining partial packet is sent with no header.
 When the buffer with a partial packet is of less number of octets
 than the value of hard_header_len the buffer is discarded by the
 usbnet module. This is causing dropped packages and error messages
 in dmesg.
 
   I will do some more digging in the code, but the test of skb-len
   against hard_header_len is done already in the completion callback
   function passed to usb_fill_bulk_urb so it seems that buffers of less
   than hard_header_len number of octets will be dropped regardless.
 
  I am pretty sure you are right about this bug. And the exact same
  solution is already used by the cx82310_eth minidriver, so I don't see
  the problem.  Your fix is fine IMHO.  But you should apply it to all the
  devices using asix_rx_fixup_common(), not just the ax88772 ones.
 
  You could maybe make this a usbnet flag instead and create a generic
  solution in usbnet, but frankly I believe the number of flags and their
  meaning have exceeded drivers authors capabilities a long time ago.  At
  least mine, which are quite limited ;-)
 
  An example of that problem is another bloody obvious bug I noticed while
  looking at this driver: The 'struct driver_info ax88178_info' points to
  asix_rx_fixup_common without setting the FLAG_MULTI_PACKET.  This will
  result in usbnet rx_process() calling usbnet_skb_return() on skbs which
  are already consumed by the minidriver.  Not a big problem, but will
  give some odd results.  But if you allow skbs shorter than ETH_HLEN to
  slip through then it might go boom, so you should probably fix that as
  well.
 
 
  Bjørn
 
 Yes I believe the patch is necessary, but maybe it would be nice with
 a prettier solution rather than setting hard_header_len to 0 for all
 devices with this behaviour. Perhaps it would be better to let each
 driver that uses the usbnet module decide what skbs to discard?
 
 What David describes seems to be another bug, but I don't think it is
 related to this patch as I'm able to reproduce the bug without the patch
 beeing applied by setting the mtu to pretty much any value other than
 1500 and using ping with a larger packet size than that mtu value.

Yes - plenty of bugs if you just look for them!

I did a quick scan through the sub-drivers and although the usbnet code
seems to treat the 'hard_header_len' as a constant to add to the mtu when
allocating rx urb (when the driver doesn't set rx_urb_len), some of the
sub-drivers seem to have three length, the rx header, tx header and hard_header,
and set them separately (I've not just rechecked) - which may not exactly
match what the usbnet code does is the lengths are different.

The ax88772b driver seems to support several different bits of silicon.
Only some put multiple ethernet frames in a single urb, and only for these
does the driver set the rx_urb_length to 2048.
For the other silicon it relies on usbnet setting the rx urb size - so
the hard_header_len better not be set to zero.

Someone with some time to spare needs to modify usbnet to support page
aligned rx buffers (probably 4k urb) and then build correctly formatted
skb from them.

At the moment the ax179_178a driver allocates 20kB urb which end up
with an 0x40 byte offset into the page (so are probably 24k) and
then cause alignment issues in the xhci driver which currently doesn't
correctly handle non-aligned 64k address boundaries when the cross the
ring end.

David



RE: [PATCH 19/26] drivers: isdn: Move prototype declaration to header file platform.h from diva_didd.c

2014-02-07 Thread David Laight
From: Josh Triplett
 On Fri, Feb 07, 2014 at 01:33:46PM +, David Laight wrote:
  From: Rashika Kheria
   Move prototype declarations of function to header file
   hardware/eicon/platform.h because they are used by more than one file.
  
   This eliminates the following warnings in hardware/eicon/diddfunc.c:
   drivers/isdn/hardware/eicon/diddfunc.c:95:12: warning: no previous 
   prototype for diddfunc_init [-
   Wmissing-prototypes]
   drivers/isdn/hardware/eicon/diddfunc.c:110:13: warning: no previous 
   prototype for diddfunc_finit
 [-
   Wmissing-prototypes]
  ...
   diff --git a/drivers/isdn/hardware/eicon/diva_didd.c 
   b/drivers/isdn/hardware/eicon/diva_didd.c
   index fab6ccf..56d32a7 100644
   --- a/drivers/isdn/hardware/eicon/diva_didd.c
   +++ b/drivers/isdn/hardware/eicon/diva_didd.c
   @@ -39,9 +39,6 @@ MODULE_LICENSE(GPL);
#define DBG_MINIMUM  (DL_LOG + DL_FTL + DL_ERR)
#define DBG_DEFAULT  (DBG_MINIMUM + DL_XLOG + DL_REG)
  
   -extern int diddfunc_init(void);
   -extern void diddfunc_finit(void);
   -
extern void DIVA_DIDD_Read(void *, int);
 
  You should move that one as well.
  There really shouldn't be 'extern' definitions for any function in
  any C files since you want the compiler to check they are correct
  when the function itself is compiled.
 
 Absolutely, but as far as I can tell Rashika is doing this
 incrementally, organized more by header than by source file, so I'd
 expect a few externs in a source file to disappear at a time rather than
 all in one patch.

Unless any actual bugs are found, I'd have thought a single patch for
each driver would be enough, maybe even one for the whole lot - depending
on how they are maintained.
The 26 patches already posted are a little excessive.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH RFCv3 08/14] arm64: introduce aarch64_insn_gen_movewide()

2014-07-16 Thread David Laight
From: Will Deacon
...
  +   BUG_ON(imm  0 || imm  65535);
 
 Do this check with masking instead?

The compiler will convert that to a single unsigned comparison.

...
  +   BUG_ON(shift != 0  shift != 16  shift != 32 
  +  shift != 48);

OTOH I don't think it will convert that to:
BUG_ON(shift  ~48);

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH RFCv3 08/14] arm64: introduce aarch64_insn_gen_movewide()

2014-07-17 Thread David Laight
From: Will Deacon
...
+   BUG_ON(imm  0 || imm  65535);
  
   Do this check with masking instead?
 
  Ok, if you prefer, I can change it to:
 
  BUG_ON(imm  ~GENMASK(15, 0));

Gah - then anyone reading the code has to look up another define.
There isn't a prize for the most complicated method of defining
a constant that can never change.

 Sure, that or use a named constant for the upper-bound (SZ_64K - 1).

There is nothing wrong with the original code.
Maybe use 0x for those people (are there any) that don't know
their powers of two.

These are strong constants, they aren't going to be changed to any
other value, and it is unlikely that anyone will want to search for
their uses.

I presume that SZ_64K is defined somewhere to 0x1u.
But IMHO using it (instead of the literal) doesn't make code
any more readable.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [net-next PATCH v2 0/3] Broadcast/Multicast rate limit via Ethtool Coalesce

2014-07-17 Thread David Laight
 From: Mugunthan V N
 On Thursday 10 July 2014 05:14 AM, David Miller wrote:
  From: Mugunthan V N mugunthan...@ti.com
  Date: Wed, 9 Jul 2014 12:44:07 +0530
 
  A system/cpu can be loaded by a hacker with flooding of broadcast or
  multicast packets, to prevent this some Ethernet controllers like CPSW
  provide a mechanism to limit the broadcast/multicast packet rate via
  hardware limiters. This patch series enables this feature via
  Ethtool Coalesce.
  This is pretty bogus if you ask me.
 
  What is the difference from accepting a high rate of unicast packets?
  I say it is no different.
 
  Therefore, this feature makes no sense to me at all.
 
 Any packet storm can cause an endpoint some issues. Typically packet
 storms will cause the system CPU to thrash resulting is very low system
 performance.
 
 Unicast storms only target a single destination end station, it can be
 easily mitigated by the host adding a blocking entry in the LUT for each
 aggressor.
 
 Broadcast and multicast target multiple end stations, or an entire
 network, not only can it cause CPU thrashing, it can result in loss of
 other broadcast and multicast services. The rate limiting feature allow
 the broadcast and or multicast traffic to be dropped if the rates are
 too high. This eliminates the CPU thrashing issue. It also allows the
 system to analyze the aggressors and block them for future transgressions.

Rate limiting multicast traffic will definitely cause the loss of multicast
services.

My experience of broadcast storms is that many of the ethernet switches
(probably especially the cheap ones) end up using a much slower software?
path for broadcasts. In a broadcast storm they start discarding normal
traffic - to point where a single ssh session becomes unusable.
This is true even when isolated from the storm by a 10M hub.

Broadcast storms are probably mostly caused by network topology issues.
Especially is switches are sending traffic to 'all ports' while resetting
the spanning tree tables.

David




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH net-next v9 3/9] net: nl802154 - make add_iface take name assign type

2014-07-18 Thread David Laight
From: David Miller
 From: Tom Gundersen t...@jklm.no
 Date: Thu, 17 Jul 2014 10:06:04 +0200
 
  @@ -192,8 +193,10 @@ int ieee802154_add_iface(struct sk_buff *skb, struct 
  genl_info *info)
  if (devname[nla_len(info-attrs[IEEE802154_ATTR_DEV_NAME]) - 1]
  != '\0')
  return -EINVAL; /* phy name should be null-terminated */
  +   name_assign_type = NET_NAME_USER;
  } else  {
  devname = wpan%d;
  +   name_assign_type = NET_NAME_ENUM;
  }
 
  if (strlen(devname) = IFNAMSIZ)
 
 Just wondering what should happen if %d appears in a user provided name.
 
 That would seem to be both USER and ENUM.

Is a training %d detected by special code?
Or is the string used as a printf format?
If the latter is true what happens if the user provides foo%s

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2 2/9] mailbox: Add NVIDIA Tegra XUSB mailbox driver

2014-08-26 Thread David Laight
From: Thierry Reding
...
  Is _nocache required? I don't see other drivers using it. I assume there's
  nothing special about the mbox registers.
 
 Most drivers should be using devm_ioremap_resource() which will use the
 _nocache variant of devm_ioremap() when appropriate. Usually the region
 will not be marked cacheable (IORESOURCE_CACHEABLE) and therefore be
 remapped uncached.

A related question:
Is there any way for a driver to force that part of a PCIe BAR be mapped
through the data cache even when the BAR isn't actually marked cacheable?

Some hardware has address regions (which might not be an entire BAR)
that are actually memory and mapping through the data cache will
generate longer PCIe transfers [1].
Clearly the driver will have to be very careful about cache flushes
and invalidates to make this work.

[1] PCIe is high throughput and high latency, single word reads can
be much slower that PCI, much nearer x86 ISA bus speeds.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v6 2/4] net: moxa: replace build_skb() with netdev_alloc_skb_ip_align() / memcpy()

2014-08-26 Thread David Laight
From: Arnd Bergmann
 On Monday 25 August 2014 16:22:22 Jonas Jensen wrote:
  @@ -226,13 +226,15 @@ static int moxart_rx_poll(struct napi_struct *napi, 
  int budget)
  if (len  RX_BUF_SIZE)
  len = RX_BUF_SIZE;
 
  -   skb = build_skb(priv-rx_buf[rx_head], priv-rx_buf_size);
  +   skb = netdev_alloc_skb_ip_align(ndev, len);
  +
  if (unlikely(!skb)) {
  -   net_dbg_ratelimited(build_skb failed\n);
  +   net_dbg_ratelimited(netdev_alloc_skb_ip_align 
  failed\n);
  priv-stats.rx_dropped++;
  priv-stats.rx_errors++;
  }
 
  +   memcpy(skb-data, priv-rx_buf[rx_head], len);

Is this memcpy() aligned?
If the hardware can receive to a 4n+2 offset it is probably better to
copy the two bytes before the frame data and to round the length of to
a whole number of words.

  skb_put(skb, len);
  skb-protocol = eth_type_trans(skb, ndev);
  napi_gro_receive(priv-napi, skb);
 
 While this seems correct, I wonder why you don't do the normal approach of
 dequeuing the skb from the chain and adding a newly allocated skb to it to
 save the memcpy.

Because the receive buffer area isn't made of skbs.
Post-allocating the skb also reduces the 'true size' of the skb.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 4/8] staging: et131x: Use for loop to initialise contiguous macstat registers to zero

2014-08-21 Thread David Laight
From: Mark Einon
 Replace a long list of contiguous writel() calls with a for loop iterating
 over the same address values.
 
 Also remove redundant comments on the macstat registers, the variable names
 are good enough.
...
 - writel(0, macstat-txrx_0_64_byte_frames);
...
 - writel(0, macstat-carry_reg2);
 + /* initialize all the macstat registers to zero on the device  */
 + for (reg = macstat-txrx_0_64_byte_frames;
 +  reg = macstat-carry_reg2; reg++)
 + writel(0, reg);
...
  struct macstat_regs {/* Location: */
   u32 pad[32];/*  0x6000 - 607C */
 
 - /* Tx/Rx 0-64 Byte Frame Counter */
 + /* counters */
   u32 txrx_0_64_byte_frames;  /*  0x6080 */
 -
 - /* Tx/Rx 65-127 Byte Frame Counter */
   u32 txrx_65_127_byte_frames;/*  0x6084 */

I think it would be best to also convert the stats counters to an array.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] vhost: Add polling mode

2014-08-21 Thread David Laight
From: Razya Ladelsky
 Michael S. Tsirkin m...@redhat.com wrote on 20/08/2014 01:57:10 PM:
 
   Results:
  
   Netperf, 1 vm:
   The polling patch improved throughput by ~33% (1516 MB/sec - 2046 
   MB/sec).
   Number of exits/sec decreased 6x.
   The same improvement was shown when I tested with 3 vms running netperf
   (4086 MB/sec - 5545 MB/sec).
  
   filebench, 1 vm:
   ops/sec improved by 13% with the polling patch. Number of exits
   was reduced by 31%.
   The same experiment with 3 vms running filebench showed similar numbers.
  
   Signed-off-by: Razya Ladelsky ra...@il.ibm.com
 
  This really needs more thourough benchmarking report, including
  system data.  One good example for a related patch:
  http://lwn.net/Articles/551179/
  though for virtualization, we need data about host as well, and if you
  want to look at streaming benchmarks, you need to test different message
  sizes and measure packet size.
 
 
 Hi Michael,
 I have already tried running netperf with several message sizes:
 64,128,256,512,600,800...
 But the results are inconsistent even in the baseline/unpatched
 configuration.
 For smaller msg sizes, I get consistent numbers. However, at some point,
 when I increase the msg size
 I get unstable results. For example, for a 512B msg, I get two scenarios:
 vm utilization 100%, vhost utilization 75%, throughput ~6300
 vm utilization 80%, vhost utilization 13%, throughput ~9400 (line rate)
 
 I don't know why vhost is behaving that way for certain message sizes.
 Do you have any insight to why this is happening?

Have you tried looking at the actual ethernet packet sizes.
It may well jump between using small packets (the size of the writes)
and full sized ones.

If you are trying to measure ethernet packet 'cost' you need to use UDP.
However that probably uses different code paths.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH RFC v4 net-next 00/26] BPF syscall, maps, verifier, samples, llvm

2014-08-13 Thread David Laight
From: Of Alexei Starovoitov
 one more RFC...
 
 Major difference vs previous set is a new 'load 64-bit immediate' eBPF insn.
 Which is first 16-byte instruction. It shows how eBPF ISA can be extended
 while maintaining backward compatibility, but mainly it cleans up eBPF
 program access to maps and improves run-time performance.

Wouldn't it be more sensible to follow the scheme used by a lot of cpus
and add a 'load high' instruction (follow with 'add' or 'or').
It still takes 16 bytes to load a 64bit immediate value, but the instruction
size remains constant.
There is nothing to stop any JIT software detecting the instruction pair.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] usb: xhci: Fix Set TR Dequeue Pointer cycle state for quirky xHCs

2014-08-11 Thread David Laight
 It's starting to get a bit too complicated.
 xhci find_new_dequeue_state() now has four places where it can toggle the 
 cycle bit
 in addition to the cycle toggle in find_trb_seg().
 
 In the end we really just want to do it max once.
 
 I'll see if I can simplify the whole cycle bit code somehow.

Have a look at the patches I submitted last year.
I remember simplifying a lot of that code.

David



RE: [PATCH RFC 1/4] virtio_net: pass vi around

2014-10-24 Thread David Laight
From: Michael S. Tsirkin
 
 Too many places poke at [rs]q-vq-vdev-priv just to get
 the the vi structure.  Let's just pass the pointer around: seems
 cleaner, and might even be faster.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  drivers/net/virtio_net.c | 36 +++-
  1 file changed, 19 insertions(+), 17 deletions(-)
 
 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 57cbc7d..36f3dfc 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
...
  static struct sk_buff *receive_big(struct net_device *dev,
 +struct virtnet_info *vi,

Do you need to pass 'dev' here?
Looks like it is obtainable from vi-dev (as below).

David

  struct receive_queue *rq,
  void *buf,
  unsigned int len)
  {
   struct page *page = buf;
 - struct sk_buff *skb = page_to_skb(rq, page, 0, len, PAGE_SIZE);
 + struct sk_buff *skb = page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
...
 -static void receive_buf(struct receive_queue *rq, void *buf, unsigned int 
 len)
 +static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
 + void *buf, unsigned int len)
  {
 - struct virtnet_info *vi = rq-vq-vdev-priv;
   struct net_device *dev = vi-dev;
...


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()

2014-10-15 Thread David Laight
From: Of Michael S. Tsirkin
 On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:
  Accumulate the sent packets and sent bytes in local variables and perform a
  single u64_stats_update_begin/end() after.
 
  Cc: Rusty Russell ru...@rustcorp.com.au
  Cc: Michael S. Tsirkin m...@redhat.com
  Signed-off-by: Jason Wang jasow...@redhat.com
 
 Not sure how much it's worth but since Eric suggested it ...

Probably depends on the actual cost of u64_stats_update_begin/end
against the likely extra saving of the tx_bytes and tx_packets
values onto the stack across the call to dev_kfree_skb_any().
(Which depends on the number of caller saved registers.)

 Acked-by: Michael S. Tsirkin m...@redhat.com
 
  ---
   drivers/net/virtio_net.c |   12 
   1 files changed, 8 insertions(+), 4 deletions(-)
 
  diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
  index 3d0ce44..a4d56b8 100644
  --- a/drivers/net/virtio_net.c
  +++ b/drivers/net/virtio_net.c
  @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
  unsigned int len;
  struct virtnet_info *vi = sq-vq-vdev-priv;
  struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
  +   u64 tx_bytes = 0, tx_packets = 0;

tx_packets need only be 'unsigned int'.
The same is almost certainly true of tx_bytes.

David

  while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
  pr_debug(Sent skb %p\n, skb);
 
  -   u64_stats_update_begin(stats-tx_syncp);
  -   stats-tx_bytes += skb-len;
  -   stats-tx_packets++;
  -   u64_stats_update_end(stats-tx_syncp);
  +   tx_bytes += skb-len;
  +   tx_packets++;
 
  dev_kfree_skb_any(skb);
  }
  +
  +   u64_stats_update_begin(stats-tx_syncp);
  +   stats-tx_bytes += tx_bytes;
  +   stats-tx_packets =+ tx_packets;
  +   u64_stats_update_end(stats-tx_syncp);
   }
 
   static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
  --
  1.7.1
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()

2014-10-15 Thread David Laight
From: Michael S. Tsirkin
 On Wed, Oct 15, 2014 at 09:49:01AM +, David Laight wrote:
  From: Of Michael S. Tsirkin
   On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:
Accumulate the sent packets and sent bytes in local variables and 
perform a
single u64_stats_update_begin/end() after.
   
Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
  
   Not sure how much it's worth but since Eric suggested it ...
 
  Probably depends on the actual cost of u64_stats_update_begin/end
  against the likely extra saving of the tx_bytes and tx_packets
  values onto the stack across the call to dev_kfree_skb_any().
  (Which depends on the number of caller saved registers.)
 
 Yea, some benchmark results would be nice to see.

I there are likely to be multiple skb on the queue the fastest
code would probably do one 'virtqueue_get_all()' that returned
a linked list of buffers, then follow the list to get the stats,
and follow it again to free the skb.

David

   Acked-by: Michael S. Tsirkin m...@redhat.com
  
---
 drivers/net/virtio_net.c |   12 
 1 files changed, 8 insertions(+), 4 deletions(-)
   
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3d0ce44..a4d56b8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue 
*sq)
unsigned int len;
struct virtnet_info *vi = sq-vq-vdev-priv;
struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
+   u64 tx_bytes = 0, tx_packets = 0;
 
  tx_packets need only be 'unsigned int'.
  The same is almost certainly true of tx_bytes.
 
  David
 
while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
pr_debug(Sent skb %p\n, skb);
   
-   u64_stats_update_begin(stats-tx_syncp);
-   stats-tx_bytes += skb-len;
-   stats-tx_packets++;
-   u64_stats_update_end(stats-tx_syncp);
+   tx_bytes += skb-len;
+   tx_packets++;
   
dev_kfree_skb_any(skb);
}
+
+   u64_stats_update_begin(stats-tx_syncp);
+   stats-tx_bytes += tx_bytes;
+   stats-tx_packets =+ tx_packets;
+   u64_stats_update_end(stats-tx_syncp);
 }
   
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
--
1.7.1
   --
   To unsubscribe from this list: send the line unsubscribe netdev in
   the body of a message to majord...@vger.kernel.org
   More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] virtio_net: fix use after free

2014-10-15 Thread David Laight
From: Michael S. Tsirkin
 commit 0b725a2ca61bedc33a2a63d0451d528b268cf975
 net: Remove ndo_xmit_flush netdev operation, use signalling instead.
 
 added code that looks at skb-xmit_more after the skb has
 been put in TX VQ. Since some paths process the ring and free the skb
 immediately, this can cause use after free.
 
 Fix by storing xmit_more in a local variable.
 
 Cc: David S. Miller da...@davemloft.net
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 David, am I using the API correctly?
 Seems to work for me.
 You used __netif_subqueue_stopped but that seems to use
 a slightly more expensive test_bit internally.
 The reason I added a variable for the txq here is because it's handy for
 BQL patch later on.
 
 
  drivers/net/virtio_net.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 3d0ce44..13d0a8b 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -920,6 +920,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
 net_device *dev)
   int qnum = skb_get_queue_mapping(skb);
   struct send_queue *sq = vi-sq[qnum];
   int err;
 + struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);

Do you need to cache 'txq' on stack for the entire call?
Looks like it is only needed when 'kick' is true.
I've not looked to see if saves both 'dev' and 'qnum' being kept.

In any case it isn't mentioned in the commit message.

David

 + bool kick = !skb-xmit_more;
 
   /* Free up any pending old buffers before queueing new ones. */
   free_old_xmit_skbs(sq);
 @@ -956,7 +958,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
 net_device *dev)
   }
   }
 
 - if (__netif_subqueue_stopped(dev, qnum) || !skb-xmit_more)
 + if (kick || netif_xmit_stopped(txq))
   virtqueue_kick(sq-vq);
 
   return NETDEV_TX_OK;
 --
 MST
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3 2/2] usb: dwc2: gadget: modify return statement

2014-10-17 Thread David Laight
From: Of Sudip Mukherjee
 modified the function to have a single return statement at the end
 instead of multiple return statement in the middle of the function
 to improve the readability of the code.

Many of us would disagree with you there.
Early returns actually make the code easier to read, certainly
better than a goto 'end of function'.

David

 This patch depends on the previous patch of the series.
 
 Signed-off-by: Sudip Mukherjee su...@vectorindia.org
 ---
  drivers/usb/dwc2/gadget.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/usb/dwc2/gadget.c b/drivers/usb/dwc2/gadget.c
 index 7f25527..e8a8fc7 100644
 --- a/drivers/usb/dwc2/gadget.c
 +++ b/drivers/usb/dwc2/gadget.c
 @@ -2471,7 +2471,8 @@ static int s3c_hsotg_ep_enable(struct usb_ep *ep,
   dir_in = (desc-bEndpointAddress  USB_ENDPOINT_DIR_MASK) ? 1 : 0;
   if (dir_in != hs_ep-dir_in) {
   dev_err(hsotg-dev, %s: direction mismatch!\n, __func__);
 - return -EINVAL;
 + ret = -EINVAL;
 + goto error1;
   }
 
   mps = usb_endpoint_maxp(desc);
 @@ -2583,6 +2584,7 @@ static int s3c_hsotg_ep_enable(struct usb_ep *ep,
 
  error:
   spin_unlock_irqrestore(hsotg-lock, flags);
 +error1:
   return ret;
  }
 
 --
 1.8.1.2
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-usb in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] netlink: don't copy over empty attribute data

2014-10-22 Thread David Laight
From: Sasha Levin
 netlink uses empty data to seperate different levels. However, we still
 try to copy that data from a NULL ptr using memcpy, which is an undefined
 behaviour.
 
 Signed-off-by: Sasha Levin sasha.le...@oracle.com
 ---
  lib/nlattr.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/lib/nlattr.c b/lib/nlattr.c
 index 9c3e85f..6512b43 100644
 --- a/lib/nlattr.c
 +++ b/lib/nlattr.c
 @@ -430,7 +430,8 @@ void __nla_put(struct sk_buff *skb, int attrtype, int 
 attrlen,
   struct nlattr *nla;
 
   nla = __nla_reserve(skb, attrtype, attrlen);
 - memcpy(nla_data(nla), data, attrlen);
 + if (data)
 + memcpy(nla_data(nla), data, attrlen);

Were it even appropriate to add a conditional here, the correct
check would be against 'attrlen', not 'data'.

David

  }
  EXPORT_SYMBOL(__nla_put);
 
 --
 1.7.10.4
 
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: bit fields data tearing

2014-09-05 Thread David Laight
From: Paul E. McKenney
 On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley wrote:
  Hi James,
 
  On 09/04/2014 10:11 PM, James Bottomley wrote:
   On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney wrote:
   +And there are anti-guarantees:
   +
   + (*) These guarantees do not apply to bitfields, because compilers often
   + generate code to modify these using non-atomic read-modify-write
   + sequences.  Do not attempt to use bitfields to synchronize parallel
   + algorithms.
   +
   + (*) Even in cases where bitfields are protected by locks, all fields
   + in a given bitfield must be protected by one lock.  If two fields
   + in a given bitfield are protected by different locks, the 
   compiler's
   + non-atomic read-modify-write sequences can cause an update to one
   + field to corrupt the value of an adjacent field.
   +
   + (*) These guarantees apply only to properly aligned and sized scalar
   + variables.  Properly sized currently means int and long,
   + because some CPU families do not support loads and stores of
   + other sizes.  (Some CPU families is currently believed to
   + be only Alpha 21064.  If this is actually the case, a different
   + non-guarantee is likely to be formulated.)
  
   This is a bit unclear.  Presumably you're talking about definiteness of
   the outcome (as in what's seen after multiple stores to the same
   variable).
 
  No, the last conditions refers to adjacent byte stores from different
  cpu contexts (either interrupt or SMP).
 
   The guarantees are only for natural width on Parisc as well,
   so you would get a mess if you did byte stores to adjacent memory
   locations.
 
  For a simple test like:
 
  struct x {
  long a;
  char b;
  char c;
  char d;
  char e;
  };
 
  void store_bc(struct x *p) {
  p-b = 1;
  p-c = 2;
  }
 
  on parisc, gcc generates separate byte stores
 
  void store_bc(struct x *p) {
 0:   34 1c 00 02 ldi 1,ret0
 4:   0f 5c 12 08 stb ret0,4(r26)
 8:   34 1c 00 04 ldi 2,ret0
 c:   e8 40 c0 00 bv r0(rp)
10:   0f 5c 12 0a stb ret0,5(r26)
 
  which appears to confirm that on parisc adjacent byte data
  is safe from corruption by concurrent cpu updates; that is,
 
  CPU 0| CPU 1
   |
  p-b = 1 | p-c = 2
   |
 
  will result in p-b == 1  p-c == 2 (assume both values
  were 0 before the call to store_bc()).
 
 What Peter said.  I would ask for suggestions for better wording, but
 I would much rather be able to say that single-byte reads and writes
 are atomic and that aligned-short reads and writes are also atomic.
 
 Thus far, it looks like we lose only very old Alpha systems, so unless
 I hear otherwise, I update my patch to outlaw these very old systems.

People with old Alphas can run NetBSD instead, along with those who have real 
VAXen :-)

I've seen gcc generate 32bit accesses for 16bit structure members on arm.
It does this because of the more limited range of the offsets for the 16bit 
access.
OTOH I don't know if it ever did this for writes - so it may be moot.

David



RE: bit fields data tearing

2014-09-05 Thread David Laight
From: Peter Hurley
 [ +cc linux-arm ]
 
 Hi David,
 
 On 09/05/2014 04:30 AM, David Laight wrote:
  I've seen gcc generate 32bit accesses for 16bit structure members on arm.
  It does this because of the more limited range of the offsets for the 16bit 
  access.
  OTOH I don't know if it ever did this for writes - so it may be moot.
 
 Can you recall the particulars, like what ARM config or what code?
 
 I tried an overly-simple test to see if gcc would bump up to the word load for
 the 12-bit offset mode, but it stuck with register offset rather than 
 immediate
 offset. [I used the compiler options for allmodconfig and a 4.8 
 cross-compiler.]
 
 Maybe the test doesn't generate enough register pressure on the compiler?

Dunno, I would have been using a much older version of the compiler.
It is possible that it doesn't do it any more.
It might only have done it for loads.

The compiler used to use misaligned 32bit loads for structure
members on large 4n+2 byte boundaries as well.
I'm pretty sure it doesn't do that either.

There have been a lot of compiler versions since I was compiling
anything for arm.

David

 Regards,
 Peter Hurley
 
 #define ARRAY_SIZE(x)  (sizeof(x)/sizeof((x)[0]))
 
 struct x {
   long unused[64];
   short b[12];
   int unused2[10];
   short c;
 };
 
 void store_c(struct x *p, short a[]) {
   int i;
 
   for (i = 0; i  ARRAY_SIZE(p-b); i++)
   p-b[i] = a[i];
   p-c = 2;
 }
 
 
 void store_c(struct x *p, short a[]) {
0: e1a0c00dmov ip, sp
4: e3a03000mov r3, #0
8: e92dd800push{fp, ip, lr, pc}
c: e24cb004sub fp, ip, #4
   int i;
 
   for (i = 0; i  ARRAY_SIZE(p-b); i++)
   p-b[i] = a[i];
   10: e191c0b3ldrhip, [r1, r3]
   14: e0802003add r2, r0, r3
   18: e2822c01add r2, r2, #256; 0x100
   1c: e2833002add r3, r3, #2
   20: e3530018cmp r3, #24
   24: e1c2c0b0strhip, [r2]
   28: 1af8bne 10 store_c+0x10
   p-c = 2;
   2c: e3a03d05mov r3, #320; 0x140
   30: e3a02002mov r2, #2
   34: e18020b3strhr2, [r0, r3]
   38: e89da800ldm sp, {fp, sp, pc}

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH v2] cpufreq: powernv: Set the cpus to nominal frequency during reboot/kexec

2014-09-01 Thread David Laight
From: Shilpa Bhat
 Hi Viresh,
 On Fri, 2014-08-29 at 05:33 +0530, Viresh Kumar wrote:
  On 28 August 2014 19:36, Shilpasri G Bhat
  shilpa.b...@linux.vnet.ibm.com wrote:
  
   Changes v1-v2:
   Invoke .target() driver callback to set the cpus to nominal frequency
   in reboot notifier, instead of calling cpufreq_suspend() as suggested
   by Viresh Kumar.
   Modified the commit message.
 
  This changelog will get commited, is this what you want?
 
   +   if (unlikely(rebooting)  new_index != get_nominal_index())
   +   return -EBUSY;
 
  Have you placed the unlikely only around 'rebooting' intentionally or
  should it cover whole if statement?
 
 
 Yes unlikely() should cover the whole if statement...

Actually it probably shouldn't.
You need to look at the generated code with each different set of 'unlikely()'
to see how gcc processes them.
In this case, if 'rebooting' is false you want to 'fall through' on a statically
predicted 'not taken' branch. You don't ever care about the second clause.
With an 'unlikely' covering the entire statement gcc could easily add a
forwards conditional branch (that will be mis-predicted) for the 'rebooting' 
test.

(Yes, I spent a lot of time getting gcc to generate branches that were
correctly statically predicted for some code where every cycle mattered.)

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH v3 0/3] sched: Always check the integrity of the canary

2014-09-12 Thread David Laight
From: Chuck Ebbert 
 David Laight david.lai...@aculab.com wrote:
 
  From: Aaron Tomlin
   Currently in the event of a stack overrun a call to schedule()
   does not check for this type of corruption. This corruption is
   often silent and can go unnoticed. However once the corrupted
   region is examined at a later stage, the outcome is undefined
   and often results in a sporadic page fault which cannot be
   handled.
  
   The first patch adds a canary to init_task's end of stack.
   While the second patch provides a helper to determine the
   integrity of the canary. The third checks for a stack
   overrun and takes appropriate action since the damage
   is already done, there is no point in continuing.
 
  Clearly you've just been 'bitten' by a kernel stack overflow.
  But a simple 'canary' isn't going to find most of the overflows
  and will give an incorrect 'sense of security'.
 
  The canary will only work if the stack is densely written.
  In practise the stack alignment rules create gaps, and the
  most likely reason for overflow is a large on-stack buffer
  that isn't actually written to.
 
  The only real way to detect kernel stack overflow is to arrange
  for an unmapped page beyond the stack.
  That costs KVA, but not much else.
 
 
 That doesn't work either, because the threadinfo sits between the end of the
 stack and the beginning of the next page, making it possible to corrupt 
 critical
 data without running off the page.

Then flip the order of the allocations so that the threadinfo is at the other 
end.
I'm not sure how many per-lwp structures the linux kernel has, but
I know that on netbsd the main thing the kernel stack hits is the fpu
save area - the end of that is the avx area which won't be needed when
the stack usage is large.
Everything else could be moved from the stack pages to the lwp struct itself.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] net: can: usb: peak_usb: pcan_usb_core.c: Cleaning up missing null-terminate in conjunction with strncpy

2014-09-15 Thread David Laight
From: Rickard Strandqvist
...
 Replacing strncpy with strlcpy to avoid strings that lacks null terminate.
...
 diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_core.c
 b/drivers/net/can/usb/peak_usb/pcan_usb_core.c
 index 644e6ab..d4fe8ac 100644
 --- a/drivers/net/can/usb/peak_usb/pcan_usb_core.c
 +++ b/drivers/net/can/usb/peak_usb/pcan_usb_core.c
 @@ -830,7 +830,7 @@ static void peak_usb_disconnect(struct usb_interface 
 *intf)
   char name[IFNAMSIZ];
 
   dev-state = ~PCAN_USB_STATE_CONNECTED;
 - strncpy(name, netdev-name, IFNAMSIZ);
 + strlcpy(name, netdev-name, IFNAMSIZ);
 
   unregister_netdev(netdev);
   free_candev(netdev);

Or:
char name[sizeof netdev-name];
memcpy(name, netdev-name, sizeof netdev-name);

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] net: can: usb: peak_usb: pcan_usb_core.c: Cleaning up missing null-terminate in conjunction with strncpy

2014-09-15 Thread David Laight
From: Marc Kleine-Budde [ 
 On 09/15/2014 10:28 AM, David Laight wrote:
  From: Rickard Strandqvist
  ...
  Replacing strncpy with strlcpy to avoid strings that lacks null terminate.
  ...
  diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_core.c
  b/drivers/net/can/usb/peak_usb/pcan_usb_core.c
  index 644e6ab..d4fe8ac 100644
  --- a/drivers/net/can/usb/peak_usb/pcan_usb_core.c
  +++ b/drivers/net/can/usb/peak_usb/pcan_usb_core.c
  @@ -830,7 +830,7 @@ static void peak_usb_disconnect(struct usb_interface 
  *intf)
 char name[IFNAMSIZ];
 
 dev-state = ~PCAN_USB_STATE_CONNECTED;
  -  strncpy(name, netdev-name, IFNAMSIZ);
  +  strlcpy(name, netdev-name, IFNAMSIZ);
 
 unregister_netdev(netdev);
 free_candev(netdev);
 
  Or:
  char name[sizeof netdev-name];
  memcpy(name, netdev-name, sizeof netdev-name);
 
 I would be sizeof(foo) in kernel coding style,
But not in mine :-)
sizeof is an operator, not a function, it's argument can be (type).

 but let's have a look at the original code:
 
 struct net_device *netdev = dev-netdev;
 char name[IFNAMSIZ];
 
 dev-state = ~PCAN_USB_STATE_CONNECTED;
 strncpy(name, netdev-name, IFNAMSIZ);
 
 unregister_netdev(netdev);
 free_candev(netdev);
 
 kfree(dev-cmd_buf);
 dev-next_siblings = NULL;
 if (dev-adapter-dev_free)
 dev-adapter-dev_free(dev);
 
 dev_info(intf-dev, %s removed\n, name);
 
 I think it's save to use:
 
 dev_info(intf-dev, %s removed\n, netdev_name(dev-netdev));
 
 instead of doing the str?cpy() in the first place. But why not use:
 
 netdev_info(dev-netdev, removed\n);
 
 Is the USB device information lost when using netdev_info()?

My guess is it avoids a 'use after free' - but I'm not going to
dig that far.

Another issue with blindly replacing strncpy() with strlcpy()
(which doesn't affect the above) is when copying status to userspace.

David



RE: [PATCH] net: fec: fix regression on i.MX28 introduced by rx_copybreak support

2014-10-07 Thread David Laight
From: Lothar Waßmann
 commit 1b7bde6d659d (net: fec: implement rx_copybreak to improve rx 
 performance)
 introduced a regression for i.MX28. The swap_buffer() function doing
 the endian conversion of the received data on i.MX28 may access memory
 beyond the actual packet size in the DMA buffer. fec_enet_copybreak()
 does not copy those bytes, so that the last bytes of a packet may be
 filled with invalid data after swapping.
 This will likely lead to checksum errors on received packets.
 E.g. when trying to mount an NFS rootfs:
 UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 36
 
 Do the byte swapping and copying to the new skb in one go if
 necessary.

ISTM that if you need to do the 'swap' you should copy the data regardless
of the length.

David

 
 Signed-off-by: Lothar Wamann l...@karo-electronics.de
 ---
  drivers/net/ethernet/freescale/fec_main.c |   25 +
  1 file changed, 21 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/net/ethernet/freescale/fec_main.c 
 b/drivers/net/ethernet/freescale/fec_main.c
 index 87975b5..eaaebad 100644
 --- a/drivers/net/ethernet/freescale/fec_main.c
 +++ b/drivers/net/ethernet/freescale/fec_main.c
 @@ -339,6 +339,18 @@ static void *swap_buffer(void *bufaddr, int len)
   return bufaddr;
  }
 
 +static void *swap_buffer2(void *dst_buf, void *src_buf, int len)
 +{
 + int i;
 + unsigned int *src = src_buf;
 + unsigned int *dst = dst_buf;
 +
 + for (i = 0; i  DIV_ROUND_UP(len, 4); i++, src++, dst++)
 + *dst = cpu_to_be32(*src);
 +
 + return dst_buf;
 +}
 +
  static void fec_dump(struct net_device *ndev)
  {
   struct fec_enet_private *fep = netdev_priv(ndev);
 @@ -1348,7 +1360,7 @@ fec_enet_new_rxbdp(struct net_device *ndev, struct 
 bufdesc *bdp, struct sk_buff
  }
 
  static bool fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb,
 -struct bufdesc *bdp, u32 length)
 +struct bufdesc *bdp, u32 length, int swap)
  {
   struct  fec_enet_private *fep = netdev_priv(ndev);
   struct sk_buff *new_skb;
 @@ -1363,7 +1375,10 @@ static bool fec_enet_copybreak(struct net_device 
 *ndev, struct sk_buff **skb,
   dma_sync_single_for_cpu(fep-pdev-dev, bdp-cbd_bufaddr,
   FEC_ENET_RX_FRSIZE - fep-rx_align,
   DMA_FROM_DEVICE);
 - memcpy(new_skb-data, (*skb)-data, length);
 + if (!swap)
 + memcpy(new_skb-data, (*skb)-data, length);
 + else
 + swap_buffer2(new_skb-data, (*skb)-data, length);
   *skb = new_skb;
 
   return true;
 @@ -1393,6 +1408,7 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, 
 u16 queue_id)
   u16 vlan_tag;
   int index = 0;
   boolis_copybreak;
 + bool need_swap = id_entry-driver_data  FEC_QUIRK_SWAP_FRAME;
 
  #ifdef CONFIG_M532x
   flush_cache_all();
 @@ -1456,7 +1472,8 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, 
 u16 queue_id)
* include that when passing upstream as it messes up
* bridging applications.
*/
 - is_copybreak = fec_enet_copybreak(ndev, skb, bdp, pkt_len - 4);
 + is_copybreak = fec_enet_copybreak(ndev, skb, bdp, pkt_len - 4,
 +   need_swap);
   if (!is_copybreak) {
   skb_new = netdev_alloc_skb(ndev, FEC_ENET_RX_FRSIZE);
   if (unlikely(!skb_new)) {
 @@ -1471,7 +1488,7 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, 
 u16 queue_id)
   prefetch(skb-data - NET_IP_ALIGN);
   skb_put(skb, pkt_len - 4);
   data = skb-data;
 - if (id_entry-driver_data  FEC_QUIRK_SWAP_FRAME)
 + if (!is_copybreak  need_swap)
   swap_buffer(data, pkt_len);
 
   /* Extract the enhanced buffer descriptor */
 --
 1.7.10.4
 
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] net: fec: fix regression on i.MX28 introduced by rx_copybreak support

2014-10-07 Thread David Laight
From: Lothar
 David Laight wrote:
  From: Lothar Waßmann
   commit 1b7bde6d659d (net: fec: implement rx_copybreak to improve rx 
   performance)
   introduced a regression for i.MX28. The swap_buffer() function doing
   the endian conversion of the received data on i.MX28 may access memory
   beyond the actual packet size in the DMA buffer. fec_enet_copybreak()
   does not copy those bytes, so that the last bytes of a packet may be
   filled with invalid data after swapping.
   This will likely lead to checksum errors on received packets.
   E.g. when trying to mount an NFS rootfs:
   UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 36
  
   Do the byte swapping and copying to the new skb in one go if
   necessary.
 
  ISTM that if you need to do the 'swap' you should copy the data regardless
  of the length.
 
 The swap function has to look at at most 3 bytes beyond the actual
 packet length. That is what the original swap_buffer() function does and
 what the new function swap_buffer2(), that does the endian swapping
 while copying to the new buffer, also does.

I understood the bug.

The point I was making is that if you have to do a read-write of the received
data (to byteswap it) then you might as well always copy it into a new skb that
is just big enough for the actual receive frame.

David



RE: [PATCH] net: fec: fix regression on i.MX28 introduced by rx_copybreak support

2014-10-07 Thread David Laight
From: Eric Dumazet
 On Tue, 2014-10-07 at 15:19 +0200, Lothar Wamann wrote:
  commit 1b7bde6d659d (net: fec: implement rx_copybreak to improve rx 
  performance)
  introduced a regression for i.MX28. The swap_buffer() function doing
  the endian conversion of the received data on i.MX28 may access memory
  beyond the actual packet size in the DMA buffer. fec_enet_copybreak()
  does not copy those bytes, so that the last bytes of a packet may be
  filled with invalid data after swapping.
  This will likely lead to checksum errors on received packets.
  E.g. when trying to mount an NFS rootfs:
  UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 36
 
  Do the byte swapping and copying to the new skb in one go if
  necessary.
 
  Signed-off-by: Lothar Wamann l...@karo-electronics.de
  ---
   drivers/net/ethernet/freescale/fec_main.c |   25 +
   1 file changed, 21 insertions(+), 4 deletions(-)
 
  diff --git a/drivers/net/ethernet/freescale/fec_main.c 
  b/drivers/net/ethernet/freescale/fec_main.c
  index 87975b5..eaaebad 100644
  --- a/drivers/net/ethernet/freescale/fec_main.c
  +++ b/drivers/net/ethernet/freescale/fec_main.c
  @@ -339,6 +339,18 @@ static void *swap_buffer(void *bufaddr, int len)
  return bufaddr;
   }
 
  +static void *swap_buffer2(void *dst_buf, void *src_buf, int len)
  +{
  +   int i;
  +   unsigned int *src = src_buf;
  +   unsigned int *dst = dst_buf;
  +
  +   for (i = 0; i  DIV_ROUND_UP(len, 4); i++, src++, dst++)
  +   *dst = cpu_to_be32(*src);
 
 No need for the DIV :
 
   for (i = 0; i  len; i += sizeof(*dst), src++, dst++)
   *dst = cpu_to_be32(*src);
 
 Also are you sure both src/dst are aligned to word boundaries, or is
 this architecture OK with possible misalignment ?

I wondered about that as well.
I wouldn't have expected ppc to support misaligned transfers, and you'd also
want to make sure that cpu_to_be(*src) was using a byte-swapping instruction.
Hmmm... cpu_to_be() doesn't sound like the right 'swap' macro name.

It looks as though this is processing an entire ethernet frame, in which
case the data (destination address) would normally be on an 4n+2 boundary.
The two bytes before the data can be safely copied (as can the rounding bytes at
the end) - so provided the pointers allow for this it should be fine.
OTOH I've NFI what the actual data is.

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] net: fec: fix regression on i.MX28 introduced by rx_copybreak support

2014-10-08 Thread David Laight
From: Lothar Waßmann
 David Laight wrote:
  From: Eric Dumazet
   On Tue, 2014-10-07 at 15:19 +0200, Lothar Wamann wrote:
commit 1b7bde6d659d (net: fec: implement rx_copybreak to improve rx 
performance)
introduced a regression for i.MX28. The swap_buffer() function doing
the endian conversion of the received data on i.MX28 may access memory
beyond the actual packet size in the DMA buffer. fec_enet_copybreak()
does not copy those bytes, so that the last bytes of a packet may be
filled with invalid data after swapping.
This will likely lead to checksum errors on received packets.
E.g. when trying to mount an NFS rootfs:
UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 
36
   
Do the byte swapping and copying to the new skb in one go if
necessary.
   
Signed-off-by: Lothar Wamann l...@karo-electronics.de
---
 drivers/net/ethernet/freescale/fec_main.c |   25 
+
 1 file changed, 21 insertions(+), 4 deletions(-)
   
diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 87975b5..eaaebad 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -339,6 +339,18 @@ static void *swap_buffer(void *bufaddr, int len)
return bufaddr;
 }
   
+static void *swap_buffer2(void *dst_buf, void *src_buf, int len)
+{
+   int i;
+   unsigned int *src = src_buf;
+   unsigned int *dst = dst_buf;
+
+   for (i = 0; i  DIV_ROUND_UP(len, 4); i++, src++, dst++)
+   *dst = cpu_to_be32(*src);
  
   No need for the DIV :
  
 for (i = 0; i  len; i += sizeof(*dst), src++, dst++)
 *dst = cpu_to_be32(*src);
  
   Also are you sure both src/dst are aligned to word boundaries, or is
   this architecture OK with possible misalignment ?
 
  I wondered about that as well.
  I wouldn't have expected ppc to support misaligned transfers, and you'd also
  want to make sure that cpu_to_be(*src) was using a byte-swapping 
  instruction.
  Hmmm... cpu_to_be() doesn't sound like the right 'swap' macro name.
 
 ??? So what is cpu_to_be32() then?
 The new swap function is an exact copy of the original one already in
 use except for the fact that it uses distinct source and destination
 buffers.

cpu_to_be32() is for converting a 'cpu' endianness value to 'big-endian'.
Here you are processing receive data - so you probably want be32_to_cpu().
(Yes, I know they are functionally identical)

Alternatively, since these aren't actually 32bit numbers and you know
whether you want to swap, something like the __swab32p() from swab.h
- but I'm not entirely sure that one is expected to be used.

Clearly you are well inside the ppc 'endianness' hell.

David



RE: [PATCH] net: fec: fix regression on i.MX28 introduced by rx_copybreak support

2014-10-08 Thread David Laight
From: Sergei Shtylyov
 On 10/07/2014 05:19 PM, Lothar Wamann wrote:
 
  commit 1b7bde6d659d (net: fec: implement rx_copybreak to improve rx 
  performance)
  introduced a regression for i.MX28. The swap_buffer() function doing
  the endian conversion of the received data on i.MX28 may access memory
  beyond the actual packet size in the DMA buffer. fec_enet_copybreak()
  does not copy those bytes, so that the last bytes of a packet may be
  filled with invalid data after swapping.
  This will likely lead to checksum errors on received packets.
  E.g. when trying to mount an NFS rootfs:
  UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 36
 
  Do the byte swapping and copying to the new skb in one go if
  necessary.
 
  Signed-off-by: Lothar Wamann l...@karo-electronics.de
  ---
drivers/net/ethernet/freescale/fec_main.c |   25 +
1 file changed, 21 insertions(+), 4 deletions(-)
 
  diff --git a/drivers/net/ethernet/freescale/fec_main.c 
  b/drivers/net/ethernet/freescale/fec_main.c
  index 87975b5..eaaebad 100644
  --- a/drivers/net/ethernet/freescale/fec_main.c
  +++ b/drivers/net/ethernet/freescale/fec_main.c
 [...]
  @@ -1348,7 +1360,7 @@ fec_enet_new_rxbdp(struct net_device *ndev, struct 
  bufdesc *bdp, struct
 sk_buff
}
 
static bool fec_enet_copybreak(struct net_device *ndev, struct sk_buff 
  **skb,
  -  struct bufdesc *bdp, u32 length)
  +  struct bufdesc *bdp, u32 length, int swap)
 
 'bool swap' perhaps?
 
  @@ -1393,6 +1408,7 @@ fec_enet_rx_queue(struct net_device *ndev, int 
  budget, u16 queue_id)
  u16 vlan_tag;
  int index = 0;
  boolis_copybreak;
  +   bool need_swap = id_entry-driver_data  FEC_QUIRK_SWAP_FRAME;
 
 ... especially talking this into account...

Hmmm... in that case you may not want the compiler to convert the bit value
to a 'bool' at all.

Passing 'id_entry-driver_data' through (that doesn't look like a field name for
'quirk flags) would generate better code.

Even better would be to reference the flag directly from 'ndev'.
A pointer indirection for the test if probably faster then passing another 
argument.

David



RE: [PATCH 1/1 net-next] af_unix: remove NULL assignment on static

2014-10-08 Thread David Laight
From: Hannes Frederic Sowa
 I think David's concern was whether if 0 == false in all situations. It
 is pretty clear that static memory is initialized to 0.

I'm not 100% sure about that.
static pointers may be required to be initialised to NULL.
If NULL isn't the 'all 0 bit pattern' then the memory would need
to be initialised to a different pattern.
Not that anyone is likely to implement such a system because far too
much code will break.

The only system I knew where 'native' NULL pointers were 'all 1s'
used 0 in its C compiler.

David



RE: [PATCH] tun: make sure interface usage can not overflow

2014-09-30 Thread David Laight
From: Hannes Frederic
 On Mo, 2014-09-29 at 12:41 -0700, Kees Cook wrote:
  On Mon, Sep 29, 2014 at 4:04 AM, David Laight david.lai...@aculab.com 
  wrote:
   From: Kees Cook
   This makes the size argument a const, since it is always populated by
   the caller.
  
   There is almost no point making parameters 'const.
   ('const foo *' makes sense).
  
   Additionally double-checks to make sure the copy_from_user
   can never overflow, keeping CONFIG_DEBUG_STRICT_USER_COPY_CHECKS happy:
  
  In function 'copy_from_user',
  inlined from '__tun_chr_ioctl' at drivers/net/tun.c:1871:7:
  ... copy_from_user() buffer size is not provably correct
  
   If 'ifreq_len' could be too big then you want to error the ioctl, not 
   panic.
   If it can't be too big you don't need the check.
 
  The ifreq_len comes from the callers, and is the output of sizeof
  which is const. Changing the function parameter to const means any
  changes made in the future where the incoming value isn't const, the
  compiler will throw a warning.
 
 Hmmm, I think you want something like BUILD_BUG_ON(!
 __builtin_constant_p(var)). const in function argument only ensures that
 the value cannot be modified in the function.

You'd have to do something in the header file - nothing in the function
body can do that check.

I've not got the source handy, but in this case is there any need to
actually pass the size at all?
Is it always the same fixed constant??

David



RE: [PATCH] tun: make sure interface usage can not overflow

2014-09-30 Thread David Laight
From: Hannes Frederic 
 On Di, 2014-09-30 at 08:20 +, David Laight wrote:
  From: Hannes Frederic
   On Mo, 2014-09-29 at 12:41 -0700, Kees Cook wrote:
On Mon, Sep 29, 2014 at 4:04 AM, David Laight david.lai...@aculab.com 
wrote:
 From: Kees Cook
 This makes the size argument a const, since it is always populated by
 the caller.

 There is almost no point making parameters 'const.
 ('const foo *' makes sense).

 Additionally double-checks to make sure the copy_from_user
 can never overflow, keeping CONFIG_DEBUG_STRICT_USER_COPY_CHECKS 
 happy:

In function 'copy_from_user',
inlined from '__tun_chr_ioctl' at drivers/net/tun.c:1871:7:
... copy_from_user() buffer size is not provably correct

 If 'ifreq_len' could be too big then you want to error the ioctl, not 
 panic.
 If it can't be too big you don't need the check.
   
The ifreq_len comes from the callers, and is the output of sizeof
which is const. Changing the function parameter to const means any
changes made in the future where the incoming value isn't const, the
compiler will throw a warning.
  
   Hmmm, I think you want something like BUILD_BUG_ON(!
   __builtin_constant_p(var)). const in function argument only ensures that
   the value cannot be modified in the function.
 
  You'd have to do something in the header file - nothing in the function
  body can do that check.
 
 Sure, it should work. You only need to make sure that gcc inlines the
 function, so the value is constant (it is not enough that gcc knows the
 value range, one specific constant is needed). So the simplest fix for
 this is to specify __tun_chr_ioctl as __always_inline. ;)

You are joking aren't you???

Look at the code.

I'd suggest fixing whatever implements CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
to be less picky.

David



RE: [PATCH v5 2/3] net: Add Keystone NetCP ethernet driver

2014-09-30 Thread David Laight
From: David Miller
 From: Santosh Shilimkar santosh.shilim...@ti.com
 Date: Thu, 25 Sep 2014 13:48:36 -0400
 
  +static inline int gbe_phy_link_status(struct gbe_slave *slave)
  +{
  +   if (!slave-phy)
  +   return 1;
  +
  +   if (slave-phy-link)
  +   return 1;
  +
  +   return 0;
  +}
 
 Please use 'bool' as the return type and return 'true' or 'false'.

That function body could also be just:
return !slave-phy  slave-phy-link;
which might be more readable if directly coded.

I also wonder if slavephy can actually be NULL?
If it can be under unusual circumstances it might be worth
assigning the address of a dummy 'phy' structure so that the
tests can all be removed.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCHv2 6/6] net: fec: fix regression on i.MX28 introduced by rx_copybreak support

2014-10-28 Thread David Laight
From: Lothar Waßmann
 commit 1b7bde6d659d (net: fec: implement rx_copybreak to improve rx 
 performance)
 introduced a regression for i.MX28. The swap_buffer() function doing
 the endian conversion of the received data on i.MX28 may access memory
 beyond the actual packet size in the DMA buffer. fec_enet_copybreak()
 does not copy those bytes, so that the last bytes of a packet may be
 filled with invalid data after swapping.
 This will likely lead to checksum errors on received packets.
 E.g. when trying to mount an NFS rootfs:
 UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 36
 
 Do the byte swapping and copying to the new skb in one go if
 necessary.
 
 Signed-off-by: Lothar Wamann l...@karo-electronics.de
 ---
  drivers/net/ethernet/freescale/fec_main.c |   25 +
  1 file changed, 21 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/net/ethernet/freescale/fec_main.c 
 b/drivers/net/ethernet/freescale/fec_main.c
 index 404fb9d..b92324c 100644
 --- a/drivers/net/ethernet/freescale/fec_main.c
 +++ b/drivers/net/ethernet/freescale/fec_main.c
 @@ -339,6 +339,18 @@ static void *swap_buffer(void *bufaddr, int len)
   return bufaddr;
  }
 
 +static void *swap_buffer2(void *dst_buf, void *src_buf, int len)
 +{
 + int i;
 + unsigned int *src = src_buf;
 + unsigned int *dst = dst_buf;
 +
 + for (i = 0; i  len; i += 4, src++, dst++)
 + swab32s(src);

This will probably benefit from being unrolled slightly.
Neither 'dst' nor the return value is used.

 +
 + return dst_buf;
 +}
 +
  static void fec_dump(struct net_device *ndev)
  {
   struct fec_enet_private *fep = netdev_priv(ndev);
 @@ -1334,7 +1346,7 @@ fec_enet_new_rxbdp(struct net_device *ndev, struct 
 bufdesc *bdp, struct sk_buff
  }
 
  static bool fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb,
 -struct bufdesc *bdp, u32 length)
 +struct bufdesc *bdp, u32 length, bool swap)
  {
   struct  fec_enet_private *fep = netdev_priv(ndev);
   struct sk_buff *new_skb;
 @@ -1349,7 +1361,10 @@ static bool fec_enet_copybreak(struct net_device 
 *ndev, struct sk_buff **skb,
   dma_sync_single_for_cpu(fep-pdev-dev, bdp-cbd_bufaddr,
   FEC_ENET_RX_FRSIZE - fep-rx_align,
   DMA_FROM_DEVICE);
 - memcpy(new_skb-data, (*skb)-data, length);
 + if (!swap)
 + memcpy(new_skb-data, (*skb)-data, length);
 + else
 + swap_buffer2(new_skb-data, (*skb)-data, length);
   *skb = new_skb;
 
   return true;
 @@ -1377,6 +1392,7 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, 
 u16 queue_id)
   u16 vlan_tag;
   int index = 0;
   boolis_copybreak;
 + boolneed_swap = fep-quirks  FEC_QUIRK_SWAP_FRAME;
 
  #ifdef CONFIG_M532x
   flush_cache_all();
 @@ -1440,7 +1456,8 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, 
 u16 queue_id)
* include that when passing upstream as it messes up
* bridging applications.
*/
 - is_copybreak = fec_enet_copybreak(ndev, skb, bdp, pkt_len - 4);
 + is_copybreak = fec_enet_copybreak(ndev, skb, bdp, pkt_len - 4,
 +   need_swap);
   if (!is_copybreak) {
   skb_new = netdev_alloc_skb(ndev, FEC_ENET_RX_FRSIZE);
   if (unlikely(!skb_new)) {
 @@ -1455,7 +1472,7 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, 
 u16 queue_id)
   prefetch(skb-data - NET_IP_ALIGN);
   skb_put(skb, pkt_len - 4);
   data = skb-data;
 - if (fep-quirks  FEC_QUIRK_SWAP_FRAME)
 + if (!is_copybreak  need_swap)
   swap_buffer(data, pkt_len);

It has to be better to set the 'copybreak' limit to be larger than the
maximum frame size and so always go through the 'copybreak' paths.

 
   /* Extract the enhanced buffer descriptor */
 --
 1.7.10.4
 
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCHv2 6/6] net: fec: fix regression on i.MX28 introduced by rx_copybreak support

2014-10-28 Thread David Laight
From:  Lothar Waßmann
 commit 1b7bde6d659d (net: fec: implement rx_copybreak to improve rx 
 performance)
 introduced a regression for i.MX28. The swap_buffer() function doing
 the endian conversion of the received data on i.MX28 may access memory
 beyond the actual packet size in the DMA buffer. fec_enet_copybreak()
 does not copy those bytes, so that the last bytes of a packet may be
 filled with invalid data after swapping.
 This will likely lead to checksum errors on received packets.
 E.g. when trying to mount an NFS rootfs:
 UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 36
 
 Do the byte swapping and copying to the new skb in one go if
 necessary.
 
 Signed-off-by: Lothar Wamann l...@karo-electronics.de
 ---
  drivers/net/ethernet/freescale/fec_main.c |   25 +
  1 file changed, 21 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/net/ethernet/freescale/fec_main.c 
 b/drivers/net/ethernet/freescale/fec_main.c
 index 404fb9d..b92324c 100644
 --- a/drivers/net/ethernet/freescale/fec_main.c
 +++ b/drivers/net/ethernet/freescale/fec_main.c
 @@ -339,6 +339,18 @@ static void *swap_buffer(void *bufaddr, int len)
   return bufaddr;
  }
 
 +static void *swap_buffer2(void *dst_buf, void *src_buf, int len)
 +{
 + int i;
 + unsigned int *src = src_buf;
 + unsigned int *dst = dst_buf;
 +
 + for (i = 0; i  len; i += 4, src++, dst++)
 + swab32s(src);
 +
 + return dst_buf;
 +}
 +

Actually that is completely f*cked

David



RE: [PATCHv2 6/6] net: fec: fix regression on i.MX28 introduced by rx_copybreak support

2014-10-28 Thread David Laight
From: Lothar Waßmann
 David Laight wrote:
  From: Lothar Waßmann
   commit 1b7bde6d659d (net: fec: implement rx_copybreak to improve rx 
   performance)
   introduced a regression for i.MX28. The swap_buffer() function doing
   the endian conversion of the received data on i.MX28 may access memory
   beyond the actual packet size in the DMA buffer. fec_enet_copybreak()
   does not copy those bytes, so that the last bytes of a packet may be
   filled with invalid data after swapping.
   This will likely lead to checksum errors on received packets.
   E.g. when trying to mount an NFS rootfs:
   UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 36
  
   Do the byte swapping and copying to the new skb in one go if
   necessary.
  
   Signed-off-by: Lothar Wamann l...@karo-electronics.de
   ---
drivers/net/ethernet/freescale/fec_main.c |   25 
   +
1 file changed, 21 insertions(+), 4 deletions(-)
  
   @@ -1455,7 +1472,7 @@ fec_enet_rx_queue(struct net_device *ndev, int 
   budget, u16 queue_id)
 prefetch(skb-data - NET_IP_ALIGN);
 skb_put(skb, pkt_len - 4);
 data = skb-data;
   - if (fep-quirks  FEC_QUIRK_SWAP_FRAME)
   + if (!is_copybreak  need_swap)
 swap_buffer(data, pkt_len);
 
  It has to be better to set the 'copybreak' limit to be larger than the
  maximum frame size and so always go through the 'copybreak' paths.
 
 Since the copybreak support is all about performance optimistation, we
 should IMO buy the additional advantage for i.MX28 by not having to
 access the buffer twice (once for copying and once again for swapping).

You definitely want to do the byteswap at the same time as the copy.

The point I'm trying to make that if you need to do the byteswap you
probably might as well copy the data to an skb of the correct size at
the same time.
Certainly I'd expect the 'break even' length will be much higher.

David




RE: [PATCH 03/10] mm: Convert p[te|md]_numa users to p[te|md]_protnone_numa

2014-11-20 Thread David Laight
From:  Mel Gorman
 Convert existing users of pte_numa and friends to the new helper. Note
 that the kernel is broken after this patch is applied until the other
 page table modifiers are also altered. This patch layout is to make
 review easier.

Doesn't that break bisection?

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH v3 0/3] sched: Always check the integrity of the canary

2014-09-11 Thread David Laight
From: Aaron Tomlin
 Currently in the event of a stack overrun a call to schedule()
 does not check for this type of corruption. This corruption is
 often silent and can go unnoticed. However once the corrupted
 region is examined at a later stage, the outcome is undefined
 and often results in a sporadic page fault which cannot be
 handled.
 
 The first patch adds a canary to init_task's end of stack.
 While the second patch provides a helper to determine the
 integrity of the canary. The third checks for a stack
 overrun and takes appropriate action since the damage
 is already done, there is no point in continuing.

Clearly you've just been 'bitten' by a kernel stack overflow.
But a simple 'canary' isn't going to find most of the overflows
and will give an incorrect 'sense of security'.

The canary will only work if the stack is densely written.
In practise the stack alignment rules create gaps, and the
most likely reason for overflow is a large on-stack buffer
that isn't actually written to.

The only real way to detect kernel stack overflow is to arrange
for an unmapped page beyond the stack.
That costs KVA, but not much else.

David



RE: bit fields data tearing

2014-09-04 Thread David Laight
From:  Benjamin Herrenschmidt
 On Wed, 2014-09-03 at 18:51 -0400, Peter Hurley wrote:
 
  Apologies for hijacking this thread but I need to extend this discussion
  somewhat regarding what a compiler might do with adjacent fields in a 
  structure.
 
  The tty subsystem defines a large aggregate structure, struct tty_struct.
  Importantly, several different locks apply to different fields within that
  structure; ie., a specific spinlock will be claimed before updating or 
  accessing
  certain fields while a different spinlock will be claimed before updating or
  accessing certain _adjacent_ fields.
 
  What is necessary and sufficient to prevent accidental false-sharing?
  The patch below was flagged as insufficient on ia64, and possibly ARM.
 
 We expect native aligned scalar types to be accessed atomically (the
 read/modify/write of a larger quantity that gcc does on some bitfield
 cases has been flagged as a gcc bug, but shouldn't happen on normal
 scalar types).

That isn't true on all architectures for items smaller than a machine word.
At least one has to do rmw for byte accesses.

David

 I am not 100% certain of bool here, I assume it's treated as a normal
 scalar and thus atomic but if unsure, you can always use int.
 
 Another option is to use the atomic bitops and make these bits in a
 bitmask but that is probably unnecessary if you have locks already.
 
 Cheers,
 Ben.


RE: [RFC] situation with csum_and_copy_... API

2014-11-21 Thread David Laight
From: Al Viro
...
 We would be better off with iov_iter passed to __sock_{send,recv}msg() (as
 a part of struct msghdr, instead of -msg_iov/-msg_iovlen) and always
 advanced to match the amount of data actually picked from it.  With iovec
 behind it remaining constant.  That would work just as well as the current
 variant for sendmsg(2)/recvmsg(2)/etc., be a lot more convenient for
 kernel_{send,recv}msg() callers and would allow a lot of other fun stuff.

Callers of kernel_send/recvmsg() could easily be using a wrapper
function that creates the 'msghdr'.
When the want to send the remaining part of a buffer the old iterator
will no longer be available - just the original iov and the required offset.

So it would be useful if the iterator could be initialised to a byte
offset down the iov[].

Are there any current code paths where the iov[] is modified but
ends up being something other than 'the remaining data'?
If not then code can check whether iov[0].len has changed, and
skip the 'advance' is it has.

(I've an out-of-tree driver that assumes the iov[] isn't changed.)

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC] situation with csum_and_copy_... API

2014-11-24 Thread David Laight
From: Al Viro
 On Fri, Nov 21, 2014 at 05:42:55PM +, David Laight wrote:
 
  Callers of kernel_send/recvmsg() could easily be using a wrapper
  function that creates the 'msghdr'.
  When the want to send the remaining part of a buffer the old iterator
  will no longer be available - just the original iov and the required offset.
 
 Er...  So why not copy a struct iov_iter to/from msg-msg_iter, then?
 It's not as it had been particulary large - 5 words isn't much...
 
 I'm not at all sure that _anything_ has valid reasons for draining iovecs.
 Maintaining a struct iov_iter and modifying it is easy and actually faster...
 
 Right now the main examples outside of net/* are due to unfortunate
 limitations of -sendmsg() - until now it had no way to be told that
 desired data starts at offset.  With -msg_iter it obviously becomes
 possible...

It may well be easier for code that only has to run in a new kernel.
But for code that has run in old kernels as well you still need
to modify the iov[].

This is also true of userspace - there is no way of completing
a partial transfer without modifying the iov[] to allow for the
partial transfer.

Note that I'm not suggesting that any of the 'write' functions
should ever modify an iov[].

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC] situation with csum_and_copy_... API

2014-11-24 Thread David Laight
From: Al Viro
 On Fri, Nov 21, 2014 at 08:49:56AM +, Al Viro wrote:
 
  Overall, I think I have the whole series plotted in enough details to be
  reasonably certain we can pull it off.  Right now I'm dealing with
  mm/iov_iter.c stuff; the amount of boilerplate source is already high enough
  and with those extra primitives it'll get really unpleasant.
 
  What we need there is something templates-like, as much as I hate C++, and
  I'm still not happy with what I have at the moment...  Hopefully I'll get
  that in more or less tolerable form today.
 
 Folks, I would really like comments on the patch below.  It's an attempt
 to reduce the amount of boilerplate code in mm/iov_iter.c; no new primitives
 added, just trying to reduce the amount of duplication in there.  I'm not
 too fond of the way it currently looks, to put it mildly.  It seems to
 work, it's reasonably straightforward and it even generates slightly better
 code than before, but I would _very_ welcome any tricks that would allow to
 make it not so tasteless.  I like the effect on line count (+124-358), but...
 
 It defines two iterators (for iovec-backed and bvec-backed ones) and converts
 a bunch of primitives to those.  The last argument is an expression evaluated
 for a bunch of ranges; for bvec one it's void, for iovec - size_t; if it
 evaluates to non-0, we treat it as read/write/whatever short by that many
 bytes and do not proceed any further.
 
 Any suggestions are welcome.
 
 diff --git a/mm/iov_iter.c b/mm/iov_iter.c
 index eafcf60..611af2bd 100644
 --- a/mm/iov_iter.c
 +++ b/mm/iov_iter.c
 @@ -4,11 +4,75 @@
  #include linux/slab.h
  #include linux/vmalloc.h
 
 +#define iterate_iovec(i, n, buf, len, move, STEP) {  \
 + const struct iovec *iov = i-iov;   \
 + size_t skip = i-iov_offset;\
 + size_t left;\
 + size_t wanted = n;  \

All these variable names probably want (at least one) _ prefix
just in case they match the names of arguments.

 + buf = iov-iov_base + skip; \
 + len = min(n, iov-iov_len - skip);  \

You are assigning to parameters, this can get confusing.
Unless these are return values this probably doesn't make sense.
Might be better to 'pass by reference', the generated code is
likely to be the same - but it is clearer to the reader.

 + left = STEP;\
 + len -= left;\
 + skip += len;\
 + n -= len;   \

Again, a parameter is modified.

 + while (unlikely(!left  n)) {  \
 + iov++;  \
 + buf = iov-iov_base;\
 + len = min(n, iov-iov_len); \
 + left = STEP;\
 + len -= left;\
 + skip = len; \
 + n -= len;   \
 + }   \
 + n = wanted - n; \
 + if (move) { \
 + if (skip == iov-iov_len) { \
 + iov++;  \
 + skip = 0;   \
 + }   \
 + i-count -= n;  \
 + i-nr_segs -= iov - i-iov; \
 + i-iov = iov;   \
 + i-iov_offset = skip;   \
 + }   \
 +}
...
 + iterate_iovec(i, bytes, buf, len, true,
 + __copy_to_user(buf, (from += len) - len, len))
 + return bytes;

Using 'buf' and 'len' in __copy_to_user() is very non-obvious.
It might be better if they were fixed names, maybe _ioiter_buf and _ioiter_len.

If this code can use the gcc extension that allows #defines that contain {}
to return values, the it would be better as:

bytes_left = iterate_iovec(i, bytes, true,
__copy_to_user(_ioiter_buf, (from += _ioiter_len) - _ioiter_len,
_ioiter_len);
return bytes_left;

Possibly also two wrapper #defines that supply the true/false parameter.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] net: can: usb: peak_usb: pcan_usb_core.c: Cleaning up missing null-terminate in conjunction with strncpy

2014-09-16 Thread David Laight
From: Rickard Strandqvist 
  2014-09-15 10:47 GMT+02:00 David Laight david.lai...@aculab.com:
...
  Or:
  char name[sizeof netdev-name];
  memcpy(name, netdev-name, sizeof netdev-name);
 
...
 I liked the variant:
 char name[sizeof(netdev-name)];
 
 But dislike and do not understand what the point would be with memcpy variant.

The memcpy() will be a lot faster than any of the string copy functions
because it doesn't need to worry about the terminator.
In many cases it will be inlined to a short series of load and stores.

David



RE: [PATCH] tun: make sure interface usage can not overflow

2014-09-29 Thread David Laight
From: Kees Cook
 This makes the size argument a const, since it is always populated by
 the caller.

There is almost no point making parameters 'const.
('const foo *' makes sense).

 Additionally double-checks to make sure the copy_from_user
 can never overflow, keeping CONFIG_DEBUG_STRICT_USER_COPY_CHECKS happy:
 
In function 'copy_from_user',
inlined from '__tun_chr_ioctl' at drivers/net/tun.c:1871:7:
... copy_from_user() buffer size is not provably correct

If 'ifreq_len' could be too big then you want to error the ioctl, not panic.
If it can't be too big you don't need the check.

David

 Signed-off-by: Kees Cook keesc...@chromium.org
 ---
  drivers/net/tun.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/net/tun.c b/drivers/net/tun.c
 index acaaf6784179..a1f317cba206 100644
 --- a/drivers/net/tun.c
 +++ b/drivers/net/tun.c
 @@ -1855,7 +1855,7 @@ unlock:
  }
 
  static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 - unsigned long arg, int ifreq_len)
 + unsigned long arg, const size_t ifreq_len)
  {
   struct tun_file *tfile = file-private_data;
   struct tun_struct *tun;
 @@ -1869,6 +1869,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned 
 int cmd,
   int ret;
 
   if (cmd == TUNSETIFF || cmd == TUNSETQUEUE || _IOC_TYPE(cmd) == 0x89) {
 + BUG_ON(ifreq_len  sizeof(ifr));
   if (copy_from_user(ifr, argp, ifreq_len))
   return -EFAULT;
   } else {
 --
 1.9.1
 
 
 --
 Kees Cook
 Chrome OS Security
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/3] arch: Introduce load_acquire() and store_release()

2014-11-14 Thread David Laight
From: Alexander Duyck
 It is common for device drivers to make use of acquire/release semantics
 when dealing with descriptors stored in device memory.  On reviewing the
 documentation and code for smp_load_acquire() and smp_store_release() as
 well as reviewing an IBM website that goes over the use of PowerPC barriers
 at http://www.ibm.com/developerworks/systems/articles/powerpc.html it
 occurred to me that the same code could likely be applied to device drivers.
 
 As a result this patch introduces load_acquire() and store_release().  The
 load_acquire() function can be used in the place of situations where a test
 for ownership must be followed by a memory barrier.  The below example is
 from ixgbe:
 
   if (!rx_desc-wb.upper.status_error)
   break;
 
   /* This memory barrier is needed to keep us from reading
* any other fields out of the rx_desc until we know the
* descriptor has been written back
*/
   rmb();
 
 With load_acquire() this can be changed to:
 
   if (!load_acquire(rx_desc-wb.upper.status_error))
   break;

If I'm quickly reading the 'new' code I need to look up yet another
function, with the 'old' code I can easily see the logic.

You've also added a memory barrier to the 'break' path - which isn't needed.

The driver might also have additional code that can be added before the barrier
so reducing the cost of the barrier.

The driver may also be able to perform multiple actions before a barrier is 
needed.

Hiding barriers isn't necessarily a good idea anyway.
If you are writing a driver you need to understand when and where they are 
needed.

Maybe you need a new (weaker) barrier to replace rmb() on some architectures.

...


David



RE: [PATCH 0/4] phy: samsung-usb2: Add support for Vbus regulator

2014-12-11 Thread David Laight
From: Vivek Gautam
 This has been on my to-do list for sometime.
 Until now the host controller (specifically ehci-exynos) is responsible
 for enabling VBUS supply. This opens up one more issue which is, when
 only ohci-exynos is enabled and ehci-exynosis disabled then VBUS was
 never enabled (since ohci did not have the code to enabled the VBUS supply).
 
 Rather it should be wise to move the VBUS related stuff to phy driver and
 let phy take care of enabling it.
 
 This patch series adds that VBUS regulator to phy-samsung-usb2 driver,
 adds necessary binding information as well as vbus-supply properties
 to phy nodes on exynos5250 systems.
...

Does this go anyway to allowing devices to be powered from a micro-usb
connector while acting as a USB host (USB OTG).
ie when you want VBUS disabled even though it would normally
need to be enabled.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2 1/1] net/macb: add TX multiqueue support for gem

2014-12-12 Thread David Laight
From: Thomas Petazzoni
 On Thu, 11 Dec 2014 11:16:51 +0100, Cyrille Pitchen wrote:
 
  +#define GEM_ISR1   0x0400
  +#define GEM_ISR2   0x0404
  +#define GEM_ISR3   0x0408
  +#define GEM_ISR4   0x040c
  +#define GEM_ISR5   0x0410
  +#define GEM_ISR6   0x0414
  +#define GEM_ISR7   0x0418
 
 What about doing instead:
 
 #define GEM_ISR(q)((q) == 0 ? MACB_ISR : 0x400 + 
 (q)  2)
 
 And ditto for all other registers, which will save a lot of boring repeated 
 code.

It will probably add a lot of object code and, depending on how often
the registers are accesses, might have performance impact.

Having:
#define GEM_ISR(n) (0x400 + (n)  4)
will save source code.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2 1/1] net/macb: add TX multiqueue support for gem

2014-12-12 Thread David Laight
From: Cyrille Pitchen [...
  It will probably add a lot of object code and, depending on how often
  the registers are accesses, might have performance impact.
 
  Having:
  #define GEM_ISR(n) (0x400 + (n)  4)
  will save source code.
 
  David
 
 
 
 So you suggest that we keep the unsigned int fields ISR, IMR, IER, IDR, TBQP 
 in
 the struct macb_queue and initialize them once for all in macb_probe() like
 patch v2 does but only replace the GEM_ISR1 .. GEM_ISR7 defines by GEM_ISR(n)
 in macb.h?
 
 This way there would be to test at run time and we can handle the special
 register mapping of queue0.
 
 Is it what you meant?

In one word, yes.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 0/4] Add lightweight memory barriers fast_rmb() and fast_wmb()

2014-11-18 Thread David Laight
From: Alexander Duyck
 These patches introduce two new primitives for synchronizing cache-enabled
 memory writes and reads.  These two new primitives are:
 
   fast_rmb()
   fast_wmb()

Not sure I like the names.
If the aim is to sync data into the local cache so that hardware
that is doing cache-snooping accesses sees the data then maybe
local_rmb() and local_wmb()

IIRC read_barrier_depends() is a nop on everything except alpha.
Maybe add the default if it isn't defined by the MD file?

David
 


RE: [PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

2014-12-02 Thread David Laight
From: Jason Wang
  On Mon, Dec 01, 2014 at 06:17:03PM +0800, Jason Wang wrote:
   Hello:
 
   We used to orphan packets before transmission for virtio-net. This
  breaks
   socket accounting and can lead serveral functions won't work, e.g:
 
   - Byte Queue Limit depends on tx completion nofication to work.
   - Packet Generator depends on tx completion nofication for the last
 transmitted packet to complete.
   - TCP Small Queue depends on proper accounting of sk_wmem_alloc to
  work.
 
   This series tries to solve the issue by enabling tx interrupts. To
  minize
   the performance impacts of this, several optimizations were used:
 
   - In guest side, virtqueue_enable_cb_delayed() was used to delay the tx
 interrupt untile 3/4 pending packets were sent.

Doesn't that give problems for intermittent transmits?

...

David



RE: nf_reject_ipv4: module license 'unspecified' taints kernel

2014-10-10 Thread David Laight
From: Dave Young
 With today's linus tree, I got below kmsg:
 [   23.545204] nf_reject_ipv4: module license 'unspecified' taints kernel.
 [   23.551886] Disabling lock debugging due to kernel taint
...

Not 100% related, but why does loading a non-GPL module disable 
lock debugging?
(Is 'lock debugging' actually useful?)

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] net: wireless: brcm80211: brcmfmac: dhd_sdio.c: Cleaning up missing null-terminate in conjunction with strncpy

2014-10-13 Thread David Laight
From: Rickard Strandqvist
 Replacing strncpy with strlcpy to avoid strings that lacks null terminate.
 And changed from using strncpy to strlcpy to simplify code.

I think you should return an error if the strings get truncated.
Silent truncation is going to lead to issues at some point in the future
(in some places).

 Signed-off-by: Rickard Strandqvist rickard_strandqv...@spectrumdigital.se
 ---
  drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c |   25 
 ++--
  1 file changed, 12 insertions(+), 13 deletions(-)
 
 diff --git a/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c
 b/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c
 index f55f625..d20d4e6 100644
 --- a/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c
 +++ b/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c
 @@ -670,7 +670,6 @@ static int brcmf_sdio_get_fwnames(struct brcmf_chip *ci,
 struct brcmf_sdio_dev *sdiodev)
  {
   int i;
 - uint fw_len, nv_len;
   char end;
 
   for (i = 0; i  ARRAY_SIZE(brcmf_fwname_data); i++) {
 @@ -684,25 +683,25 @@ static int brcmf_sdio_get_fwnames(struct brcmf_chip *ci,
   return -ENODEV;
   }
 
 - fw_len = sizeof(sdiodev-fw_name) - 1;
 - nv_len = sizeof(sdiodev-nvram_name) - 1;
   /* check if firmware path is provided by module parameter */
   if (brcmf_firmware_path[0] != '\0') {
 - strncpy(sdiodev-fw_name, brcmf_firmware_path, fw_len);
 - strncpy(sdiodev-nvram_name, brcmf_firmware_path, nv_len);
 - fw_len -= strlen(sdiodev-fw_name);
 - nv_len -= strlen(sdiodev-nvram_name);
 + strlcpy(sdiodev-fw_name, brcmf_firmware_path,
 + sizeof(sdiodev-fw_name));
 + strlcpy(sdiodev-nvram_name, brcmf_firmware_path,
 + sizeof(sdiodev-nvram_name));
 
   end = brcmf_firmware_path[strlen(brcmf_firmware_path) - 1];

If you are doing a strlen() here, you could use the length for the copy
and/or use it to avoid the strcat().

   if (end != '/') {
 - strncat(sdiodev-fw_name, /, fw_len);
 - strncat(sdiodev-nvram_name, /, nv_len);
 - fw_len--;
 - nv_len--;
 + strlcat(sdiodev-fw_name, /,
 + sizeof(sdiodev-fw_name));
 + strlcat(sdiodev-nvram_name, /,
 + sizeof(sdiodev-nvram_name));
   }
   }
 - strncat(sdiodev-fw_name, brcmf_fwname_data[i].bin, fw_len);
 - strncat(sdiodev-nvram_name, brcmf_fwname_data[i].nv, nv_len);
 + strlcat(sdiodev-fw_name, brcmf_fwname_data[i].bin,
 + sizeof(sdiodev-fw_name));
 + strlcat(sdiodev-nvram_name, brcmf_fwname_data[i].nv,
 + sizeof(sdiodev-nvram_name));

I assume something ensures that fw_name[0] == 0 here.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: RCU bug with v3.17-rc3 ?

2014-10-13 Thread David Laight
From: Nathan Lynch
 On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
 
  Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
  it seems that this has been known about for some time.)
 
 Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x  3
 are affected, as well as 4.9.0.
 
  We can blacklist these GCC versions quite easily.  We already have GCC
  3.3 blacklisted, and it's trivial to add others.  I would want to include
  some proper details about the bug, just like the other existing entries
  we already have in asm-offsets.c, where we name the functions that the
  compiler is known to break where appropriate.
 
 Before blacklisting anything, it's worth considering that simple version
 checks would break existing pre-4.8.3 compilers that have been patched
 for PR58854.  It looks like Yocto and Buildroot issued releases with
 patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
 the most we can reasonably do without breaking some correctly-behaving
 toolchains is to emit a warning.

Is it possible to compile a small code fragment and check the generated
code for the bug?
Possibly predicated on the broken version number to avoid false positives.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] net: eth: realtek: atp: checkpatch errors and warnings corrected

2014-11-04 Thread David Laight
From: Roberto Medina
 On 11/03/2014 07:51 PM, Joe Perches wrote:
 
  Some ancient drivers could be regarded as neolithic
  curiosities that never need updating.  This may be one.
 
  But if you really want to change it, could you please
  make sure that objdiff shows no changes?
 
 
 I see some changes with objdiff, maybe this is caused by the include
 file that I changed to linux/io.h instead of asm/io.h
 
 ---
 /home/vov/Git/linux-next/.tmp_objdiff/a641d0e/drivers/net/ethernet/realtek/atp.dis
 2014-11-03 19:59:18.723954900 +0100
 +++
 /home/vov/Git/linux-next/.tmp_objdiff/5f19b70/drivers/net/ethernet/realtek/atp.dis
 2014-11-03 20:00:34.133954217 +0100
 @@ -1753,9 +1753,8 @@
   ...
   :   e3 0a   jrcxz  4c4 __gcov_.atp_probe1+0x14
   :   00 00   add%al,(%rax)
 -:e2 e2   loop   4a0 __gcov_.atp_init+0x20
 -:9c  pushfq
 -:9a  (bad)
 +:4b b4 98rex.WXB mov $0x98,%r12b
 +:ac  lods   %ds:(%rsi),%al
   :   67 2f   addr32 (bad)
   :   e4 e3   in $0xe3,%al
   :   00 00   add%al,(%rax)

That code (and all the ones below) are gibberish, neither the old or new
sequences make any sense.

Almost as though you used the wrong instruction set!

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] x86: bpf_jit_comp: simplify trivial boolean return

2014-11-27 Thread David Laight
From: Joe Perches
 On Wed, 2014-11-26 at 10:34 -0800, Alexei Starovoitov wrote:
  On Wed, Nov 26, 2014 at 10:02 AM, Joe Perches j...@perches.com wrote:
   On Wed, 2014-11-26 at 09:23 -0800, Alexei Starovoitov wrote:
   On Wed, Nov 26, 2014 at 8:58 AM, Joe Perches j...@perches.com wrote:
  
Is there any value in reordering these tests for frequency
or maybe using | instead of || to avoid multiple jumps?
  
   probably not. It's not a critical path.
   compiler may fuse conditions depending on values anyway.
   If it was a critical path, we could have used
   (1  reg)  mask trick.
   I picked explicit 'return true' else 'return false' here,
   because it felt easier to read. Just a matter of taste.
  
   There is a size difference though: (allyesconfig)
  
   $ size arch/x86/net/built-in.o*
  textdata bss dec hex filename
 1299910124336   1834747ab arch/x86/net/built-in.o.new
 1317710764592   18845499d arch/x86/net/built-in.o.old
 
  interesting. Compiler obviously thinks that 178 byte increase
  with -O2 is the right trade off. Which I agree with :)
 
  If I think dropping 'inline' and using -Os will give bigger savings...
 
 This was allyesconfig which already uses -Os
 
 Using -O2, there is no difference using inline
 or not, but the size delta with the bitmask is
 much larger
 
 $ size arch/x86/net/built-in.o* (allyesconfig, but not -Os)
text  data bss dec hex filename
   13410   8203624   1785445be arch/x86/net/built-in.o.new
   16130   8844200   2121452de arch/x86/net/built-in.o.old
   16130   8844200   2121452de arch/x86/net/built-in.o.static

That is quite a big % change in the code size.
Why the change in data?

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-27 Thread David Laight
From: David Hildenbrand
...
 Although it might not be optimal, but keeping a separate counter for
 pagefault_disable() as part of the preemption counter seems to be the only
 doable thing right now. I am not sure if a completely separated counter is 
 even
 possible, increasing the size of thread_info.

What about adding (say) 0x1 for the more restrictive test?

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-27 Thread David Laight
From: David Hildenbrand [mailto:d...@linux.vnet.ibm.com]
  From: David Hildenbrand
  ...
   Although it might not be optimal, but keeping a separate counter for
   pagefault_disable() as part of the preemption counter seems to be the only
   doable thing right now. I am not sure if a completely separated counter 
   is even
   possible, increasing the size of thread_info.
 
  What about adding (say) 0x1 for the more restrictive test?
 
  David
 
 
 You mean as part of the preempt counter?
 
 The current layout (on my branch) is
 
  * PREEMPT_MASK:0x00ff
  * SOFTIRQ_MASK:0xff00
  * HARDIRQ_MASK:0x000f
  * NMI_MASK:0x0010
  * PREEMPT_ACTIVE:  0x0020
 
 I would have added
  * PAGEFAULT_MASK:  0x03C0

I'm not sure where you'd need to add the bits.

I think the above works because disabling 'HARDIRQ' implicitly
disables 'SOFTIRQ' and 'PREEMPT' (etc), so if 256+ threads
disable PREEMPT everything still works.

So if disabling pagefaults implies that pre-emption is disabled
(but SOFTIRQ is still allowed) then you need to insert your bit(s)
between 0xff00 and 0x00ff.
OTOH if disabling pre-emption implies that pagefaults are disabled
then you'd need to use the lsb and change all the above values.

Which makes me think that 'PREEMPT_ACTIVE' isn't right at all.
Two threads disabling NMIs (or 32  disabling HARDIRQ) won't DTRT.

OTOH I'm only guessing at how this is used.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Where exactly will arch_fast_hash be used

2014-12-04 Thread David Laight
From: Herbert Xu
 On Thu, Dec 04, 2014 at 04:43:31PM +0100, Daniel Borkmann wrote:
 
  Hm, I thought the kernel doc on arch_fast_hash() in include/linux/hash.h
  would give enough of a hint ...
 
 I think something more explicit like do not use this in a hash
 table unless you know what you're doing might be needed.

That isn't really helpful. Maybe something more like:

This CRC algorithm used by this hash is 'linear', ie hash(a xor b) ==
hash(a) xor hash(b). This means that it is relatively easy for a remote
attacker to generate multiple items with the same hash.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v1 3/5] mm, uaccess: trigger might_sleep() in might_fault() when pagefaults are disabled

2014-12-05 Thread David Laight
From: David Hildenbrand [...
  This should be likely() instead of unlikely(), no?
  I'd rather write this
 
  if (pagefault_disabled())
  return;
  __might_sleep(file, line, 0);
 
  and leave the likely stuff completely away.
 
 Makes perfect sense!

From my experience of getting (an older version of) gcc to emit
'correctly' statically predicted branches I found that code that
looks like (I don't think return/goto make any difference):

If (unlikely(condition)) {
code;
}
more_code;

is compile with a forwards conditional branch (ie ignoring the unlikely()).
Similarly 'if () continue' is likely to generate a 'predicted taken'
backwards conditional branch.

To get the desired effect you need a non-empty 'else' part, an assembler
comment will suffice, eg: asm volatile(# comment).

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] gianfar: handle map error in gfar_start_xmit()

2014-12-10 Thread David Laight
From: David Miller
 From: Arseny Solokha asolo...@kb.kras.ru
 Date: Fri,  5 Dec 2014 17:37:54 +0700
 
  @@ -2296,6 +2296,12 @@ static int gfar_start_xmit(struct sk_buff *skb, 
  struct net_device *dev)
 0,
 frag_len,
 DMA_TO_DEVICE);
  +   if (unlikely(dma_mapping_error(priv-dev, bufaddr))) {
  +   /* As DMA mapping failed, pretend the TX path
  +* is busy to retry later
  +*/
  +   return NETDEV_TX_BUSY;
  +   }
 
 You are not busy, you are dropping the packet due to insufficient system
 resources.
 
 Therefore the appropriate thing to do is to free the SKB, increment
 the drop statistical counter, and return NETDEV_TX_OK.

Plausibly the error action could depend on the number of messages
in the transmit ring.

If the ring is empty you definitely want to drop the packet.

If mapping a ring full of packets takes more dma map space than
the system has available you may want to be busy - otherwise you
get systemic packet loss when transmitting large burst of data.

This could be a problem if all the available dma mapping resources
have been allocated to receive buffers.

Do any common systems actually have limited dma space (apart from
limited bounce buffers)?
If people are only testing on systems with unlimited dma space (eg x86)
then these paths will never be exercised unless an artificial limit
is applied.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH 0/3] Faster than SLAB caching of SKBs with qmempool (backed by alf_queue)

2014-12-10 Thread David Laight
From: Jesper Dangaard Brouer
 The network stack have some use-cases that puts some extreme demands
 on the memory allocator.  One use-case, 10Gbit/s wirespeed at smallest
 packet size[1], requires handling a packet every 67.2 ns (nanosec).
 
 Micro benchmarking[2] the SLUB allocator (with skb size 256bytes
 elements), show fast-path instant reuse only costs 19 ns, but a
 closer to network usage pattern show the cost rise to 45 ns.
 
 This patchset introduce a quick mempool (qmempool), which when used
 in-front of the SKB (sk_buff) kmem_cache, saves 12 ns on fast-path
 drop in iptables raw table, but more importantly saves 40 ns with
 IP-forwarding, which were hitting the slower SLUB use-case.
 
 
 One of the building blocks for achieving this speedup is a cmpxchg
 based Lock-Free queue that supports bulking, named alf_queue for
 Array-based Lock-Free queue.  By bulking elements (pointers) from the
 queue, the cost of the cmpxchg (approx 8 ns) is amortized over several
 elements.

It seems to me that these improvements could be added to the
underlying allocator itself.
Nesting allocators doesn't really seem right to me.

David



RE: [PATCH 0/5] powerpc: Get rid of redundant arch specific swab functions

2015-02-04 Thread David Laight
From: David Gibson
 arch/powerpc/include/asm/swab.h includes some powerpc specific
 byteswapping functions, which are implemented in terms of powerpc's
 built in byte reversed load/store instructions.  There are two problems with 
 this:
 
 1) They're not necessary - gcc is perfectly capable of generating the
byte-reversed load and store instructions when using the normal,
generic byteswapping functions (tested with gcc (GCC) 4.8.3
20140911 (Red Hat 4.8.3-9))

Should you be worrying about older versions of gcc?
IIRC the internal byteswap 'stuff' is relatively recent (like
the last couple of years) so people building current kernels
on older distributions might have issues.

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH v2] net: bluetooth: hci_sock: Use 'const void *' instead of 'void *' for 2nd parameter of hci_test_bit()

2015-02-04 Thread David Laight
From: Gang S
 On 2/3/15 10:32, Chen Gang S wrote:
  On 2/3/15 05:20, Joe Perches wrote:
  On Tue, 2015-02-03 at 05:14 +0800, Chen Gang S wrote:
  hci_test_bit() does not modify 2nd parameter, so it is better to let it
  be constant, or may cause build warning. The related warning (with
  allmodconfig under xtensa):
  []
  diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
  []
  @@ -46,7 +46,7 @@ struct hci_pinfo {
unsigned shortchannel;
   };
 
  -static inline int hci_test_bit(int nr, void *addr)
  +static inline int hci_test_bit(int nr, const void *addr)
   {
return *((__u32 *) addr + (nr  5))  ((__u32) 1  (nr  31));
   }
 
  It's probably better to use const __u32 * here too, but the
  real thing I wonder is whether or not there's an issue with
  one of the 2 uses of this function.
 
  One of them passes a unsigned long *, the other a u32 *.
 
  $ git grep -w hci_test_bit
  net/bluetooth/hci_sock.c:static inline int hci_test_bit(int nr, void *addr)
  net/bluetooth/hci_sock.c:   if (!hci_test_bit(flt_event, 
  flt-event_mask))
  net/bluetooth/hci_sock.c:!hci_test_bit(ocf  
  HCI_FLT_OCF_BITS,
  net/bluetooth/hci_sock.c-  
  hci_sec_filter.ocf_mask[ogf])) 
 
  hci_sec_filter.ocf_mask is __u32
  but flt-event_mask is unsigned long.
 
  Any possible issue here on 64-bit systems?
 
 
  For me, it can not cause issue on 64-bit systems. hci_test_bit() treats
  'addr' as __u32 *, and has to use the pointer to do something.
 
 
 'event_mask' is intended to type cast to __u32 * within 'hci_sock.c'.
 So for me, const __u32 * is better than const void * for 2nd
 parameter of hci_test_bit().
 
 If what I said above is correct, and also if necessary, I shall patch v3
 for it.

How are the bits set in the first place?
If the array is ever indexed as long [] then the code above is unlikely
to be testing the correct bits.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2] net: bluetooth: hci_sock: Use 'const void *' instead of 'void *' for 2nd parameter of hci_test_bit()

2015-02-04 Thread David Laight
From: Chen Gang S
 -static inline int hci_test_bit(int nr, void *addr)
 +static inline int hci_test_bit(int nr, const void *addr)
  {
   return *((__u32 *) addr + (nr  5))  ((__u32) 1  (nr  31));
  }

Is there a 'standard' function lurking that will do the above.
On x86 the cpus 'bit test' instruction will handle bit numbers
greater than the word size - so it can be a single instruction.

David



<    1   2   3   4   5   6   7   8   9   10   >