RE: A reduced Linux network stack for small systems
From: Andi Kleen There has been a lot of interest recently to run Linux on very small systems, like Quark systems. These may have only 2-4MB memory. They are also limited by flash space. I'm intrigued about the 2-4MB memory. That is more that would typically be available on-chip in a DSP or FPGA. It sounds like an expensive SRAM chip. OTOH a single SDRAM gives 16MB and DDR a lot more - and are a lot cheaper and lower power. Most modern silicon can easily have SDRAM/DDR interfaces. You may want some size reduction to run in 16MB, but it is not as problematic as running in 2MB. With that little memory I wouldn't want to run anything that relied on dynamic memory allocation (after startup) - except for fixed size data buffers. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH/RFC] usb: gadget: CONFIG_USB_S3C_HSOTG should depend on HAS_DMA
From: Geert Uytterhoeven If NO_DMA=y: drivers/built-in.o: In function `s3c_hsotg_map_dma': s3c-hsotg.c:(.text+0x375b2c): undefined reference to `usb_gadget_map_request' drivers/built-in.o: In function `s3c_hsotg_unmap_dma': s3c-hsotg.c:(.text+0x376a32): undefined reference to `usb_gadget_unmap_request' make[3]: *** [vmlinux] Error 1 Note that all callers of s3c_hsotg_map_dma()/s3c_hsotg_unmap_dma() are protected by a call into the using_dma() checking function: static inline bool using_dma(struct s3c_hsotg *hsotg) { return false; /* support is not complete */ } but not all versions of gcc manage to optimize the check away, causing a link error. A have seen gcc optimise away a call, but leave the undefined symbol in the symbol table. That might be what is happening here. I don't know a generic solution, I used 'objcopy -N__udivdi3' to remove the explicit unwanted reference from my driver. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 00/13] Refactor pci_is_brdige() to simplify code
From: Yijing Wang This patchset rename the current pci_is_bridge() to pci_has_subordinate(), and introduce a new pci_is_bridge() which determine pci bridge by check dev-hdr_type. The new one is more accurate. PCIe Spec define the pci device is a bridge by the dev-hdr_type = 0x01 || 0x02. That is a dangerous rename and is likely to cause difficult to identify bugs in any code you've missed. David
RE: [PATCH v3 2/8] DMA: Freescale: unify register access methods
From: hongbo.zh...@freescale.com Methods of accessing DMA contorller registers are inconsistent, some registers ^^ are accessed by DMA_IN/OUT directly, while others are accessed by functions get/set_* which are wrappers of DMA_IN/OUT, and even for the BCR register, it is read by get_bcr but written by DMA_OUT. This patch unifies the inconsistent methods, all registers are accessed by get/set_* now. David N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH] tcp: fix compiler array bounds warning on selective_acks[]
Bjorn Helgaas With -Werror=array-bounds, gcc v4.8.x warns that in tcp_sack_remove(), a selective_acks[] array subscript is above array bounds. I don't understand how gcc figures this out, or why we don't see similar problems many other places, but this is the only fix I can figure out. ... diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 65cf90e063d5..65133b108236 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4047,7 +4047,8 @@ static void tcp_sack_remove(struct tcp_sock *tp) /* Zap this SACK, by moving forward any other SACKS. */ for (i = this_sack+1; i num_sacks; i++) - tp-selective_acks[i-1] = tp-selective_acks[i]; + if (i ARRAY_SIZE(tp-selective_acks)) + tp-selective_acks[i-1] = tp-selective_acks[i]; num_sacks--; continue; } You really shouldn't add that test every time around the loop. Try changing the loop so the assignment is: tp-selective_acks[i] = tp-selective_acks[i + 1]; or the loop test to: i = num_sacks - 1; Or beat up the gcc developers :-) David
RE: [PATCH v2 15/45] rcu: Use get/put_online_cpus_atomic() to prevent CPU offline
It would also increase the latency of CPU-hotunplug operations. Is that a big deal? I thought that was the whole deal with this patchset - making cpu hotunplugs lighter and faster mostly for powersaving. That said, just removing stop_machine call would be a pretty good deal and I don't know how meaningful reducing CPU hotunplug latency is. Srivatsa? Keeping the hotunplug latency is important for suspend/resume, where we take all non-boot CPUs in a loop. That's an interesting use-case where intrusiveness doesn't matter much, but latency does. So yes, making CPU hotplug faster is also one of the goals of this patchset. If you are removing all but one of the cpu, the you only need one rcu cycle (remove everything from the list first). I'd also guess that you can't suspend a cpu until you can sleep the process that is running on it - so if a process has pre-emption disabled you aren't going to complete suspend until the process sleeps (this wouldn't be true if you suspended the cpu with its current stack - but if suspend is removing the non-boot cpus first it must be doing so from the scheduler idle loop). If you are doing suspend for aggressive power saving, then all the processes (and processors) will already be idle. However you probably wouldn't want the memory accesses to determine this on a large NUMA system with 1024+ processors. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 2/2] hso: fix deadlock when receiving bursts of data
From: Olivier Sobrie ... The function put_rxbuf_data() is called from the urb completion handler. It puts the data of the urb transfer in the tty buffer with tty_insert_flip_string_flags() and schedules a work queue in order to push the data to the ldisc. Problem is that we are in a urb completion handler so we can't wait until there is room in the tty buffer. Surely you can just keep the urb? Resubmit it later when all the data has been transferred. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 2/2] hso: fix deadlock when receiving bursts of data
From: One Thousand Gnomes On Thu, 10 Jul 2014 14:37:37 + David Laight david.lai...@aculab.com wrote: From: Olivier Sobrie ... The function put_rxbuf_data() is called from the urb completion handler. It puts the data of the urb transfer in the tty buffer with tty_insert_flip_string_flags() and schedules a work queue in order to push the data to the ldisc. Problem is that we are in a urb completion handler so we can't wait until there is room in the tty buffer. The tty provides the input queue, if the queue is full then just chuck the data in the bitbucket. hso is trying to be far too clever. If hso is fast enough that the buffering isn't sufficient on the tty side then we need to fix the tty buffer size. You really want to apply flow control back over the 'serial' link. That may just cause data discards earlier on the local system. But it is possible that not resubmitting the receive urb will cause the modem to flow control back to the sender. In which case there is some chance that data won't be lost. David Arguably what we need are tty fastpaths for non N_TTY but thats rather more work. There's no fundamental reason that hso can't throw the buffer at buffer straight at the PPP ldisc and straight into the network stack - just that 1. The tty mid layer glue is standing in the way 2. The change of line discipline code has to lock against it Alan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 2/2] hso: fix deadlock when receiving bursts of data
From: Olivier Sobrie Olivier Sobrie Hi Alan and Davids, On Thu, Jul 10, 2014 at 04:50:03PM +0100, One Thousand Gnomes wrote: On Thu, 10 Jul 2014 14:37:37 + David Laight david.lai...@aculab.com wrote: From: Olivier Sobrie ... The function put_rxbuf_data() is called from the urb completion handler. It puts the data of the urb transfer in the tty buffer with tty_insert_flip_string_flags() and schedules a work queue in order to push the data to the ldisc. Problem is that we are in a urb completion handler so we can't wait until there is room in the tty buffer. The tty provides the input queue, if the queue is full then just chuck the data in the bitbucket. hso is trying to be far too clever. If hso is fast enough that the buffering isn't sufficient on the tty side then we need to fix the tty buffer size. Ok I'll adapt the patch to drop the data that can't be put in the tty buffer. I test this and resend a new patch. If you are going to drop data, then ideally you want to discard entire ppp packets. Depending on exactly how the interface works it might be that urb are likely to contain complete packets. So discarding entire urb might work better than discarding a few bytes. (But don't even think of scanning the data stream in the usb driver - except for experiments.) David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/5] xhci: Use correct SLOT ID when handling a reset device command
From: Of Mathias Nyman Command completion events normally include command completion status, SLOT_ID, and a pointer to the original command. Reset device command completion SLOT_ID may be zero according to xhci specs 4.6.11. VIA controllers set the SLOT_ID to zero, triggering a WARN_ON in the command completion handler. Use the SLOT ID found from the original command instead. This patch should be applied to stable kernels since 3.13 that contain the commit 20e7acb13ff48fbc884d5918c3697c27de63922a xhci: use completion event's slot id rather than dig it out of command Cc: sta...@vger.kernel.org # 3.13 Reported-by: Saran Neti saran...@gmail.com Tested-by: Saran Neti saran...@gmail.com Signed-off-by: Mathias Nyman mathias.ny...@linux.intel.com --- drivers/usb/host/xhci-ring.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index d67ff71..71657d3 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -1433,8 +1433,11 @@ static void handle_cmd_completion(struct xhci_hcd *xhci, xhci_handle_cmd_reset_ep(xhci, slot_id, cmd_trb, cmd_comp_code); break; case TRB_RESET_DEV: - WARN_ON(slot_id != TRB_TO_SLOT_ID( - le32_to_cpu(cmd_trb-generic.field[3]))); + /* SLOT_ID field in reset device cmd completion event TRB is 0. Minor nit... Surely is would be better to say 'is undefined' or 'may be zero'. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] vfio: Fix endianness handling for emulated BARs
From: Alexey Kardashevskiy ... So IMHO we should either create new, generic iowrite helpers that don't do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls. I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense I do not understand why @val is considered LE here and need to be converted to CPU. Really. I truly believe it should be cpu_to_le32(). Think about it carefully... Apparently iowrite32() is writing a 'cpu' value out 'le'. So if you have a 'le' value you need to convert it to 'cpu' first. I really won't ask how 'be' ppc managed to get 'le' on-chip peripherals. David N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH] USB: ehci-pci: USB host controller support for Intel Quark X1000
From: Chen, Alvin On Tue, Jun 24, 2014 at 08:51:43AM -0700, Chen, Alvin wrote: diff --git a/drivers/usb/host/pci-quirks.c b/drivers/usb/host/pci-quirks.c index 00661d3..1ea8803 100644 --- a/drivers/usb/host/pci-quirks.c +++ b/drivers/usb/host/pci-quirks.c @@ -823,6 +823,48 @@ static int handshake(void __iomem *ptr, u32 mask, u32 done, return -ETIMEDOUT; } +#define PCI_DEVICE_ID_INTEL_QUARK_X1000_SOC 0x0939 +bool usb_is_intel_qrk(struct pci_dev *pdev) { + return pdev-vendor == PCI_VENDOR_ID_INTEL + pdev-device == PCI_DEVICE_ID_INTEL_QUARK_X1000_SOC; + +} +EXPORT_SYMBOL_GPL(usb_is_intel_qrk); There is no lack of vowels available to you, please use them. OK, I will change ' usb_is_intel_qrk ' to ' usb_is_intel_quark'. Or even usb_is_intel_quark_x1000() ? David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2] sched: Fix compiler warnings
From: Guenter Roeck On 06/25/2014 07:49 AM, Uwe Kleine-Knig wrote: Hello Guenter, On Wed, Jun 25, 2014 at 07:27:47AM -0700, Guenter Roeck wrote: Maybe the author's intention was: static inline int cpu_corepower_flags(void) __attribute__((const)); ? This specifies that the function has no side effects and the return value only depends on the (here non-existing) function arguments. Possibly, but either I am missing something or this doesn't compile. You need to do a separate declaration: static inline int cpu_corepower_flags(void) __attribute__((const)); static inline int cpu_corepower_flags(void) { ... Actually turns out one can use __attribute_const__, and it is static inline int __attribute_const__ cpu_corepower_flags(void) which turns out to be widely used. I'll change that and resubmit after testing. You don't need to tell the compiler that for an inline function. David N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH v2] sched: Fix compiler warnings
From: Uwe Kleine-König Hello, On Wed, Jun 25, 2014 at 03:40:28PM +, David Laight wrote: From: Guenter Roeck Actually turns out one can use __attribute_const__, and it is static inline int __attribute_const__ cpu_corepower_flags(void) which turns out to be widely used. I'll change that and resubmit after testing. You don't need to tell the compiler that for an inline function. I didn't check for the functions in question here, but in general your statement is wrong. For example: static inline unsigned int __attribute_const__ read_cpuid_id(void) { return readl(BASEADDR_V7M_SCB + V7M_SCB_CPUID); } from arch/arm/include/asm/cputype.h. The V7M_SCB_CPUID register never changes, but there is no way gcc can deduce that. Hmm... it all rather depends on the order of the optimisations and inlining. I've tried to use 'restrict' on the parameters to an inline function in an attempt to get 'noalias' - but the reverse inference never seems to be applied. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2 net-next 0/2] split BPF out of core networking
From: Chema Gonzalez ... 4.5. BPF_ST|BPF_MEM Operation: *(size *) (dst_reg + off16) = imm32 This insn encodes 2 immediate values (the offset and the imm32 value) in the insn, and actually forces the sock_filter_int 64-bit struct to have both a 16-bit offset field and a 32-bit immediate field). In fact, it's the only instructions that uses .off and .imm at the same time (for all other instructions, at least one of the fields is always 0). This did not exist in classic BPF (where BPF_ST|BPF_MEM actually did mem[pc-k] = A;). In fact, it's rare to find an ISA that allows encoding 2 immediate values in a single insn. My impression (after checking the x86 JIT implementation, which works on the eBPF code) is that this was added as an x86 optimization, because x86 allows encoding 2 values (offset and immediate) by using the displacement and immediate suffixes. I wonder whether the ISA would be more readable if we did this in 2 insn, one to put dst_reg+off16 in a temporary register, and the second a simpler BPF_STX|BPF_MEM. Then we could use the same space for the immediate and offset fields. One option is to add code to the x86 JIT to detect the two instruction sequence and generate a single instruction. Thinks further, the JIT might be easier to write if there is a temporary register that is defined to be only valid for the next instruction (or two). Then the JIT can completely optimise away any assignments to it without requiring a full analysis of the entire program. David
RE: [PATCH net-next 1/2] net: filter: split filter.c into two files
From: Varka Bhadram On 07/23/2014 11:31 AM, Alexei Starovoitov wrote: BPF is used in several kernel components. This split creates logical boundary between generic eBPF core and the rest kernel/bpf/core.c: eBPF interpreter net/core/filter.c: classic-eBPF converter, classic verifiers, socket filters This patch only moves functions. If we are moving the code also its good to do cleanup. Except you need to do it as a separate patch. David Run checkpatch.pl on this... -- Regards, Varka Bhadram. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2] USB: ehci-pci: USB host controller support for Intel Quark X1000
From: Jingoo Han ... /* The maximal threshold value is 0x80, which means 512 bytes */ #define EHCI_THRESHOLD_512BYTES 0x80 #define EHCI_THRESHOLD_508BYTES 0x79 It would be better to define these using expressions. So: #define EHCI_THRESHOLD_512BYTES (512u / 8u) #define EHCI_THRESHOLD_508BYTES (508u / 8u) Then you might decide to use: #define EHCI_THRESHOLD(size) ((size) / 8u) Then realise that the names are not generic EHCI, so need some driver-specific prefix (for namespace reasons). And that the defines are probably limit values, and should be named as such. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCHv2] usb: gadget: serial: replace hardcoded ttyGS with PREFIX
From: Of Richard Leitner Replaces all hardcoded ttyGS strings with the PREFIX macro. Due to the fact the strings are spread over different source files the PREFIX definition is moved to u_serial.h Lots of changes like: - DBG(cdev, acm ttyGS%d completion, err %d\n, - acm-port_num, req-status); + DBG(cdev, acm %s%d completion, err %d\n, + PREFIX, acm-port_num, req-status); I'm not sure this improves the code. Do you see a need to ever change PREFIX? Even if it is a reasonable change, the name PREFIX is hopeless. It would have to be something like ACM_TTYNAME_PREFIX. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCHv2] usb: gadget: serial: replace hardcoded ttyGS with PREFIX
From: Richard Leitner On Mon, 30 Jun 2014 08:41:18 + David Laight david.lai...@aculab.com wrote: From: Of Richard Leitner Replaces all hardcoded ttyGS strings with the PREFIX macro. Due to the fact the strings are spread over different source files the PREFIX definition is moved to u_serial.h Lots of changes like: - DBG(cdev, acm ttyGS%d completion, err %d\n, - acm-port_num, req-status); + DBG(cdev, acm %s%d completion, err %d\n, + PREFIX, acm-port_num, req-status); I'm not sure this improves the code. Maybe you're right, the readability of the code wouldn't be better afterwards, ... Indeed it will make most attempts to grep for the error message fail. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH RFC net-next 07/14] bpf: expand BPF syscall with program load/unload
From: Alexei Starovoitov On Fri, Jun 27, 2014 at 5:19 PM, Andy Lutomirski l...@amacapital.net wrote: On Fri, Jun 27, 2014 at 5:05 PM, Alexei Starovoitov a...@plumgrid.com wrote: eBPF programs are safe run-to-completion functions with load/unload methods from userspace similar to kernel modules. User space API: - load eBPF program prog_id = bpf_prog_load(int prog_id, bpf_prog_type, struct nlattr *prog, int len) where 'prog' is a sequence of sections (currently TEXT and LICENSE) TEXT - array of eBPF instructions LICENSE - GPL compatible + + err = -EINVAL; + /* look for mandatory license string */ + if (!tb[BPF_PROG_LICENSE]) + goto free_attr; + + /* eBPF programs must be GPL compatible */ + if (!license_is_gpl_compatible(nla_data(tb[BPF_PROG_LICENSE]))) + goto free_attr; Seriously? My mind boggles. Yes. Quite a bit of logic can fit into one eBPF program. I don't think it's wise to leave this door open for abuse. This check makes it clear that if you write a program in C, the source code must be available. That seems utterly extreme. Loadable kernel modules don't have to be GPL. I can imagine that some people might not want to load code for which they don't have the source - but in that case they probably want to compile it themselves anyway. I don't want to have to put a gpl licence on random pieces of test code I might happen to write for my own use. David
RE: [PATCH] vsprintf: Remove SPECIAL from pointer types
From: Joe Perches Because gcc issues a complaint about any pointer format with %#p, remove the use of SPECIAL to prefix 0x to various pointer types. There are no uses in the kernel tree of %#p. I know you guys don't really care about them, but there might be uses in out of tree drivers. With the change what is output for %#p ? I know I've used %#p in some code (possibly userspace) that need to run under multiple OS because 0x%p generates 0x0x on one of the OS. (We might have removed them because of the gcc warning though.) David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 2/2] hso: fix deadlock when receiving bursts of data
From: Olivier Sobrie When the module sends bursts of data, sometimes a deadlock happens in the hso driver when the tty buffer doesn't get the chance to be flushed quickly enough. To avoid this, first, we remove the endless while loop in put_rx_bufdata() which is the root cause of the deadlock. Secondly, when there is no room anymore in the tty buffer, we set up a timer of 100 msecs to give a chance to the upper layer to flush the tty buffer and make room for new data. What is the timer for? You need to get the sending code woken up by the urb completion. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 2/2] hso: fix deadlock when receiving bursts of data
From: Olivier Sobrie Hi David, On Mon, Jul 07, 2014 at 09:13:53AM +, David Laight wrote: From: Olivier Sobrie When the module sends bursts of data, sometimes a deadlock happens in the hso driver when the tty buffer doesn't get the chance to be flushed quickly enough. To avoid this, first, we remove the endless while loop in put_rx_bufdata() which is the root cause of the deadlock. Secondly, when there is no room anymore in the tty buffer, we set up a timer of 100 msecs to give a chance to the upper layer to flush the tty buffer and make room for new data. What is the timer for? You need to get the sending code woken up by the urb completion. In put_rxbuf_data() (which can be called under irq disabled), tty_flip_buffer_push() is called and schedules a push of the tty buffer. When the buffer is full, I give some time to the above layer in order to flush it. The timer is used to recall put_rxbuf_data_and_resubmit_bulk_urb() later in order to read the remaining data stored in urb-transfer_buffer and then to resubmit the urb to receive more data from the gsm module. I don't understand what you mean by getting the sending code woken up. Calling tty_port_tty_wakeup()?? We are in the receive path... Sorry, it isn't at all clear from the general description whether you are referring to the transmit or receive path. 'flush' can mean all sorts of things ... In either case a 100ms delay to data doesn't seem right at all. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/3] PCI/MSI: Add pci_enable_msi_partial()
From: Bjorn Helgaas ... Even if you do that, you ought to write valid interrupt information into the 4th slot (maybe replicating one of the earlier interrupts). Then, if the device does raise the 'unexpected' interrupt you don't get a write to a random kernel location. I might be missing something, but we are talking of MSI address space here, aren't we? I am not getting how we could end up with a 'write' to a random kernel location when a unclaimed MSI vector sent. We could only expect a spurious interrupt at worst, which is handled and reported. Yes, that's how I understand it. With MSI, the OS specifies the a single Message Address, e.g., a LAPIC address, and a single Message Data value, e.g., a vector number that will be written to the LAPIC. The device is permitted to modify some low-order bits of the Message Data to send one of several vector numbers (the MME value tells the device how many bits it can modify). Bottom line, I think a spurious interrupt is the failure we'd expect if a device used more vectors than the OS expects it to. So you need to tell the device where to write in order to raise the 'spurious interrupt'. David
RE: [PATCH 1/2] net: cadence: macb: add support for the WOL
From: Varka Bhadram On 07/14/2014 02:32 PM, Jongsung Kim wrote: This patch enables the ethtool utility to control the WOL function of the PHY connected to the GEM/MACB. (if supported) ... +static int macb_set_wol(struct net_device *netdev, struct ethtool_wolinfo *wol) +{ + struct macb *bp = netdev_priv(netdev); + struct phy_device *phydev = bp-phy_dev; + int err = -ENODEV; + + if (phydev) + err = phy_ethtool_set_wol(phydev, wol); + + return err; +} + I think we can do in this way: if (phydev) return phy_ethtool_set_wol(phydev, wol); else return -ENODEV; we can save err. What do you say ...? I would do: if (!phydev) return -ENODEV; return phy_ethtool_set_wol(phydev, wol); Although it might even be worth moving the NULL test into the function. (sort of depends on style and the number of callers who need to do the test.) David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH RFC net-next 08/14] bpf: add eBPF verifier
From: Alexei Starovoitov ... +#define _(OP) ({ int ret = OP; if (ret 0) return ret; }) ... + _(get_map_info(env, map_id, map)); Nit: such macros should be removed, please. It may surely look unconventional, but alternative is to replace every usage of _ macro with: err = if (err) return err; and since this macro is used 38 times, it will add ~120 unnecessary lines that will only make code much harder to follow. I tried not using macro and results were not pleasing. The problem is that they are hidden control flow. As such they make flow analysis harder for the casual reader. The extra lines really shouldn't matter. David
RE: [PATCH][RFC] USB: zerocopy support for usbfs
From: Stefan Klug ... Is there any way to check if the host controller supports arbitrary alignment? If I read the xhci spec correctly arbitrary alignment is explicitly permitted. Not entirely. The xhci spec has a few limits on the alignment of transfer buffer. They seem to be designed to make life difficult for the kernel! 1) Transfer buffers cannot be longer than 64k. 2) Transfer buffers cannot cross 64k address boundaries. 3) The end of a ring segment must occur on a USB packet boundary. The current xhci driver doesn't implement check 3 - which causes certain devices to fail (notable the ax88179_178a usb3 ethernet). David
RE: [PATCH RFC net-next 08/14] bpf: add eBPF verifier
From: Alexei Starovoitov +#define _(OP) ({ int ret = OP; if (ret 0) return ret; }) +1 to removing the _ macro. If you want to avoid the 3 lines (is there anything in the style guide against if ((err=OP) 0) ... ?), at assignment and function call inside 'if' ? I don't like such style. least use some meaningful macro name (DO_AND_CHECK, or something like that). It would have to be RETURN_IF_NEGATIVE(). But even then it is skipped by searches for 'return'. Try replacing _ with any other name and see how bad it will look. I tried with MACRO_NAME and with 'if (err) goto' and with 'if (err) return', before I converged on _ macro. I think it's a hidden gem of this patch. No, it is one of those things that 'seems like a good idea at the time', but causes grief much later on. Have you considered saving the error code into 'env' and making most of the functions return if an error is set? Then the calling code need not check the result of every function call. David
RE: [PATCH 1/3] PCI/MSI: Add pci_enable_msi_partial()
From: Bjorn Helgaas On Tue, Jun 10, 2014 at 03:10:30PM +0200, Alexander Gordeev wrote: There are PCI devices that require a particular value written to the Multiple Message Enable (MME) register while aligned on power of 2 boundary value of actually used MSI vectors 'nvec' is a lesser of that MME value: roundup_pow_of_two(nvec) 'Multiple Message Enable' However the existing pci_enable_msi_block() interface is not able to configure such devices, since the value written to the MME register is calculated from the number of requested MSIs 'nvec': 'Multiple Message Enable' = roundup_pow_of_two(nvec) For MSI, software learns how many vectors a device requests by reading the Multiple Message Capable (MMC) field. This field is encoded, so a device can only request 1, 2, 4, 8, etc., vectors. It's impossible for a device to request 3 vectors; it would have to round up that up to a power of two and request 4 vectors. Software writes similarly encoded values to MME to tell the device how many vectors have been allocated for its use. For example, it's impossible to tell the device that it can use 3 vectors; the OS has to round that up and tell the device it can use 4 vectors. So if I understand correctly, the point of this series is to take advantage of device-specific knowledge, e.g., the device requests 4 vectors via MMC, but we know the device is only capable of using 3. Moreover, we tell the device via MME that 4 vectors are available, but we've only actually set up 3 of them. ... Even if you do that, you ought to write valid interrupt information into the 4th slot (maybe replicating one of the earlier interrupts). Then, if the device does raise the 'unexpected' interrupt you don't get a write to a random kernel location. Plausibly something similar should be done when a smaller number of interrupts is assigned. David
RE: [PATCH 1/3] PCI/MSI: Add pci_enable_msi_partial()
From: Alexander Gordeev ... Even if you do that, you ought to write valid interrupt information into the 4th slot (maybe replicating one of the earlier interrupts). Then, if the device does raise the 'unexpected' interrupt you don't get a write to a random kernel location. I might be missing something, but we are talking of MSI address space here, aren't we? I am not getting how we could end up with a 'write' to a random kernel location when a unclaimed MSI vector sent. We could only expect a spurious interrupt at worst, which is handled and reported. Anyway, as I described in my reply to Bjorn, this is not a concern IMO. I'm thinking of the following - which might be MSI-X ? 1) Hardware requests some interrupts and tells the host the BAR (and offset) where the 'vectors' should be written. 2) To raise an interrupt the hardware uses the 'vector' as the address of a normal PCIe write cycle. So if the hardware requests 4 interrupts, but the driver (believing it will only use 3) only write 3 vectors, and then the hardware uses the 4th vector it can write to a random location. Debugging that would be hard! David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 3/7] drivers: usb: Include appropriate header file in hcd.h
From: Rashika Kheria Include header file include/linux/usb.h in include/linux/usb/hcd.h because structures usb_device, usb_host_config and usb_interface have their definitions in include/linux/usb.h. This eliminates the following warning in include/linux/usb/hcd.h: include/linux/usb/hcd.h:311:44: warning: ‘struct usb_device’ declared inside parameter list [enabled by default] include/linux/usb/hcd.h:412:10: warning: ‘struct usb_host_config’ declared inside parameter list [enabled by default] include/linux/usb/hcd.h:614:9: warning: ‘struct usb_interface’ declared inside parameter list [enabled by default] All it is necessary to do is add a declaration of the struct before the function definition. There is no need to include the definition of the structure. It is a shame that gcc doesn't defer this warning to any call site (where an incorrect type would get passed). David N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH 1/7] drivers: usb: Include appropriate header file in hcd.c
These warning are non-default GCC warnings. These can be seen either by adding W=1 while running make (i.e. make W=1) or adding -Wmissing-prototypes in KBUILD_CFLAGS in the toplevel Makefile. By default, we don't care about 'W=1' warnings, as no one sees them, and they don't matter. -Wmissing-prototypes really ought to be defined. Some of the other warning are a little more pedantic, the most annoying is -Wsign-compare. OTOH a lot of code out there is clean enough to build without any warnings and with a moderate amount of pedantry from the compiler. OTOH just including extra headers isn't ideal - it can considerably slow down the compilation time. There are many subsystems that don't really separate their internal headers from their external ones. David N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH 1/7] drivers: usb: Include appropriate header file in hcd.c
From: Josh Triplett On Thu, Dec 19, 2013 at 05:33:09PM -, David Laight wrote: OTOH just including extra headers isn't ideal - it can considerably slow down the compilation time. There are many subsystems that don't really separate their internal headers from their external ones. There's a benefit to doing so, though: it ensures that the prototypes in the header stay in sync with the definition. I think you misunderstood what I was saying. The header with the function prototypes has to be included when the driver itself is built - there is a gcc warning for that as well. The 'problem' is that if I only have: void foo(struct foo *); then the C language (uselessly) scopes the declaration of 'struct foo' to the inside of the functions - making it almost impossible to call. All you need is an outer declaration: struct foo; void foo(struct foo *); what you don't need is the actual definition of 'struct foo'. Source files that need to look at the members of the structure should be directly including the header that defines the structure. Adding nested includes just makes it more likely that code will fail to directly include the headers that define the structures it uses. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] net: asix: fix bad header length bug
From: Emil Goode The AX88772B occasionally send rx packets that cross urb boundaries and the remaining partial packet is sent with no header. When the buffer with a partial packet is of less number of octets than the value of hard_header_len the buffer is discarded by the usbnet module. This is causing dropped packages and error messages in dmesg. This can be reproduced by using ping with a packet size between 1965-1976. I think this can affect other USB ethernet drivers. Probably most of the ones that explicitly set rx_urb_len. The ax88179_178a driver sets massive 20k receive urb. I've seen over 10k of data in a single urb, dunno if it can actually generate more than 20k - possibly if the usb3 link is loaded with other traffic. It would be much more efficient for it to use an aligned 4k urb and then merge the fragment into skbs. Once you've set: + dev-net-hard_header_len = 0; /* Partial packets have no header */ try setting the mtu to a multiple of 1k. There is a very odd check in usbnet_change_mtu() that tries to stop the receive urb_length being a multiple of the usb packet size. This code looks as though it is hoping that the usb controller will discard any full length bulk messages after finding a short buffer. I suspect that might be just wishful thinking! David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] net: asix: fix bad header length bug
From: Igor Gnatenko On Thu, 2014-02-06 at 13:56 +0100, Emil Goode wrote: The AX88772B occasionally send rx packets that cross urb boundaries and the remaining partial packet is sent with no header. When the buffer with a partial packet is of less number of octets than the value of hard_header_len the buffer is discarded by the usbnet module. This is causing dropped packages and error messages in dmesg. This can be reproduced by using ping with a packet size between 1965-1976. The bug has been reported here: https://bugzilla.kernel.org/show_bug.cgi?id=29082 Signed-off-by: Emil Goode emilgo...@gmail.com Reported-and-tested-by: Igor Gnatenko i.gnatenko.br...@gmail.com --- drivers/net/usb/asix_devices.c |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c index 9765a7d..120bb29 100644 --- a/drivers/net/usb/asix_devices.c +++ b/drivers/net/usb/asix_devices.c @@ -455,6 +455,7 @@ static int ax88772_bind(struct usbnet *dev, struct usb_interface *intf) dev-net-ethtool_ops = ax88772_ethtool_ops; dev-net-needed_headroom = 4; /* cf asix_tx_fixup() */ dev-net-needed_tailroom = 4; /* cf asix_tx_fixup() */ + dev-net-hard_header_len = 0; /* Partial packets have no header */ That is the wrong place for the fix. It should only be done when rx_urb_size is set to a multiple of the usb packet size. That is only done for some of the supported devices. In fact, if the rx_urb_size is a multiple of the usb frame size (or 1k) then maybe the usbnet code should assume that the driver is capable of processing ethernet frames that cross usb packet boundaries and not delete short packets at all - regardless of the hard_header_len. David
RE: [PATCH 19/26] drivers: isdn: Move prototype declaration to header file platform.h from diva_didd.c
From: Rashika Kheria Move prototype declarations of function to header file hardware/eicon/platform.h because they are used by more than one file. This eliminates the following warnings in hardware/eicon/diddfunc.c: drivers/isdn/hardware/eicon/diddfunc.c:95:12: warning: no previous prototype for diddfunc_init [- Wmissing-prototypes] drivers/isdn/hardware/eicon/diddfunc.c:110:13: warning: no previous prototype for diddfunc_finit [- Wmissing-prototypes] ... diff --git a/drivers/isdn/hardware/eicon/diva_didd.c b/drivers/isdn/hardware/eicon/diva_didd.c index fab6ccf..56d32a7 100644 --- a/drivers/isdn/hardware/eicon/diva_didd.c +++ b/drivers/isdn/hardware/eicon/diva_didd.c @@ -39,9 +39,6 @@ MODULE_LICENSE(GPL); #define DBG_MINIMUM (DL_LOG + DL_FTL + DL_ERR) #define DBG_DEFAULT (DBG_MINIMUM + DL_XLOG + DL_REG) -extern int diddfunc_init(void); -extern void diddfunc_finit(void); - extern void DIVA_DIDD_Read(void *, int); You should move that one as well. There really shouldn't be 'extern' definitions for any function in any C files since you want the compiler to check they are correct when the function itself is compiled. David N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH] net: asix: fix bad header length bug
From: Emil Goode On Fri, Feb 07, 2014 at 10:38:04AM +0100, Bjørn Mork wrote: Emil Goode emilgo...@gmail.com writes: On Thu, Feb 06, 2014 at 03:28:13PM +, David Laight wrote: From: Igor Gnatenko On Thu, 2014-02-06 at 13:56 +0100, Emil Goode wrote: The AX88772B occasionally send rx packets that cross urb boundaries and the remaining partial packet is sent with no header. When the buffer with a partial packet is of less number of octets than the value of hard_header_len the buffer is discarded by the usbnet module. This is causing dropped packages and error messages in dmesg. I will do some more digging in the code, but the test of skb-len against hard_header_len is done already in the completion callback function passed to usb_fill_bulk_urb so it seems that buffers of less than hard_header_len number of octets will be dropped regardless. I am pretty sure you are right about this bug. And the exact same solution is already used by the cx82310_eth minidriver, so I don't see the problem. Your fix is fine IMHO. But you should apply it to all the devices using asix_rx_fixup_common(), not just the ax88772 ones. You could maybe make this a usbnet flag instead and create a generic solution in usbnet, but frankly I believe the number of flags and their meaning have exceeded drivers authors capabilities a long time ago. At least mine, which are quite limited ;-) An example of that problem is another bloody obvious bug I noticed while looking at this driver: The 'struct driver_info ax88178_info' points to asix_rx_fixup_common without setting the FLAG_MULTI_PACKET. This will result in usbnet rx_process() calling usbnet_skb_return() on skbs which are already consumed by the minidriver. Not a big problem, but will give some odd results. But if you allow skbs shorter than ETH_HLEN to slip through then it might go boom, so you should probably fix that as well. Bjørn Yes I believe the patch is necessary, but maybe it would be nice with a prettier solution rather than setting hard_header_len to 0 for all devices with this behaviour. Perhaps it would be better to let each driver that uses the usbnet module decide what skbs to discard? What David describes seems to be another bug, but I don't think it is related to this patch as I'm able to reproduce the bug without the patch beeing applied by setting the mtu to pretty much any value other than 1500 and using ping with a larger packet size than that mtu value. Yes - plenty of bugs if you just look for them! I did a quick scan through the sub-drivers and although the usbnet code seems to treat the 'hard_header_len' as a constant to add to the mtu when allocating rx urb (when the driver doesn't set rx_urb_len), some of the sub-drivers seem to have three length, the rx header, tx header and hard_header, and set them separately (I've not just rechecked) - which may not exactly match what the usbnet code does is the lengths are different. The ax88772b driver seems to support several different bits of silicon. Only some put multiple ethernet frames in a single urb, and only for these does the driver set the rx_urb_length to 2048. For the other silicon it relies on usbnet setting the rx urb size - so the hard_header_len better not be set to zero. Someone with some time to spare needs to modify usbnet to support page aligned rx buffers (probably 4k urb) and then build correctly formatted skb from them. At the moment the ax179_178a driver allocates 20kB urb which end up with an 0x40 byte offset into the page (so are probably 24k) and then cause alignment issues in the xhci driver which currently doesn't correctly handle non-aligned 64k address boundaries when the cross the ring end. David
RE: [PATCH 19/26] drivers: isdn: Move prototype declaration to header file platform.h from diva_didd.c
From: Josh Triplett On Fri, Feb 07, 2014 at 01:33:46PM +, David Laight wrote: From: Rashika Kheria Move prototype declarations of function to header file hardware/eicon/platform.h because they are used by more than one file. This eliminates the following warnings in hardware/eicon/diddfunc.c: drivers/isdn/hardware/eicon/diddfunc.c:95:12: warning: no previous prototype for diddfunc_init [- Wmissing-prototypes] drivers/isdn/hardware/eicon/diddfunc.c:110:13: warning: no previous prototype for diddfunc_finit [- Wmissing-prototypes] ... diff --git a/drivers/isdn/hardware/eicon/diva_didd.c b/drivers/isdn/hardware/eicon/diva_didd.c index fab6ccf..56d32a7 100644 --- a/drivers/isdn/hardware/eicon/diva_didd.c +++ b/drivers/isdn/hardware/eicon/diva_didd.c @@ -39,9 +39,6 @@ MODULE_LICENSE(GPL); #define DBG_MINIMUM (DL_LOG + DL_FTL + DL_ERR) #define DBG_DEFAULT (DBG_MINIMUM + DL_XLOG + DL_REG) -extern int diddfunc_init(void); -extern void diddfunc_finit(void); - extern void DIVA_DIDD_Read(void *, int); You should move that one as well. There really shouldn't be 'extern' definitions for any function in any C files since you want the compiler to check they are correct when the function itself is compiled. Absolutely, but as far as I can tell Rashika is doing this incrementally, organized more by header than by source file, so I'd expect a few externs in a source file to disappear at a time rather than all in one patch. Unless any actual bugs are found, I'd have thought a single patch for each driver would be enough, maybe even one for the whole lot - depending on how they are maintained. The 26 patches already posted are a little excessive. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH RFCv3 08/14] arm64: introduce aarch64_insn_gen_movewide()
From: Will Deacon ... + BUG_ON(imm 0 || imm 65535); Do this check with masking instead? The compiler will convert that to a single unsigned comparison. ... + BUG_ON(shift != 0 shift != 16 shift != 32 + shift != 48); OTOH I don't think it will convert that to: BUG_ON(shift ~48); David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH RFCv3 08/14] arm64: introduce aarch64_insn_gen_movewide()
From: Will Deacon ... + BUG_ON(imm 0 || imm 65535); Do this check with masking instead? Ok, if you prefer, I can change it to: BUG_ON(imm ~GENMASK(15, 0)); Gah - then anyone reading the code has to look up another define. There isn't a prize for the most complicated method of defining a constant that can never change. Sure, that or use a named constant for the upper-bound (SZ_64K - 1). There is nothing wrong with the original code. Maybe use 0x for those people (are there any) that don't know their powers of two. These are strong constants, they aren't going to be changed to any other value, and it is unlikely that anyone will want to search for their uses. I presume that SZ_64K is defined somewhere to 0x1u. But IMHO using it (instead of the literal) doesn't make code any more readable. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [net-next PATCH v2 0/3] Broadcast/Multicast rate limit via Ethtool Coalesce
From: Mugunthan V N On Thursday 10 July 2014 05:14 AM, David Miller wrote: From: Mugunthan V N mugunthan...@ti.com Date: Wed, 9 Jul 2014 12:44:07 +0530 A system/cpu can be loaded by a hacker with flooding of broadcast or multicast packets, to prevent this some Ethernet controllers like CPSW provide a mechanism to limit the broadcast/multicast packet rate via hardware limiters. This patch series enables this feature via Ethtool Coalesce. This is pretty bogus if you ask me. What is the difference from accepting a high rate of unicast packets? I say it is no different. Therefore, this feature makes no sense to me at all. Any packet storm can cause an endpoint some issues. Typically packet storms will cause the system CPU to thrash resulting is very low system performance. Unicast storms only target a single destination end station, it can be easily mitigated by the host adding a blocking entry in the LUT for each aggressor. Broadcast and multicast target multiple end stations, or an entire network, not only can it cause CPU thrashing, it can result in loss of other broadcast and multicast services. The rate limiting feature allow the broadcast and or multicast traffic to be dropped if the rates are too high. This eliminates the CPU thrashing issue. It also allows the system to analyze the aggressors and block them for future transgressions. Rate limiting multicast traffic will definitely cause the loss of multicast services. My experience of broadcast storms is that many of the ethernet switches (probably especially the cheap ones) end up using a much slower software? path for broadcasts. In a broadcast storm they start discarding normal traffic - to point where a single ssh session becomes unusable. This is true even when isolated from the storm by a 10M hub. Broadcast storms are probably mostly caused by network topology issues. Especially is switches are sending traffic to 'all ports' while resetting the spanning tree tables. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH net-next v9 3/9] net: nl802154 - make add_iface take name assign type
From: David Miller From: Tom Gundersen t...@jklm.no Date: Thu, 17 Jul 2014 10:06:04 +0200 @@ -192,8 +193,10 @@ int ieee802154_add_iface(struct sk_buff *skb, struct genl_info *info) if (devname[nla_len(info-attrs[IEEE802154_ATTR_DEV_NAME]) - 1] != '\0') return -EINVAL; /* phy name should be null-terminated */ + name_assign_type = NET_NAME_USER; } else { devname = wpan%d; + name_assign_type = NET_NAME_ENUM; } if (strlen(devname) = IFNAMSIZ) Just wondering what should happen if %d appears in a user provided name. That would seem to be both USER and ENUM. Is a training %d detected by special code? Or is the string used as a printf format? If the latter is true what happens if the user provides foo%s David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2 2/9] mailbox: Add NVIDIA Tegra XUSB mailbox driver
From: Thierry Reding ... Is _nocache required? I don't see other drivers using it. I assume there's nothing special about the mbox registers. Most drivers should be using devm_ioremap_resource() which will use the _nocache variant of devm_ioremap() when appropriate. Usually the region will not be marked cacheable (IORESOURCE_CACHEABLE) and therefore be remapped uncached. A related question: Is there any way for a driver to force that part of a PCIe BAR be mapped through the data cache even when the BAR isn't actually marked cacheable? Some hardware has address regions (which might not be an entire BAR) that are actually memory and mapping through the data cache will generate longer PCIe transfers [1]. Clearly the driver will have to be very careful about cache flushes and invalidates to make this work. [1] PCIe is high throughput and high latency, single word reads can be much slower that PCI, much nearer x86 ISA bus speeds. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v6 2/4] net: moxa: replace build_skb() with netdev_alloc_skb_ip_align() / memcpy()
From: Arnd Bergmann On Monday 25 August 2014 16:22:22 Jonas Jensen wrote: @@ -226,13 +226,15 @@ static int moxart_rx_poll(struct napi_struct *napi, int budget) if (len RX_BUF_SIZE) len = RX_BUF_SIZE; - skb = build_skb(priv-rx_buf[rx_head], priv-rx_buf_size); + skb = netdev_alloc_skb_ip_align(ndev, len); + if (unlikely(!skb)) { - net_dbg_ratelimited(build_skb failed\n); + net_dbg_ratelimited(netdev_alloc_skb_ip_align failed\n); priv-stats.rx_dropped++; priv-stats.rx_errors++; } + memcpy(skb-data, priv-rx_buf[rx_head], len); Is this memcpy() aligned? If the hardware can receive to a 4n+2 offset it is probably better to copy the two bytes before the frame data and to round the length of to a whole number of words. skb_put(skb, len); skb-protocol = eth_type_trans(skb, ndev); napi_gro_receive(priv-napi, skb); While this seems correct, I wonder why you don't do the normal approach of dequeuing the skb from the chain and adding a newly allocated skb to it to save the memcpy. Because the receive buffer area isn't made of skbs. Post-allocating the skb also reduces the 'true size' of the skb. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 4/8] staging: et131x: Use for loop to initialise contiguous macstat registers to zero
From: Mark Einon Replace a long list of contiguous writel() calls with a for loop iterating over the same address values. Also remove redundant comments on the macstat registers, the variable names are good enough. ... - writel(0, macstat-txrx_0_64_byte_frames); ... - writel(0, macstat-carry_reg2); + /* initialize all the macstat registers to zero on the device */ + for (reg = macstat-txrx_0_64_byte_frames; + reg = macstat-carry_reg2; reg++) + writel(0, reg); ... struct macstat_regs {/* Location: */ u32 pad[32];/* 0x6000 - 607C */ - /* Tx/Rx 0-64 Byte Frame Counter */ + /* counters */ u32 txrx_0_64_byte_frames; /* 0x6080 */ - - /* Tx/Rx 65-127 Byte Frame Counter */ u32 txrx_65_127_byte_frames;/* 0x6084 */ I think it would be best to also convert the stats counters to an array. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] vhost: Add polling mode
From: Razya Ladelsky Michael S. Tsirkin m...@redhat.com wrote on 20/08/2014 01:57:10 PM: Results: Netperf, 1 vm: The polling patch improved throughput by ~33% (1516 MB/sec - 2046 MB/sec). Number of exits/sec decreased 6x. The same improvement was shown when I tested with 3 vms running netperf (4086 MB/sec - 5545 MB/sec). filebench, 1 vm: ops/sec improved by 13% with the polling patch. Number of exits was reduced by 31%. The same experiment with 3 vms running filebench showed similar numbers. Signed-off-by: Razya Ladelsky ra...@il.ibm.com This really needs more thourough benchmarking report, including system data. One good example for a related patch: http://lwn.net/Articles/551179/ though for virtualization, we need data about host as well, and if you want to look at streaming benchmarks, you need to test different message sizes and measure packet size. Hi Michael, I have already tried running netperf with several message sizes: 64,128,256,512,600,800... But the results are inconsistent even in the baseline/unpatched configuration. For smaller msg sizes, I get consistent numbers. However, at some point, when I increase the msg size I get unstable results. For example, for a 512B msg, I get two scenarios: vm utilization 100%, vhost utilization 75%, throughput ~6300 vm utilization 80%, vhost utilization 13%, throughput ~9400 (line rate) I don't know why vhost is behaving that way for certain message sizes. Do you have any insight to why this is happening? Have you tried looking at the actual ethernet packet sizes. It may well jump between using small packets (the size of the writes) and full sized ones. If you are trying to measure ethernet packet 'cost' you need to use UDP. However that probably uses different code paths. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH RFC v4 net-next 00/26] BPF syscall, maps, verifier, samples, llvm
From: Of Alexei Starovoitov one more RFC... Major difference vs previous set is a new 'load 64-bit immediate' eBPF insn. Which is first 16-byte instruction. It shows how eBPF ISA can be extended while maintaining backward compatibility, but mainly it cleans up eBPF program access to maps and improves run-time performance. Wouldn't it be more sensible to follow the scheme used by a lot of cpus and add a 'load high' instruction (follow with 'add' or 'or'). It still takes 16 bytes to load a 64bit immediate value, but the instruction size remains constant. There is nothing to stop any JIT software detecting the instruction pair. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] usb: xhci: Fix Set TR Dequeue Pointer cycle state for quirky xHCs
It's starting to get a bit too complicated. xhci find_new_dequeue_state() now has four places where it can toggle the cycle bit in addition to the cycle toggle in find_trb_seg(). In the end we really just want to do it max once. I'll see if I can simplify the whole cycle bit code somehow. Have a look at the patches I submitted last year. I remember simplifying a lot of that code. David
RE: [PATCH RFC 1/4] virtio_net: pass vi around
From: Michael S. Tsirkin Too many places poke at [rs]q-vq-vdev-priv just to get the the vi structure. Let's just pass the pointer around: seems cleaner, and might even be faster. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/net/virtio_net.c | 36 +++- 1 file changed, 19 insertions(+), 17 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 57cbc7d..36f3dfc 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c ... static struct sk_buff *receive_big(struct net_device *dev, +struct virtnet_info *vi, Do you need to pass 'dev' here? Looks like it is obtainable from vi-dev (as below). David struct receive_queue *rq, void *buf, unsigned int len) { struct page *page = buf; - struct sk_buff *skb = page_to_skb(rq, page, 0, len, PAGE_SIZE); + struct sk_buff *skb = page_to_skb(vi, rq, page, 0, len, PAGE_SIZE); ... -static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len) +static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq, + void *buf, unsigned int len) { - struct virtnet_info *vi = rq-vq-vdev-priv; struct net_device *dev = vi-dev; ... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
From: Of Michael S. Tsirkin On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote: Accumulate the sent packets and sent bytes in local variables and perform a single u64_stats_update_begin/end() after. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com Not sure how much it's worth but since Eric suggested it ... Probably depends on the actual cost of u64_stats_update_begin/end against the likely extra saving of the tx_bytes and tx_packets values onto the stack across the call to dev_kfree_skb_any(). (Which depends on the number of caller saved registers.) Acked-by: Michael S. Tsirkin m...@redhat.com --- drivers/net/virtio_net.c | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 3d0ce44..a4d56b8 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq) unsigned int len; struct virtnet_info *vi = sq-vq-vdev-priv; struct virtnet_stats *stats = this_cpu_ptr(vi-stats); + u64 tx_bytes = 0, tx_packets = 0; tx_packets need only be 'unsigned int'. The same is almost certainly true of tx_bytes. David while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) { pr_debug(Sent skb %p\n, skb); - u64_stats_update_begin(stats-tx_syncp); - stats-tx_bytes += skb-len; - stats-tx_packets++; - u64_stats_update_end(stats-tx_syncp); + tx_bytes += skb-len; + tx_packets++; dev_kfree_skb_any(skb); } + + u64_stats_update_begin(stats-tx_syncp); + stats-tx_bytes += tx_bytes; + stats-tx_packets =+ tx_packets; + u64_stats_update_end(stats-tx_syncp); } static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
From: Michael S. Tsirkin On Wed, Oct 15, 2014 at 09:49:01AM +, David Laight wrote: From: Of Michael S. Tsirkin On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote: Accumulate the sent packets and sent bytes in local variables and perform a single u64_stats_update_begin/end() after. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com Not sure how much it's worth but since Eric suggested it ... Probably depends on the actual cost of u64_stats_update_begin/end against the likely extra saving of the tx_bytes and tx_packets values onto the stack across the call to dev_kfree_skb_any(). (Which depends on the number of caller saved registers.) Yea, some benchmark results would be nice to see. I there are likely to be multiple skb on the queue the fastest code would probably do one 'virtqueue_get_all()' that returned a linked list of buffers, then follow the list to get the stats, and follow it again to free the skb. David Acked-by: Michael S. Tsirkin m...@redhat.com --- drivers/net/virtio_net.c | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 3d0ce44..a4d56b8 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq) unsigned int len; struct virtnet_info *vi = sq-vq-vdev-priv; struct virtnet_stats *stats = this_cpu_ptr(vi-stats); + u64 tx_bytes = 0, tx_packets = 0; tx_packets need only be 'unsigned int'. The same is almost certainly true of tx_bytes. David while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) { pr_debug(Sent skb %p\n, skb); - u64_stats_update_begin(stats-tx_syncp); - stats-tx_bytes += skb-len; - stats-tx_packets++; - u64_stats_update_end(stats-tx_syncp); + tx_bytes += skb-len; + tx_packets++; dev_kfree_skb_any(skb); } + + u64_stats_update_begin(stats-tx_syncp); + stats-tx_bytes += tx_bytes; + stats-tx_packets =+ tx_packets; + u64_stats_update_end(stats-tx_syncp); } static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] virtio_net: fix use after free
From: Michael S. Tsirkin commit 0b725a2ca61bedc33a2a63d0451d528b268cf975 net: Remove ndo_xmit_flush netdev operation, use signalling instead. added code that looks at skb-xmit_more after the skb has been put in TX VQ. Since some paths process the ring and free the skb immediately, this can cause use after free. Fix by storing xmit_more in a local variable. Cc: David S. Miller da...@davemloft.net Signed-off-by: Michael S. Tsirkin m...@redhat.com --- David, am I using the API correctly? Seems to work for me. You used __netif_subqueue_stopped but that seems to use a slightly more expensive test_bit internally. The reason I added a variable for the txq here is because it's handy for BQL patch later on. drivers/net/virtio_net.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 3d0ce44..13d0a8b 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -920,6 +920,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) int qnum = skb_get_queue_mapping(skb); struct send_queue *sq = vi-sq[qnum]; int err; + struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); Do you need to cache 'txq' on stack for the entire call? Looks like it is only needed when 'kick' is true. I've not looked to see if saves both 'dev' and 'qnum' being kept. In any case it isn't mentioned in the commit message. David + bool kick = !skb-xmit_more; /* Free up any pending old buffers before queueing new ones. */ free_old_xmit_skbs(sq); @@ -956,7 +958,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) } } - if (__netif_subqueue_stopped(dev, qnum) || !skb-xmit_more) + if (kick || netif_xmit_stopped(txq)) virtqueue_kick(sq-vq); return NETDEV_TX_OK; -- MST -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v3 2/2] usb: dwc2: gadget: modify return statement
From: Of Sudip Mukherjee modified the function to have a single return statement at the end instead of multiple return statement in the middle of the function to improve the readability of the code. Many of us would disagree with you there. Early returns actually make the code easier to read, certainly better than a goto 'end of function'. David This patch depends on the previous patch of the series. Signed-off-by: Sudip Mukherjee su...@vectorindia.org --- drivers/usb/dwc2/gadget.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/usb/dwc2/gadget.c b/drivers/usb/dwc2/gadget.c index 7f25527..e8a8fc7 100644 --- a/drivers/usb/dwc2/gadget.c +++ b/drivers/usb/dwc2/gadget.c @@ -2471,7 +2471,8 @@ static int s3c_hsotg_ep_enable(struct usb_ep *ep, dir_in = (desc-bEndpointAddress USB_ENDPOINT_DIR_MASK) ? 1 : 0; if (dir_in != hs_ep-dir_in) { dev_err(hsotg-dev, %s: direction mismatch!\n, __func__); - return -EINVAL; + ret = -EINVAL; + goto error1; } mps = usb_endpoint_maxp(desc); @@ -2583,6 +2584,7 @@ static int s3c_hsotg_ep_enable(struct usb_ep *ep, error: spin_unlock_irqrestore(hsotg-lock, flags); +error1: return ret; } -- 1.8.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] netlink: don't copy over empty attribute data
From: Sasha Levin netlink uses empty data to seperate different levels. However, we still try to copy that data from a NULL ptr using memcpy, which is an undefined behaviour. Signed-off-by: Sasha Levin sasha.le...@oracle.com --- lib/nlattr.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/nlattr.c b/lib/nlattr.c index 9c3e85f..6512b43 100644 --- a/lib/nlattr.c +++ b/lib/nlattr.c @@ -430,7 +430,8 @@ void __nla_put(struct sk_buff *skb, int attrtype, int attrlen, struct nlattr *nla; nla = __nla_reserve(skb, attrtype, attrlen); - memcpy(nla_data(nla), data, attrlen); + if (data) + memcpy(nla_data(nla), data, attrlen); Were it even appropriate to add a conditional here, the correct check would be against 'attrlen', not 'data'. David } EXPORT_SYMBOL(__nla_put); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: bit fields data tearing
From: Paul E. McKenney On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley wrote: Hi James, On 09/04/2014 10:11 PM, James Bottomley wrote: On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney wrote: +And there are anti-guarantees: + + (*) These guarantees do not apply to bitfields, because compilers often + generate code to modify these using non-atomic read-modify-write + sequences. Do not attempt to use bitfields to synchronize parallel + algorithms. + + (*) Even in cases where bitfields are protected by locks, all fields + in a given bitfield must be protected by one lock. If two fields + in a given bitfield are protected by different locks, the compiler's + non-atomic read-modify-write sequences can cause an update to one + field to corrupt the value of an adjacent field. + + (*) These guarantees apply only to properly aligned and sized scalar + variables. Properly sized currently means int and long, + because some CPU families do not support loads and stores of + other sizes. (Some CPU families is currently believed to + be only Alpha 21064. If this is actually the case, a different + non-guarantee is likely to be formulated.) This is a bit unclear. Presumably you're talking about definiteness of the outcome (as in what's seen after multiple stores to the same variable). No, the last conditions refers to adjacent byte stores from different cpu contexts (either interrupt or SMP). The guarantees are only for natural width on Parisc as well, so you would get a mess if you did byte stores to adjacent memory locations. For a simple test like: struct x { long a; char b; char c; char d; char e; }; void store_bc(struct x *p) { p-b = 1; p-c = 2; } on parisc, gcc generates separate byte stores void store_bc(struct x *p) { 0: 34 1c 00 02 ldi 1,ret0 4: 0f 5c 12 08 stb ret0,4(r26) 8: 34 1c 00 04 ldi 2,ret0 c: e8 40 c0 00 bv r0(rp) 10: 0f 5c 12 0a stb ret0,5(r26) which appears to confirm that on parisc adjacent byte data is safe from corruption by concurrent cpu updates; that is, CPU 0| CPU 1 | p-b = 1 | p-c = 2 | will result in p-b == 1 p-c == 2 (assume both values were 0 before the call to store_bc()). What Peter said. I would ask for suggestions for better wording, but I would much rather be able to say that single-byte reads and writes are atomic and that aligned-short reads and writes are also atomic. Thus far, it looks like we lose only very old Alpha systems, so unless I hear otherwise, I update my patch to outlaw these very old systems. People with old Alphas can run NetBSD instead, along with those who have real VAXen :-) I've seen gcc generate 32bit accesses for 16bit structure members on arm. It does this because of the more limited range of the offsets for the 16bit access. OTOH I don't know if it ever did this for writes - so it may be moot. David
RE: bit fields data tearing
From: Peter Hurley [ +cc linux-arm ] Hi David, On 09/05/2014 04:30 AM, David Laight wrote: I've seen gcc generate 32bit accesses for 16bit structure members on arm. It does this because of the more limited range of the offsets for the 16bit access. OTOH I don't know if it ever did this for writes - so it may be moot. Can you recall the particulars, like what ARM config or what code? I tried an overly-simple test to see if gcc would bump up to the word load for the 12-bit offset mode, but it stuck with register offset rather than immediate offset. [I used the compiler options for allmodconfig and a 4.8 cross-compiler.] Maybe the test doesn't generate enough register pressure on the compiler? Dunno, I would have been using a much older version of the compiler. It is possible that it doesn't do it any more. It might only have done it for loads. The compiler used to use misaligned 32bit loads for structure members on large 4n+2 byte boundaries as well. I'm pretty sure it doesn't do that either. There have been a lot of compiler versions since I was compiling anything for arm. David Regards, Peter Hurley #define ARRAY_SIZE(x) (sizeof(x)/sizeof((x)[0])) struct x { long unused[64]; short b[12]; int unused2[10]; short c; }; void store_c(struct x *p, short a[]) { int i; for (i = 0; i ARRAY_SIZE(p-b); i++) p-b[i] = a[i]; p-c = 2; } void store_c(struct x *p, short a[]) { 0: e1a0c00dmov ip, sp 4: e3a03000mov r3, #0 8: e92dd800push{fp, ip, lr, pc} c: e24cb004sub fp, ip, #4 int i; for (i = 0; i ARRAY_SIZE(p-b); i++) p-b[i] = a[i]; 10: e191c0b3ldrhip, [r1, r3] 14: e0802003add r2, r0, r3 18: e2822c01add r2, r2, #256; 0x100 1c: e2833002add r3, r3, #2 20: e3530018cmp r3, #24 24: e1c2c0b0strhip, [r2] 28: 1af8bne 10 store_c+0x10 p-c = 2; 2c: e3a03d05mov r3, #320; 0x140 30: e3a02002mov r2, #2 34: e18020b3strhr2, [r0, r3] 38: e89da800ldm sp, {fp, sp, pc} N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH v2] cpufreq: powernv: Set the cpus to nominal frequency during reboot/kexec
From: Shilpa Bhat Hi Viresh, On Fri, 2014-08-29 at 05:33 +0530, Viresh Kumar wrote: On 28 August 2014 19:36, Shilpasri G Bhat shilpa.b...@linux.vnet.ibm.com wrote: Changes v1-v2: Invoke .target() driver callback to set the cpus to nominal frequency in reboot notifier, instead of calling cpufreq_suspend() as suggested by Viresh Kumar. Modified the commit message. This changelog will get commited, is this what you want? + if (unlikely(rebooting) new_index != get_nominal_index()) + return -EBUSY; Have you placed the unlikely only around 'rebooting' intentionally or should it cover whole if statement? Yes unlikely() should cover the whole if statement... Actually it probably shouldn't. You need to look at the generated code with each different set of 'unlikely()' to see how gcc processes them. In this case, if 'rebooting' is false you want to 'fall through' on a statically predicted 'not taken' branch. You don't ever care about the second clause. With an 'unlikely' covering the entire statement gcc could easily add a forwards conditional branch (that will be mis-predicted) for the 'rebooting' test. (Yes, I spent a lot of time getting gcc to generate branches that were correctly statically predicted for some code where every cycle mattered.) David N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH v3 0/3] sched: Always check the integrity of the canary
From: Chuck Ebbert David Laight david.lai...@aculab.com wrote: From: Aaron Tomlin Currently in the event of a stack overrun a call to schedule() does not check for this type of corruption. This corruption is often silent and can go unnoticed. However once the corrupted region is examined at a later stage, the outcome is undefined and often results in a sporadic page fault which cannot be handled. The first patch adds a canary to init_task's end of stack. While the second patch provides a helper to determine the integrity of the canary. The third checks for a stack overrun and takes appropriate action since the damage is already done, there is no point in continuing. Clearly you've just been 'bitten' by a kernel stack overflow. But a simple 'canary' isn't going to find most of the overflows and will give an incorrect 'sense of security'. The canary will only work if the stack is densely written. In practise the stack alignment rules create gaps, and the most likely reason for overflow is a large on-stack buffer that isn't actually written to. The only real way to detect kernel stack overflow is to arrange for an unmapped page beyond the stack. That costs KVA, but not much else. That doesn't work either, because the threadinfo sits between the end of the stack and the beginning of the next page, making it possible to corrupt critical data without running off the page. Then flip the order of the allocations so that the threadinfo is at the other end. I'm not sure how many per-lwp structures the linux kernel has, but I know that on netbsd the main thing the kernel stack hits is the fpu save area - the end of that is the avx area which won't be needed when the stack usage is large. Everything else could be moved from the stack pages to the lwp struct itself. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] net: can: usb: peak_usb: pcan_usb_core.c: Cleaning up missing null-terminate in conjunction with strncpy
From: Rickard Strandqvist ... Replacing strncpy with strlcpy to avoid strings that lacks null terminate. ... diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_core.c b/drivers/net/can/usb/peak_usb/pcan_usb_core.c index 644e6ab..d4fe8ac 100644 --- a/drivers/net/can/usb/peak_usb/pcan_usb_core.c +++ b/drivers/net/can/usb/peak_usb/pcan_usb_core.c @@ -830,7 +830,7 @@ static void peak_usb_disconnect(struct usb_interface *intf) char name[IFNAMSIZ]; dev-state = ~PCAN_USB_STATE_CONNECTED; - strncpy(name, netdev-name, IFNAMSIZ); + strlcpy(name, netdev-name, IFNAMSIZ); unregister_netdev(netdev); free_candev(netdev); Or: char name[sizeof netdev-name]; memcpy(name, netdev-name, sizeof netdev-name); David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] net: can: usb: peak_usb: pcan_usb_core.c: Cleaning up missing null-terminate in conjunction with strncpy
From: Marc Kleine-Budde [ On 09/15/2014 10:28 AM, David Laight wrote: From: Rickard Strandqvist ... Replacing strncpy with strlcpy to avoid strings that lacks null terminate. ... diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_core.c b/drivers/net/can/usb/peak_usb/pcan_usb_core.c index 644e6ab..d4fe8ac 100644 --- a/drivers/net/can/usb/peak_usb/pcan_usb_core.c +++ b/drivers/net/can/usb/peak_usb/pcan_usb_core.c @@ -830,7 +830,7 @@ static void peak_usb_disconnect(struct usb_interface *intf) char name[IFNAMSIZ]; dev-state = ~PCAN_USB_STATE_CONNECTED; - strncpy(name, netdev-name, IFNAMSIZ); + strlcpy(name, netdev-name, IFNAMSIZ); unregister_netdev(netdev); free_candev(netdev); Or: char name[sizeof netdev-name]; memcpy(name, netdev-name, sizeof netdev-name); I would be sizeof(foo) in kernel coding style, But not in mine :-) sizeof is an operator, not a function, it's argument can be (type). but let's have a look at the original code: struct net_device *netdev = dev-netdev; char name[IFNAMSIZ]; dev-state = ~PCAN_USB_STATE_CONNECTED; strncpy(name, netdev-name, IFNAMSIZ); unregister_netdev(netdev); free_candev(netdev); kfree(dev-cmd_buf); dev-next_siblings = NULL; if (dev-adapter-dev_free) dev-adapter-dev_free(dev); dev_info(intf-dev, %s removed\n, name); I think it's save to use: dev_info(intf-dev, %s removed\n, netdev_name(dev-netdev)); instead of doing the str?cpy() in the first place. But why not use: netdev_info(dev-netdev, removed\n); Is the USB device information lost when using netdev_info()? My guess is it avoids a 'use after free' - but I'm not going to dig that far. Another issue with blindly replacing strncpy() with strlcpy() (which doesn't affect the above) is when copying status to userspace. David
RE: [PATCH] net: fec: fix regression on i.MX28 introduced by rx_copybreak support
From: Lothar Waßmann commit 1b7bde6d659d (net: fec: implement rx_copybreak to improve rx performance) introduced a regression for i.MX28. The swap_buffer() function doing the endian conversion of the received data on i.MX28 may access memory beyond the actual packet size in the DMA buffer. fec_enet_copybreak() does not copy those bytes, so that the last bytes of a packet may be filled with invalid data after swapping. This will likely lead to checksum errors on received packets. E.g. when trying to mount an NFS rootfs: UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 36 Do the byte swapping and copying to the new skb in one go if necessary. ISTM that if you need to do the 'swap' you should copy the data regardless of the length. David Signed-off-by: Lothar Wamann l...@karo-electronics.de --- drivers/net/ethernet/freescale/fec_main.c | 25 + 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 87975b5..eaaebad 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -339,6 +339,18 @@ static void *swap_buffer(void *bufaddr, int len) return bufaddr; } +static void *swap_buffer2(void *dst_buf, void *src_buf, int len) +{ + int i; + unsigned int *src = src_buf; + unsigned int *dst = dst_buf; + + for (i = 0; i DIV_ROUND_UP(len, 4); i++, src++, dst++) + *dst = cpu_to_be32(*src); + + return dst_buf; +} + static void fec_dump(struct net_device *ndev) { struct fec_enet_private *fep = netdev_priv(ndev); @@ -1348,7 +1360,7 @@ fec_enet_new_rxbdp(struct net_device *ndev, struct bufdesc *bdp, struct sk_buff } static bool fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb, -struct bufdesc *bdp, u32 length) +struct bufdesc *bdp, u32 length, int swap) { struct fec_enet_private *fep = netdev_priv(ndev); struct sk_buff *new_skb; @@ -1363,7 +1375,10 @@ static bool fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb, dma_sync_single_for_cpu(fep-pdev-dev, bdp-cbd_bufaddr, FEC_ENET_RX_FRSIZE - fep-rx_align, DMA_FROM_DEVICE); - memcpy(new_skb-data, (*skb)-data, length); + if (!swap) + memcpy(new_skb-data, (*skb)-data, length); + else + swap_buffer2(new_skb-data, (*skb)-data, length); *skb = new_skb; return true; @@ -1393,6 +1408,7 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id) u16 vlan_tag; int index = 0; boolis_copybreak; + bool need_swap = id_entry-driver_data FEC_QUIRK_SWAP_FRAME; #ifdef CONFIG_M532x flush_cache_all(); @@ -1456,7 +1472,8 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id) * include that when passing upstream as it messes up * bridging applications. */ - is_copybreak = fec_enet_copybreak(ndev, skb, bdp, pkt_len - 4); + is_copybreak = fec_enet_copybreak(ndev, skb, bdp, pkt_len - 4, + need_swap); if (!is_copybreak) { skb_new = netdev_alloc_skb(ndev, FEC_ENET_RX_FRSIZE); if (unlikely(!skb_new)) { @@ -1471,7 +1488,7 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id) prefetch(skb-data - NET_IP_ALIGN); skb_put(skb, pkt_len - 4); data = skb-data; - if (id_entry-driver_data FEC_QUIRK_SWAP_FRAME) + if (!is_copybreak need_swap) swap_buffer(data, pkt_len); /* Extract the enhanced buffer descriptor */ -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] net: fec: fix regression on i.MX28 introduced by rx_copybreak support
From: Lothar David Laight wrote: From: Lothar Waßmann commit 1b7bde6d659d (net: fec: implement rx_copybreak to improve rx performance) introduced a regression for i.MX28. The swap_buffer() function doing the endian conversion of the received data on i.MX28 may access memory beyond the actual packet size in the DMA buffer. fec_enet_copybreak() does not copy those bytes, so that the last bytes of a packet may be filled with invalid data after swapping. This will likely lead to checksum errors on received packets. E.g. when trying to mount an NFS rootfs: UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 36 Do the byte swapping and copying to the new skb in one go if necessary. ISTM that if you need to do the 'swap' you should copy the data regardless of the length. The swap function has to look at at most 3 bytes beyond the actual packet length. That is what the original swap_buffer() function does and what the new function swap_buffer2(), that does the endian swapping while copying to the new buffer, also does. I understood the bug. The point I was making is that if you have to do a read-write of the received data (to byteswap it) then you might as well always copy it into a new skb that is just big enough for the actual receive frame. David
RE: [PATCH] net: fec: fix regression on i.MX28 introduced by rx_copybreak support
From: Eric Dumazet On Tue, 2014-10-07 at 15:19 +0200, Lothar Wamann wrote: commit 1b7bde6d659d (net: fec: implement rx_copybreak to improve rx performance) introduced a regression for i.MX28. The swap_buffer() function doing the endian conversion of the received data on i.MX28 may access memory beyond the actual packet size in the DMA buffer. fec_enet_copybreak() does not copy those bytes, so that the last bytes of a packet may be filled with invalid data after swapping. This will likely lead to checksum errors on received packets. E.g. when trying to mount an NFS rootfs: UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 36 Do the byte swapping and copying to the new skb in one go if necessary. Signed-off-by: Lothar Wamann l...@karo-electronics.de --- drivers/net/ethernet/freescale/fec_main.c | 25 + 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 87975b5..eaaebad 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -339,6 +339,18 @@ static void *swap_buffer(void *bufaddr, int len) return bufaddr; } +static void *swap_buffer2(void *dst_buf, void *src_buf, int len) +{ + int i; + unsigned int *src = src_buf; + unsigned int *dst = dst_buf; + + for (i = 0; i DIV_ROUND_UP(len, 4); i++, src++, dst++) + *dst = cpu_to_be32(*src); No need for the DIV : for (i = 0; i len; i += sizeof(*dst), src++, dst++) *dst = cpu_to_be32(*src); Also are you sure both src/dst are aligned to word boundaries, or is this architecture OK with possible misalignment ? I wondered about that as well. I wouldn't have expected ppc to support misaligned transfers, and you'd also want to make sure that cpu_to_be(*src) was using a byte-swapping instruction. Hmmm... cpu_to_be() doesn't sound like the right 'swap' macro name. It looks as though this is processing an entire ethernet frame, in which case the data (destination address) would normally be on an 4n+2 boundary. The two bytes before the data can be safely copied (as can the rounding bytes at the end) - so provided the pointers allow for this it should be fine. OTOH I've NFI what the actual data is. David N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH] net: fec: fix regression on i.MX28 introduced by rx_copybreak support
From: Lothar Waßmann David Laight wrote: From: Eric Dumazet On Tue, 2014-10-07 at 15:19 +0200, Lothar Wamann wrote: commit 1b7bde6d659d (net: fec: implement rx_copybreak to improve rx performance) introduced a regression for i.MX28. The swap_buffer() function doing the endian conversion of the received data on i.MX28 may access memory beyond the actual packet size in the DMA buffer. fec_enet_copybreak() does not copy those bytes, so that the last bytes of a packet may be filled with invalid data after swapping. This will likely lead to checksum errors on received packets. E.g. when trying to mount an NFS rootfs: UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 36 Do the byte swapping and copying to the new skb in one go if necessary. Signed-off-by: Lothar Wamann l...@karo-electronics.de --- drivers/net/ethernet/freescale/fec_main.c | 25 + 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 87975b5..eaaebad 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -339,6 +339,18 @@ static void *swap_buffer(void *bufaddr, int len) return bufaddr; } +static void *swap_buffer2(void *dst_buf, void *src_buf, int len) +{ + int i; + unsigned int *src = src_buf; + unsigned int *dst = dst_buf; + + for (i = 0; i DIV_ROUND_UP(len, 4); i++, src++, dst++) + *dst = cpu_to_be32(*src); No need for the DIV : for (i = 0; i len; i += sizeof(*dst), src++, dst++) *dst = cpu_to_be32(*src); Also are you sure both src/dst are aligned to word boundaries, or is this architecture OK with possible misalignment ? I wondered about that as well. I wouldn't have expected ppc to support misaligned transfers, and you'd also want to make sure that cpu_to_be(*src) was using a byte-swapping instruction. Hmmm... cpu_to_be() doesn't sound like the right 'swap' macro name. ??? So what is cpu_to_be32() then? The new swap function is an exact copy of the original one already in use except for the fact that it uses distinct source and destination buffers. cpu_to_be32() is for converting a 'cpu' endianness value to 'big-endian'. Here you are processing receive data - so you probably want be32_to_cpu(). (Yes, I know they are functionally identical) Alternatively, since these aren't actually 32bit numbers and you know whether you want to swap, something like the __swab32p() from swab.h - but I'm not entirely sure that one is expected to be used. Clearly you are well inside the ppc 'endianness' hell. David
RE: [PATCH] net: fec: fix regression on i.MX28 introduced by rx_copybreak support
From: Sergei Shtylyov On 10/07/2014 05:19 PM, Lothar Wamann wrote: commit 1b7bde6d659d (net: fec: implement rx_copybreak to improve rx performance) introduced a regression for i.MX28. The swap_buffer() function doing the endian conversion of the received data on i.MX28 may access memory beyond the actual packet size in the DMA buffer. fec_enet_copybreak() does not copy those bytes, so that the last bytes of a packet may be filled with invalid data after swapping. This will likely lead to checksum errors on received packets. E.g. when trying to mount an NFS rootfs: UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 36 Do the byte swapping and copying to the new skb in one go if necessary. Signed-off-by: Lothar Wamann l...@karo-electronics.de --- drivers/net/ethernet/freescale/fec_main.c | 25 + 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 87975b5..eaaebad 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c [...] @@ -1348,7 +1360,7 @@ fec_enet_new_rxbdp(struct net_device *ndev, struct bufdesc *bdp, struct sk_buff } static bool fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb, - struct bufdesc *bdp, u32 length) + struct bufdesc *bdp, u32 length, int swap) 'bool swap' perhaps? @@ -1393,6 +1408,7 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id) u16 vlan_tag; int index = 0; boolis_copybreak; + bool need_swap = id_entry-driver_data FEC_QUIRK_SWAP_FRAME; ... especially talking this into account... Hmmm... in that case you may not want the compiler to convert the bit value to a 'bool' at all. Passing 'id_entry-driver_data' through (that doesn't look like a field name for 'quirk flags) would generate better code. Even better would be to reference the flag directly from 'ndev'. A pointer indirection for the test if probably faster then passing another argument. David
RE: [PATCH 1/1 net-next] af_unix: remove NULL assignment on static
From: Hannes Frederic Sowa I think David's concern was whether if 0 == false in all situations. It is pretty clear that static memory is initialized to 0. I'm not 100% sure about that. static pointers may be required to be initialised to NULL. If NULL isn't the 'all 0 bit pattern' then the memory would need to be initialised to a different pattern. Not that anyone is likely to implement such a system because far too much code will break. The only system I knew where 'native' NULL pointers were 'all 1s' used 0 in its C compiler. David
RE: [PATCH] tun: make sure interface usage can not overflow
From: Hannes Frederic On Mo, 2014-09-29 at 12:41 -0700, Kees Cook wrote: On Mon, Sep 29, 2014 at 4:04 AM, David Laight david.lai...@aculab.com wrote: From: Kees Cook This makes the size argument a const, since it is always populated by the caller. There is almost no point making parameters 'const. ('const foo *' makes sense). Additionally double-checks to make sure the copy_from_user can never overflow, keeping CONFIG_DEBUG_STRICT_USER_COPY_CHECKS happy: In function 'copy_from_user', inlined from '__tun_chr_ioctl' at drivers/net/tun.c:1871:7: ... copy_from_user() buffer size is not provably correct If 'ifreq_len' could be too big then you want to error the ioctl, not panic. If it can't be too big you don't need the check. The ifreq_len comes from the callers, and is the output of sizeof which is const. Changing the function parameter to const means any changes made in the future where the incoming value isn't const, the compiler will throw a warning. Hmmm, I think you want something like BUILD_BUG_ON(! __builtin_constant_p(var)). const in function argument only ensures that the value cannot be modified in the function. You'd have to do something in the header file - nothing in the function body can do that check. I've not got the source handy, but in this case is there any need to actually pass the size at all? Is it always the same fixed constant?? David
RE: [PATCH] tun: make sure interface usage can not overflow
From: Hannes Frederic On Di, 2014-09-30 at 08:20 +, David Laight wrote: From: Hannes Frederic On Mo, 2014-09-29 at 12:41 -0700, Kees Cook wrote: On Mon, Sep 29, 2014 at 4:04 AM, David Laight david.lai...@aculab.com wrote: From: Kees Cook This makes the size argument a const, since it is always populated by the caller. There is almost no point making parameters 'const. ('const foo *' makes sense). Additionally double-checks to make sure the copy_from_user can never overflow, keeping CONFIG_DEBUG_STRICT_USER_COPY_CHECKS happy: In function 'copy_from_user', inlined from '__tun_chr_ioctl' at drivers/net/tun.c:1871:7: ... copy_from_user() buffer size is not provably correct If 'ifreq_len' could be too big then you want to error the ioctl, not panic. If it can't be too big you don't need the check. The ifreq_len comes from the callers, and is the output of sizeof which is const. Changing the function parameter to const means any changes made in the future where the incoming value isn't const, the compiler will throw a warning. Hmmm, I think you want something like BUILD_BUG_ON(! __builtin_constant_p(var)). const in function argument only ensures that the value cannot be modified in the function. You'd have to do something in the header file - nothing in the function body can do that check. Sure, it should work. You only need to make sure that gcc inlines the function, so the value is constant (it is not enough that gcc knows the value range, one specific constant is needed). So the simplest fix for this is to specify __tun_chr_ioctl as __always_inline. ;) You are joking aren't you??? Look at the code. I'd suggest fixing whatever implements CONFIG_DEBUG_STRICT_USER_COPY_CHECKS to be less picky. David
RE: [PATCH v5 2/3] net: Add Keystone NetCP ethernet driver
From: David Miller From: Santosh Shilimkar santosh.shilim...@ti.com Date: Thu, 25 Sep 2014 13:48:36 -0400 +static inline int gbe_phy_link_status(struct gbe_slave *slave) +{ + if (!slave-phy) + return 1; + + if (slave-phy-link) + return 1; + + return 0; +} Please use 'bool' as the return type and return 'true' or 'false'. That function body could also be just: return !slave-phy slave-phy-link; which might be more readable if directly coded. I also wonder if slavephy can actually be NULL? If it can be under unusual circumstances it might be worth assigning the address of a dummy 'phy' structure so that the tests can all be removed. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCHv2 6/6] net: fec: fix regression on i.MX28 introduced by rx_copybreak support
From: Lothar Waßmann commit 1b7bde6d659d (net: fec: implement rx_copybreak to improve rx performance) introduced a regression for i.MX28. The swap_buffer() function doing the endian conversion of the received data on i.MX28 may access memory beyond the actual packet size in the DMA buffer. fec_enet_copybreak() does not copy those bytes, so that the last bytes of a packet may be filled with invalid data after swapping. This will likely lead to checksum errors on received packets. E.g. when trying to mount an NFS rootfs: UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 36 Do the byte swapping and copying to the new skb in one go if necessary. Signed-off-by: Lothar Wamann l...@karo-electronics.de --- drivers/net/ethernet/freescale/fec_main.c | 25 + 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 404fb9d..b92324c 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -339,6 +339,18 @@ static void *swap_buffer(void *bufaddr, int len) return bufaddr; } +static void *swap_buffer2(void *dst_buf, void *src_buf, int len) +{ + int i; + unsigned int *src = src_buf; + unsigned int *dst = dst_buf; + + for (i = 0; i len; i += 4, src++, dst++) + swab32s(src); This will probably benefit from being unrolled slightly. Neither 'dst' nor the return value is used. + + return dst_buf; +} + static void fec_dump(struct net_device *ndev) { struct fec_enet_private *fep = netdev_priv(ndev); @@ -1334,7 +1346,7 @@ fec_enet_new_rxbdp(struct net_device *ndev, struct bufdesc *bdp, struct sk_buff } static bool fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb, -struct bufdesc *bdp, u32 length) +struct bufdesc *bdp, u32 length, bool swap) { struct fec_enet_private *fep = netdev_priv(ndev); struct sk_buff *new_skb; @@ -1349,7 +1361,10 @@ static bool fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb, dma_sync_single_for_cpu(fep-pdev-dev, bdp-cbd_bufaddr, FEC_ENET_RX_FRSIZE - fep-rx_align, DMA_FROM_DEVICE); - memcpy(new_skb-data, (*skb)-data, length); + if (!swap) + memcpy(new_skb-data, (*skb)-data, length); + else + swap_buffer2(new_skb-data, (*skb)-data, length); *skb = new_skb; return true; @@ -1377,6 +1392,7 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id) u16 vlan_tag; int index = 0; boolis_copybreak; + boolneed_swap = fep-quirks FEC_QUIRK_SWAP_FRAME; #ifdef CONFIG_M532x flush_cache_all(); @@ -1440,7 +1456,8 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id) * include that when passing upstream as it messes up * bridging applications. */ - is_copybreak = fec_enet_copybreak(ndev, skb, bdp, pkt_len - 4); + is_copybreak = fec_enet_copybreak(ndev, skb, bdp, pkt_len - 4, + need_swap); if (!is_copybreak) { skb_new = netdev_alloc_skb(ndev, FEC_ENET_RX_FRSIZE); if (unlikely(!skb_new)) { @@ -1455,7 +1472,7 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id) prefetch(skb-data - NET_IP_ALIGN); skb_put(skb, pkt_len - 4); data = skb-data; - if (fep-quirks FEC_QUIRK_SWAP_FRAME) + if (!is_copybreak need_swap) swap_buffer(data, pkt_len); It has to be better to set the 'copybreak' limit to be larger than the maximum frame size and so always go through the 'copybreak' paths. /* Extract the enhanced buffer descriptor */ -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCHv2 6/6] net: fec: fix regression on i.MX28 introduced by rx_copybreak support
From: Lothar Waßmann commit 1b7bde6d659d (net: fec: implement rx_copybreak to improve rx performance) introduced a regression for i.MX28. The swap_buffer() function doing the endian conversion of the received data on i.MX28 may access memory beyond the actual packet size in the DMA buffer. fec_enet_copybreak() does not copy those bytes, so that the last bytes of a packet may be filled with invalid data after swapping. This will likely lead to checksum errors on received packets. E.g. when trying to mount an NFS rootfs: UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 36 Do the byte swapping and copying to the new skb in one go if necessary. Signed-off-by: Lothar Wamann l...@karo-electronics.de --- drivers/net/ethernet/freescale/fec_main.c | 25 + 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 404fb9d..b92324c 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -339,6 +339,18 @@ static void *swap_buffer(void *bufaddr, int len) return bufaddr; } +static void *swap_buffer2(void *dst_buf, void *src_buf, int len) +{ + int i; + unsigned int *src = src_buf; + unsigned int *dst = dst_buf; + + for (i = 0; i len; i += 4, src++, dst++) + swab32s(src); + + return dst_buf; +} + Actually that is completely f*cked David
RE: [PATCHv2 6/6] net: fec: fix regression on i.MX28 introduced by rx_copybreak support
From: Lothar Waßmann David Laight wrote: From: Lothar Waßmann commit 1b7bde6d659d (net: fec: implement rx_copybreak to improve rx performance) introduced a regression for i.MX28. The swap_buffer() function doing the endian conversion of the received data on i.MX28 may access memory beyond the actual packet size in the DMA buffer. fec_enet_copybreak() does not copy those bytes, so that the last bytes of a packet may be filled with invalid data after swapping. This will likely lead to checksum errors on received packets. E.g. when trying to mount an NFS rootfs: UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 36 Do the byte swapping and copying to the new skb in one go if necessary. Signed-off-by: Lothar Wamann l...@karo-electronics.de --- drivers/net/ethernet/freescale/fec_main.c | 25 + 1 file changed, 21 insertions(+), 4 deletions(-) @@ -1455,7 +1472,7 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id) prefetch(skb-data - NET_IP_ALIGN); skb_put(skb, pkt_len - 4); data = skb-data; - if (fep-quirks FEC_QUIRK_SWAP_FRAME) + if (!is_copybreak need_swap) swap_buffer(data, pkt_len); It has to be better to set the 'copybreak' limit to be larger than the maximum frame size and so always go through the 'copybreak' paths. Since the copybreak support is all about performance optimistation, we should IMO buy the additional advantage for i.MX28 by not having to access the buffer twice (once for copying and once again for swapping). You definitely want to do the byteswap at the same time as the copy. The point I'm trying to make that if you need to do the byteswap you probably might as well copy the data to an skb of the correct size at the same time. Certainly I'd expect the 'break even' length will be much higher. David
RE: [PATCH 03/10] mm: Convert p[te|md]_numa users to p[te|md]_protnone_numa
From: Mel Gorman Convert existing users of pte_numa and friends to the new helper. Note that the kernel is broken after this patch is applied until the other page table modifiers are also altered. This patch layout is to make review easier. Doesn't that break bisection? David N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH v3 0/3] sched: Always check the integrity of the canary
From: Aaron Tomlin Currently in the event of a stack overrun a call to schedule() does not check for this type of corruption. This corruption is often silent and can go unnoticed. However once the corrupted region is examined at a later stage, the outcome is undefined and often results in a sporadic page fault which cannot be handled. The first patch adds a canary to init_task's end of stack. While the second patch provides a helper to determine the integrity of the canary. The third checks for a stack overrun and takes appropriate action since the damage is already done, there is no point in continuing. Clearly you've just been 'bitten' by a kernel stack overflow. But a simple 'canary' isn't going to find most of the overflows and will give an incorrect 'sense of security'. The canary will only work if the stack is densely written. In practise the stack alignment rules create gaps, and the most likely reason for overflow is a large on-stack buffer that isn't actually written to. The only real way to detect kernel stack overflow is to arrange for an unmapped page beyond the stack. That costs KVA, but not much else. David
RE: bit fields data tearing
From: Benjamin Herrenschmidt On Wed, 2014-09-03 at 18:51 -0400, Peter Hurley wrote: Apologies for hijacking this thread but I need to extend this discussion somewhat regarding what a compiler might do with adjacent fields in a structure. The tty subsystem defines a large aggregate structure, struct tty_struct. Importantly, several different locks apply to different fields within that structure; ie., a specific spinlock will be claimed before updating or accessing certain fields while a different spinlock will be claimed before updating or accessing certain _adjacent_ fields. What is necessary and sufficient to prevent accidental false-sharing? The patch below was flagged as insufficient on ia64, and possibly ARM. We expect native aligned scalar types to be accessed atomically (the read/modify/write of a larger quantity that gcc does on some bitfield cases has been flagged as a gcc bug, but shouldn't happen on normal scalar types). That isn't true on all architectures for items smaller than a machine word. At least one has to do rmw for byte accesses. David I am not 100% certain of bool here, I assume it's treated as a normal scalar and thus atomic but if unsure, you can always use int. Another option is to use the atomic bitops and make these bits in a bitmask but that is probably unnecessary if you have locks already. Cheers, Ben.
RE: [RFC] situation with csum_and_copy_... API
From: Al Viro ... We would be better off with iov_iter passed to __sock_{send,recv}msg() (as a part of struct msghdr, instead of -msg_iov/-msg_iovlen) and always advanced to match the amount of data actually picked from it. With iovec behind it remaining constant. That would work just as well as the current variant for sendmsg(2)/recvmsg(2)/etc., be a lot more convenient for kernel_{send,recv}msg() callers and would allow a lot of other fun stuff. Callers of kernel_send/recvmsg() could easily be using a wrapper function that creates the 'msghdr'. When the want to send the remaining part of a buffer the old iterator will no longer be available - just the original iov and the required offset. So it would be useful if the iterator could be initialised to a byte offset down the iov[]. Are there any current code paths where the iov[] is modified but ends up being something other than 'the remaining data'? If not then code can check whether iov[0].len has changed, and skip the 'advance' is it has. (I've an out-of-tree driver that assumes the iov[] isn't changed.) David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC] situation with csum_and_copy_... API
From: Al Viro On Fri, Nov 21, 2014 at 05:42:55PM +, David Laight wrote: Callers of kernel_send/recvmsg() could easily be using a wrapper function that creates the 'msghdr'. When the want to send the remaining part of a buffer the old iterator will no longer be available - just the original iov and the required offset. Er... So why not copy a struct iov_iter to/from msg-msg_iter, then? It's not as it had been particulary large - 5 words isn't much... I'm not at all sure that _anything_ has valid reasons for draining iovecs. Maintaining a struct iov_iter and modifying it is easy and actually faster... Right now the main examples outside of net/* are due to unfortunate limitations of -sendmsg() - until now it had no way to be told that desired data starts at offset. With -msg_iter it obviously becomes possible... It may well be easier for code that only has to run in a new kernel. But for code that has run in old kernels as well you still need to modify the iov[]. This is also true of userspace - there is no way of completing a partial transfer without modifying the iov[] to allow for the partial transfer. Note that I'm not suggesting that any of the 'write' functions should ever modify an iov[]. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC] situation with csum_and_copy_... API
From: Al Viro On Fri, Nov 21, 2014 at 08:49:56AM +, Al Viro wrote: Overall, I think I have the whole series plotted in enough details to be reasonably certain we can pull it off. Right now I'm dealing with mm/iov_iter.c stuff; the amount of boilerplate source is already high enough and with those extra primitives it'll get really unpleasant. What we need there is something templates-like, as much as I hate C++, and I'm still not happy with what I have at the moment... Hopefully I'll get that in more or less tolerable form today. Folks, I would really like comments on the patch below. It's an attempt to reduce the amount of boilerplate code in mm/iov_iter.c; no new primitives added, just trying to reduce the amount of duplication in there. I'm not too fond of the way it currently looks, to put it mildly. It seems to work, it's reasonably straightforward and it even generates slightly better code than before, but I would _very_ welcome any tricks that would allow to make it not so tasteless. I like the effect on line count (+124-358), but... It defines two iterators (for iovec-backed and bvec-backed ones) and converts a bunch of primitives to those. The last argument is an expression evaluated for a bunch of ranges; for bvec one it's void, for iovec - size_t; if it evaluates to non-0, we treat it as read/write/whatever short by that many bytes and do not proceed any further. Any suggestions are welcome. diff --git a/mm/iov_iter.c b/mm/iov_iter.c index eafcf60..611af2bd 100644 --- a/mm/iov_iter.c +++ b/mm/iov_iter.c @@ -4,11 +4,75 @@ #include linux/slab.h #include linux/vmalloc.h +#define iterate_iovec(i, n, buf, len, move, STEP) { \ + const struct iovec *iov = i-iov; \ + size_t skip = i-iov_offset;\ + size_t left;\ + size_t wanted = n; \ All these variable names probably want (at least one) _ prefix just in case they match the names of arguments. + buf = iov-iov_base + skip; \ + len = min(n, iov-iov_len - skip); \ You are assigning to parameters, this can get confusing. Unless these are return values this probably doesn't make sense. Might be better to 'pass by reference', the generated code is likely to be the same - but it is clearer to the reader. + left = STEP;\ + len -= left;\ + skip += len;\ + n -= len; \ Again, a parameter is modified. + while (unlikely(!left n)) { \ + iov++; \ + buf = iov-iov_base;\ + len = min(n, iov-iov_len); \ + left = STEP;\ + len -= left;\ + skip = len; \ + n -= len; \ + } \ + n = wanted - n; \ + if (move) { \ + if (skip == iov-iov_len) { \ + iov++; \ + skip = 0; \ + } \ + i-count -= n; \ + i-nr_segs -= iov - i-iov; \ + i-iov = iov; \ + i-iov_offset = skip; \ + } \ +} ... + iterate_iovec(i, bytes, buf, len, true, + __copy_to_user(buf, (from += len) - len, len)) + return bytes; Using 'buf' and 'len' in __copy_to_user() is very non-obvious. It might be better if they were fixed names, maybe _ioiter_buf and _ioiter_len. If this code can use the gcc extension that allows #defines that contain {} to return values, the it would be better as: bytes_left = iterate_iovec(i, bytes, true, __copy_to_user(_ioiter_buf, (from += _ioiter_len) - _ioiter_len, _ioiter_len); return bytes_left; Possibly also two wrapper #defines that supply the true/false parameter. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] net: can: usb: peak_usb: pcan_usb_core.c: Cleaning up missing null-terminate in conjunction with strncpy
From: Rickard Strandqvist 2014-09-15 10:47 GMT+02:00 David Laight david.lai...@aculab.com: ... Or: char name[sizeof netdev-name]; memcpy(name, netdev-name, sizeof netdev-name); ... I liked the variant: char name[sizeof(netdev-name)]; But dislike and do not understand what the point would be with memcpy variant. The memcpy() will be a lot faster than any of the string copy functions because it doesn't need to worry about the terminator. In many cases it will be inlined to a short series of load and stores. David
RE: [PATCH] tun: make sure interface usage can not overflow
From: Kees Cook This makes the size argument a const, since it is always populated by the caller. There is almost no point making parameters 'const. ('const foo *' makes sense). Additionally double-checks to make sure the copy_from_user can never overflow, keeping CONFIG_DEBUG_STRICT_USER_COPY_CHECKS happy: In function 'copy_from_user', inlined from '__tun_chr_ioctl' at drivers/net/tun.c:1871:7: ... copy_from_user() buffer size is not provably correct If 'ifreq_len' could be too big then you want to error the ioctl, not panic. If it can't be too big you don't need the check. David Signed-off-by: Kees Cook keesc...@chromium.org --- drivers/net/tun.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index acaaf6784179..a1f317cba206 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1855,7 +1855,7 @@ unlock: } static long __tun_chr_ioctl(struct file *file, unsigned int cmd, - unsigned long arg, int ifreq_len) + unsigned long arg, const size_t ifreq_len) { struct tun_file *tfile = file-private_data; struct tun_struct *tun; @@ -1869,6 +1869,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd, int ret; if (cmd == TUNSETIFF || cmd == TUNSETQUEUE || _IOC_TYPE(cmd) == 0x89) { + BUG_ON(ifreq_len sizeof(ifr)); if (copy_from_user(ifr, argp, ifreq_len)) return -EFAULT; } else { -- 1.9.1 -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/3] arch: Introduce load_acquire() and store_release()
From: Alexander Duyck It is common for device drivers to make use of acquire/release semantics when dealing with descriptors stored in device memory. On reviewing the documentation and code for smp_load_acquire() and smp_store_release() as well as reviewing an IBM website that goes over the use of PowerPC barriers at http://www.ibm.com/developerworks/systems/articles/powerpc.html it occurred to me that the same code could likely be applied to device drivers. As a result this patch introduces load_acquire() and store_release(). The load_acquire() function can be used in the place of situations where a test for ownership must be followed by a memory barrier. The below example is from ixgbe: if (!rx_desc-wb.upper.status_error) break; /* This memory barrier is needed to keep us from reading * any other fields out of the rx_desc until we know the * descriptor has been written back */ rmb(); With load_acquire() this can be changed to: if (!load_acquire(rx_desc-wb.upper.status_error)) break; If I'm quickly reading the 'new' code I need to look up yet another function, with the 'old' code I can easily see the logic. You've also added a memory barrier to the 'break' path - which isn't needed. The driver might also have additional code that can be added before the barrier so reducing the cost of the barrier. The driver may also be able to perform multiple actions before a barrier is needed. Hiding barriers isn't necessarily a good idea anyway. If you are writing a driver you need to understand when and where they are needed. Maybe you need a new (weaker) barrier to replace rmb() on some architectures. ... David
RE: [PATCH 0/4] phy: samsung-usb2: Add support for Vbus regulator
From: Vivek Gautam This has been on my to-do list for sometime. Until now the host controller (specifically ehci-exynos) is responsible for enabling VBUS supply. This opens up one more issue which is, when only ohci-exynos is enabled and ehci-exynosis disabled then VBUS was never enabled (since ohci did not have the code to enabled the VBUS supply). Rather it should be wise to move the VBUS related stuff to phy driver and let phy take care of enabling it. This patch series adds that VBUS regulator to phy-samsung-usb2 driver, adds necessary binding information as well as vbus-supply properties to phy nodes on exynos5250 systems. ... Does this go anyway to allowing devices to be powered from a micro-usb connector while acting as a USB host (USB OTG). ie when you want VBUS disabled even though it would normally need to be enabled. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2 1/1] net/macb: add TX multiqueue support for gem
From: Thomas Petazzoni On Thu, 11 Dec 2014 11:16:51 +0100, Cyrille Pitchen wrote: +#define GEM_ISR1 0x0400 +#define GEM_ISR2 0x0404 +#define GEM_ISR3 0x0408 +#define GEM_ISR4 0x040c +#define GEM_ISR5 0x0410 +#define GEM_ISR6 0x0414 +#define GEM_ISR7 0x0418 What about doing instead: #define GEM_ISR(q)((q) == 0 ? MACB_ISR : 0x400 + (q) 2) And ditto for all other registers, which will save a lot of boring repeated code. It will probably add a lot of object code and, depending on how often the registers are accesses, might have performance impact. Having: #define GEM_ISR(n) (0x400 + (n) 4) will save source code. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2 1/1] net/macb: add TX multiqueue support for gem
From: Cyrille Pitchen [... It will probably add a lot of object code and, depending on how often the registers are accesses, might have performance impact. Having: #define GEM_ISR(n) (0x400 + (n) 4) will save source code. David So you suggest that we keep the unsigned int fields ISR, IMR, IER, IDR, TBQP in the struct macb_queue and initialize them once for all in macb_probe() like patch v2 does but only replace the GEM_ISR1 .. GEM_ISR7 defines by GEM_ISR(n) in macb.h? This way there would be to test at run time and we can handle the special register mapping of queue0. Is it what you meant? In one word, yes. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 0/4] Add lightweight memory barriers fast_rmb() and fast_wmb()
From: Alexander Duyck These patches introduce two new primitives for synchronizing cache-enabled memory writes and reads. These two new primitives are: fast_rmb() fast_wmb() Not sure I like the names. If the aim is to sync data into the local cache so that hardware that is doing cache-snooping accesses sees the data then maybe local_rmb() and local_wmb() IIRC read_barrier_depends() is a nop on everything except alpha. Maybe add the default if it isn't defined by the MD file? David
RE: [PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts
From: Jason Wang On Mon, Dec 01, 2014 at 06:17:03PM +0800, Jason Wang wrote: Hello: We used to orphan packets before transmission for virtio-net. This breaks socket accounting and can lead serveral functions won't work, e.g: - Byte Queue Limit depends on tx completion nofication to work. - Packet Generator depends on tx completion nofication for the last transmitted packet to complete. - TCP Small Queue depends on proper accounting of sk_wmem_alloc to work. This series tries to solve the issue by enabling tx interrupts. To minize the performance impacts of this, several optimizations were used: - In guest side, virtqueue_enable_cb_delayed() was used to delay the tx interrupt untile 3/4 pending packets were sent. Doesn't that give problems for intermittent transmits? ... David
RE: nf_reject_ipv4: module license 'unspecified' taints kernel
From: Dave Young With today's linus tree, I got below kmsg: [ 23.545204] nf_reject_ipv4: module license 'unspecified' taints kernel. [ 23.551886] Disabling lock debugging due to kernel taint ... Not 100% related, but why does loading a non-GPL module disable lock debugging? (Is 'lock debugging' actually useful?) David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] net: wireless: brcm80211: brcmfmac: dhd_sdio.c: Cleaning up missing null-terminate in conjunction with strncpy
From: Rickard Strandqvist Replacing strncpy with strlcpy to avoid strings that lacks null terminate. And changed from using strncpy to strlcpy to simplify code. I think you should return an error if the strings get truncated. Silent truncation is going to lead to issues at some point in the future (in some places). Signed-off-by: Rickard Strandqvist rickard_strandqv...@spectrumdigital.se --- drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c | 25 ++-- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c b/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c index f55f625..d20d4e6 100644 --- a/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c +++ b/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c @@ -670,7 +670,6 @@ static int brcmf_sdio_get_fwnames(struct brcmf_chip *ci, struct brcmf_sdio_dev *sdiodev) { int i; - uint fw_len, nv_len; char end; for (i = 0; i ARRAY_SIZE(brcmf_fwname_data); i++) { @@ -684,25 +683,25 @@ static int brcmf_sdio_get_fwnames(struct brcmf_chip *ci, return -ENODEV; } - fw_len = sizeof(sdiodev-fw_name) - 1; - nv_len = sizeof(sdiodev-nvram_name) - 1; /* check if firmware path is provided by module parameter */ if (brcmf_firmware_path[0] != '\0') { - strncpy(sdiodev-fw_name, brcmf_firmware_path, fw_len); - strncpy(sdiodev-nvram_name, brcmf_firmware_path, nv_len); - fw_len -= strlen(sdiodev-fw_name); - nv_len -= strlen(sdiodev-nvram_name); + strlcpy(sdiodev-fw_name, brcmf_firmware_path, + sizeof(sdiodev-fw_name)); + strlcpy(sdiodev-nvram_name, brcmf_firmware_path, + sizeof(sdiodev-nvram_name)); end = brcmf_firmware_path[strlen(brcmf_firmware_path) - 1]; If you are doing a strlen() here, you could use the length for the copy and/or use it to avoid the strcat(). if (end != '/') { - strncat(sdiodev-fw_name, /, fw_len); - strncat(sdiodev-nvram_name, /, nv_len); - fw_len--; - nv_len--; + strlcat(sdiodev-fw_name, /, + sizeof(sdiodev-fw_name)); + strlcat(sdiodev-nvram_name, /, + sizeof(sdiodev-nvram_name)); } } - strncat(sdiodev-fw_name, brcmf_fwname_data[i].bin, fw_len); - strncat(sdiodev-nvram_name, brcmf_fwname_data[i].nv, nv_len); + strlcat(sdiodev-fw_name, brcmf_fwname_data[i].bin, + sizeof(sdiodev-fw_name)); + strlcat(sdiodev-nvram_name, brcmf_fwname_data[i].nv, + sizeof(sdiodev-nvram_name)); I assume something ensures that fw_name[0] == 0 here. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: RCU bug with v3.17-rc3 ?
From: Nathan Lynch On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote: Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and it seems that this has been known about for some time.) Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x 3 are affected, as well as 4.9.0. We can blacklist these GCC versions quite easily. We already have GCC 3.3 blacklisted, and it's trivial to add others. I would want to include some proper details about the bug, just like the other existing entries we already have in asm-offsets.c, where we name the functions that the compiler is known to break where appropriate. Before blacklisting anything, it's worth considering that simple version checks would break existing pre-4.8.3 compilers that have been patched for PR58854. It looks like Yocto and Buildroot issued releases with patched 4.8.2 compilers well before the (fixed) 4.8.3 release. I think the most we can reasonably do without breaking some correctly-behaving toolchains is to emit a warning. Is it possible to compile a small code fragment and check the generated code for the bug? Possibly predicated on the broken version number to avoid false positives. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] net: eth: realtek: atp: checkpatch errors and warnings corrected
From: Roberto Medina On 11/03/2014 07:51 PM, Joe Perches wrote: Some ancient drivers could be regarded as neolithic curiosities that never need updating. This may be one. But if you really want to change it, could you please make sure that objdiff shows no changes? I see some changes with objdiff, maybe this is caused by the include file that I changed to linux/io.h instead of asm/io.h --- /home/vov/Git/linux-next/.tmp_objdiff/a641d0e/drivers/net/ethernet/realtek/atp.dis 2014-11-03 19:59:18.723954900 +0100 +++ /home/vov/Git/linux-next/.tmp_objdiff/5f19b70/drivers/net/ethernet/realtek/atp.dis 2014-11-03 20:00:34.133954217 +0100 @@ -1753,9 +1753,8 @@ ... : e3 0a jrcxz 4c4 __gcov_.atp_probe1+0x14 : 00 00 add%al,(%rax) -:e2 e2 loop 4a0 __gcov_.atp_init+0x20 -:9c pushfq -:9a (bad) +:4b b4 98rex.WXB mov $0x98,%r12b +:ac lods %ds:(%rsi),%al : 67 2f addr32 (bad) : e4 e3 in $0xe3,%al : 00 00 add%al,(%rax) That code (and all the ones below) are gibberish, neither the old or new sequences make any sense. Almost as though you used the wrong instruction set! David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] x86: bpf_jit_comp: simplify trivial boolean return
From: Joe Perches On Wed, 2014-11-26 at 10:34 -0800, Alexei Starovoitov wrote: On Wed, Nov 26, 2014 at 10:02 AM, Joe Perches j...@perches.com wrote: On Wed, 2014-11-26 at 09:23 -0800, Alexei Starovoitov wrote: On Wed, Nov 26, 2014 at 8:58 AM, Joe Perches j...@perches.com wrote: Is there any value in reordering these tests for frequency or maybe using | instead of || to avoid multiple jumps? probably not. It's not a critical path. compiler may fuse conditions depending on values anyway. If it was a critical path, we could have used (1 reg) mask trick. I picked explicit 'return true' else 'return false' here, because it felt easier to read. Just a matter of taste. There is a size difference though: (allyesconfig) $ size arch/x86/net/built-in.o* textdata bss dec hex filename 1299910124336 1834747ab arch/x86/net/built-in.o.new 1317710764592 18845499d arch/x86/net/built-in.o.old interesting. Compiler obviously thinks that 178 byte increase with -O2 is the right trade off. Which I agree with :) If I think dropping 'inline' and using -Os will give bigger savings... This was allyesconfig which already uses -Os Using -O2, there is no difference using inline or not, but the size delta with the bitmask is much larger $ size arch/x86/net/built-in.o* (allyesconfig, but not -Os) text data bss dec hex filename 13410 8203624 1785445be arch/x86/net/built-in.o.new 16130 8844200 2121452de arch/x86/net/built-in.o.old 16130 8844200 2121452de arch/x86/net/built-in.o.static That is quite a big % change in the code size. Why the change in data? David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic
From: David Hildenbrand ... Although it might not be optimal, but keeping a separate counter for pagefault_disable() as part of the preemption counter seems to be the only doable thing right now. I am not sure if a completely separated counter is even possible, increasing the size of thread_info. What about adding (say) 0x1 for the more restrictive test? David N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic
From: David Hildenbrand [mailto:d...@linux.vnet.ibm.com] From: David Hildenbrand ... Although it might not be optimal, but keeping a separate counter for pagefault_disable() as part of the preemption counter seems to be the only doable thing right now. I am not sure if a completely separated counter is even possible, increasing the size of thread_info. What about adding (say) 0x1 for the more restrictive test? David You mean as part of the preempt counter? The current layout (on my branch) is * PREEMPT_MASK:0x00ff * SOFTIRQ_MASK:0xff00 * HARDIRQ_MASK:0x000f * NMI_MASK:0x0010 * PREEMPT_ACTIVE: 0x0020 I would have added * PAGEFAULT_MASK: 0x03C0 I'm not sure where you'd need to add the bits. I think the above works because disabling 'HARDIRQ' implicitly disables 'SOFTIRQ' and 'PREEMPT' (etc), so if 256+ threads disable PREEMPT everything still works. So if disabling pagefaults implies that pre-emption is disabled (but SOFTIRQ is still allowed) then you need to insert your bit(s) between 0xff00 and 0x00ff. OTOH if disabling pre-emption implies that pagefaults are disabled then you'd need to use the lsb and change all the above values. Which makes me think that 'PREEMPT_ACTIVE' isn't right at all. Two threads disabling NMIs (or 32 disabling HARDIRQ) won't DTRT. OTOH I'm only guessing at how this is used. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Where exactly will arch_fast_hash be used
From: Herbert Xu On Thu, Dec 04, 2014 at 04:43:31PM +0100, Daniel Borkmann wrote: Hm, I thought the kernel doc on arch_fast_hash() in include/linux/hash.h would give enough of a hint ... I think something more explicit like do not use this in a hash table unless you know what you're doing might be needed. That isn't really helpful. Maybe something more like: This CRC algorithm used by this hash is 'linear', ie hash(a xor b) == hash(a) xor hash(b). This means that it is relatively easy for a remote attacker to generate multiple items with the same hash. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v1 3/5] mm, uaccess: trigger might_sleep() in might_fault() when pagefaults are disabled
From: David Hildenbrand [... This should be likely() instead of unlikely(), no? I'd rather write this if (pagefault_disabled()) return; __might_sleep(file, line, 0); and leave the likely stuff completely away. Makes perfect sense! From my experience of getting (an older version of) gcc to emit 'correctly' statically predicted branches I found that code that looks like (I don't think return/goto make any difference): If (unlikely(condition)) { code; } more_code; is compile with a forwards conditional branch (ie ignoring the unlikely()). Similarly 'if () continue' is likely to generate a 'predicted taken' backwards conditional branch. To get the desired effect you need a non-empty 'else' part, an assembler comment will suffice, eg: asm volatile(# comment). David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 2/2] gianfar: handle map error in gfar_start_xmit()
From: David Miller From: Arseny Solokha asolo...@kb.kras.ru Date: Fri, 5 Dec 2014 17:37:54 +0700 @@ -2296,6 +2296,12 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev) 0, frag_len, DMA_TO_DEVICE); + if (unlikely(dma_mapping_error(priv-dev, bufaddr))) { + /* As DMA mapping failed, pretend the TX path +* is busy to retry later +*/ + return NETDEV_TX_BUSY; + } You are not busy, you are dropping the packet due to insufficient system resources. Therefore the appropriate thing to do is to free the SKB, increment the drop statistical counter, and return NETDEV_TX_OK. Plausibly the error action could depend on the number of messages in the transmit ring. If the ring is empty you definitely want to drop the packet. If mapping a ring full of packets takes more dma map space than the system has available you may want to be busy - otherwise you get systemic packet loss when transmitting large burst of data. This could be a problem if all the available dma mapping resources have been allocated to receive buffers. Do any common systems actually have limited dma space (apart from limited bounce buffers)? If people are only testing on systems with unlimited dma space (eg x86) then these paths will never be exercised unless an artificial limit is applied. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC PATCH 0/3] Faster than SLAB caching of SKBs with qmempool (backed by alf_queue)
From: Jesper Dangaard Brouer The network stack have some use-cases that puts some extreme demands on the memory allocator. One use-case, 10Gbit/s wirespeed at smallest packet size[1], requires handling a packet every 67.2 ns (nanosec). Micro benchmarking[2] the SLUB allocator (with skb size 256bytes elements), show fast-path instant reuse only costs 19 ns, but a closer to network usage pattern show the cost rise to 45 ns. This patchset introduce a quick mempool (qmempool), which when used in-front of the SKB (sk_buff) kmem_cache, saves 12 ns on fast-path drop in iptables raw table, but more importantly saves 40 ns with IP-forwarding, which were hitting the slower SLUB use-case. One of the building blocks for achieving this speedup is a cmpxchg based Lock-Free queue that supports bulking, named alf_queue for Array-based Lock-Free queue. By bulking elements (pointers) from the queue, the cost of the cmpxchg (approx 8 ns) is amortized over several elements. It seems to me that these improvements could be added to the underlying allocator itself. Nesting allocators doesn't really seem right to me. David
RE: [PATCH 0/5] powerpc: Get rid of redundant arch specific swab functions
From: David Gibson arch/powerpc/include/asm/swab.h includes some powerpc specific byteswapping functions, which are implemented in terms of powerpc's built in byte reversed load/store instructions. There are two problems with this: 1) They're not necessary - gcc is perfectly capable of generating the byte-reversed load and store instructions when using the normal, generic byteswapping functions (tested with gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9)) Should you be worrying about older versions of gcc? IIRC the internal byteswap 'stuff' is relatively recent (like the last couple of years) so people building current kernels on older distributions might have issues. David N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH v2] net: bluetooth: hci_sock: Use 'const void *' instead of 'void *' for 2nd parameter of hci_test_bit()
From: Gang S On 2/3/15 10:32, Chen Gang S wrote: On 2/3/15 05:20, Joe Perches wrote: On Tue, 2015-02-03 at 05:14 +0800, Chen Gang S wrote: hci_test_bit() does not modify 2nd parameter, so it is better to let it be constant, or may cause build warning. The related warning (with allmodconfig under xtensa): [] diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c [] @@ -46,7 +46,7 @@ struct hci_pinfo { unsigned shortchannel; }; -static inline int hci_test_bit(int nr, void *addr) +static inline int hci_test_bit(int nr, const void *addr) { return *((__u32 *) addr + (nr 5)) ((__u32) 1 (nr 31)); } It's probably better to use const __u32 * here too, but the real thing I wonder is whether or not there's an issue with one of the 2 uses of this function. One of them passes a unsigned long *, the other a u32 *. $ git grep -w hci_test_bit net/bluetooth/hci_sock.c:static inline int hci_test_bit(int nr, void *addr) net/bluetooth/hci_sock.c: if (!hci_test_bit(flt_event, flt-event_mask)) net/bluetooth/hci_sock.c:!hci_test_bit(ocf HCI_FLT_OCF_BITS, net/bluetooth/hci_sock.c- hci_sec_filter.ocf_mask[ogf])) hci_sec_filter.ocf_mask is __u32 but flt-event_mask is unsigned long. Any possible issue here on 64-bit systems? For me, it can not cause issue on 64-bit systems. hci_test_bit() treats 'addr' as __u32 *, and has to use the pointer to do something. 'event_mask' is intended to type cast to __u32 * within 'hci_sock.c'. So for me, const __u32 * is better than const void * for 2nd parameter of hci_test_bit(). If what I said above is correct, and also if necessary, I shall patch v3 for it. How are the bits set in the first place? If the array is ever indexed as long [] then the code above is unlikely to be testing the correct bits. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2] net: bluetooth: hci_sock: Use 'const void *' instead of 'void *' for 2nd parameter of hci_test_bit()
From: Chen Gang S -static inline int hci_test_bit(int nr, void *addr) +static inline int hci_test_bit(int nr, const void *addr) { return *((__u32 *) addr + (nr 5)) ((__u32) 1 (nr 31)); } Is there a 'standard' function lurking that will do the above. On x86 the cpus 'bit test' instruction will handle bit numbers greater than the word size - so it can be a single instruction. David