RE: [PATCH v2] net: bluetooth: hci_sock: Use 'const void *' instead of 'void *' for 2nd parameter of hci_test_bit()

2015-02-05 Thread David Laight
From: Chen Gang S [mailto:gang.c...@sunrus.com.cn]
 On 2/5/15 05:09, Marcel Holtmann wrote:
  Hi Sergei,
 
  -static inline int hci_test_bit(int nr, void *addr)
  +static inline int hci_test_bit(int nr, const void *addr)
   {
   return *((__u32 *) addr + (nr  5))  ((__u32) 1  (nr  31));
   }
 
  Is there a 'standard' function lurking that will do the above.
  On x86 the cpus 'bit test' instruction will handle bit numbers
  greater than the word size - so it can be a single instruction.
 
Of course, there's test_bit().
 
  we did leave hci_test_bit in the code since there are some userspace facing
  API that we can not change. Remember that the origin of this code is
  from 2.4.6 kernel.
 
  So we can only change this if you can ensure not to break the userspace API.
  So might want to write unit tests to ensure working HCI filter before even
  considering touching this.
 
 
 For me, we have to remain hci_test_bit(), it is for __u32 * (which we
 can not change). The common test_bit() is for unsigned long *, in this
 case, I guess it may cause issue under 64-bit environments.

Except that half the time you are passing a 'long *' and you haven't
explained why this isn't broken on 64bit architectures.

Note that on LE systems the size of the accesses used to access a dense
bit array don't matter. This is not true of BE systems.

David



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/2] if_link: Add VF multicast promiscuous mode control

2015-01-20 Thread David Laight
From: Hiroshi Shimamoto
 From: Hiroshi Shimamoto h-shimam...@ct.jp.nec.com
 
 Add netlink directives and ndo entry to control VF multicast promiscuous mode.
 
 Intel ixgbe and ixgbevf driver can handle only 30 multicast MAC addresses
 per VF. It means that we cannot assign over 30 IPv6 addresses to a single
 VF interface on VM. We want thousands IPv6 addresses in VM.
 
 There is capability of multicast promiscuous mode in Intel 82599 chip.
 It enables all multicast packets are delivered to the target VF.

A lot of ethernet chips (certainly older ones) used a hash for the
multicast filter. So the hardware filter was never perfect.

I even know of one that needed to be told the addresses and used
those to set the hash bits.
For that device I generated a canonical set of MAC addresses (one
for each multicast hash) and did a software count of how many times
each one was needed for each of the enabled multicasts.

My guess is that 'multicast promiscuous mode' is always valid
and should be enabled by the relevant low level driver when
any table setup fails because there are too many entries.

I actually suspect that most systems want all the multicasts
that occur at any moderate rate on the local network segment.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2 07/11] powerpc/8xx: macro for handling CPU15 errata

2015-01-20 Thread David Laight
From Christophe Leroy
 Having a macro will help keep clear code.

It might remove an #if but it doesn't really help.
All it means is that anyone reading the code has to hunt for
the definition before proceeding.

Some comment about what (and why) the extra code is needed
might help.

...
 +
 +#ifdef CONFIG_8xx_CPU15
 +#define DO_8xx_CPU15(tmp, addr)  \
 + additmp, addr, PAGE_SIZE;   \
 + tlbie   tmp;\
 + additmp, addr, PAGE_SIZE;   \

You've even transcribed this incorrectly.

Clearly not tested :-)

David

 + tlbie   tmp
 +#else
 +#define DO_8xx_CPU15(tmp, addr)
 +#endif
 +
  InstructionTLBMiss:
  #ifdef CONFIG_8xx_CPU6
   mtspr   SPRN_DAR, r3
 @@ -304,12 +315,7 @@ InstructionTLBMiss:
   EXCEPTION_PROLOG_0
   mtspr   SPRN_SPRG_SCRATCH2, r10
   mfspr   r10, SPRN_SRR0  /* Get effective address of fault */
 -#ifdef CONFIG_8xx_CPU15
 - addir11, r10, PAGE_SIZE
 - tlbie   r11
 - addir11, r10, -PAGE_SIZE
 - tlbie   r11
 -#endif
 + DO_8xx_CPU15(r11, r10)
 
   /* If we are faulting a kernel address, we have to use the
* kernel page tables.
 --
 2.1.0
 
 ___
 Linuxppc-dev mailing list
 linuxppc-...@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH v2 07/11] powerpc/8xx: macro for handling CPU15 errata

2015-01-20 Thread David Laight
From: leroy 
 Le 20/01/2015 12:09, David Laight a écrit :
   From Christophe Leroy
  Having a macro will help keep clear code.
  It might remove an #if but it doesn't really help.
  All it means is that anyone reading the code has to hunt for
  the definition before proceeding.
 
  Some comment about what (and why) the extra code is needed
  might help.
 The main reason is because of patch 09/11 where we have to duplicate
 this code. I prefer to just duplicate one line rather than duplicate the
 whole code (especially because in v1 of the PATCHset, it was duplicated
 twice):
 
 -DO_8xx_CPU15(r11, r10)
 [...]
 #ifdef CONFIG_MODULES
 [...]
 +DO_8xx_CPU15(r10, r11)
 [...]
 +#else
 +mfsprr10, SPRN_SRR0/* Get effective address of fault */
 +DO_8xx_CPU15(r11, r10)
 
 Is this approach wrong ?

I'd call it something that infers 'invalidate adjacent pages'
and then mention that this is needed due to a cpu errata.

David



RE: [E1000-devel] [PATCH 1/2] if_link: Add VF multicast promiscuous mode control

2015-01-22 Thread David Laight
From: Skidmore, Donald C 
   From: Hiroshi Shimamoto
My concern is what is the real issue that VF multicast promiscuous mode
  can cause.
I think there is the 4k entries to filter multicast address, and the
current ixgbe/ixgbevf can turn all bits on from VM. That is almost same 
as
  enabling multicast promiscuous mode.
I mean that we can receive all multicast addresses by an onerous
  operation in untrusted VM.
I think we should clarify what is real security issue in this context.
  
   If you are worried about passing un-enabled multicasts to users then
   what about doing a software hash of received multicasts and checking
   against an actual list of multicasts enabled for that hash entry.
   Under normal conditions there is likely to be only a single address to 
   check.
  
   It may (or may not) be best to use the same hash as any hashing
   hardware filter uses.
 
  thanks for the comment. But I don't think that is the point.
 
  I guess, introducing VF multicast promiscuous mode seems to add new
  privilege to peek every multicast packet in VM and that doesn't look good.
  On the other hand, I think that there has been the same privilege in the
  current ixgbe/ixgbevf implementation already. Or I'm reading the code
  wrongly.
  I'd like to clarify what is the issue of allowing to receive all multicast 
  packets.
 
 Allowing a VM to give itself the privilege of seeing every multicast packet
 could be seen as a hole in VM isolation.
 Now if the host system allows this policy I don't see this as an issue as
 someone specifically allowed this to happen and then must not be concerned.
 We could even log that it has occurred, which I believe your patch did do.
 The issue is also further muddied, as you mentioned above, since some of
 these multicast packets are leaking anyway (the HW currently uses a 12 bit 
 mask).
 It's just that this change would greatly enlarge that hole from a fraction to
 all multicast packets.

Why does it have anything to do with VM isolation?
Isn't is just the same as if the VM were connected directly to the
ethernet cable?

David



RE: [PATCH v2 1/1] atm: remove deprecated use of pci api

2015-01-14 Thread David Laight
From: David Miller
 From: Quentin Lambert lambert.quen...@gmail.com
 Date: Mon, 12 Jan 2015 17:10:42 +0100
 
  @@ -2246,7 +2246,8 @@ static int eni_init_one(struct pci_dev *pci_dev,
  goto err_disable;
 
  zero = eni_dev-zero;
  -   zero-addr = pci_alloc_consistent(pci_dev, ENI_ZEROES_SIZE, zero-dma);
  +   zero-addr = dma_alloc_coherent(pci_dev-dev, ENI_ZEROES_SIZE,
  +   zero-dma, GFP_ATOMIC);
  if (!zero-addr)
  goto err_kfree;
 
 
 I really would like you to look at these locations and see if
 GFP_KERNEL can be used instead of GFP_ATOMIC.  I bet that nearly
 all of these can, and it is preferred.

And there isn't much point inlining the wrapper until that has been done.
Not only that, the corresponding pci_free_consistent() calls need changing
at (much) the same time.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2 1/2] fixup! net/macb: Adding comments to various #defs to make interpretation easier

2015-01-15 Thread David Laight
From: Xander Huff
 Put #define comments into a single line.

Breaks the 80 char limit.
I suspect the definitions could be made to fit by judicial editing.
But that probably requires knowing exactly what the comments mean.

David

 Signed-off-by: Xander Huff xander.h...@ni.com
 ---
  drivers/net/ethernet/cadence/macb.h | 107 
 +---
  1 file changed, 26 insertions(+), 81 deletions(-)
 
 diff --git a/drivers/net/ethernet/cadence/macb.h 
 b/drivers/net/ethernet/cadence/macb.h
 index 378b218..d7b93d0 100644
 --- a/drivers/net/ethernet/cadence/macb.h
 +++ b/drivers/net/ethernet/cadence/macb.h
 @@ -275,9 +275,7 @@
  #define MACB_THALT_SIZE  1
  #define MACB_NCR_TPF_OFFSET  11 /* Transmit pause frame */
  #define MACB_NCR_TPF_SIZE1
 -#define MACB_TZQ_OFFSET  12 /* Transmit zero 
 quantum
 - * pause frame
 - */
 +#define MACB_TZQ_OFFSET  12 /* Transmit zero 
 quantum pause frame */
  #define MACB_TZQ_SIZE1
 
  /* Bitfields in NCFGR */
 @@ -299,9 +297,7 @@
  #define MACB_UNI_SIZE1
  #define MACB_BIG_OFFSET  8 /* Receive 1536 byte 
 frames */
  #define MACB_BIG_SIZE1
 -#define MACB_EAE_OFFSET  9 /* External address 
 match
 -* enable
 -*/
 +#define MACB_EAE_OFFSET  9 /* External address 
 match enable */
  #define MACB_EAE_SIZE1
  #define MACB_CLK_OFFSET  10
  #define MACB_CLK_SIZE2
 @@ -313,9 +309,7 @@
  #define MACB_RM9200_RMII_SIZE1  /* AT91RM9200 only */
  #define MACB_RBOF_OFFSET 14 /* Receive buffer offset */
  #define MACB_RBOF_SIZE   2
 -#define MACB_RLCE_OFFSET 16 /* Length field error frame
 - * discard
 - */
 +#define MACB_RLCE_OFFSET 16 /* Length field error frame 
 discard */
  #define MACB_RLCE_SIZE   1
  #define MACB_DRFCS_OFFSET17 /* FCS remove */
  #define MACB_DRFCS_SIZE  1
 @@ -335,41 +329,22 @@
  #define GEM_RXCOEN_SIZE  1
 
  /* Constants for data bus width. */
 -#define GEM_DBW320 /* 32 bit AMBA AHB data bus
 -* width
 -*/
 -#define GEM_DBW641 /* 64 bit AMBA AHB data bus
 -* width
 -*/
 -#define GEM_DBW128   2 /* 128 bit AMBA AHB data bus
 -* width
 -*/
 +#define GEM_DBW320 /* 32 bit AMBA AHB data bus 
 width */
 +#define GEM_DBW641 /* 64 bit AMBA AHB data bus 
 width */
 +#define GEM_DBW128   2 /* 128 bit AMBA AHB data bus 
 width */
 
  /* Bitfields in DMACFG. */
 -#define GEM_FBLDO_OFFSET 0 /* AHB fixed burst length for
 -* DMA data operations
 -*/
 +#define GEM_FBLDO_OFFSET 0 /* AHB fixed burst length for 
 DMA data operations */
  #define GEM_FBLDO_SIZE   5
 -#define GEM_ENDIA_OFFSET 7 /* AHB endian swap mode enable
 -* for packet data accesses
 -*/
 +#define GEM_ENDIA_OFFSET 7 /* AHB endian swap mode 
 enable for packet data accesses */
  #define GEM_ENDIA_SIZE   1
 -#define GEM_RXBMS_OFFSET 8 /* Receiver packet buffer
 -* memory size select
 -*/
 +#define GEM_RXBMS_OFFSET 8 /* Receiver packet buffer 
 memory size select */
  #define GEM_RXBMS_SIZE   2
 -#define GEM_TXPBMS_OFFSET10 /* Transmitter packet buffer
 - * memory size select
 - */
 +#define GEM_TXPBMS_OFFSET10 /* Transmitter packet buffer 
 memory size 

RE: [E1000-devel] [PATCH 1/2] if_link: Add VF multicast promiscuous mode control

2015-01-21 Thread David Laight
From: Hiroshi Shimamoto
 My concern is what is the real issue that VF multicast promiscuous mode can 
 cause.
 I think there is the 4k entries to filter multicast address, and the current 
 ixgbe/ixgbevf
 can turn all bits on from VM. That is almost same as enabling multicast 
 promiscuous mode.
 I mean that we can receive all multicast addresses by an onerous operation in 
 untrusted VM.
 I think we should clarify what is real security issue in this context.

If you are worried about passing un-enabled multicasts to users then
what about doing a software hash of received multicasts and checking
against an actual list of multicasts enabled for that hash entry.
Under normal conditions there is likely to be only a single address to check.

It may (or may not) be best to use the same hash as any hashing hardware
filter uses.

David



RE: [PATCH 3.12 065/122] lib/checksum.c: fix carry in csum_tcpudp_nofold

2015-02-18 Thread David Laight
From: Karl Beldan 
 On Tue, Feb 17, 2015 at 12:04:22PM +, David Laight wrote:
   +static inline u32 from64to32(u64 x)
   +{
   + /* add up 32-bit and 32-bit for 32+c bit */
   + x = (x  0x) + (x  32);
   + /* add up carry.. */
   + x = (x  0x) + (x  32);
   + return (u32)x;
   +}
 
  As a matter of interest, does the compiler optimise away the
  second (x  0x) ?
  The code could just be:
  x = (x  0x) + (x  32);
  return x + (x  32);
 

 On my side, from what I've seen so far, your version results in better
 assembly, esp. with clang, but my first version
 http://article.gmane.org/gmane.linux.kernel/1875407:
 x += (x  32) + (x  32);
   return (__force __wsum)(x  32);
 resulted in even better assembly, I just verified with gcc/clang,
 x86_64/ARM and -O1,2,3.

The latter looks to have a shorter dependency chain as well.
Although I'd definitely include a comment saying that it is equivalent
to the two lines in the current patch.

Does either compiler manage to use a rotate for the two shifts?
Using '(x  32) | (x  32)' might convince it to do so.
That would reduce it to three 'real' instructions and a register rename.

David



RE: [PATCH] iwl4965: Enable checking of format strings

2015-02-13 Thread David Laight
From: Rasmus Villemoes
 Well, probably the linker is allowed to overlap anonymous objects
 (string literals) with whatever const char[] (or indeed any const)
 object it finds containing the appropriate byte sequence. But I think
 language lawyers would insist that for
 
 const char foo[] = a string;
 const char bar[] = a string;

A quick test shows those are separate strings.
But 'const char *foo = xxx;' will share.
You also need -O1 to get the strings into .rodata.str.n so that the linker
can merge them.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 3.12 065/122] lib/checksum.c: fix carry in csum_tcpudp_nofold

2015-02-17 Thread David Laight
 +static inline u32 from64to32(u64 x)
 +{
 + /* add up 32-bit and 32-bit for 32+c bit */
 + x = (x  0x) + (x  32);
 + /* add up carry.. */
 + x = (x  0x) + (x  32);
 + return (u32)x;
 +}

As a matter of interest, does the compiler optimise away the
second (x  0x) ?
The code could just be:
x = (x  0x) + (x  32);
return x + (x  32);

David


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH V6 03/10] USB: f81232: implement RX bulk-in ep

2015-02-17 Thread David Laight
From: Greg KH
  +   for (i = 0 ; i  urb-actual_length ; i += 2) {
  +   tty_flag = TTY_NORMAL;
  +
  +   if (unlikely(data[i+0]  UART_LSR_BRK_ERROR_BITS)) {
 
 Never use unlikely() unless you can prove that it actually matters if
 you use it.  Hint, it's almost impossible to prove, so don't use it, the
 compiler and processor look-ahead is almost smarter than we are.

That just isn't true.

The compiler cannot know the actual control flow - so cannot correctly
arrange the code so that the branches are statically predicted
correctly for the required path (usually the most common path).

There are a lot of places where a few extra clocks for a mispredicted
branch don't really matter, and even in very hot paths where it does
matter it can be quite difficult to get the compiler to optimise the
branches 'correctly' - you can need to add asm comments in order to
generate non-empty code blocks.

In addition unlikely() is also a note to the human reader.

I did a lot of work adding likely/unlikely to some code in order
to minimise the 'worst case' code path. I got there, but some
parts were initially non-intuitive.

david

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH net-next 1/7] timecounter: provide a macro to initialize the cyclecounter mask field.

2015-01-05 Thread David Laight
 On Mon, Jan 05, 2015 at 01:20:57PM +, David Laight wrote:
   +/* simplify initialization of mask field */
   +#define CYCLECOUNTER_MASK(bits) (cycle_t)((bits)  64 ? 
   ((1ULL(bits))-1) : -1)
 
  That has me chasing through the C integer promotion rules.
  Better might be:
  ((bits)  64 ? (1ULL  (bits)) - 1 : (((1ULL  63) - 1)  1) + 1)
  I actually suspect there is a standard definition somewhere?
 
 This is an exact copy of CLOCKSOURCE_MASK, and if wrong, then both are
 wrong.  In any case, I can't see any issue here. Is not
 
   (some_int_type) -1
 
 always equal to
 
   0xf...(width of type)
 
 for all integer types, when using 2s compliment?

As I said, it leaves me chasing through the promotion rules (which I
probably know if I actually think hard enough).

Thinking... ~0ULL would be nice and simple and correct.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH net-next 1/7] timecounter: provide a macro to initialize the cyclecounter mask field.

2015-01-05 Thread David Laight
 There is no need for users of the timecounter/cyclecounter code to include
 clocksource.h just for a single macro.
 
 Signed-off-by: Richard Cochran richardcoch...@gmail.com
 ---
  include/linux/timecounter.h |5 -
  1 file changed, 4 insertions(+), 1 deletion(-)
 
 diff --git a/include/linux/timecounter.h b/include/linux/timecounter.h
 index 74f4549..4382035 100644
 --- a/include/linux/timecounter.h
 +++ b/include/linux/timecounter.h
 @@ -19,6 +19,9 @@
 
  #include linux/types.h
 
 +/* simplify initialization of mask field */
 +#define CYCLECOUNTER_MASK(bits) (cycle_t)((bits)  64 ? ((1ULL(bits))-1) : 
 -1)

That has me chasing through the C integer promotion rules.
Better might be:
((bits)  64 ? (1ULL  (bits)) - 1 : (((1ULL  63) - 1)  1) + 1)
I actually suspect there is a standard definition somewhere?

David 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 05/11] powerpc/8xx: Optimise access to swapper_pg_dir

2015-01-06 Thread David Laight
 On Tue, 2014-12-16 at 16:03 +0100, Christophe Leroy wrote:
  All accessed to PGD entries are done via 0(r11).
  By using lower part of swapper_pg_dir as load index to r11, we can remove 
  the
  ori instruction.
 
  Signed-off-by: Christophe Leroy christophe.le...@c-s.fr
 
 Nice :)
 Acked-by: Joakim Tjernlund joakim.tjernl...@transmode.se
 
 
  ---
   arch/powerpc/kernel/head_8xx.S | 22 ++
   1 file changed, 10 insertions(+), 12 deletions(-)
 
  diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
  index ae05f28..aa45225 100644
  --- a/arch/powerpc/kernel/head_8xx.S
  +++ b/arch/powerpc/kernel/head_8xx.S
  @@ -322,13 +322,12 @@ InstructionTLBMiss:
  mfspr   r11, SPRN_M_TW/* Get level 1 table base address */
   #ifdef CONFIG_MODULES
  beq 3f
  -   lis r11, (swapper_pg_dir-PAGE_OFFSET)@h
  -   ori r11, r11, (swapper_pg_dir-PAGE_OFFSET)@l
  +   lis r11, (swapper_pg_dir-PAGE_OFFSET)@ha
   3:
   #endif
  /* Insert level 1 index */
  rlwimi  r11, r10, 32 - ((PAGE_SHIFT - 2)  1), (PAGE_SHIFT - 2)  
  1, 29
  -   lwz r11, 0(r11)/* Get the level 1 entry */
  +   lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)/* Get the level 1 
  entry */
 

On the face of it that fragment doesn't look right when CONFIG_MODULES is 
undefined.

David



RE: [PATCH 05/11] powerpc/8xx: Optimise access to swapper_pg_dir

2015-01-06 Thread David Laight
From: leroy christophe
 Le 06/01/2015 13:08, David Laight a écrit :
  On Tue, 2014-12-16 at 16:03 +0100, Christophe Leroy wrote:
  All accessed to PGD entries are done via 0(r11).
  By using lower part of swapper_pg_dir as load index to r11, we can remove 
  the
  ori instruction.
 
  Signed-off-by: Christophe Leroy christophe.le...@c-s.fr
  Nice :)
  Acked-by: Joakim Tjernlund joakim.tjernl...@transmode.se
 
  ---
arch/powerpc/kernel/head_8xx.S | 22 ++
1 file changed, 10 insertions(+), 12 deletions(-)
 
  diff --git a/arch/powerpc/kernel/head_8xx.S 
  b/arch/powerpc/kernel/head_8xx.S
  index ae05f28..aa45225 100644
  --- a/arch/powerpc/kernel/head_8xx.S
  +++ b/arch/powerpc/kernel/head_8xx.S
  @@ -322,13 +322,12 @@ InstructionTLBMiss:
   mfspr   r11, SPRN_M_TW/* Get level 1 table base address */
#ifdef CONFIG_MODULES
   beq 3f
  -   lis r11, (swapper_pg_dir-PAGE_OFFSET)@h
  -   ori r11, r11, (swapper_pg_dir-PAGE_OFFSET)@l
  +   lis r11, (swapper_pg_dir-PAGE_OFFSET)@ha
3:
#endif
   /* Insert level 1 index */
   rlwimi  r11, r10, 32 - ((PAGE_SHIFT - 2)  1), (PAGE_SHIFT - 2) 
   1, 29
  -   lwz r11, 0(r11)/* Get the level 1 entry */
  +   lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)/* Get the level 
  1 entry */
 
  On the face of it that fragment doesn't look right when CONFIG_MODULES is 
  undefined.
 
  David
 
 I'm not sure I understand what you mean.
 
 The other part of the patch adds the following:
 +lir5, (swapper_pg_dir-PAGE_OFFSET)@l
 +subr4, r4, r5
 
 r4 is the value put into SPRN_M_TW, so I don't see what may be wrong.

Ok, but it is completely non-obvious.
You probably need to change some comments somewhere about what is held
in SPRN_M_TW - since it is no longer the L1 base address, but that value
offset by some 'random' amount.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: /proc/net/dev regression

2015-01-12 Thread David Laight
From: Al Viro
  I think the problem with wmnet is not that it was expecting the fields
  to be aligned because it never had problems before (when definitely more
  than 10 megabytes were received, wmnet is crappy but not _that_ crappy).
 
  I think the problem really was here,
 
  totalbytes_in = strtoul(buffer[7], NULL, 10);
 
  After the patch the device name is 8 characters long and buffer[7]
  overlaps with the name instead of reading the bytes. Before the
  patch is was fine because the call to strtoul() seems correct in the
  sense that it would read everything until the NULL. So more than 10
  megabytes was still ok.
 
  So I guess I was wrong when suggesting that the problem was the
  alignment.
 
 Several lines below there's this:
 totalpackets_out = strtoul(buffer[74], NULL, 10);
 if (totalpackets_out != lastpackets_out) {
 totalbytes_out = strtoul(buffer[66], NULL, 
 10);
 diffpackets_out += totalpackets_out - 
 lastpackets_out;
 diffbytes_out += totalbytes_out - 
 lastbytes_out;
 lastpackets_out = totalpackets_out;
 lastbytes_out = totalbytes_out;
 tx = True;
 }
 
 So I'm afraid it *is* that crappy.  This function really should use scanf();
 note that updateStats_ipchains() in the same file does just that (well,
 fgets()+sscanf() for fsck knows what reason).  And I'd be careful with all
 those %d, actually - it's not _that_ hard to get more than 4Gb sent.
 scanf formats really ought to match the kernel-side (seq_)printf one...

IMHO it is safer to use strchr(p, ' '); to skip the interface name and then
use repeated calls to strtoull() to read the numbers.
Correctly/safely using scanf() is really too hard.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/1] update ip-sysctl.txt documentation

2015-01-07 Thread David Laight
From: Ani Sinha
 Update documentation to reflect the fact that
 /proc/sys/net/ipv4/route/max_size is no longer used for ipv4.
 
 Signed-off-by: Ani Sinha a...@arista.com
 ---
  Documentation/networking/ip-sysctl.txt |6 --
  1 files changed, 4 insertions(+), 2 deletions(-)
 
 diff --git a/Documentation/networking/ip-sysctl.txt 
 b/Documentation/networking/ip-sysctl.txt
 index 9bffdfc..2a28261 100644
 --- a/Documentation/networking/ip-sysctl.txt
 +++ b/Documentation/networking/ip-sysctl.txt
 @@ -64,8 +64,10 @@ fwmark_reflect - BOOLEAN
   Default: 0
 
  route/max_size - INTEGER
 - Maximum number of routes allowed in the kernel.  Increase
 - this when using large numbers of interfaces and/or routes.
 + Post linux kernel 3.6, this is deprecated for ipv4 as route cache is no
 + longer used. For ipv6, this is used to limit the maximum number of ipv6
 + routes allowed in the kernel.  Increase this when using large numbers of
 + interfaces and/or routes.

Now imagine you are reading ip-sysctl.txt, the new text is all wrong.
Something like:
Limit on the size of the ip route caches.
Ignored for ipv4 after kernel 3.6 because the ipv4 route cache was 
removed.
Increase this when using large numbers of interfaces and/or routes.
would read better.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH net-next] rhashtable: Lower/upper bucket may map to same lock while shrinking

2015-01-13 Thread David Laight
From: Thomas Graf
 Each per bucket lock covers a configurable number of buckets. While
 shrinking, two buckets in the old table contain entries for a single
 bucket in the new table. We need to lock down both while linking.
 Check if they are protected by different locks to avoid a recursive
 lock.

Thought, could the shrunk table use the same locks as the lower half
of the old table?

I also wonder whether shrinking hash tables is ever actually worth
the effort. Most likely they'll need to grow again very quickly.

   spin_lock_bh(old_bucket_lock1);
 - spin_lock_bh_nested(old_bucket_lock2, RHT_LOCK_NESTED);
 - spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED2);
 +
 + /* Depending on the lock per buckets mapping, the bucket in
 +  * the lower and upper region may map to the same lock.
 +  */
 + if (old_bucket_lock1 != old_bucket_lock2) {
 + spin_lock_bh_nested(old_bucket_lock2, RHT_LOCK_NESTED);
 + spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED2);
 + } else {
 + spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED);
 + }

Acquiring 3 locks of much the same type looks like a locking hierarchy
violation just waiting to happen.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH net-next] rhashtable: Lower/upper bucket may map to same lock while shrinking

2015-01-13 Thread David Laight
From: Thomas Graf
...
  Thought, could the shrunk table use the same locks as the lower half
  of the old table?
 
 No. A new bucket table and thus a new set of locks is allocated when the
 table is shrunk or grown. We only have check for overlapping locks
 when holding multiple locks for the same table at the same time.

I was guessing that when locks are shared buckets k and 2^n+k use the
same lock.
Under those conditions if the 'grow' decided not to allocate extra
locks then it could save work by using exactly the same locks as the
old table.
Similarly 'shrink' could do the reverse.

It was only a thought.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH net-next] rhashtable: Lower/upper bucket may map to same lock while shrinking

2015-01-13 Thread David Laight
From: Thomas Graf
...
 spin_lock_bh(old_bucket_lock1);
   - spin_lock_bh_nested(old_bucket_lock2, RHT_LOCK_NESTED);
   - spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED2);
   +
   + /* Depending on the lock per buckets mapping, the bucket in
   +  * the lower and upper region may map to the same lock.
   +  */
   + if (old_bucket_lock1 != old_bucket_lock2) {
   + spin_lock_bh_nested(old_bucket_lock2, RHT_LOCK_NESTED);
   + spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED2);
   + } else {
   + spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED);
   + }
 
  Acquiring 3 locks of much the same type looks like a locking hierarchy
  violation just waiting to happen.
 
 I'm not claiming it's extremely pretty, lockless lookup with deferred
 resizing doesn't come for free ;-) If you have a suggestion on how to
 implement this differently I'm all ears.

runs away

 That said, it's well isolated
 and the user of rhashtable does not have to deal with it. All code paths
 which take multiple locks are mutually exclusive to each other (ht-mutex).

OK, ht-mutes saves the day.
Might be worth a comment to save people looking at the code in isolation
from worrying and doing a bit search.
OTOH it might be obvious from a slightly larger fragment than the diff.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3] net: bluetooth: hci_sock: Use 'const u32 *' instead of 'void *' for 2nd parameter of hci_test_bit()

2015-02-09 Thread David Laight
From: Marcel Holtmann
 Hi David,
 
  So at present, in kernel part, we can only say the original authors
  intended to do like this. And only within kernel part, it can not cause
  issue. I guess, original authors originally knew what we talk about.
 
  I've just searched for hci_u*filter it is all horrid.
  Look at the code for get/set_sockopt of HCI_FILTER.
  Someone seems to have done a complete 'bodge job' of fixing the user 
  interface
  for apps (32bit and 64bit) on 64bit kernels.
 
 we are actually fully aware of that. Just keep in mind that this code is from 
 2.4.6 kernel and with
 that most likely 14 years old by now. Unfortunately it ended up as userspace 
 API.

AFAICT the userspace API is always __u32[2] the long[2] is never exposed
to userspace (certainly not through the socket option code).

Quite why all the casts were added - instead of changing the type
is probably hidden in the annals of the source repo.
Some serious archaeology might be in order.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3] net: bluetooth: hci_sock: Use 'const u32 *' instead of 'void *' for 2nd parameter of hci_test_bit()

2015-02-09 Thread David Laight
From: Chen Gang
...
 So at present, in kernel part, we can only say the original authors
 intended to do like this. And only within kernel part, it can not cause
 issue. I guess, original authors originally knew what we talk about.

I've just searched for hci_u*filter it is all horrid.
Look at the code for get/set_sockopt of HCI_FILTER.
Someone seems to have done a complete 'bodge job' of fixing the user interface
for apps (32bit and 64bit) on 64bit kernels.

 This patch is for fixing building warnings without any negative effect.
 And most of us are not the suitable members to continue analyzing. So
 at present, for me, we can accept this patch.

And, not uncommonly, it has shown up a 'bag of worms'.

If you change 'hci_filter' to contain u32[2] then you can drop
all of the casts and the temporary structures in the sockopt code.
Just be aware of the tail-padding.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3] brcmfmac: avoid duplicated suspend/resume operation

2015-02-16 Thread David Laight
  WiFi chip has 2 SDIO functions, and PM core will trigger
  twice suspend/resume operations for one WiFi chip to do
  the same things. This patch avoid this case.

Do you want to suspend on the first or last request?

In general it might be that one function is in use and
something wants to suspend the other (as inactive).

If they suspend together you might need to pretend the
first function is suspended but only do the real power-saving
device suspend when all the functions have been suspended.

David



RE: [PATCH net-next 07/11] r8169:update rtl8168f pcie ephy parameter

2015-01-07 Thread David Laight
From: Chunhao Lin
 rtl8168f may return to PCIe L0 from PCIe L0s low power mode too slow.
 The following ephy parameters are for this issue.
 { 0x00, 0x,   0x0008 }
 { 0x0c, 0x3df0,   0x0200 }
 
 Signed-off-by: Chunhao Lin h...@realtek.com
 ---
  drivers/net/ethernet/realtek/r8169.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/net/ethernet/realtek/r8169.c 
 b/drivers/net/ethernet/realtek/r8169.c
 index 1874583..bafa132 100644
 --- a/drivers/net/ethernet/realtek/r8169.c
 +++ b/drivers/net/ethernet/realtek/r8169.c
 @@ -5852,7 +5852,9 @@ static void rtl_hw_start_8168f_1(struct rtl8169_private 
 *tp)
   { 0x06, 0x00c0, 0x0020 },
   { 0x08, 0x0001, 0x0002 },
   { 0x09, 0x, 0x0080 },
 - { 0x19, 0x, 0x0224 }
 + { 0x19, 0x, 0x0224 },
 + { 0x00, 0x, 0x0008 },
 + { 0x0c, 0x3df0, 0x0200 }

I can't help feeling these lines all require short comments.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 15/15] x86/lib/memcpy_64.S: Convert memcpy to ALTERNATIVE_2 macro

2015-03-27 Thread David Laight
I've just 'fallen over' this 'optimisation' on a new Intel Atom processor.
AFAICT all the copy functions get patched to 'rep movsb'.

The problem arises when one of the buffers is uncached, in this
case the 'rep movsb' has to perform single byte transfers.

So memcpy_toio() and memcpy_fromio() need to use 'rep movsq'
otherwise the performance is horrid - especially over PCIe.

The same is true for any userspace code that is copying from
mmap()ed PCI memory space.

In my case I am using copy_to/from_user() to copy directly between
PCIe space and a user buffer. While not entirely portable it only
has to run on specific hardware where it works.

I know which copies in my code are problematic, but there isn't general
copy function that will DTRT.

(Keep me on the cc list).

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v10 tip 5/9] tracing: allow BPF programs to call bpf_trace_printk()

2015-03-23 Thread David Laight
From: Alexei Starovoitov
 Debugging of BPF programs needs some form of printk from the program,
 so let programs call limited trace_printk() with %d %u %x %p modifiers only.

Should anyone be allowed to use BPF programs to determine the kernel
addresses of any items?
Looks as though it is leaking kernel addresses to userspace.
Note that the problem is with the arguments, not the format string.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH net-next] wireless: test sscanf return values

2015-02-27 Thread David Laight
From: Joe Perches
 At some point, it'd be good to make sscanf use __must_check
 so make sure the net/ uses of sscanf use the return value.

Isn't it much safer to avoid sscanf() completely and use
a different function for converting numerics?

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/7] Add die_spin_lock_{irqsave,irqrestore}

2015-02-24 Thread David Laight
From: Ingo Molnar
...
 So why not trylock and time out here after a few seconds,
 instead of indefinitely supressing some potentially vital
 output due to some other CPU crashing/locking with the lock
 held?

I've used that for status requests that usually lock a table
to get a consistent view.
If trylock times out assume that the data is actually stable
and access it anyway. Remember the pid doing the access and
next time it tries to acquire the same lock do a trylock with
no timeout.
That way if (when) there is a locking fubar (or a driver oops
with a lock held) at least some of the relevant status commands
will work.

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH 07/11] powerpc/8xx: macro for handling CPU15 errata

2015-04-20 Thread David Laight
From: Christophe Leroy
 Sent: 20 April 2015 06:27
 Having a macro will help keep clear code.
...
   * We have to use the MD_xxx registers for the tablewalk because the
   * equivalent MI_xxx registers only perform the attribute functions.
   */
 +
 +#ifdef CONFIG_8xx_CPU15
 +#define DO_8xx_CPU15(tmp, addr)  \
 + additmp, addr, PAGE_SIZE;   \
 + tlbie   tmp;\
 + additmp, addr, PAGE_SIZE;   \
 + tlbie   tmp
 +#else
 +#define DO_8xx_CPU15(tmp, addr)
 +#endif

I'm sure I've spotted the same obvious error in the above before.

I'd also suggest calling it 'invalidate_adjacent_pages' - since that it
what it does.

I also guess that the execution time of 'tlbie' is non-trivial.
So you might as well get rid of the temporary register and put an
'addi' to reset 'addr' at the end.

David



RE: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-20 Thread David Laight
From: Of Rob Landley
 Sent: 19 April 2015 08:25
 On Thu, Apr 16, 2015 at 6:03 PM, Tejun Heo t...@kernel.org wrote:
  In a lot of configurations, netconsole is a useful way to collect
  system logs; however, all netconsole does is simply emitting UDP
  packets for the raw messages and there's no way for the receiver to
  find out whether the packets were lost and/or reordered in flight.
 
 Except a modern nonsaturated LAN shouldn't do that.
 
 If you have two machines plugged into a hub, and that's _all_ that's
 plugged in, packets should never get dropped. This was the original
 use case of netconsole was that the sender and the receiver were
 plugged into the same router.
 
 However, even on a quite active LAN the days of ethernet doing CDMA
 requiring retransmits are long gone, even 100baseT routers have been
 cacheing and retransmitting data internally so each connection can go
 at the full 11 megabytes/second with the backplane running fast enough
 to keep them all active at the same time. (That's why it's so hard to
 find a _hub_ anymore, it's all routers
...

Most machines are plugged into switches (not routers), many of them
will send 'pause' frames which the host mac may act on.
In which case packet loss is not expected (unless you have broadcast storms
when all bets are off).

Additionally, within a local network you shouldn't really get any packet
loss since no segments should be 100% loaded.
So for testing it is not unreasonable to expect no lost packets in netconsole
traffic.

David




RE: [Y2038] [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure

2015-04-22 Thread David Laight
From: Thomas Gleixner
 Sent: 22 April 2015 09:45
 On Tue, 21 Apr 2015, Thomas Gleixner wrote:
  On Tue, 21 Apr 2015, Arnd Bergmann wrote:
   I know there are concerns about this, in particular because C11 and
   POSIX both require tv_nsec to be 'long', unlike timeval-tv_usec,
   which is a 'suseconds_t' and can be defined as 'long long'.
  
   a)
  
   struct timespec {
 time_t tv_sec;
 long long tv_nsec; /* or typedef long long snseconds_t */
   };
  
   This is not directly compatible with C11 or POSIX.1-2008, but it
   matches what we do inside of 64-bit kernels, so probably has the
   highest chance of working correctly in practice
 
  After reading Linus rant in the x32 thread again (thanks for the
  reminder), and looking at b/c/d - which rate between ugly and butt
  ugly - I think we should go for a) and screw POSIX and C11 as those
  committee dinosaurs seem to completely ignore the 2038 problem on
  32bit machines. At least I have not found any hint that these folks
  care at all. So why should we comply to something which is completely
  useless?
 
  That also makes the question about the upper 32bits check moot, so
  it's the simplest and clearest of the possible solutions.
 
 Second thoughts after some sleep.
 
 So the outcome of this is going to be that user space libraries will
 not expose the syscall variant of
 
 syscall_timespec64 {
s64 tv_sec;
  s64 tv_nsec;
 };
 
 to applications. The libs will translate them to spec conforming
 
timespec {
time_t tv_sec;
  long   tv_nsec;
};
 
 anyway. That means we have two translation steps on 32bit systems:
 
   1) user space timespec - syscall timespec64
 
   2) syscall timespec64 - scalar nsec s64 (ktime_t)
 
 and the other way round. The kernel internal representation is simply
 s64 (nsec) based all over the place.

Do you need the double-translation?
If all the kernel uses a 64bit nsec value the in-kernel syscall stub
can convert the user-supplied values appropriately before calling
the standard function.
Not that a syscall that takes a linear nsec value isn't useful.

FWIW I can't remember what NetBSD did when they extended time_t to 64bits.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] net: usb: allow MTU that is a multiple of USB packet size

2015-05-07 Thread David Laight
From: Oliver Neukum
 Sent: 07 May 2015 11:11
 On Thu, 2015-05-07 at 12:49 +0300, Ruslan Bilovol wrote:
  Current usbnet driver rejects setting MTU that is a multiple
  of USB endpoint's wMaxPacketSize size. However, it may only
  lead to possible performance degradation but is not so
  critical that its using should be prohibited. So allow it
  but also warn user about possible issue.
 
 We have reports about devices reacting badly to ZLPs.
 Unless you have a compelling reasons for this change
 I have to reject it.

I don't remember seeing a fix for xhci to even send ZLP go through.
If the ZLP are sent then the device will behave badly.

The receive side code in usbnet also needs a lot of TLC.
(I don't remember anything major happening to it in the last couple
of years.)
I think it would be better if it didn't try to use skb for receive URB.
Instead it should supply URB that are always multiples of the USB
buffer size and then generate receive skb from the receive USB data.
This would require code that correctly processes ethernet frames that
span URB boundaries.
This would remove the need for the 16k+ sized skb used by some of the
sub-drivers and the typical lying about the 'true size'.

For xhci it really ought to be possible to remove a lot of code from the
tx and rx data paths and just write the buffer descriptors directly into
the ring entries (as is a typical ethernet driver).
That might get the performance (and software overhead) of USB3 Ge much
nearer that of a normal ethernet device.

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH 1/4] ozwpan: Use proper check to prevent heap overflow

2015-05-15 Thread David Laight
From: Jason A. Donenfeld
 Sent: 13 May 2015 19:34
 Since elt-length is a u8, we can make this variable a u8. Then we can
 do proper bounds checking more easily. Without this, a potentially
 negative value is passed to the memcpy inside oz_hcd_get_desc_cnf,
 resulting in a remotely exploitable heap overflow with network
 supplied data.
...
 diff --git a/drivers/staging/ozwpan/ozusbsvc1.c 
 b/drivers/staging/ozwpan/ozusbsvc1.c
 index d434d8c..cd6c63e 100644
 --- a/drivers/staging/ozwpan/ozusbsvc1.c
 +++ b/drivers/staging/ozwpan/ozusbsvc1.c
 @@ -390,8 +390,10 @@ void oz_usb_rx(struct oz_pd *pd, struct oz_elt *elt)
   case OZ_GET_DESC_RSP: {
   struct oz_get_desc_rsp *body =
   (struct oz_get_desc_rsp *)usb_hdr;
 - int data_len = elt-length -
 + u8 data_len = elt-length -
   sizeof(struct oz_get_desc_rsp) + 1;
 + if (data_len  elt-length)
 + break;

Why not just check the length. eg:
unsigned int data_len = elt-length;
if (data_len  sizeof(struct oz_get_desc_rsp) + 1)
break;

   u16 offs = le16_to_cpu(get_unaligned(body-offset));
   u16 total_size =
   le16_to_cpu(get_unaligned(body-total_size));

Don't put variable definitions after code.

You don't really want to do arithmetic on local variables that are
smaller than a machine word (eg u8 and u16), doing so can require
the compiler generate a lot more code.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2 0/6] powerpc32: replace memcpy and memset by cacheable alternatives

2015-05-19 Thread David Laight
From: Christophe Leroy
 Sent: 19 May 2015 11:08

 This patchset implements use of cacheable versions of memset and
 memcpy since when the destination is not cacheable, memset_io
 and memcpy_toio are used.

This isn't the right list to ask, but:

Can someone fix the x86 versions of memset/memcpy (and the _io variants)
so that they don't end up being 'rep movsb' on new intel cpus?

I've a C2558 Atom which has the optimised 'rep movsb' hardware.
Copies to/from uncached locations are now done 'byte by byte'.
As well as kernel code this affects userpace copying to/from
mmap()ed PCIe space.
64bit reads are slow enough, making it 8 times slower is horrid.

I suspect this affect some network drivers as well.

David



RE: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen

2015-04-16 Thread David Laight
From: George Dunlap
 Sent: 16 April 2015 09:56
 On 04/15/2015 07:19 PM, Eric Dumazet wrote:
  On Wed, 2015-04-15 at 19:04 +0100, George Dunlap wrote:
 
  Maybe you should stop wasting all of our time and just tell us what
  you're thinking.
 
  I think you make me wasting my time.
 
  I already gave all the hints in prior discussions.
 
 Right, and I suggested these two options:
 
 Obviously one solution would be to allow the drivers themselves to set
 the tcp_limit_output_bytes, but that seems like a maintenance
 nightmare.
 
 Another simple solution would be to allow drivers to indicate whether
 they have a high transmit latency, and have the kernel use a higher
 value by default when that's the case. [1]
 
 Neither of which you commented on.  Instead you pointed me to a comment
 that only partially described what the limitations were. (I.e., it
 described the two packets or 1ms, but not how they related, nor how
 they related to the max of 2 64k packets outstanding of the default
 tcp_limit_output_bytes setting.)

ISTM that you are changing the wrong knob.
You need to change something that affects the global amount of pending tx data,
not the amount that can be buffered by a single connection.

If you change tcp_limit_output_bytes and then have 1000 connections trying
to send data you'll suffer 'bufferbloat'.

If you call skb_orphan() in the tx setup path then the total number of
buffers is limited, but a single connection can (and will) will the tx
ring leading to incorrect RTT calculations and additional latency for
other connections.
This will give high single connection throughput but isn't ideal.

One possibility might be to call skb_orphan() when enough time has
elapsed since the packet was queued for transmit that it is very likely
to have actually been transmitted - even though 'transmit done' has
not yet been signalled.
Not at all sure how this would fit in though...

David




RE: [PATCH v3 03/17] crypto: talitos - talitos_ptr renamed ptr for more lisibility

2015-04-17 Thread David Laight
From: Christophe Leroy
 Linux CodyingStyle recommends to use short variables for local
 variables. ptr is just good enough for those 3 lines functions.
 It helps keep single lines shorter than 80 characters.
...
 -static void to_talitos_ptr(struct talitos_ptr *talitos_ptr, dma_addr_t 
 dma_addr)
 +static void to_talitos_ptr(struct talitos_ptr *ptr, dma_addr_t dma_addr)
  {
 - talitos_ptr-ptr = cpu_to_be32(lower_32_bits(dma_addr));
 - talitos_ptr-eptr = upper_32_bits(dma_addr);
 + ptr-ptr = cpu_to_be32(lower_32_bits(dma_addr));
 + ptr-eptr = upper_32_bits(dma_addr);
  }
...

Maybe, but 'ptr' isn't a good choice.

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] usb: gadget: ffs: don't allow to open with O_NONBLOCK flag

2015-04-07 Thread David Laight
From: Robert Baldyga
 Hi Michal,
 
 On 04/01/2015 05:17 PM, Michal Nazarewicz wrote:
  On Wed, Apr 01 2015, Robert Baldyga r.bald...@samsung.com wrote:
  FunctionFS can't support O_NONBLOCK because read/write operatons are
  directly translated into USB requests which are asynchoronous, so we
  can't know how long we will have to wait for request completion. For
  this reason in case of open with O_NONBLOCK flag we return
  -EWOULDBLOCK.
 
  cant is a bit strong of a word here though.  It can, but in a few
  cases it doesnt.
 
  It kinda saddens me that this undoes all the lines of code that were put
  into the file to support O_NONBLOCK (e.g. FFS_NO_SETUP path of
  ffs_ep0_read).
 
  Im also worried this may break existing applications which, for better
  or worse, open the file with O_NONBLOCK.
 
  Most importantly though, this does not stop users from using fcntl to
  set O_NONBLOCK, so if you really want to stop O_NONBLOCK from being set,
  that path should be checked as well (if possible).
 
 I want rather to inform users that non-blocking i/o wouldn't work for
 epfiles. Indeed we can handle O_NONBLOCK for ep0 (for the same reason we
 can have poll), but for other epfiles there is no way to check if
 read/write operation can end up in short time. Everything is up to host.

Is that really necessary?
I'm sure there are a lot of device drivers that ignore O_NONBLOCK.

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] ethernet: e1000e: define lat_ns as u64 instead of s64

2015-04-08 Thread David Laight
From: yanjiang@windriver.com
 Sent: 03 April 2015 10:18
 From: Yanjiang Jin yanjiang@windriver.com
 
 do_div() expects the type of n to be uint64_t, define lat_ns as u64 to
 avoid the below warning, also update its correlative operations and data.
 
 In file included from ./arch/powerpc/include/asm/div64.h:1:0,
  from include/linux/kernel.h:124,
  from include/linux/list.h:8,
  from include/linux/timer.h:4,
  from drivers/net/ethernet/intel/e1000e/e1000.h:29,
  from drivers/net/ethernet/intel/e1000e/ich8lan.c:59:
 drivers/net/ethernet/intel/e1000e/ich8lan.c: In function 
 'e1000_platform_pm_pch_lpt':
 include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer 
 types lacks a cast [enabled
 by default]
   (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
 ^
 drivers/net/ethernet/intel/e1000e/ich8lan.c:1016:4: note: in expansion of 
 macro 'do_div'
 do_div(lat_ns, speed);
 
 Signed-off-by: Yanjiang Jin yanjiang@windriver.com
 ---
  drivers/net/ethernet/intel/e1000e/ich8lan.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c 
 b/drivers/net/ethernet/intel/e1000e/ich8lan.c
 index 48b74a5..baab58b 100644
 --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
 +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
 @@ -982,8 +982,8 @@ static s32 e1000_platform_pm_pch_lpt(struct e1000_hw *hw, 
 bool link)
   u16 speed, duplex, scale = 0;
   u16 max_snoop, max_nosnoop;
   u16 max_ltr_enc;/* max LTR latency encoded */
 - s64 lat_ns; /* latency (ns) */
 - s64 value;
 + u64 lat_ns; /* latency (ns) */
 + u64 value;
   u32 rxa;
 
   if (!hw-adapter-max_frame_size) {
 @@ -1008,8 +1008,8 @@ static s32 e1000_platform_pm_pch_lpt(struct e1000_hw 
 *hw, bool link)
* 2^25*(2^10-1) ns.  The scale is encoded as 0=2^0ns,
* 1=2^5ns, 2=2^10ns,...5=2^25ns.
*/
 - lat_ns = ((s64)rxa * 1024 -
 -   (2 * (s64)hw-adapter-max_frame_size)) * 8 * 1000;
 + lat_ns = ((u64)rxa * 1024 -
 +   (2 * (u64)hw-adapter-max_frame_size)) * 8 * 1000;
   if (lat_ns  0)
   lat_ns = 0;

The above change cannot be correct.
You should be getting another error for testing an unsigned value be less than 
0.

So I presume this wasn't even tested.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3 2/2] powerpc: add support for csum_add()

2015-05-27 Thread David Laight
From: Scott Wood
 Sent: 26 May 2015 20:43
...
  I was thinking of all 64bit targets, not 32bit ones.
 
 Oh, you mean move it out of arch/powerpc?  Sounds reasonable, but
 someone should probably check what the resulting code looks like on
 other common arches.  OTOH, if we're going to modify non-arch code,
 that might be a good opportunity to implement Segher's suggestion and
 move to a 64-bit accumulator.

Or more likely: adding alternate 32bit words to different 64-bit
registers in order to reduce the dependency chains further.
I'm sure I've seen a version that does that somewhere.

David



RE: [PATCH] net: tcp: Fix a PTO timing granularity issue

2015-05-27 Thread David Laight
From: Of Ido Yariv
 Sent: 26 May 2015 21:17
 The Tail Loss Probe RFC specifies that the PTO value should be set to
 max(2 * SRTT, 10ms), where SRTT is the smoothed round-trip time.
 
 The PTO value is converted to jiffies, so the timer may expire
 prematurely.
 
 This is especially problematic on systems in which HZ = 100, so work
 around this by setting the timeout to at least 2 jiffies on such
 systems.
 
 The 10ms figure was originally selected based on tests performed with
 the current implementation and HZ = 1000. Thus, leave the behavior on
 systems with HZ  100 unchanged.
 
 Signed-off-by: Ido Yariv idox.ya...@intel.com
 ---
  net/ipv4/tcp_output.c | 3 +++
  1 file changed, 3 insertions(+)
 
 diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
 index 534e5fd..5321df8 100644
 --- a/net/ipv4/tcp_output.c
 +++ b/net/ipv4/tcp_output.c
 @@ -2208,6 +2208,9 @@ bool tcp_schedule_loss_probe(struct sock *sk)
   timeout = max_t(u32, timeout,
   (rtt + (rtt  1) + TCP_DELACK_MAX));
   timeout = max_t(u32, timeout, msecs_to_jiffies(10));
 +#if HZ = 100
 + timeout = max_t(u32, timeout, 2);
 +#endif

Why not:
timeout = max_t(u32, timeout, max_t(u32, 2, msecs_to_jiffies(10)));
I think the RH max_t() is a compile time constant.

You need 2 jiffies to guarantee a non-zero timeout.
Even if HZ=199 with a 'rounding down' msecs_to_jiffies() you get 1 jiffy
and a possible immediate timeout.

There are probably other places where msecs_to_jiffies() better not return 1.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] net: tcp: Fix a PTO timing granularity issue

2015-05-28 Thread David Laight
From: Ido Yariv
 Sent: 28 May 2015 05:37
...
 +/* Convert msecs to jiffies, ensuring that the return value is at least 2
 + * jiffies.
 + * This can be used when setting tick-based timers to guarantee that they 
 won't
 + * expire right away.
 + */
 +static inline unsigned long tcp_safe_msecs_to_jiffies(const unsigned int m)

I don't like using 'safe' in function names, being 'safe; depends on what
the caller wants.
Maybe tcp_msecs_to_jiffies_min_2() would be better.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/2] usb: chipidea: Reduce ULPI PHY reset pulse to datasheet spec of 1us

2015-07-02 Thread David Laight
From: Peter Chen
 Sent: 30 June 2015 03:06
 On Fri, Jun 26, 2015 at 03:47:03PM +0200, Mike Looijmans wrote:
  The datasheet for the 334x PHY mentions that a reset can be performed:
  ... by bringing the pin low for a minimum of 1 microsecond and
  then high.
  A delay of 5ms to implement that seems overly long, so reduce it to
  just 1us.
  As for the delay after reset, the datasheet only mentioned that the
  chip will assert the DIR output. 1ms seems like a safe time to wait
  for that to happen, so no change there.
 
  Signed-off-by: Mike Looijmans mike.looijm...@topic.nl
  ---
   drivers/usb/chipidea/core.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
 
  diff --git a/drivers/usb/chipidea/core.c b/drivers/usb/chipidea/core.c
  index e970863..c865abe 100644
  --- a/drivers/usb/chipidea/core.c
  +++ b/drivers/usb/chipidea/core.c
  @@ -664,7 +664,7 @@ static int ci_hdrc_create_ulpi_phy(struct device *dev, 
  struct ci_hdrc *ci)
  dev_err(dev, Failed to request ULPI reset gpio: %d\n, 
  ret);
  return ret;
  }
  -   msleep(5);
  +   udelay(1);

If the spec says 1us I'd delay for longer just to make sure.
And add a comment saying that the reset needs to be 1us long.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3 2/2] powerpc: add support for csum_add()

2015-05-22 Thread David Laight
From: Linuxppc-dev Christophe Leroy
 Sent: 19 May 2015 16:19
...
 diff --git a/arch/powerpc/include/asm/checksum.h 
 b/arch/powerpc/include/asm/checksum.h
 index 5e43d2d..e8d9ef4 100644
 --- a/arch/powerpc/include/asm/checksum.h
 +++ b/arch/powerpc/include/asm/checksum.h
 @@ -130,6 +130,22 @@ static inline __sum16 csum_tcpudp_magic(__be32 saddr, 
 __be32 daddr,
   return csum_fold(csum_tcpudp_nofold(saddr, daddr, len, proto, sum));
  }
 
 +#define HAVE_ARCH_CSUM_ADD
 +static inline __wsum csum_add(__wsum csum, __wsum addend)
 +{
 +#ifdef __powerpc64__
 + u64 res = (__force u64)csum;
 +
 + res += (__force u64)addend;
 + return (__force __wsum)((u32)res + (res  32));
 +#else
 + asm(addc %0,%0,%1;
 + addze %0,%0;
 + : +r (csum) : r (addend));
 + return csum;
 +#endif

I'd have thought it better to test for the cpu type where you want the
'asm' variant, and then fall back on the C version for all others.
I know (well suspect) there are only two cases here.

I'd also have thought that the 64bit C version above would be generally 'good'.

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH v3 2/2] powerpc: add support for csum_add()

2015-05-26 Thread David Laight
From: Scott Wood ...
  I'd also have thought that the 64bit C version above would be generally 
  'good'.
 
 It doesn't generate the addc/addze sequence.  At least with GCC 4.8.2,
 it does something like:
 
   mr  tmp0, csum
   li  tmp1, 0
   li  tmp2, 0
   addctmp3, addend, tmp0
   addecsum, tmp2, tmp1
   add csum, csum, tmp3

I was thinking of all 64bit targets, not 32bit ones.

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH v2 3/3] power: wm831x_power: Support USB charger current limit management

2015-08-20 Thread David Laight
From: Baolin Wang
 Sent: 14 August 2015 10:48
 +/* In miliamps */

Spelling police: milliamps

 +static unsigned int wm831x_usb_limits[] = {
 + 0,
 + 2,
 + 100,
 + 500,
 + 900,
 + 1500,
 + 1800,
 + 550,
 +};

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Re: [PATCH 4.2-rc5] workqueue: Make flush_workqueue() available again to non GPL modules

2015-08-20 Thread David Laight
From: David Laight
 Sent: 05 August 2015 10:52
 To: linux-kernel@vger.kernel.org
 Subject: Re: [PATCH 4.2-rc5] workqueue: Make flush_workqueue() available 
 again to non GPL modules
 
  From:   Tejun Heo
  Date:   2015-08-04 18:05:20
  On Tue, Aug 04, 2015 at 11:26:04AM -0600, tim.gard...@canonical.com wrote:
   From: Tim Gardner tim.gard...@canonical.com
  
   Commit 37b1ef31a568fc02e53587620226e5f3c66454c8 (workqueue: move
   flush_scheduled_work() to workqueue.h) moved the exported non GPL
   flush_scheduled_work() from a function to an inline wrapper.
   Unfortunately, it directly calls flush_workqueue() which is a GPL 
   function.
   This has the effect of changing the licensing requirement for this 
   function
   and makes it unavailable to non GPL modules.
  
   See commit ad7b1f841f8a54c6d61ff181451f55b68175e15a (workqueue: Make
   schedule_work() available again to non GPL modules) for precedent.
  
   Cc: Tejun Heo t...@kernel.org
   Signed-off-by: Tim Gardner tim.gard...@canonical.com
 
  Applied to wq/for-4.3.
 
 I hit this yesterday afternoon as well!
 
 Can we get the fix into 4.2 itself ?

This doesn't seem to be in 4.2 yet.
It is an interface regression that will cause grief.

David
(I'm not subscribed to linux-kernel, so copy me in any responses.)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] usb: gadget: f_uac1: Convert use of __constant_cpu_to_le16 to cpu_to_le16

2015-08-20 Thread David Laight
From: Vaishali Thakkar
 Sent: 19 August 2015 06:31
 To: Felipe Balbi
 Cc: Greg Kroah-Hartman; linux-...@vger.kernel.org; 
 linux-kernel@vger.kernel.org
 Subject: [PATCH] usb: gadget: f_uac1: Convert use of __constant_cpu_to_le16 
 to cpu_to_le16
 
 In big endian cases, macro cpu_to_le16 unfolds to __swab16 which
 provides special case for constants. In little endian cases,
 __constant_cpu_to_le16 and cpu_to_le16 expand directly to the
 same expression. So, replace __constant_cpu_to_le16 with
 cpu_to_le16 with the goal of getting rid of the definition of
 __constant_cpu_to_le16 completely.
 
 The semantic patch that performs this transformation is as follows:
 
 @@expression x;@@
 
 - __constant_cpu_to_le16(x)
 + cpu_to_le16(x)
 
 Signed-off-by: Vaishali Thakkar vthakkar1...@gmail.com
 ---
  drivers/usb/gadget/function/f_uac1.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/usb/gadget/function/f_uac1.c 
 b/drivers/usb/gadget/function/f_uac1.c
 index 7856b33..5aa8d8a 100644
 --- a/drivers/usb/gadget/function/f_uac1.c
 +++ b/drivers/usb/gadget/function/f_uac1.c
 @@ -57,8 +57,8 @@ static struct uac1_ac_header_descriptor_1 ac_header_desc = {
   .bLength =  UAC_DT_AC_HEADER_LENGTH,
   .bDescriptorType =  USB_DT_CS_INTERFACE,
   .bDescriptorSubtype =   UAC_HEADER,
 - .bcdADC =   __constant_cpu_to_le16(0x0100),
 - .wTotalLength = __constant_cpu_to_le16(UAC_DT_TOTAL_LENGTH),
 + .bcdADC =   cpu_to_le16(0x0100),
 + .wTotalLength = cpu_to_le16(UAC_DT_TOTAL_LENGTH),

Have you test compiled this on a big-endian system?
My gut feeling is that is fails.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] usb: gadget: f_uac1: Convert use of __constant_cpu_to_le16 to cpu_to_le16

2015-08-24 Thread David Laight
From: Vaishali Thakkar [mailto:vthakkar1...@gmail.com]
 Sent: 22 August 2015 02:57
...
  - .bcdADC =   __constant_cpu_to_le16(0x0100),
  - .wTotalLength = __constant_cpu_to_le16(UAC_DT_TOTAL_LENGTH),
  + .bcdADC =   cpu_to_le16(0x0100),
  + .wTotalLength = cpu_to_le16(UAC_DT_TOTAL_LENGTH),
 
  Have you test compiled this on a big-endian system?
  My gut feeling is that is fails.
 
 No. I have tested it on little-endian system only. But I'll
 be really surprised if this will fail. Can you please tell me
 if I am missing something in this particular case or same
 applies for other cases because most of the cases like
 __constant_foo are already converted to foo?
 
 As far as I know, if the argument is a constant the
 conversion happens at compile time. And unfolding both
 definitions returns to same expression. Still I am trying if
 someone can test it for me on big endian system.

Flip one to cpu_to_be16() and see if it still compiles.

Static initialisers and case labels can be expressions, but the
expression itself must only contain constants.
So it needs to be constant regardless of the value of any constants.
If it contains 'a ? t : f' then both 't' and 'f' must be constant.

In code, if 'a' is constant the optimiser discards one of 't' or 'f'.
I'm not sure what happens for non-static initialisers (they generate
odd code at the best of times).

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH v5 02/46] usb: gadget: add endpoint capabilities flags

2015-07-31 Thread David Laight
From: Robert Baldyga
 Sent: 31 July 2015 15:00
 Introduce struct usb_ep_caps which contains information about capabilities
 of usb endpoints - supported transfer types and directions. This structure
 should be filled by UDC driver for each of its endpoints, and will be
 used in epautoconf in new ep matching mechanism which will replace ugly
 guessing of endpoint capabilities basing on its name.
 
 Signed-off-by: Robert Baldyga r.bald...@samsung.com
 ---
  include/linux/usb/gadget.h | 21 +
  1 file changed, 21 insertions(+)
 
 diff --git a/include/linux/usb/gadget.h b/include/linux/usb/gadget.h
 index 68fb5e8..a9a4959 100644
 --- a/include/linux/usb/gadget.h
 +++ b/include/linux/usb/gadget.h
 @@ -141,10 +141,29 @@ struct usb_ep_ops {
  };
...
 +struct usb_ep_caps {
 + unsigned type_control:1;
 + unsigned type_iso:1;
 + unsigned type_bulk:1;
 + unsigned type_int:1;
 + unsigned dir_in:1;
 + unsigned dir_out:1;
 +};

With the way this is used (eg below from 13/46)

+
+   if (i == 0) {
+   ep-ep.caps.type_control = true;
+   } else {
+   ep-ep.caps.type_iso = true;
+   ep-ep.caps.type_bulk = true;
+   ep-ep.caps.type_int = true;
+   }
+
+   ep-ep.caps.dir_in = true;
+   ep-ep.caps.dir_out = true;

I think it would be more obvious if you used a u8 and explicit bitmasks.
The initialisation (as above) would the be explicitly assigning 'not supported'
to the other fields.
The compiler will also generate much better code...

David



RE: [PATCH 08/15] drivers: net: Drop unlikely before IS_ERR(_OR_NULL)

2015-07-31 Thread David Laight
From: Murali Karicheri
 Sent: 31 July 2015 16:04
 On 07/31/2015 04:38 AM, Viresh Kumar wrote:
  IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there
  is no need to do that again from its callers. Drop it.
 
 
 IS_ERR_OR_NULL() is defined as
 
 static inline bool __must_check IS_ERR_OR_NULL(__force const void *ptr)
 {
  return !ptr || IS_ERR_VALUE((unsigned long)ptr);
 }
 
 So the unlikely() applies only to second part. Wouldn't that be a
 problem for optimization?

Ugg...

The unlikely() in IS_ERR_VALUE() is likely to stop the compiler
doing a single 'window' comparison for the range [-MAX_ERROR .. 0].
So you are likely to end up with 2 comparisons.
I suspect that:

return IS_ERR_VALUE((unsigned long)ptr - 1);

would be a much better test.
(Ignoring the off-by-one for the highest error.)

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4.2-rc5] workqueue: Make flush_workqueue() available again to non GPL modules

2015-08-05 Thread David Laight
 From:   Tejun Heo 
 Date:   2015-08-04 18:05:20
 On Tue, Aug 04, 2015 at 11:26:04AM -0600, tim.gard...@canonical.com wrote:
  From: Tim Gardner tim.gard...@canonical.com
 
  Commit 37b1ef31a568fc02e53587620226e5f3c66454c8 (workqueue: move
  flush_scheduled_work() to workqueue.h) moved the exported non GPL
  flush_scheduled_work() from a function to an inline wrapper.
  Unfortunately, it directly calls flush_workqueue() which is a GPL function.
  This has the effect of changing the licensing requirement for this function
  and makes it unavailable to non GPL modules.
 
  See commit ad7b1f841f8a54c6d61ff181451f55b68175e15a (workqueue: Make
  schedule_work() available again to non GPL modules) for precedent.
 
  Cc: Tejun Heo t...@kernel.org
  Signed-off-by: Tim Gardner tim.gard...@canonical.com

 Applied to wq/for-4.3.

I hit this yesterday afternoon as well!

Can we get the fix into 4.2 itself ?

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] powerpc/hvsi: Fix endianness issues in the HVSI driver

2015-08-03 Thread David Laight
From: Laurent Dufour
 Sent: 31 July 2015 10:30
 This patch fixes several endianness issues detected when running the HVSI
 driver in little endian mode.
 
 These issues are raised in little endian mode because the data exchanged in
 memory between the kernel and the hypervisor has to be in big endian
 format.
...
 diff --git a/drivers/tty/hvc/hvsi.c b/drivers/tty/hvc/hvsi.c
 index 41901997c0d6..a75146f600cb 100644
 --- a/drivers/tty/hvc/hvsi.c
 +++ b/drivers/tty/hvc/hvsi.c
 @@ -240,9 +240,9 @@ static void hvsi_recv_control(struct hvsi_struct *hp, 
 uint8_t *packet,
  {
   struct hvsi_control *header = (struct hvsi_control *)packet;
 
 - switch (header-verb) {
 + switch (be16_to_cpu(header-verb)) {
   case VSV_MODEM_CTL_UPDATE:
 - if ((header-word  HVSI_TSCD) == 0) {
 + if ((be32_to_cpu(header-word)  HVSI_TSCD) == 0) {

It is generally best to byteswap constants.

David



RE: [RFC] bmac:change to use bitrev8() generic function

2015-08-10 Thread David Laight
 From: Tobias Klauser
 Sent: 10 August 2015 12:49
 On 2015-08-10 at 11:53:41 +0200, yalin wang yalin.wang2...@gmail.com wrote:
  This change to use generic bitrev8() for bmac driver.
...
  @@ -871,7 +860,7 @@ bmac_addhash(struct bmac_data *bp, unsigned char *addr)
 
  if (!(*addr)) return;
  crc = bmac_crc((unsigned short *)addr)  0x3f; /* Big-endian alert! */

Why not *((u8 *)addr + 1)  0x3f

  -   crc = reverse6[crc];/* Hyperfast bit-reversing algorithm */
  +   crc = bitrev8((u8)crc);
 
 No, this won't work. reverse6 works on 6 bit values, while bitrev8 works
 on 8bit values, e.g. reverse6[1] = 0x20, while bitrev8(1) = 0x80

You could use bitrev8(n)  2.

But that is a 'strange' map of a 7-bit value to a 6-bit one.

I thought it was more common for ethernet hardware to use the
value of the crc register after all 6 bytes of the mac address
had been processed.

David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2] barriers: introduce smp_mb__release_acquire and update documentation

2015-10-21 Thread David Laight
From: Paul E. McKenney
> Sent: 21 October 2015 00:35
...
> There is also the question of whether the barrier forces ordering
> of unrelated stores, everything initially zero and all accesses
> READ_ONCE() or WRITE_ONCE():
> 
>   P0  P1  P2  P3
>   X = 1;  Y = 1;  r1 = X; r3 = Y;
>   some_barrier(); some_barrier();
>   r2 = Y; r4 = X;
> 
> P2's and P3's ordering could be globally visible without requiring
> P0's and P1's independent stores to be ordered, for example, if you
> used smp_rmb() for some_barrier().  In contrast, if we used smp_mb()
> for barrier, everyone would agree on the order of P0's and P0's stores.
> 
> There are actually a fair number of different combinations of
> aspects of memory ordering.  We will need to choose wisely.  ;-)

My thoughts on this are that most code probably isn't performance critical
enough to be using anything other than normal locks for inter-cpu
synchronisation.
Certainly most people are likely to get it wrong somewhere.
So you want a big red sticker saying 'Don't try to be too clever'.

Also without examples of why things go wrong (eg member_consumer()
and alpha) it is difficult to understand the differences between
all the barriers (etc).

OTOH device driver code may need things slightly stronger than
barrier() (which I think is asm(:::"memory")) to sequence accesses
to hardware devices (and memory the hardware reads), but without
having a strong barrier in every ioread/write() access.

David



RE: [PATCH] of: Print rather than WARN'ing when overlap check fails

2015-11-10 Thread David Laight
From: Michael Ellerman
> Sent: 10 November 2015 05:09
> __rmem_check_for_overlap() is called very early in boot, and on some
> powerpc systems it's not safe to call WARN that early in boot.
> 
> If the overlap check fails the system will oops instead of printing a
> warning. Furthermore because it's so early in boot the console is not up
> and the user doesn't see the oops, they just get a dead system.

Wouldn't it be better to add the required checks to WARN()?
That would stop the same problem happening elsewhere.

David



RE: irq_fpu_usable() is false in ndo_start_xmit() for UDP packets

2015-11-17 Thread David Laight
From: David Miller
> Sent: 16 November 2015 20:32
> From: "Jason A. Donenfeld" 
> Date: Mon, 16 Nov 2015 20:52:28 +0100
> 
> > This works fine with, say, iperf3 in TCP mode. The AVX performance
> > is great. However, when using iperf3 in UDP mode, irq_fpu_usable()
> > is mostly false! I added a dump_stack() call to see why, except
> > nothing looks strange; the initial call in the stack trace is
> > entry_SYSCALL_64_fastpath. Why would irq_fpu_usable() return false
> > when we're in a syscall? Doesn't that mean this is in process
> > context?
> 
> Network device driver transmit executes with software interrupts
> disabled.
> 
> Therefore on x86, you cannot use the FPU.

I had some thoughts about driver access to AVX instructions when
I was adding AVX support to NetBSD.

The fpu state is almost certainly 'lazy switched' this means that
the fpu registers can contain data for a process that is currently
running on a different cpu.
At any time the other cpu might request (by IPI) that they be flushed
to the process data area so that it can reload them.
Kernel code could request them be flushed, but that can only happen once.
If a nested function requires them it would have to supply a local
save area. But the full save area is too big to go on stack.
Not only that, the save and restore instructions are slow.

It is also worth noting that all the AVX registers are callee saved.
This means that the syscall entry need not preserve them, instead
it can mark that they will be 'restored as zero'. However this
isn't true of any other kernel entry points.

Back to my thoughts...

Kernel code is likely to want to use special SSE/AVX instructions (eg
the crc and crypto ones) rather than generic FP calculations.
As such just saving a two of three of AVX registers would suffice.
This could be done using a small on-stack save structure that
can be referenced from the process's save area so that any IPI
can copy over the correct values after saving the full state.

This would allow kernel code (including interrupts) to execute
some AVX instructions without having to save the entire cpu
extended state anywhere.

I suspect there is a big hole in the above though...

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH RFC net-next 2/2] tcp: Add Redundant Data Bundling (RDB)

2015-11-02 Thread David Laight
From: Bendik Rønning Opstad
> Sent: 23 October 2015 21:50
> RDB is a mechanism that enables a TCP sender to bundle redundant
> (already sent) data with TCP packets containing new data. By bundling
> (retransmitting) already sent data with each TCP packet containing new
> data, the connection will be more resistant to sporadic packet loss
> which reduces the application layer latency significantly in congested
> scenarios.

What sort of traffic flows do you expect this to help?

An ssh (or similar) connection will get additional data to send,
but that sort of data flow needs Nagle in order to reduce the
number of packets sent.
OTOH it might benefit from including unacked data if the Nagle
timer expires.
Being able to set the Nagle timer on a per-connection basis
(or maybe using something based on the RTT instead of 2 secs)
might make packet loss less problematic.

Data flows that already have Nagle disabled (probably anything that
isn't command-response and isn't unidirectional bulk data) are
likely to generate a lot of packets within the RTT.
Resending unacked data will just eat into available network bandwidth
and could easily make any congestion worse.

I think that means you shouldn't resend data more than once, and/or
should make sure that the resent data isn't a significant overhead
on the packet being sent.

David




RE: [PATCH RFC net-next 2/2] tcp: Add Redundant Data Bundling (RDB)

2015-11-02 Thread David Laight
From: Bendik Rønning Opstad
> Sent: 29 October 2015 22:54
...
> > > > The semantics of the tp->nonagle bits are already a bit complex. My
> > > > sense is that having a setsockopt of TCP_RDB transparently modify the
> > > > nagle behavior is going to add more extra complexity and unanticipated
> > > > behavior than is warranted given the slight possible gain in
> > > > convenience to the app writer. What about a model where the
> > > > application user just needs to remember to call
> > > > setsockopt(TCP_NODELAY) if they want the TCP_RDB behavior to be
> > > > sensible? I see your nice tests at
> > > >
> > > >   https://github.com/bendikro/packetdrill/commit/9916b6c53e33dd04329d29b
> > > >   7d8baf703b2c2ac1b> >
> > > > are already doing that. And my sense is that likewise most
> > > > well-engineered "thin stream" apps will already be using
> > > > setsockopt(TCP_NODELAY). Is that workable?
> 
> This is definitely workable. I agree that it may not be an ideal solution to
> have TCP_RDB disable Nagle, however, it would be useful with a way to easily
> enable RDB and disable Nagle.

If enabling RDB disables Nagle, then what happens when you turn RDB back off?

David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [GIT] Networking

2015-11-06 Thread David Laight
> From: Linus Torvalds
> Sent: 03 November 2015 20:45
> On Tue, Nov 3, 2015 at 12:05 PM, Linus Torvalds
>  wrote:
> >  result = add_overflow(
> > mul_overflow(sec, SEC_CONVERSION, ),
> > mul_overflow(nsec, NSEC_CONVERSION, ),
> > );
> >
> >  return overflow ? MAX_JIFFIES : result;
> 
> Thinking more about this example, I think the gcc interface for
> multiplication overflow is fine.
> 
> It would end up something like
> 
> if (mul_overflow(sec, SEC_CONVERSION, ))
> return MAX_JIFFY_OFFSET;
> if (mul_overflow(nsec, NSEC_CONVERSION, ))
> return MAX_JIFFY_OFFSET;
> sum = sec + nsec;
> if (sum < sec || sum > MAX_JIFFY_OFFSET)
> return MAX_JIFFY_OFFSET;
> return sum;
> 
> and that doesn't look horribly ugly to me.

If mul_overflow() is a real function you've just forced some of the
values out to memory, generating a 'clobber' for all memory
(unless 'strict-aliasing' is enabled) and making a mess of other
optimisations.
(If it is a static inline that might not happen.)

If you assume that no one is stupid enough to multiply very large
values by 1 and not get an error you could have mul_overflow()
return the largest prime if the multiply overflowed.

David



RE: [PATCH] net: hix5hd2_gmac: avoid integer overload warning

2015-10-16 Thread David Laight
From: Arnd Bergmann
> Sent: 16 October 2015 11:01
> BITS_RX_EN is an 'unsigned long' constant, so the ones complement of that
> has bits set that do not fit into a 32-bit variable on 64-bit architectures,
> which causes a harmless gcc warning:
...
>  static void hix5hd2_port_disable(struct hix5hd2_priv *priv)
>  {
> - writel_relaxed(~(BITS_RX_EN | BITS_TX_EN), priv->base + PORT_EN);
> + writel_relaxed(~(u32)(BITS_RX_EN | BITS_TX_EN), priv->base + PORT_EN);
>   writel_relaxed(0, priv->base + DESC_WR_RD_ENA);

ISTM that just means that the constants shouldn't be 'long'.

David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: next build: 235 warnings 3 failures (next/next-20151117)

2015-11-18 Thread David Laight
From: Will Deacon
> Sent: 18 November 2015 12:28
> On Wed, Nov 18, 2015 at 12:11:25PM +, David Laight wrote:
> > From: Will Deacon
> > > Sent: 18 November 2015 10:14
> > > On Tue, Nov 17, 2015 at 08:17:17PM +0100, Arnd Bergmann wrote:
> > > > On Tuesday 17 November 2015 17:12:37 Will Deacon wrote:
> > > > > On Tue, Nov 17, 2015 at 06:03:40PM +0100, Arnd Bergmann wrote:
> > > > > > On Tuesday 17 November 2015 16:44:53 Will Deacon wrote:
> > > > > > > > 8<
> > > > > > > > Subject: ARM64: make smp_load_acquire() work with const 
> > > > > > > > arguments
> > > > > > > >
> > > > > > > > smp_load_acquire() uses typeof() to declare a local variable 
> > > > > > > > for temporarily
> > > > > > > > storing the output of the memory access. This fails when the 
> > > > > > > > argument is
> > > > > > > > constant, because the assembler complains about using a 
> > > > > > > > constant register
> > > > > > > > as output:
> > > > > > > >
> > > > > > > >  arch/arm64/include/asm/barrier.h:71:3: error: read-only 
> > > > > > > > variable '___p1'
> > > > > > > >  used as 'asm' output
> > > > > > >
> > > > > > > Do you know the usage in the kernel causing this warning?
> > > > > >
> > > > > > A newly introduced function in include/net/sock.h:
> > > > > >
> > > > > > static inline int sk_state_load(const struct sock *sk)
> > > > > > {
> > > > > > return smp_load_acquire(>sk_state);
> > > > > > }
> > > > >
> > > > > Hmm, maybe we could play a similar trick to READ_ONCE by declaring an
> > > > > anonymous union and writing through the non-const member?
> > > >
> > > > Yes, I think that would work, if you think we need to care about the
> > > > case where we read into a structure.
> > > >
> > > > Can you come up with a patch for that?
> > >
> > > Done:
> > >
> > >   
> > > http://lists.infradead.org/pipermail/linux-arm-kernel/2015-November/386094.html
> >
> > That patch forces a memory write-read and returns uninitialised stack
> > for short reads.
> 
> Really? The disassembly looks fine to me. Do you have a concrete example
> of where you think it goes wrong, please?
> 
> > Who knows what happens on big-endian systems.
> 
> The same thing as READ_ONCE? I'll test it there to make sure, but I
> don't see a problem.

Ah, god, it is absolutely horrid. But probably right :-(

Do all the lda variants zero extend to 64 bits ?
If so maybe you could use a single 64 bit variable for the result of the read
and then cast it to typeof(*p) to get the required sign extension for
small integer types.

David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: next build: 235 warnings 3 failures (next/next-20151117)

2015-11-18 Thread David Laight
From: Will Deacon [mailto:will.dea...@arm.com]
> Sent: 18 November 2015 15:37
> On Wed, Nov 18, 2015 at 03:21:19PM +, David Laight wrote:
> > From: Will Deacon
> > > Sent: 18 November 2015 12:28
> > > On Wed, Nov 18, 2015 at 12:11:25PM +, David Laight wrote:
> > > > From: Will Deacon
> > > > >   
> > > > > http://lists.infradead.org/pipermail/linux-arm-kernel/2015-November/386094.html
> > > >
> > > > That patch forces a memory write-read and returns uninitialised stack
> > > > for short reads.
> > >
> > > Really? The disassembly looks fine to me. Do you have a concrete example
> > > of where you think it goes wrong, please?
> > >
> > > > Who knows what happens on big-endian systems.
> > >
> > > The same thing as READ_ONCE? I'll test it there to make sure, but I
> > > don't see a problem.
> >
> > Ah, god, it is absolutely horrid. But probably right :-(
> 
> Yeah, I wasn't pretending it was nice :) FWIW, I've given it a reasonable
> testing in both little-endian and big-endian configurations and it seems
> to be happy.

I was missing the fact that the *(int_type *) is always reading
the full union.
The next version of the compiler might decide to barf at the code
that appears to be reading beyond the end of the union.

> > Do all the lda variants zero extend to 64 bits ?
> 
> Yes.
> 
> > If so maybe you could use a single 64 bit variable for the result of the 
> > read
> > and then cast it to typeof(*p) to get the required sign extension for
> > small integer types.
> 
> That was the original proposal from Arnd, but I want this to work with
> structures smaller than 64-bit (e.g. arch_spinlock_t), so that's why
> I decided to follow the approach laid down by READ_ONCE.

That would still be ok.
You'd have something that is effectively:
_u64 val = *p;
return typeof(*p)val;
The compiler might mask unsigned values - but it may be able to
determine it isn't needed (which is probably true of your version).
For signed types both versions require the compile sign-extend
the value.

David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [GIT] Networking

2015-09-04 Thread David Laight
> I find them useful as syntactic sugar. We have not used them a lot, but there 
> are cases in our crypto
> handling code where we have fixed size array inputs/outputs and there we 
> opted to use them. They make
> it easy to remember what the expected sizes of input and output are without 
> having to read through the
> implementation (of course we never even tried to use sizeof on these 
> pointers).
> 
> static int smp_ah(struct crypto_blkcipher *tfm, const u8 irk[16],
>   const u8 r[3], u8 res[3])

Expect that it looks like you are passing arrays by value,
but instead you are passing by reference.

Explicitly pass by reference and sizeof works.
The object code will be the same.

David


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [GIT] Networking

2015-09-07 Thread David Laight
From: Rustad, Mark D
...
> >> static int smp_ah(struct crypto_blkcipher *tfm, const u8 irk[16],
> >>  const u8 r[3], u8 res[3])
> >
> > Expect that it looks like you are passing arrays by value,
> > but instead you are passing by reference.
> >
> > Explicitly pass by reference and sizeof works.
> 
> It depends on what you mean by works. It at least doesn't look so misleading 
> when passing by reference
> and so works more as expected. The sizeof in either case will never return 
> the size of the array. To
> have sizeof return the size of the array would require a typedef of the array 
> to pass by reference. In
> some cases that could be the right thing to do.

Feed:
int bar(int (*f)[10]) { return sizeof *f; }
into cc -O2 -S and look at the generated code - returns 40 not 4.

David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] powerpc32: memcpy: only use dcbz once cache is enabled

2015-09-07 Thread David Laight
From: Christophe Leroy
> Sent: 07 September 2015 15:25
...
> diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S
> index 2ef50c6..05b3096 100644
> --- a/arch/powerpc/lib/copy_32.S
> +++ b/arch/powerpc/lib/copy_32.S
> @@ -172,7 +172,16 @@ _GLOBAL(memcpy)
>   mtctr   r0
>   beq 63f
>  53:
> - dcbzr11,r6
> + /*
> +  * During early init, cache might not be active yet, so dcbz cannot be
> +  * used. We put dcbt instead of dcbz. If cache is not active, it's just
> +  * like a not. If cache is active, at least it prefetchs the line to be
^^^ nop ??

David

> +  * overwritten.
> +  * Will be replaced by dcbz in machine_init()
> +  */
> +_GLOBAL(ppc32_memcpy_dcbz)
> + dcbtr11,r6
> +
>   COPY_16_BYTES
>  #if L1_CACHE_BYTES >= 32
>   COPY_16_BYTES
> --
> 2.1.0



RE: [PATCH] x86: Wire up 32-bit direct socket calls

2015-09-03 Thread David Laight
From: Peter Anvin
> Sent: 02 September 2015 21:16
> On 09/02/2015 02:48 AM, Geert Uytterhoeven wrote:
> >
> > Should all other architectures follow suit?
> > Or should we follow the s390 approach:
> >
> 
> It is up to the maintainer(s), largely dependent on how likely you are
> going to want to support this in your libc, but in general, socketcall
> is an abomination which there is no reason not to bypass.

The other (worse) abomination is the way SCTP overloads setsockopt()
to perform actions that change state.
Rather unfortunately that got documented in the protocol standard :-(

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] mwifiex: Make mwifiex_dbg a function, reduce object size

2015-09-01 Thread David Laight
From: Joe Perches
> Sent: 31 August 2015 18:47
> 
> The mwifiex_dbg macro has two tests that could be consolidated
> into a function reducing overall object size ~10KB (~4%).
> 
> So convert the macro into a function.

This looks like it will slow things down somewhat.
Maybe inline the tests and then do a call for the dev_info().
Or at least inline the test for the debug flag.
So you end up with something like:

#define mwifiex_dbg(adapter, dbg_mask, fmt, args...)\
do {\
if ((adapter)->debug_mask & MWIFIEX_DBG_##dbg_mask) \
_mwifiex_dbg(adapter, fmt, ##__VA_ARGS__)
} while (0)

David

> 
> $ size drivers/net/wireless/mwifiex/built-in.o* (x86-64 defconfig)
>text  data bss dec hex filename
>  233102  86284809  246539   3c30b 
> drivers/net/wireless/mwifiex/built-
> in.o.new
>  243949  86284809  257386   3ed6a 
> drivers/net/wireless/mwifiex/built-
> in.o.old
> 
> Signed-off-by: Joe Perches 
> ---
>  drivers/net/wireless/mwifiex/main.c | 20 
>  drivers/net/wireless/mwifiex/main.h | 17 -
>  2 files changed, 28 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/wireless/mwifiex/main.c 
> b/drivers/net/wireless/mwifiex/main.c
> index 278dc94..4fa8ca3 100644
> --- a/drivers/net/wireless/mwifiex/main.c
> +++ b/drivers/net/wireless/mwifiex/main.c
> @@ -1447,6 +1447,26 @@ exit_sem_err:
>  }
>  EXPORT_SYMBOL_GPL(mwifiex_remove_card);
> 
> +void _mwifiex_dbg(const struct mwifiex_adapter *adapter, int mask,
> +   const char *fmt, ...)
> +{
> + struct va_format vaf;
> + va_list args;
> +
> + if (!adapter->dev || !(adapter->debug_mask & mask))
> + return;
> +
> + va_start(args, fmt);
> +
> + vaf.fmt = fmt;
> + vaf.va = 
> +
> + dev_info(adapter->dev, "%pV", );
> +
> + va_end(args);
> +}
> +EXPORT_SYMBOL_GPL(_mwifiex_dbg);
> +
>  /*
>   * This function initializes the module.
>   *
> diff --git a/drivers/net/wireless/mwifiex/main.h 
> b/drivers/net/wireless/mwifiex/main.h
> index 6b95121..96663214 100644
> --- a/drivers/net/wireless/mwifiex/main.h
> +++ b/drivers/net/wireless/mwifiex/main.h
> @@ -48,6 +48,9 @@
> 
>  extern const char driver_version[];
> 
> +struct mwifiex_adapter;
> +struct mwifiex_private;
> +
>  enum {
>   MWIFIEX_ASYNC_CMD,
>   MWIFIEX_SYNC_CMD
> @@ -180,12 +183,11 @@ enum MWIFIEX_DEBUG_LEVEL {
>   MWIFIEX_DBG_FATAL | \
>   MWIFIEX_DBG_ERROR)
> 
> -#define mwifiex_dbg(adapter, dbg_mask, fmt, args...) \
> -do { \
> - if ((adapter)->debug_mask & MWIFIEX_DBG_##dbg_mask) \
> - if ((adapter)->dev) \
> - dev_info((adapter)->dev, fmt, ## args); \
> -} while (0)
> +__printf(3, 4)
> +void _mwifiex_dbg(const struct mwifiex_adapter *adapter, int mask,
> +   const char *fmt, ...);
> +#define mwifiex_dbg(adapter, mask, fmt, ...) \
> + _mwifiex_dbg(adapter, MWIFIEX_DBG_##mask, fmt, ##__VA_ARGS__)
> 
...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: First kernel patch (optimization)

2015-09-16 Thread David Laight
From: Austin S Hemmelgarn
> Sent: 16 September 2015 12:46
> On 2015-09-15 20:09, Steve Calfee wrote:
> > On Tue, Sep 15, 2015 at 12:53 PM, Eric Curtin  
> > wrote:
> >> Signed-off-by: Eric Curtin 
> >>
> >> diff --git a/tools/usb/usbip/src/usbip_detach.c 
> >> b/tools/usb/usbip/src/usbip_detach.c
> >> index 05c6d15..9db9d21 100644
> >> --- a/tools/usb/usbip/src/usbip_detach.c
> >> +++ b/tools/usb/usbip/src/usbip_detach.c
> >> @@ -47,7 +47,9 @@ static int detach_port(char *port)
> >>  uint8_t portnum;
> >>  char path[PATH_MAX+1];
> >>
> >> -   
> >> +   unsigned int port_len = strlen(port);
> >> +
> >> +   for (unsigned int i = 0; i < port_len; i++)
> >>  if (!isdigit(port[i])) {
> >>  err("invalid port %s", port);
> >>  return -1;
> >>
> >> --
> >
> > Hi Eric,
> >
> > This is fine, but what kind of wimpy compiler optimizer will not move
> > the constant initializer out of the loop? I bet if you compare binary
> > sizes/code it will be exactly the same, and you added some characters
> > of code. Reorganizing code for readability is fine, but for compiler
> > (in)efficiency seems like a bad idea.
> While I agree with your argument, I would like to point out that it is a
> well established fact that GCC's optimizers are kind of brain-dead at
> times and need their hands held.
> 
> I'd be willing to bet that the code will be marginally larger (because
> of adding another variable), but might run slightly faster too (because
> in my experience, GCC doesn't always catch things like this), and should
> compile a little faster (because the optimizers don't have to do as much
> work).

The compiler probably can't optimise the strlen().
If isdigit() is a real function (the locale specific one probably is)
then the compile cannot assume that port[n] isn't changed by the call
to isdigit.

A simpler change would be:
for (unsigned int i = 0; port[i] != 0; i++)

Much better would be to use strtoul() instead of atoi().

David



RE: tools: usbip: detach: avoid calling strlen() at each iteration

2015-09-16 Thread David Laight
From: Aaro Koskinen
> Sent: 15 September 2015 21:56
...
> > -   for (unsigned int i = 0; i < strlen(port); i++)
> > +   unsigned int port_len = strlen(port);
> > +
> > +   for (unsigned int i = 0; i < port_len; i++)
> 
> port is read only in this function, so maybe just use "const" and the
> compiler should know to do the same without adding a new variable?

While I've seen the compiler make the assumption, I'm not sure
it should assume that data that is 'const' in one function cannot
be modified by a called function.
(Unless the compiler has some way of knowing that the called function
cannot obtain a non-const pointer to the referenced data.)
(This is also independent of whether the const pointer is passed
to the function.) 

David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V

2015-09-16 Thread David Laight
From: Haiyang Zhang
> Sent: 16 September 2015 17:09
> > -Original Message-
> > From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
> > Sent: Wednesday, September 16, 2015 11:50 AM
> > To: net...@vger.kernel.org
> > Cc: David S. Miller ; linux-kernel@vger.kernel.org;
> > KY Srinivasan ; Haiyang Zhang
> > ; Jason Wang 
> > Subject: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
> >
> > Commit b08cc79155fc26d0d112b1470d1ece5034651a4b ("hv_netvsc: Eliminate
> >  memory allocation in the packet send path") introduced skb headroom
> > request for Hyper-V netvsc driver:
> >
> >max_needed_headroom = sizeof(struct hv_netvsc_packet) +
> >sizeof(struct rndis_message) +
> >NDIS_VLAN_PPI_SIZE + NDIS_CSUM_PPI_SIZE +
> >NDIS_LSO_PPI_SIZE + NDIS_HASH_PPI_SIZE;
> >...
> >net->needed_headroom = max_needed_headroom;
> >
> > max_needed_headroom is 220 bytes, it significantly exceeds the
> > LL_MAX_HEADER setting. This causes each skb to be cloned on send path,
> > e.g. for IPv4 case we fall into the following clause
> > (ip_finish_output2()):
> >
> > if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
> > ...
> > skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev));
> > ...
> > }
> >
> > leading to a significant performance regression. Increase LL_MAX_HEADER
> > to make it suitable for netvsc, make it 224 to be 16-aligned.
> > Alternatively we could (partially) revert the commit which introduced
> > skb
> > headroom request restoring manual memory allocation on transmit path.
> >
> > Signed-off-by: Vitaly Kuznetsov 
> > ---
> >  include/linux/netdevice.h | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 88a0069..7233790 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc)
> >   * used.
> >   */
> >
> > -#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
> > +#if IS_ENABLED(CONFIG_HYPERV_NET)
> > +# define LL_MAX_HEADER 224
> > +#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
> >  # if defined(CONFIG_MAC80211_MESH)
> >  #  define LL_MAX_HEADER 128
> >  # else
> 
> Thanks for the patch.
> To avoid we forget to update that 224 number when we add more things
> into netvsc header, I suggest that we define a macro in netdevice.h such
> as:
> #define HVNETVSC_MAX_HEADER 224
> #define LL_MAX_HEADER HVNETVSC_MAX_HEADER
> 
> And, put a note in netvsc code saying the header reservation shouldn't
> exceed HVNETVSC_MAX_HEADER, or you need to update HVNETVSC_MAX_HEADER.

Am I right in thinking this is adding an extra 96 unused bytes to the front
of almost all skb just so that hyper-v can make its link level header
contiguous with whatever follows (IP header ?).

Doesn't sound ideal.

David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: First kernel patch (optimization)

2015-09-17 Thread David Laight
From: Jaime Arrocha
> Sent: 17 September 2015 02:50
..
> One interesting observation I found was that in O0 and O2, it does make
> a call to strlen while in O1 it calculates
> the length of the string using:
> 

You want an 'xor %rcx,%rcx' here.
> repnz scas%es:(%rdi),%al
> not%rcx
> sub   $0x2,%rcx
> 
> Why does it do that? Is the code above faster? If yes, why not do it in
> O2 too?

Because 'repnz scasb' is slow, especially on some cpu types.
It may win for -Os on 32 bit systems.
Pentium 4 netburst have about 40 clocks setup for all the 'rep' instructions,
later cpus are better but you might still be talking double figures.
On 64 bit cpu there are much faster ways of detecting a zero byte in a
64 bit word by using shifts and masks - so the function call can be a win.

David



RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V

2015-09-17 Thread David Laight
From: KY Srinivasan
> Sent: 16 September 2015 23:58
...
> > I think we get that.  The question is does the Remote NDIS header and
> > packet info actually need to be a part of the header data?  I would
> > argue that it probably doesn't.
> >
> > So for example in netvsc_start_xmit it looks like you are calling
> > init_page_array in order to populate a set of page buffers, but the
> > first buffer for the Remote NDIS protocol is populated as a separate
> > page and offset.  As such it doesn't seem like it necessarily needs to
> > be a part of the header data but could be maintained perhaps in a
> > separate ring buffer, or perhaps just be a separate page that you break
> > up to use for each header.
> 
> You are right; the rndis header can be built as a separate fragment and sent.
> Indeed this is what we were doing earlier - on the outgoing path we would 
> allocate
> memory for the rndis header. My goal was to avoid this allocation on every 
> packet being
> sent and I decided to use the headroom instead. If we can completely avoid 
> all memory
> allocation for rndis header, it makes a significant perf difference:
...


So just preallocate the header space as a fixed buffer for each ring entry
(or tx frame).

If you allocate a fixed buffer for each ring entry you may find there are
performance gains from copying small fragments into the buffer instead
of doing whatever mapping operations are required.

David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH V4 1/2] ACPI / EC: Fix broken 64bit big-endian users of 'global_lock'

2015-09-28 Thread David Laight
From: James Bottomley 
> Sent: 28 September 2015 16:12
> > > > The x86 cpus will also do 32bit wide rmw cycles for the 'bit' 
> > > > operations.
> > >
> > > That's different: it's an atomic RMW operation.  The problem with the
> > > alpha was that the operation wasn't atomic (meaning that it can't be
> > > interrupted and no intermediate output states are visible).
> >
> > It is only atomic if prefixed by the 'lock' prefix.
> > Normally the read and write are separate bus cycles.
> 
> The essential point is that x86 has atomic bit ops and byte writes.
> Early alpha did not.

Early alpha didn't have any byte accesses.

On x86 if you have the following:
struct {
char  a;
volatile char b;
} *foo;
foo->a |= 4;

The compiler is likely to generate a 'bis #4, 0(rbx)' (or similar)
and the cpu will do two 32bit memory cycles that read and write
the 'volatile' field 'b'.
(gcc definitely used to do this...)

A lot of fields were made 32bit (and probably not bitfields) in the linux
kernel tree a year or two ago to avoid this very problem.

> > > > You still have to ensure the compiler doesn't do wider rmw cycles.
> > > > I believe the recent versions of gcc won't do wider accesses for 
> > > > volatile data.
> > >
> > > I don't understand this comment.  You seem to be implying gcc would do a
> > > 64 bit RMW for a 32 bit store ... that would be daft when a single
> > > instruction exists to perform the operation on all architectures.
> >
> > Read the object code and weep...
> > It is most likely to happen for operations that are rmw (eg bit set).
> > For instance the arm cpu has limited offsets for 16bit accesses, for
> > normal structures the compiler is likely to use a 32bit rmw sequence
> > for a 16bit field that has a large offset.
> > The C language allows the compiler to do it for any access (IIRC including
> > volatiles).
> 
> I think you might be confusing different things.  Most RISC CPUs can't
> do 32 bit store immediates because there aren't enough bits in their
> arsenal, so they tend to split 32 bit loads into a left and right part
> (first the top then the offset).  This (and other things) are mostly
> what you see in code.  However, 32 bit register stores are still atomic,
> which is all we require.  It's not really the compiler's fault, it's
> mostly an architectural limitation.

No, I'm not talking about how 32bit constants are generated.
I'm talking about structure offsets.

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH v2] net: sctp: Don't use 64 kilobyte lookup table for four elements

2015-09-28 Thread David Laight
From: Neil Horman
> Sent: 28 September 2015 14:51
> On Mon, Sep 28, 2015 at 02:34:04PM +0200, Denys Vlasenko wrote:
> > Seemingly innocuous sctp_trans_state_to_prio_map[] array
> > is way bigger than it looks, since
> > "[SCTP_UNKNOWN] = 2" expands into "[0x] = 2" !
> >
> > This patch replaces it with switch() statement.

What about just adding 1 (and masking) before indexing the array?
That might require a static inline function with a local static array.

Or define the array as (say) [16] and just mask the state before using
it as an index?

David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH V4 1/2] ACPI / EC: Fix broken 64bit big-endian users of 'global_lock'

2015-09-28 Thread David Laight
From: James Bottomley [mailto:james.bottom...@hansenpartnership.com]
> Sent: 28 September 2015 15:27
> On Mon, 2015-09-28 at 08:58 +, David Laight wrote:
> > From: Rafael J. Wysocki
> > > Sent: 27 September 2015 15:09
> > ...
> > > > > Say you have three adjacent fields in a structure, x, y, z, each one 
> > > > > byte long.
> > > > > Initially, all of them are equal to 0.
> > > > >
> > > > > CPU A writes 1 to x and CPU B writes 2 to y at the same time.
> > > > >
> > > > > What's the result?
> > > >
> > > > I think every CPU's  cache architecure guarantees adjacent store
> > > > integrity, even in the face of SMP, so it's x==1 and y==2.  If you're
> > > > thinking of old alpha SMP system where the lowest store width is 32 bits
> > > > and thus you have to do RMW to update a byte, this was usually fixed by
> > > > padding (assuming the structure is not packed).  However, it was such a
> > > > problem that even the later alpha chips had byte extensions.
> >
> > Does linux still support those old Alphas?
> >
> > The x86 cpus will also do 32bit wide rmw cycles for the 'bit' operations.
> 
> That's different: it's an atomic RMW operation.  The problem with the
> alpha was that the operation wasn't atomic (meaning that it can't be
> interrupted and no intermediate output states are visible).

It is only atomic if prefixed by the 'lock' prefix.
Normally the read and write are separate bus cycles.
 
> > You still have to ensure the compiler doesn't do wider rmw cycles.
> > I believe the recent versions of gcc won't do wider accesses for volatile 
> > data.
> 
> I don't understand this comment.  You seem to be implying gcc would do a
> 64 bit RMW for a 32 bit store ... that would be daft when a single
> instruction exists to perform the operation on all architectures.

Read the object code and weep...
It is most likely to happen for operations that are rmw (eg bit set).
For instance the arm cpu has limited offsets for 16bit accesses, for
normal structures the compiler is likely to use a 32bit rmw sequence
for a 16bit field that has a large offset.
The C language allows the compiler to do it for any access (IIRC including
volatiles).

David



RE: [PATCH v2] net: sctp: Don't use 64 kilobyte lookup table for four elements

2015-09-28 Thread David Laight
From: Eric Dumazet
> Sent: 28 September 2015 15:27
> On Mon, 2015-09-28 at 14:12 +, David Laight wrote:
> > From: Neil Horman
> > > Sent: 28 September 2015 14:51
> > > On Mon, Sep 28, 2015 at 02:34:04PM +0200, Denys Vlasenko wrote:
> > > > Seemingly innocuous sctp_trans_state_to_prio_map[] array
> > > > is way bigger than it looks, since
> > > > "[SCTP_UNKNOWN] = 2" expands into "[0x] = 2" !
> > > >
> > > > This patch replaces it with switch() statement.
> >
> > What about just adding 1 (and masking) before indexing the array?
> > That might require a static inline function with a local static array.
> >
> > Or define the array as (say) [16] and just mask the state before using
> > it as an index?
> 
> Just let the compiler do its job, instead of obfuscating source.
> 
> Compilers can transform a switch into an (optimal) table if it is really
> a gain.

The compiler can choose between a jump table and nested ifs for a switch
statement. I've never seen it convert one into a data array index.

David



RE: [PATCH v2 00/14] RDS: connection scalability and performance improvements

2015-10-01 Thread David Laight
From: Santosh Shilimkar
> Sent: 30 September 2015 18:24
...
> This is being addressed by simply using per bucket rw lock which makes the
> locking simple and very efficient. The hash table size is still an issue and
> I plan to address it by using re-sizable hash tables as suggested on the list.

If the hash chains are short do you need the expense of a rw lock
for each chain?
A simple spinlock may be faster.

If you use the hash chain lock for the reference count on the hashed
objects you should be able to release the lock before locking the
object itself.

David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH V4 1/2] ACPI / EC: Fix broken 64bit big-endian users of 'global_lock'

2015-09-28 Thread David Laight
From: Rafael J. Wysocki
> Sent: 27 September 2015 15:09
...
> > > Say you have three adjacent fields in a structure, x, y, z, each one byte 
> > > long.
> > > Initially, all of them are equal to 0.
> > >
> > > CPU A writes 1 to x and CPU B writes 2 to y at the same time.
> > >
> > > What's the result?
> >
> > I think every CPU's  cache architecure guarantees adjacent store
> > integrity, even in the face of SMP, so it's x==1 and y==2.  If you're
> > thinking of old alpha SMP system where the lowest store width is 32 bits
> > and thus you have to do RMW to update a byte, this was usually fixed by
> > padding (assuming the structure is not packed).  However, it was such a
> > problem that even the later alpha chips had byte extensions.

Does linux still support those old Alphas?

The x86 cpus will also do 32bit wide rmw cycles for the 'bit' operations.

> OK, thanks!

You still have to ensure the compiler doesn't do wider rmw cycles.
I believe the recent versions of gcc won't do wider accesses for volatile data.

David

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH v2 08/25] powerpc/8xx: Map IMMR area with 512k page at a fixed address

2015-09-24 Thread David Laight
From: Christophe Leroy
> Sent: 22 September 2015 17:51
...
> Traditionaly, each driver manages one computer board which has its
> own components with its own memory maps.
> But on embedded chips like the MPC8xx, the SOC has all registers
> located in the same IO area.
> 
> When looking at ioremaps done during startup, we see that
> many drivers are re-mapping small parts of the IMMR for their own use
> and all those small pieces gets their own 4k page, amplifying the
> number of TLB misses: in our system we get 0xff00 mapped 31 times
> and 0xff003000 mapped 9 times.

Isn't this a more general problem?

If there are multiple remap requests for the same physical page
shouldn't the kernel be just increasing a reference count somewhere
and returning address in the same virtual page?
This should probably happen regardless of the address.
I presume it must be done for cacheable mappings.

Whether things like the IMMR should be mapped with a larger TLB
is a separate matter.

David



RE: [PATCH v2 08/25] powerpc/8xx: Map IMMR area with 512k page at a fixed address

2015-09-25 Thread David Laight
From: Scott Wood
> Sent: 24 September 2015 21:14
> > Isn't this a more general problem?
> >
> > If there are multiple remap requests for the same physical page
> > shouldn't the kernel be just increasing a reference count somewhere
> > and returning address in the same virtual page?
> > This should probably happen regardless of the address.
> > I presume it must be done for cacheable mappings.
> 
> Why would you assume that?

Because 'really horrid (tm)' things happen on some cache
architectures if you map the same physical address to
multiple virtual addresses.

David



RE: [PATCH 05/15] RDS: increase size of hash-table to 8K

2015-09-21 Thread David Laight
From: Santosh Shilimkar
> Sent: 20 September 2015 00:05
> Even with per bucket locking scheme, in a massive parallel
> system with active rds sockets which could be in excess of multiple
> of 10K, rds_bin_lookup() workload is siginificant because of smaller
> hashtable size.
> 
> With some tests, it was found that we get modest but still nice
> reduction in rds_bind_lookup with bigger bucket.
> 
>   Hashtable   Baseline(1k)Delta
>   2048:   8.28%   -2.45%
>   4096:   8.28%   -4.60%
>   8192:   8.28%   -6.46%
>   16384:  8.28%   -6.75%
> 
> Based on the data, we set 8K as the bind hash-table size.

Can't you use of on the dynamically sizing hash tables?
8k hash table entries is OTT for a lot of systems.

David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2 3/5] printk/nmi: Try hard to print Oops message in NMI context

2015-12-07 Thread David Laight
From: Russell King - ARM Linux
> Sent: 04 December 2015 17:13
...
> I have a slightly different view...
> 
> > > I don't see bust_spinlocks() dealing with any of these locks, so IMHO
> > > trying to make this work in NMI context strikes me as making the
> > > existing solution more unreliable on ARM systems.
> >
> > bust_spinlocks() calls printk_nmi_flush() that would call printk()
> > that would zap "lockbuf_lock" and "console_sem" when in Oops and NMI.
> > Yes, there might be more locks blocked but we try to break at least
> > the first two walls. Also zapping is allowed only once per 30 seconds,
> > see zap_locks(). Why do you think that it might make things more
> > unreliable, please?
> 
> Take the scenario where CPU1 is in the middle of a printk(), and is
> holding its lock.
> 
> CPU0 comes along and decides to trigger a NMI backtrace.  This sends
> a NMI to CPU1, which takes it in the middle of the serial console
> output.
> 
> With the existing solution, the NMI output will be written to the
> temporary buffer, and CPU1 has finished handling the NMI it resumes
> the serial console output, eventually dropping the lock.  That then
> allows CPU0 to print the contents of all buffers, and we get NMI
> printk output.

Is the traceback from inside printk() or serial console code
likely to be useful?
If not then why not get the stacktrace generated when the relevant
lock is released?
That should save any faffing with a special buffer.

David



RE: [PATCH v2 8/8] treewide: Remove newlines inside DEFINE_PER_CPU() macros

2015-12-07 Thread David Laight
From: Michal Marek
> Sent: 04 December 2015 15:26
> Otherwise make tags can't parse them:
> 
> ctags: Warning: arch/ia64/kernel/smp.c:60: null expansion of name pattern "\1"
...

Seems to me you need to fix ctags.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2 5/5] printk/nmi: Increase the size of the temporary buffer

2015-12-02 Thread David Laight
From: yalin wang
> Sent: 30 November 2015 16:42
> > On Nov 27, 2015, at 19:09, Petr Mladek  wrote:
> >
> > Testing has shown that the backtrace sometimes does not fit
> > into the 4kB temporary buffer that is used in NMI context.
> >
> > The warnings are gone when I double the temporary buffer size.

You are wasting a lot of memory for something that is infrequently used.
There ought to be some way of copying partial tracebacks into the
main buffer.

David



RE: bpf: undefined shift in __bpf_prog_run

2015-12-07 Thread David Laight
From: Dmitry Vyukov
> Sent: 04 December 2015 19:49
...
> 3.4.3
> undefined behavior
> 1 behavior, upon use of a nonportable or erroneous program construct
> or of erroneous data, for which this International Standard imposes no
> requirements
> 2 NOTE Possible undefined behavior ranges from ignoring the situation
> completely with unpredictable results, to behaving during translation
> or program execution in a documented manner characteristic of the
> environment (with or without the issuance of a diagnostic message), to
> terminating a translation or execution

While 'undefined behaviour' is allowed to include 'firing an ICBM at
the current location of the person who wrote the code' it is very
unlikely to result in anything other than an unexpected value
and the compiler making false assumptions about the value.

eg the compiler can assume this is an infinite loop:
int i;
for (i = 0; i >= 0; i++)
...

David


RE: [PATCH net] ipv6: sctp: clone options to avoid use after free

2015-12-09 Thread David Laight
> SCTP is lacking proper np->opt cloning at accept() time.
> 
> TCP and DCCP use ipv6_dup_options() helper, do the same in SCTP.
> 
> We might later factorize this code in a common helper to avoid
> future mistakes.

I'm wondering what the real impact of this and the other recent
SCTP bugs/patches is on real workloads?
We have enough trouble getting our customers to use kernels
later that the 2.6.18 based RHEL5 - without having to persuade
them to use kernels that contain very recent fixes.

David



RE: [PATCH net] ipv6: sctp: clone options to avoid use after free

2015-12-09 Thread David Laight
From: Eric Dumazet [mailto:eric.duma...@gmail.com]
> Sent: 09 December 2015 16:00
> On Wed, 2015-12-09 at 15:49 +, David Laight wrote:
> > > SCTP is lacking proper np->opt cloning at accept() time.
> > >
> > > TCP and DCCP use ipv6_dup_options() helper, do the same in SCTP.
> > >
> > > We might later factorize this code in a common helper to avoid
> > > future mistakes.
> >
> > I'm wondering what the real impact of this and the other recent
> > SCTP bugs/patches is on real workloads?
> > We have enough trouble getting our customers to use kernels
> > later that the 2.6.18 based RHEL5 - without having to persuade
> > them to use kernels that contain very recent fixes.
> 
> It all depends if your customers let (hostile ?) people run programs on
> the boxes.

If they require hostile programs I'm not worried.

But it isn't entirely clear from these oops reports what the
test program is actually doing.
Some of them might be valid scenarios.
Not that our code does anything clever.

David



RE: rhashtable: ENOMEM errors when hit with a flood of insertions

2015-12-03 Thread David Laight
From: Herbert Xu
> Sent: 03 December 2015 12:51
> On Mon, Nov 30, 2015 at 06:18:59PM +0800, Herbert Xu wrote:
> >
> > OK that's better.  I think I see the problem.  The test in
> > rhashtable_insert_rehash is racy and if two threads both try
> > to grow the table one of them may be tricked into doing a rehash
> > instead.
> >
> > I'm working on a fix.
> 
> While the EBUSY errors are gone for me, I can still see plenty
> of ENOMEM errors.  In fact it turns out that the reason is quite
> understandable.  When you pound the rhashtable hard so that it
> doesn't actually get a chance to grow the table in process context,
> then the table will only grow with GFP_ATOMIC allocations.
> 
> For me this starts failing regularly at around 2^19 entries, which
> requires about 1024 contiguous pages if I'm not mistaken.

ISTM that you should always let the insert succeed - even if it makes
the average/maximum chain length increase beyond some limit.
Any limit on the number of hashed items should have been done earlier
by the calling code.
The slight performance decrease caused by scanning longer chains
is almost certainly more 'user friendly' than an error return.

Hoping to get 1024+ contiguous VA pages does seem over-optimistic.

With a 2-level lookup you could make all the 2nd level tables
a fixed size (maybe 4 or 8 pages?) and extend the first level
table as needed.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported

2015-12-17 Thread David Laight
> The MSI-X table is paravirtualized on vfio in general and interrupt
> remapping theoretically protects against errant interrupts, so why is
> this PPC64 specific? We have the same safeguards on x86 if we want to
> decide they're sufficient. Offhand, the only way I can think that a
> device can touch the MSI-X table is via backdoors or p2p DMA with
> another device.

Is this all related to the statements in the PCI(e) spec that the
MSI-X table and Pending bit array should in their own BARs?
(ISTR it even suggests a BAR each.)

Since the MSI-X table exists in device memory/registers there is
nothing to stop the device modifying the table contents (or even
ignoring the contents and writing address+data pairs that are known
to reference the CPUs MSI-X interrupt generation logic).

We've an fpga based PCIe slave that has some additional PCIe slaves
(associated with the interrupt generation logic) that are currently
next to the PBA (which is 8k from the MSI-X table).
If we can't map the PBA we can't actually raise any interrupts.
The same would be true if page size is 64k and mapping the MSI-X
table banned.

Do we need to change our PCIe slave address map so we don't need
to access anything in the same page (which might be 64k were we to
target large ppc - which we don't at the moment) as both the
MSI-X table and the PBA?

I'd also note that being able to read the MSI-X table is a useful
diagnostic that the relevant interrupts are enabled properly.

David



RE: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported

2015-12-18 Thread David Laight
From: Alex Williamson
> Sent: 17 December 2015 21:07
...
> > Is this all related to the statements in the PCI(e) spec that the
> > MSI-X table and Pending bit array should in their own BARs?
> > (ISTR it even suggests a BAR each.)
> >
> > Since the MSI-X table exists in device memory/registers there is
> > nothing to stop the device modifying the table contents (or even
> > ignoring the contents and writing address+data pairs that are known
> > to reference the CPUs MSI-X interrupt generation logic).
> >
> > We've an fpga based PCIe slave that has some additional PCIe slaves
> > (associated with the interrupt generation logic) that are currently
> > next to the PBA (which is 8k from the MSI-X table).
> > If we can't map the PBA we can't actually raise any interrupts.
> > The same would be true if page size is 64k and mapping the MSI-X
> > table banned.
> >
> > Do we need to change our PCIe slave address map so we don't need
> > to access anything in the same page (which might be 64k were we to
> > target large ppc - which we don't at the moment) as both the
> > MSI-X table and the PBA?
> >
> > I'd also note that being able to read the MSI-X table is a useful
> > diagnostic that the relevant interrupts are enabled properly.
> 
> Yes, the spec requirement is that MSI-X structures must reside in a 4k
> aligned area that doesn't overlap with other configuration registers
> for the device.  It's only an advisement to put them into their own
> BAR, and 4k clearly wasn't as forward looking as we'd hope.  Vfio
> doesn't particularly care about the PBA, but if it resides in the same
> host PAGE_SIZE area as the MSI-X vector table, you currently won't be
> able to get to it.  Most devices are not at all dependent on the PBA
> for any sort of functionality.

Having some hint in the spec as to why these mapping rules might be
needed would have been useful.

> It's really more correct to say that both the vector table and PBA are
> emulated by QEMU than paravirtualized.  Only PPC64 has the guest OS
> taking a paravirtual path to program the vector table, everyone else
> attempts to read/write to the device MMIO space, which gets trapped and
> emulated in QEMU.  This is why the QEMU side patch has further ugly
> hacks to mess with the ordering of MemoryRegions since even if we can
> access and mmap the MSI-X vector table, we'll still trap into QEMU for
> emulation.

Thanks for that explanation.

> How exactly does the ability to map the PBA affect your ability to
> raise an interrupt?

There are other registers for the logic block that converts internal
interrupt requests into the PCIe writes in the locations following the PBA.
These include interrupt enable bits, and the ability to remove pending
interrupt requests (and other stuff for testing the interrupt block).
Yes I know I probably shouldn't have done that, but it all worked.

At least it is better than the previous version of the hardware that
required the driver read back the MSI-X table entries in order to
set up an on-board PTE to convert a 32bit on-board address to the
64bit PCIe address needed for the MSI-X.

I'll 'fix' our board by making both the MSI-X table and PBA area
accessible through one of the other BARs. (Annoyingly this will
be confusing because the BAR offsets will have to differ.)
Note that this makes a complete mockery of disallowing the mapping
in the first place.

David


N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH net] ipv6: sctp: clone options to avoid use after free

2015-12-10 Thread David Laight
From: Daniel Borkmann
> Sent: 09 December 2015 19:19
> On 12/09/2015 06:11 PM, Marcelo Ricardo Leitner wrote:
> > Em 09-12-2015 14:31, David Laight escreveu:
> >> From: Eric Dumazet [mailto:eric.duma...@gmail.com]
> >>> Sent: 09 December 2015 16:00
> >>> On Wed, 2015-12-09 at 15:49 +, David Laight wrote:
> >>>>> SCTP is lacking proper np->opt cloning at accept() time.
> >>>>>
> >>>>> TCP and DCCP use ipv6_dup_options() helper, do the same in SCTP.
> >>>>>
> >>>>> We might later factorize this code in a common helper to avoid
> >>>>> future mistakes.
> >>>>
> >>>> I'm wondering what the real impact of this and the other recent
> >>>> SCTP bugs/patches is on real workloads?
> >>>> We have enough trouble getting our customers to use kernels
> >>>> later that the 2.6.18 based RHEL5 - without having to persuade
> >>>> them to use kernels that contain very recent fixes.
> >>>
> >>> It all depends if your customers let (hostile ?) people run programs on
> >>> the boxes.
> >>
> >> If they require hostile programs I'm not worried.
> >
> > Not really "require", but "allow", as in: allowing third-party applications 
> > to run on it.
> 
> Yeah :/ given distros enable almost everything anyway, the first unpriv'ed
> socket(..., IPPROTO_SCTP) call auto-loads SCTP module. But to be honest, I'd
> be surprised if Cloud providers allow for this. Most of this might only run
> on dedicated boxes with telco appliances.

Yes, I'm worried about whether our M3UA code is likely to crash customer
systems, not whether hostile applications can crash it.
These boxes ought to be on private networks since the sigtran protocols
themselves have nothing that even gives a hint of security.

David


RE: [PATCH net] ipv6: sctp: clone options to avoid use after free

2015-12-10 Thread David Laight
From: Eric Dumazet
> Sent: 10 December 2015 15:58
>
> BTW, are you even using IPv6 SCTP sessions ?

Our M3UA/SCTP protocol stack supports them and defaults to using
IPv6 listening sockets for IPv4 connections.

I very much doubt than any customers have used them yet.
So most of the IPv6 connections will have been to ::1
during internal regression testing.

We don't even try to set any IPv6 (or IPv4) options.

Just SO_REUSEADDR, TCP/SCTP_NODELAY, SCTP_EVENTS, SCTP_INITMSG,
SO_KEEPALIVE (tcp), IPV6_V6ONLY (if binding separate listeners),
SCTP_SOCKOPT_BINX_ADD (WTF is this a 'socket option') and
SO_LINGER (to get abortive close on SCTP connections on kernels
before 3.18).

David



RE: [PATCH] usb: remove redundant conditions

2015-12-10 Thread David Laight
From: Felipe Balbi
> Sent: 10 December 2015 15:14
> "Geyslan G. Bem"  writes:
> 
> > This patch removes redundant conditions.
> >
> >  - (!A || (A && B)) is the same as (!A || B).
> >  - (length && length > 5) can be reduced to a single evaluation.
> >
> > Caught by: cppcheck
> >
> > Signed-off-by: Geyslan G. Bem 
> > ---
> 
> I guess you didn't get previous comment in time; let's split this per
> driver so different maintainers can pick their parts.

I also suspect that gcc will optimise out the redundant checks as well.

David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] net: add per device sg_max_frags for skb

2016-01-06 Thread David Laight
From: Hans Westgaard Ry
> Sent: 06 January 2016 13:16
> Devices may have limits on the number of fragments in an skb they
> support. Current codebase uses a constant as maximum for number of
> fragments (MAX_SKB_FRAGS) one skb can hold and use.
> 
> When enabling scatter/gather and running traffic with many small
> messages the codebase uses the maximum number of fragments and thereby
> violates the max for certain devices.
> 
> An example of such a violation is when running IPoIB on a HCA
> supporting 16 SGE on an architecture with 4K pagesize. The
> MAX_SKB_FRAGS will be 17 (64K/4K+1) and because IPoIB adds yet another
> segment we end up with send_requests with 18 SGE resulting in
> kernel-panic.
> 
> The patch allows the device to limit the maximum number fragments used
> in one skb.

This doesn't seem to me to be the correct way to fix this.
Anything that adds an extra fragment (in this case IPoIB) should allow
for the skb already having the maximum number of fragments.
Fully linearising the skb is overkill, but I think the first fragment
can be added to the linear part of the skb.

David




RE: [PATCH] netfilter: nf_conntrack: use safer way to lock all buckets

2016-01-05 Thread David Laight
From: Sasha Levin
> Sent: 05 January 2016 02:26
> When we need to lock all buckets in the connection hashtable we'd attempt to
> lock 1024 spinlocks, which is way more preemption levels than supported by
> the kernel. Furthermore, this behavior was hidden by checking if lockdep is
> enabled, and if it was - use only 8 buckets(!).
> 
> Fix this by using a global lock and synchronize all buckets on it when we
> need to lock them all. This is pretty heavyweight, but is only done when we
> need to resize the hashtable, and that doesn't happen often enough (or at 
> all).
...
> +static void nf_conntrack_lock_nested(spinlock_t *lock)
> +{
> + spin_lock_nested(lock, SINGLE_DEPTH_NESTING);
> + while (unlikely(nf_conntrack_locks_all)) {
> + spin_unlock(lock);
> + spin_lock(_conntrack_locks_all_lock);
> + spin_unlock(_conntrack_locks_all_lock);
> + spin_lock_nested(lock, SINGLE_DEPTH_NESTING);
> + }
> +}
...
> @@ -102,16 +126,19 @@ static void nf_conntrack_all_lock(void)
>  {
>   int i;
> 
> - for (i = 0; i < CONNTRACK_LOCKS; i++)
> - spin_lock_nested(_conntrack_locks[i], i);
> + spin_lock(_conntrack_locks_all_lock);
> + nf_conntrack_locks_all = true;
> +
> + for (i = 0; i < CONNTRACK_LOCKS; i++) {
> + spin_lock(_conntrack_locks[i]);
> + spin_unlock(_conntrack_locks[i]);
> + }
>  }

If spin_lock_nested() does anything like what I think its
name suggests then I suspect that deadlocks.

David


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Resend PATCH] RDS: fix race condition when sending a message on unbound socket

2015-11-25 Thread David Laight
From: Santosh Shilimkar
> Sent: 24 November 2015 22:13
...
> Sasha's found a NULL pointer dereference in the RDS connection code when
> sending a message to an apparently unbound socket.  The problem is caused
> by the code checking if the socket is bound in rds_sendmsg(), which checks
> the rs_bound_addr field without taking a lock on the socket.  This opens a
> race where rs_bound_addr is temporarily set but where the transport is not
> in rds_bind(), leading to a NULL pointer dereference when trying to
> dereference 'trans' in __rds_conn_create().
> 
> Vegard wrote a reproducer for this issue, so kindly ask him to share if
> you're interested.
...
> diff --git a/net/rds/send.c b/net/rds/send.c
> index 827155c..c9cdb35 100644
> --- a/net/rds/send.c
> +++ b/net/rds/send.c
> @@ -1013,11 +1013,13 @@ int rds_sendmsg(struct socket *sock, struct msghdr 
> *msg, size_t payload_len)
>   release_sock(sk);

This is falling though into an unconditional lock_sock().
No need to unlock and relock immediately.

>   }
> 
> - /* racing with another thread binding seems ok here */
> + lock_sock(sk);
>   if (daddr == 0 || rs->rs_bound_addr == 0) {
> + release_sock(sk);
>   ret = -ENOTCONN; /* XXX not a great errno */
>   goto out;
>   }
> + release_sock(sk);
> 

On the face of it the above looks somewhat dubious.
Locks usually tie together two action (eg a test and use of a value),
In this case you only have a test inside the lock.
That either means that the state can change after you release the lock
(ie rs->rs_bound_addr = 0 is executed somewhere), or you don't
really need the lock.

David


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] sctp: use GFP_USER for user-controlled kmalloc

2015-12-01 Thread David Laight
From: Marcelo Ricardo Leitner
> Sent: 30 November 2015 16:33
> Dmitry Vyukov reported that the user could trigger a kernel warning by
> using a large len value for getsockopt SCTP_GET_LOCAL_ADDRS, as that
> value directly affects the value used as a kmalloc() parameter.
> 
> This patch thus switches the allocation flags from all user-controllable
> kmalloc size to GFP_USER to put some more restrictions on it and also
> disables the warn, as they are not necessary.

ISTM that the code should put some 'sanity limit' on that
size before allocating the kernel buffer.

David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: next build: 235 warnings 3 failures (next/next-20151117)

2015-11-18 Thread David Laight
From: Will Deacon
> Sent: 18 November 2015 10:14
> On Tue, Nov 17, 2015 at 08:17:17PM +0100, Arnd Bergmann wrote:
> > On Tuesday 17 November 2015 17:12:37 Will Deacon wrote:
> > > On Tue, Nov 17, 2015 at 06:03:40PM +0100, Arnd Bergmann wrote:
> > > > On Tuesday 17 November 2015 16:44:53 Will Deacon wrote:
> > > > > > 8<
> > > > > > Subject: ARM64: make smp_load_acquire() work with const arguments
> > > > > >
> > > > > > smp_load_acquire() uses typeof() to declare a local variable for 
> > > > > > temporarily
> > > > > > storing the output of the memory access. This fails when the 
> > > > > > argument is
> > > > > > constant, because the assembler complains about using a constant 
> > > > > > register
> > > > > > as output:
> > > > > >
> > > > > >  arch/arm64/include/asm/barrier.h:71:3: error: read-only variable 
> > > > > > '___p1'
> > > > > >  used as 'asm' output
> > > > >
> > > > > Do you know the usage in the kernel causing this warning?
> > > >
> > > > A newly introduced function in include/net/sock.h:
> > > >
> > > > static inline int sk_state_load(const struct sock *sk)
> > > > {
> > > > return smp_load_acquire(>sk_state);
> > > > }
> > >
> > > Hmm, maybe we could play a similar trick to READ_ONCE by declaring an
> > > anonymous union and writing through the non-const member?
> >
> > Yes, I think that would work, if you think we need to care about the
> > case where we read into a structure.
> >
> > Can you come up with a patch for that?
> 
> Done:
> 
>   
> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-November/386094.html

That patch forces a memory write-read and returns uninitialised stack
for short reads.
Who knows what happens on big-endian systems.

You need a static inline function with separate temporaries in each branch.

David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] usb: gadget: Add the console support for usb-to-serial port

2015-11-18 Thread David Laight
From: Baolin Wang
> Sent: 18 November 2015 10:45
> On 18 November 2015 at 17:32, Andy Shevchenko  
> wrote:
> > On Wed, Nov 18, 2015 at 4:15 AM, Baolin Wang  wrote:
> >> On 17 November 2015 at 21:34, Andy Shevchenko  
> >> wrote:
> >>> On Mon, Nov 16, 2015 at 9:05 AM, Baolin Wang  
> >>> wrote:
>  It dose not work when we want to use the usb-to-serial port based
>  on one usb gadget as a console. Thus this patch adds the console
>  initialization to support this request.
> >>>
> >
>  +#define GS_BUFFER_SIZE (4096)
> >>> Redundant parens
> >> OK. I'll remove it.
> >>
>  +#define GS_CONSOLE_BUF_SIZE(2 * GS_BUFFER_SIZE)
>  +
>  +struct gscons_info {
>  +   struct gs_port  *port;
>  +   struct tty_driver   *tty_driver;
>  +   struct work_struct  work;
>  +   int buf_tail;
>  +   charbuf[GS_CONSOLE_BUF_SIZE];
> >>>
> >>> Can't be malloced once?
> >> The 'gscons_info' structure is malloced once.
> >
> > In state of high fragmentation is quite hard to find big memory chunks.
> > I would split it to two allocations, though if maintainers are okay
> > with your code, then I'm also okay.
> >
> 
> Make sense. But I think the major memory of the 'struct gscons_info'
> is for the 'buf' member, so I still think no need to allocate it 2
> times.

It may be worth just reducing GS_BUFFER_SIZE slightly so that the gscons_info
structure itself is less than 8k.
If you can't get 2 adjacent pages then a lot of things will fail.

David



<    1   2   3   4   5   6   7   8   9   10   >