from:"David Laight"

RE: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function

2016-12-15 Thread David Laight

From: Hannes Frederic Sowa
> Sent: 14 December 2016 22:03
> On 14.12.2016 13:46, Jason A. Donenfeld wrote:
> > Hi David,
> >
> > On Wed, Dec 14, 2016 at 10:56 AM, David Laight <david.lai...@aculab.com> 
> > wrote:
> >> ...
> >>> +u64 siphash24(const u8 *data, size_t len, const u8 
> >>> key[SIPHASH24_KEY_LEN])
> >> ...
> >>> + u64 k0 = get_unaligned_le64(key);
> >>> + u64 k1 = get_unaligned_le64(key + sizeof(u64));
> >> ...
> >>> + m = get_unaligned_le64(data);
> >>
> >> All these unaligned accesses are going to get expensive on architectures
> >> like sparc64.
> >
> > Yes, the unaligned accesses aren't pretty. Since in pretty much all
> > use cases thus far, the data can easily be made aligned, perhaps it
> > makes sense to create siphash24() and siphash24_unaligned(). Any
> > thoughts on doing something like that?
> 
> I fear that the alignment requirement will be a source of bugs on 32 bit
> machines, where you cannot even simply take a well aligned struct on a
> stack and put it into the normal siphash(aligned) function without
> adding alignment annotations everywhere. Even blocks returned from
> kmalloc on 32 bit are not aligned to 64 bit.

Are you doing anything that will require 64bit alignment on 32bit systems?
It is unlikely that the kernel can use any simd registers that have wider
alignment requirements.

You also really don't want to request on-stack items have large alignments.
While gcc can generate code to do it, it isn't pretty.

David

RE: [PATCH v3 1/3] siphash: add cryptographically secure hashtable function

2016-12-15 Thread David Laight

From: Linus Torvalds
> Sent: 15 December 2016 00:11
> On Wed, Dec 14, 2016 at 3:34 PM, Jason A. Donenfeld  wrote:
> >
> > Or does your reasonable dislike of "word" still allow for the use of
> > dword and qword, so that the current function names of:
> 
> dword really is confusing to people.
>
> If you have a MIPS background, it means 64 bits. While to people with
> Windows programming backgrounds it means 32 bits.

Guess what a DWORD_PTR is on 64bit windows ...
(it is an integer type).

David

RE: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function

2016-12-15 Thread David Laight

From: Hannes Frederic Sowa
> Sent: 15 December 2016 12:23
...
> Hmm? Even the Intel ABI expects alignment of unsigned long long to be 8
> bytes on 32 bit. Do you question that?

Yes.

The linux ABI for x86 (32 bit) only requires 32bit alignment for u64 (etc).

David

RE: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread David Laight

From: George Spelvin
> Sent: 15 December 2016 23:29
> > If a halved version of SipHash can bring significant performance boost
> > (with 32b words instead of 64b words) with an acceptable security level
> > (64-bit enough?) then we may design such a version.
> 
> I was thinking if the key could be pushed to 80 bits, that would be nice,
> but honestly 64 bits is fine.  This is DoS protection, and while it's
> possible to brute-force a 64 bit secret, there are more effective (DDoS)
> attacks possible for the same cost.

A 32bit hash would also remove all the issues about the alignment
of IP addresses (etc) on 64bit systems.

> (I'd suggest a name of "HalfSipHash" to convey the reduced security
> effectively.)
> 
> > Regarding output size, are 64 bits sufficient?
> 
> As a replacement for jhash, 32 bits are sufficient.  It's for
> indexing an in-memory hash table on a 32-bit machine.

It is also worth remembering that if the intent is to generate
a hash table index (not a unique fingerprint) you will always
get collisions on the final value.
Randomness could still give overlong hash chains - which might
still need rehashing with a different key.

David

RE: Misalignment, MIPS, and ip_hdr(skb)->version

2016-12-12 Thread David Laight

From: Måns Rullgård
> Sent: 10 December 2016 13:25
...
> I solved this problem in an Ethernet driver by copying the initial part
> of the packet to an aligned skb and appending the remainder using
> skb_add_rx_frag().  The kernel network stack only cares about the
> headers, so the alignment of the packet payload doesn't matter.

That rather depends on where the packet payload ends up.
It is likely that it will be copied to userspace (or maybe
into some aligned kernel buffer).
In which case you get an expensive misaligned copy.

Encapsulation protocols not using headers that are multiples
of 4 bytes is as stupid as ethernet hardware that can't place
the mac address on a 4n+2 boundary.
The latter is particularly stupid when it happens on embedded
silicon with a processor that can only do aligned memory accesses.
What do the hardware engineers think the ethernet interface will
be used for!

David

RE: [PATCH v2 8/8] crypto/testmgr: Allocate only the required output size for hash tests

2017-01-11 Thread David Laight

From: Andy Lutomirski
> Sent: 10 January 2017 23:25
> There are some hashes (e.g. sha224) that have some internal trickery
> to make sure that only the correct number of output bytes are
> generated.  If something goes wrong, they could potentially overrun
> the output buffer.
> 
> Make the test more robust by allocating only enough space for the
> correct output size so that memory debugging will catch the error if
> the output is overrun.

Might be better to test this by allocating an overlong buffer
and then explicitly checking that the output hasn't overrun
the allowed space.

If nothing else the error message will be clearer.

David

RE: Problem on SCTP

2017-01-12 Thread David Laight

From: Sun Paul [mailto:paul...@gmail.com]
> Sent: 12 January 2017 09:31
> Let me clear the understanding. below is the flow.
> 
> 1. Client sends to Linux Router: 192.168.206.83 -> 192.168.206.56,
> 2. Linux router sends to SERVER where the source IP is unchanged:
> 192.168.206.83 -> 192.168.206.66
> 
> My question here is why SERVER cannot response this INIT chunk?

Probably because the IP addresses embedded in the SCTP packet
don't match the ones in the IP header.

David

RE: [PATCH net-next v2 05/10] drivers: base: Add device_find_class()

2017-01-13 Thread David Laight

From: Florian Fainelli
> Sent: 12 January 2017 22:51
> On 01/12/2017 01:21 PM, David Miller wrote:
> > From: Florian Fainelli 
> > Date: Wed, 11 Jan 2017 19:41:16 -0800
> >
> >> Add a helper function to lookup a device reference given a class name.
> >> This is a preliminary patch to remove adhoc code from net/dsa/dsa.c and
> >> make it more generic.
...
> >> +static int dev_is_class(struct device *dev, void *class)
> >
> > I know you are just moving code, but this class argumnet is a string
> > and thus should be "char *" or even "const char *".
> 
> Well, this is really so that we don't need to cast the arguments passed
> to device_find_child(), which takes a void *data as well. If we made
> that a const char *class, we'd get warnings that look like these:
> 
> drivers/base/core.c: In function 'device_find_class':
> drivers/base/core.c:2083:2: warning: passing argument 2 of
> 'device_find_child' discards 'const' qualifier from pointer target type
> [enabled by default]
>   return device_find_child(parent, class, dev_is_class);
>   ^
> drivers/base/core.c:2050:16: note: expected 'void *' but argument is of
> type 'const char *'
>  struct device *device_find_child(struct device *parent, void *data,
> ^
...

Maybe device_find_child() needs changing to take 'const void *' ?

David

RE: [PATCH] usb: gadget: udc: atmel: used managed kasprintf

2016-12-05 Thread David Laight

From: Alexandre Belloni
> Sent: 02 December 2016 16:19
> On 02/12/2016 at 15:59:57 +, David Laight wrote :
> > From: Alexandre Belloni
> > > Sent: 01 December 2016 10:27
> > > Use devm_kasprintf instead of simple kasprintf to free the allocated 
> > > memory
> > > when needed.
> >
> > s/when needed/when the device is freed/
> >
> > > Suggested-by: Peter Rosin <p...@axentia.se>
> > > Signed-off-by: Alexandre Belloni <alexandre.bell...@free-electrons.com>
> > > ---
> > >  drivers/usb/gadget/udc/atmel_usba_udc.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/usb/gadget/udc/atmel_usba_udc.c 
> > > b/drivers/usb/gadget/udc/atmel_usba_udc.c
> > > index 45bc997d0711..aec72fe8273c 100644
> > > --- a/drivers/usb/gadget/udc/atmel_usba_udc.c
> > > +++ b/drivers/usb/gadget/udc/atmel_usba_udc.c
> > > @@ -1978,7 +1978,8 @@ static struct usba_ep * atmel_udc_of_init(struct 
> > > platform_device *pdev,
> > >   dev_err(>dev, "of_probe: name error(%d)\n", ret);
> > >   goto err;
> > >   }
> > > - ep->ep.name = kasprintf(GFP_KERNEL, "ep%d", ep->index);
> > > + ep->ep.name = devm_kasprintf(>dev, GFP_KERNEL, "ep%d",
> > > +  ep->index);
> >
> > Acually why bother mallocing such a small string at all.
> > The maximum length is 12 bytes even if 'index' are unrestricted.
> >
> 
> IIRC, using statically allocated string is failing somewhere is the USB
> core but I don't remember all the details.

I can't imagine that changing ep->ep.name from 'char *' to 'char [12]' would
make any difference.

David

RE: [PATCH] usb: gadget: udc: atmel: used managed kasprintf

2016-12-02 Thread David Laight

From: Alexandre Belloni
> Sent: 01 December 2016 10:27
> Use devm_kasprintf instead of simple kasprintf to free the allocated memory
> when needed.

s/when needed/when the device is freed/

> Suggested-by: Peter Rosin 
> Signed-off-by: Alexandre Belloni 
> ---
>  drivers/usb/gadget/udc/atmel_usba_udc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/gadget/udc/atmel_usba_udc.c 
> b/drivers/usb/gadget/udc/atmel_usba_udc.c
> index 45bc997d0711..aec72fe8273c 100644
> --- a/drivers/usb/gadget/udc/atmel_usba_udc.c
> +++ b/drivers/usb/gadget/udc/atmel_usba_udc.c
> @@ -1978,7 +1978,8 @@ static struct usba_ep * atmel_udc_of_init(struct 
> platform_device *pdev,
>   dev_err(>dev, "of_probe: name error(%d)\n", ret);
>   goto err;
>   }
> - ep->ep.name = kasprintf(GFP_KERNEL, "ep%d", ep->index);
> + ep->ep.name = devm_kasprintf(>dev, GFP_KERNEL, "ep%d",
> +  ep->index);

Acually why bother mallocing such a small string at all.
The maximum length is 12 bytes even if 'index' are unrestricted.
 
David

RE: [PATCH] ASoC : fsl_ssi : Correct the condition to check AC97 mode

2017-01-04 Thread David Laight

From: Harisangam, Sharvari (S.)
> Sent: 28 December 2016 11:07
>  Corrected the condition to check if ssi is configured for AC97
>  mode. Other modes like dsp_a also satisfy the ANDing condition.

Under the assumption that the constants have 1 bit set nothing is wrong.

David

...
> - return !!(ssi_private->dai_fmt & SND_SOC_DAIFMT_AC97);
...
> - if (fmt & SND_SOC_DAIFMT_AC97)
...

RE: [RFC 1/4] mm: remove unused TASK_SIZE_OF()

2017-01-05 Thread David Laight

From: Dmitry Safonov
> Sent: 30 December 2016 15:57
> All users of TASK_SIZE_OF(tsk) have migrated to mm->task_size or
> TASK_SIZE_MAX since:
> commit d696ca016d57 ("x86/fsgsbase/64: Use TASK_SIZE_MAX for
> FSBASE/GSBASE upper limits"),
> commit a06db751c321 ("pagemap: check permissions and capabilities at
> open time"),
...
> +#define TASK_SIZE(current->thread.task_size)

I'm not sure I like he hidden 'current' argument to an
apparent constant.

David

RE: Problem on SCTP

2017-01-09 Thread David Laight

From: Sun Paul
> Sent: 09 January 2017 02:08

> >> I am setting up a lab where the SCTP traffics from  client is passing
> >> through a linux router before reaching to the SCTP server running
> >> LKSCTP.
> >>
> >> The linux router did not change the source address of the client, so
> >> when it arrived to the SCTP server, the source address is the oriingal
> >> one.
>
> the INIT chunk arrive on the SERVER, but then no response. the
> application that using in SERVER is the same as the other test.
> 
> I noticed one thing in Ethernet frame of the incoming packet on the
> SERVER compared to the one captured from the client is the LG bit on
> the source address.
> 
> The LG bit is set to 0 on the request packet received in the
> SERVER,but it is 0 from the one originated on the client. willl it be
> the root cause?

Which addresses are you talking about, and what do you mean by the LG bit?

Is your linux 'router' just routing (ie IP forwarding) or is it doing NAT?
If it is changing the IP addresses then the addresses inside some SCTP
chunks also need changing.

David

RE: [RFC PATCH] intel: Use upper_32_bits and lower_32_bits

2017-01-09 Thread David Laight

From: Joe Perches
> Sent: 07 January 2017 18:33
> Shifting and masking various types can be made a bit
> simpler to read by using the available kernel macros.
...
> - ew32(TDBAH, (tdba >> 32));
> - ew32(TDBAL, (tdba & 0xULL));
> + ew32(TDBAH, upper_32_bits(tdba));
> + ew32(TDBAL, lower_32_bits(tdba));

Personally I find the original code easier to understand
since I don't have to look up another silly macro.

I'd normally not even explicitly mask the low bits
relying on the implicit truncation of the assignment.

At least modern compilers aren't stupid enough to add two
'mask with 0xff' instructions for:
*uchar_ptr = (unsigned char)(foo & 0xff);

David

RE: [PATCH v2 net-next 3/4] secure_seq: use SipHash in place of MD5

2017-01-09 Thread David Laight

From: Eric Biggers
> Sent: 07 January 2017 22:09
..
> Out of curiosity, is this actually a solvable problem, e.g. by making the code
> using the XMM registers responsible for saving and restoring the ones 
> clobbered,
> or by optimizing kernel_fpu_begin()/kernel_fpu_end()?  Or does it in fact 
> remain
> impractical for such instructions to be used for applications like this one?

I think it should be possible to fast-save a couple of FP registers
by telling the IPI save code where to save them from (to load
into a different cpu).
But there might be issues with other parts of the FP state.
This might be ok provided the kernel doesn't actually use FP instructions.

(I thought about this when adding AVX support to NetBSD.)

David

RE: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-19 Thread David Laight

From: George Spelvin
> Sent: 17 December 2016 15:21
...
> uint32_t
> hsiphash24(char const *in, size_t len, uint32_t const key[2])
> {
>   uint32_t c = key[0];
>   uint32_t d = key[1];
>   uint32_t a = 0x6c796765 ^ 0x736f6d65;
>   uint32_t b = d ^ 0x74656462 ^ 0x646f7261;

I've not looked closely, but is that (in some sense) duplicating
the key length?
So you could set a = key[2] and b = key[3] and still have an
working hash - albeit not exactly the one specified.

I'll add another comment here...
Is it worth using the 32bit hash for IP addresses on 64bit systems that
can't do misaligned accessed?

David

RE: [PATCH v5 3/4] secure_seq: use SipHash in place of MD5

2016-12-16 Thread David Laight

From: Jason A. Donenfeld
> Sent: 15 December 2016 20:30
> This gives a clear speed and security improvement. Siphash is both
> faster and is more solid crypto than the aging MD5.
> 
> Rather than manually filling MD5 buffers, for IPv6, we simply create
> a layout by a simple anonymous struct, for which gcc generates
> rather efficient code. For IPv4, we pass the values directly to the
> short input convenience functions.
...
> diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
> index 88a8e429fc3e..c80583bf3213 100644
...
> + const struct {
> + struct in6_addr saddr;
> + struct in6_addr daddr;
> + __be16 sport;
> + __be16 dport;
> + u32 padding;
> + } __aligned(SIPHASH_ALIGNMENT) combined = {
> + .saddr = *(struct in6_addr *)saddr,
> + .daddr = *(struct in6_addr *)daddr,
> + .sport = sport,
> + .dport = dport
> + };

I think you should explicitly initialise the 'padding'.
It can do no harm and makes it obvious that it is necessary.

You are still putting over-aligned data on stack.
You only need to align it to the alignment of u64 (not the size of u64).
If an on-stack item has a stronger alignment requirement than the stack
the gcc has to generate two stack frames for the function.

If you assign to each field (instead of using initialisers) then you
can get the alignment by making the first member an anonymous union
of in6_addr and u64.

Oh - and wait a bit longer between revisions.

David

RE: [PATCH v5 2/4] siphash: add Nu{32,64} helpers

2016-12-16 Thread David Laight

From: Jason A. Donenfeld
> Sent: 15 December 2016 20:30
> These restore parity with the jhash interface by providing high
> performance helpers for common input sizes.
...
> +#define PREAMBLE(len) \
> + u64 v0 = 0x736f6d6570736575ULL; \
> + u64 v1 = 0x646f72616e646f6dULL; \
> + u64 v2 = 0x6c7967656e657261ULL; \
> + u64 v3 = 0x7465646279746573ULL; \
> + u64 b = ((u64)len) << 56; \
> + v3 ^= key[1]; \
> + v2 ^= key[0]; \
> + v1 ^= key[1]; \
> + v0 ^= key[0];

Isn't that equivalent to:
v0 = key[0];
v1 = key[1];
v2 = key[0] ^ (0x736f6d6570736575ULL ^ 0x646f72616e646f6dULL);
v3 = key[1] ^ (0x646f72616e646f6dULL ^ 0x7465646279746573ULL);

Those constants also look like ASCII strings.
What cryptographic analysis has been done on the values?

David

RE: [PATCH 06/16] drivers, net, mlx5: convert mlx5_cq.refcount from atomic_t to refcount_t

2017-03-28 Thread David Laight

From: Elena Reshetova
> Sent: 28 March 2017 09:57
> 
> refcount_t type and corresponding API should be
> used instead of atomic_t when the variable is used as
> a reference counter. This allows to avoid accidental
> refcounter overflows that might lead to use-after-free
> situations.

I can't help feeling that you ought to find a scheme
that will detect extra decrements and extra increments
before the counter wraps 32 bits.

If an extra reference is requested every 100us it takes 4.8 days
for the counter to increment back to zero.
Simple tests aren't doing to find that - but it can easily happen
on a system that is running for several years.

David

RE: [PATCH] net: netfilter: Remove multiple assignment.

2017-03-27 Thread David Laight

From: Pablo Neira Ayuso
> Sent: 27 March 2017 13:08
> On Sat, Mar 25, 2017 at 06:19:47PM +0530, Arushi Singhal wrote:
> > This patch removes multiple assignments.
> > Done using coccinelle.
> > @@
> > identifier i1,i2;
> > constant c;
> > @@
> > - i1=i2=c;
> > + i1=c;
> > + i2=c;
> 
> You have to explain why this is bad.

And your substituted code isn't equivalent.
The correct replacement is:
i2 = c;
i1 = i2;

David

RE: [PATCH] netfilter: ipset: Use max macro instead of ternary operator

2017-03-28 Thread David Laight

From: simran singhal
> Sent: 28 March 2017 14:33
> This patch replaces ternary operator with macro max as it shorter and
> thus increases code readability. Macro max return the maximum of the two
> compared values.
...
>   /* Convert error codes to nomatch */
> - return (ret < 0 ? 0 : ret);
> + return max(0, ret);

It might be shorter but it isn't more readable.

David

RE: [PATCH v3 2/2] PCI: Add tango PCIe host bridge support

2017-03-29 Thread David Laight

> > For my education, what is the API to send an IPI?
> > And the API to handle an IPI?
> 
> There are a few ways you could implement some custom cross-call,
> although in this case I think stop_machine() would probably be the most
> appropriate candidate. However, you're right that in general it may not
> actually help enough to be worthwhile - a DSB SY would ensure that
> in-flight transactions have at least been observed by the CPUs and any
> other coherent masters, but for any writes with a memory type allowing
> early acknowledgement (i.e. a Normal or Device mapping of a BAR) that
> doesn't necessarily correlate with them having reached their ultimate
> destination. For a PCI destination in particular, I think the normal way
> to ensure all posted writes have completed would be to read from config
> space; ah...

He almost certainly doesn't need to wait for the cycle to complete,
just long enough for the cycle to have been sent.

David

RE: [PATCH net-next] r8152: simply the arguments

2017-03-16 Thread David Laight

From: Hayes Wang
> Sent: 16 March 2017 06:28
> Replace >napi with napi and tp->netdev with netdev.
> 
> Signed-off-by: Hayes Wang 
> ---
>  drivers/net/usb/r8152.c | 44 +++-
>  1 file changed, 27 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
> index 227e1fd..e480e9f 100644
> --- a/drivers/net/usb/r8152.c
> +++ b/drivers/net/usb/r8152.c
> @@ -1761,6 +1761,7 @@ static int rx_bottom(struct r8152 *tp, int budget)
>   unsigned long flags;
>   struct list_head *cursor, *next, rx_queue;
>   int ret = 0, work_done = 0;
> + struct napi_struct *napi = >napi;
> 
>   if (!skb_queue_empty(>rx_queue)) {
>   while (work_done < budget) {
> @@ -1773,7 +1774,7 @@ static int rx_bottom(struct r8152 *tp, int budget)
>   break;
> 
>   pkt_len = skb->len;
> - napi_gro_receive(>napi, skb);
> + napi_gro_receive(napi, skb);
...

I'm not sure this makes the code any more readable.
It isn't even needed to shorten any long lines.

If you are really lucky the compiler will optimise it away.
Otherwise it will generate another local variable and possibly
a register spill to stack.

David

RE: [PATCH v2 net-next 4/5] sunvnet: count multicast packets

2017-03-15 Thread David Laight

From: Shannon Nelson
> Sent: 14 March 2017 17:25
...
> + if (unlikely(is_multicast_ether_addr(eth_hdr(skb)->h_dest)))
> + dev->stats.multicast++;

I'd guess that:
dev->stats.multicast += is_multicast_ether_addr(eth_hdr(skb)->h_dest);
generates faster code.
Especially if is_multicast_ether_addr(x) is (*x >> 7).

David

RE: [PATCH v2 net-next 4/5] sunvnet: count multicast packets

2017-03-16 Thread David Laight

From: Shannon Nelson
> Sent: 16 March 2017 00:18
> To: David Laight; net...@vger.kernel.org; da...@davemloft.net
> On 3/15/2017 1:50 AM, David Laight wrote:
> > From: Shannon Nelson
> >> Sent: 14 March 2017 17:25
> > ...
> >> +  if (unlikely(is_multicast_ether_addr(eth_hdr(skb)->h_dest)))
> >> +  dev->stats.multicast++;
> >
> > I'd guess that:
> > dev->stats.multicast += is_multicast_ether_addr(eth_hdr(skb)->h_dest);
> > generates faster code.
> > Especially if is_multicast_ether_addr(x) is (*x >> 7).

I'd clearly got brain-fade there, mcast bit is the first transmitted bit
(on ethernet) but the bytes are sent LSB first (like async).
> > David
> 
> Hi David, thanks for the comment.  My local instruction level
> performance guru is on vacation this week so I can't do a quick check
> with him today on this.  However, I"m not too worried here since the
> inline code for is_multicast_ether_addr() is simply
> 
>   return 0x01 & addr[0];
> 
> and objdump tells me that on sparc it compiles down to a simple single
> byte load and compare:
> 
>  325c:c2 08 80 03 ldub  [ %g2 + %g3 ], %g1
>  3260:80 88 60 01 btst  1, %g1
>  3264:32 60 00 b3 bne,a,pn   %xcc, 3530 <vnet_rx_one+0x430>
>  3268:c2 5c 61 68 ldx  [ %l1 + 0x168 ], %g1
>   dev->stats.multicast++;

Followed by a branch that might be marked 'assume taken' so the
normal path takes the branch.
I guess that is followed by 'add 1 to %g1', 'stx %g1, [ %l1 + 0x168 ]'
and a branch to 3530.
GCC must be using that condition to generate get the bottom of a loop
to 'fallthrough' to its top!

My version should generate something like:
ldub  [ %g2 + %g3 ], %g1
ldx   [ %l1 + 0x168 ], %g2
and   1, %g1
add   %g1, %g2, %g2
stx   %g2, [ %l1 + 0x168 ]
While this looks like 5 instructions (rather than 2) it has fewer pipeline
stalls and can be 'spread out' into the surrounding lines of code to
reduce the stalls further.

> I don't think this driver will ever be used on anything bug sparc, so
> I'm not worried about how x86 might compile this.

On x86 gcc is likely to ignore the 'unlikely' and generate a forwards
(predicted not taken) branch around the increment.
I've had to but asm comments in the else part of conditionals like
that to force gcc to generate a forwards jump to the 'unlikely' statements.

David

RE: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t

2017-03-20 Thread David Laight

From: Herbert Xu
> Sent: 20 March 2017 13:16
> On Mon, Mar 20, 2017 at 11:39:37AM +0100, Peter Zijlstra wrote:
> >
> > Can we at least give a benchmark and have someone run numbers? We should
> > be able to quantify these things.
> 
> Do you realise how many times this thing gets hit at 10Gb/s or
> higher? Anyway, since you're proposing this change you should
> demonstrate that it does not cause a performance regression.

What checks does refcnt_t actually do?

An extra decrement is hard to detect since the item gets freed early.
I guess making the main 'allocate/free' code hold (say) 64k references
would give some leeway for extra decrements.

An extra increment will be detected when the count eventually wraps.
Unless the error is in a very common path that won't happen for a long time.

On x86 the cpu flags from the 'lock inc/dec' could be used to reasonably
cheaply detect errors - provided you actually generate a forwards branch.

Otherwise having a common, but not every packet, code path verify that the
reference count is 'sane' would give reasonable coverage.

David

RE: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t

2017-03-20 Thread David Laight

From: Peter Zijlstra
> Sent: 20 March 2017 14:28
> On Mon, Mar 20, 2017 at 02:10:24PM +, David Laight wrote:
> > On x86 the cpu flags from the 'lock inc/dec' could be used to reasonably
> > cheaply detect errors - provided you actually generate a forwards branch.
> 
> Note that currently there is no arch specific implementation. We could
> of course cure this.
> 
> But note that the thing you propose; using the overflow flag, can only
> reasonably be done on PREEMPT=n kernels, otherwise we have an incredible
> number of contexts that can nest.
> 
> Sure; getting all starts aligned to double overflow is incredibly rare,
> but I don't want to be the one to have to debug that.

One overflow would set the overflow flag, you don't need both to fail.

In any case you can use the sign flag.
Say valid count values are -64k to -256 and 0 to MAXINT.
The count will normally be +ve unless the 'main free path'
has released the 64k references it holds.
If the sign bit is set after inc/dec the value is checked; 
might valid, an error, or require the item be freed.

Ok assuming the items have reasonable lifetimes and have a nominal
'delete' function.

David

RE: [PATCH net-next] r8152: simply the arguments

2017-03-17 Thread David Laight

From: Hayes Wang
> Sent: 17 March 2017 03:00
> To: David Laight; net...@vger.kernel.org
> Cc: nic_swsd; linux-kernel@vger.kernel.org; linux-...@vger.kernel.org
> Subject: RE: [PATCH net-next] r8152: simply the arguments
> 
> David Laight [mailto:david.lai...@aculab.com]
> > Sent: Thursday, March 16, 2017 10:28 PM
> [...]
> > If you are really lucky the compiler will optimise it away.
> > Otherwise it will generate another local variable and possibly
> > a register spill to stack.
> 
> However, I could reduce the time for calculating the address,
> because I only calculate it once and save the result to a variable.
> Right?

 address you want is just an offset from another pointer that is
commonly used and ought to be assigned to a register variable.

The offset can be added by the instruction that puts the value into
the register used for the first function argument.

So 'saving' it in another variable is extra work.

David

RE: [PATCH] mac80211: Fix clang warning about constant operand in logical operation

2017-04-10 Thread David Laight

From: Matthias Kaehlcke
> Sent: 06 April 2017 19:57
> Clang raises a warning about the expression 'strlen(CONFIG_XXX)' being
> used in a logical operation. Clangs' builtin strlen function resolves the
> expression to a constant at compile time, which causes clang to generate
> a 'constant-logical-operand' warning.
> 
> Split the if statement in two to avoid using the const expression in a
> logical operation.
> 
> Signed-off-by: Matthias Kaehlcke 
> ---
>  net/mac80211/rate.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/net/mac80211/rate.c b/net/mac80211/rate.c
> index 206698bc93f4..68ff202d6380 100644
> --- a/net/mac80211/rate.c
> +++ b/net/mac80211/rate.c
> @@ -173,9 +173,14 @@ ieee80211_rate_control_ops_get(const char *name)
>   /* try default if specific alg requested but not found */
>   ops = 
> ieee80211_try_rate_control_ops_get(ieee80211_default_rc_algo);
> 
> + if (ops)
> + goto unlock;
> +
>   /* try built-in one if specific alg requested but not found */
> - if (!ops && strlen(CONFIG_MAC80211_RC_DEFAULT))
> + if (strlen(CONFIG_MAC80211_RC_DEFAULT))
>   ops = 
> ieee80211_try_rate_control_ops_get(CONFIG_MAC80211_RC_DEFAULT);
> +
> +unlock:

Using the result of strlen() as a boolean should itself be a compilation error.
You could change to code to the equivalent:

if (!ops && CONFIG_MAC80211_RC_DEFAULT[0])
ops = ...

David

RE: [PATCH] powerpc/32: Move entry_32 functions just after HEAD functions.

2017-04-19 Thread David Laight

From: Christophe Leroy
> By default, PPC8xx PINs an ITLB on the first 8M of memory in order
> to avoid any ITLB miss on kernel code.
> However, with some debug functions like DEBUG_PAGEALLOC and
> (soon to come) DEBUG_RODATA, the PINned TLB is invalidated soon
> after startup so ITLB missed start to happen also on the kernel code.
> 
> In order to avoid any ITLB miss in a critical section, we have to
> ensure that their is no page boundary crossed between the setup of
> a new value in SRR0/SRR1 and the associated RFI. This cannot be done
> easily if entry_32 functions sits in the middle of other .text
> functions. By placing entry_32 just after the .head section (as already
> done for entry_64 on PPC64), we can more easily ensure the issue
> doesn't happen.

Shouldn't this be done by putting all the functions that 'matter'
into a named section instead of relying on the order of the input files?
(Which is what I think this is doing.)

David

RE: [PATCH v3 4/7] powerpc: kprobes: use safer string functions in kprobe_lookup_name()

2017-04-21 Thread David Laight

From: Naveen N. Rao
> Sent: 19 April 2017 13:51
...
>   dot_name[0] = '\0';
> - strncat(dot_name, name, sizeof(dot_name) - 1);
> + strlcat(dot_name, name, sizeof(dot_name));
...

Is that really zeroing the first byte just so it can append to it?

David

RE: [PATCH 16/22] xen-blkfront: Make use of the new sg_map helper function

2017-04-18 Thread David Laight

From: Logan Gunthorpe
> Sent: 13 April 2017 23:05
> Straightforward conversion to the new helper, except due to
> the lack of error path, we have to warn if unmapable memory
> is ever present in the sgl.
> 
> Signed-off-by: Logan Gunthorpe 
> ---
>  drivers/block/xen-blkfront.c | 33 +++--
>  1 file changed, 27 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 5067a0a..7dcf41d 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -807,8 +807,19 @@ static int blkif_queue_rw_req(struct request *req, 
> struct blkfront_ring_info *ri
>   BUG_ON(sg->offset + sg->length > PAGE_SIZE);
> 
>   if (setup.need_copy) {
> - setup.bvec_off = sg->offset;
> - setup.bvec_data = kmap_atomic(sg_page(sg));
> + setup.bvec_off = 0;
> + setup.bvec_data = sg_map(sg, SG_KMAP_ATOMIC);
> + if (IS_ERR(setup.bvec_data)) {
> + /*
> +  * This should really never happen unless
> +  * the code is changed to use memory that is
> +  * not mappable in the sg. Seeing there is a
> +  * questionable error path out of here,
> +  * we WARN.
> +  */
> + WARN(1, "Non-mappable memory used in sg!");
> + return 1;
> + }
...

Perhaps add a flag to mark failure as 'unexpected' and trace (and panic?)
inside sg_map().

David

RE: [PATCH v2 1/5] kprobes: convert kprobe_lookup_name() to a function

2017-04-18 Thread David Laight

From: Naveen N. Rao
> Sent: 12 April 2017 11:58
...
> +kprobe_opcode_t *kprobe_lookup_name(const char *name)
> +{
...
> + char dot_name[MODULE_NAME_LEN + 1 + KSYM_NAME_LEN];
> + const char *modsym;
> + bool dot_appended = false;
> + if ((modsym = strchr(name, ':')) != NULL) {
> + modsym++;
> + if (*modsym != '\0' && *modsym != '.') {
> + /* Convert to  */
> + strncpy(dot_name, name, modsym - name);
> + dot_name[modsym - name] = '.';
> + dot_name[modsym - name + 1] = '\0';
> + strncat(dot_name, modsym,
> + sizeof(dot_name) - (modsym - name) - 2);
> + dot_appended = true;

If the ':' is 'a way down' name[] then although the strncpy() won't
overrun dot_name[] the rest of the code can.

The strncat() call is particularly borked.

David

RE: [PATCH 01/26] compiler: introduce noinline_for_kasan annotation

2017-03-03 Thread David Laight

From: Andrey Ryabinin
> Sent: 03 March 2017 13:50
...
> noinline_iff_kasan might be a better name.  noinline_for_kasan gives the 
> impression
> that we always noinline function for the sake of kasan, while 
> noinline_iff_kasan
> clearly indicates that function is noinline only if kasan is used.

noinline_if_stackbloat

David

RE: [PATCH net-next 09/12] net: bcmgenet: return EOPNOTSUPP for unknown ioctl commands

2017-03-14 Thread David Laight

From: Doug Berger
> Sent: 14 March 2017 00:42
> This commit changes the ioctl handling behavior to return the
> EOPNOTSUPP error code instead of the EINVAL error code when an
> unknown ioctl command value is detected.
> 
> It also removes some redundant parsing of the ioctl command value
> and allows the SIOCSHWTSTAMP value to be handled.

A better description would seem to be:
Remove checks on ioctl command and just forward all ioctl requests
to phy_mii_ioctl().

I also thought the 'generic' response to an unknown ioctl command
was ENOTTY.

David

RE: [PATCH v2 1/6] powerpc/perf: Define big-endian version of perf_mem_data_src

2017-03-06 Thread David Laight

From: Peter Zijlstra
> Sent: 06 March 2017 11:22
> To: Madhavan Srinivasan
> Cc: Wang Nan; Alexander Shishkin; linux-kernel@vger.kernel.org; Arnaldo 
> Carvalho de Melo; Alexei
> Starovoitov; Ingo Molnar; Stephane Eranian; Sukadev Bhattiprolu; 
> linuxppc-...@lists.ozlabs.org
> Subject: Re: [PATCH v2 1/6] powerpc/perf: Define big-endian version of 
> perf_mem_data_src
> 
> On Mon, Mar 06, 2017 at 04:13:08PM +0530, Madhavan Srinivasan wrote:
> > From: Sukadev Bhattiprolu 
> >
> > perf_mem_data_src is an union that is initialized via the ->val field
> > and accessed via the bitmap fields. For this to work on big endian
> > platforms, we also need a big-endian represenation of perf_mem_data_src.
> 
> Doesn't this break interpreting the data on a different endian machine?

Best to avoid bitfields if you ever care about the bit order.

David

RE: [PATCH v2 1/5] kprobes: convert kprobe_lookup_name() to a function

2017-04-19 Thread David Laight

From: Naveen N. Rao
> Sent: 19 April 2017 09:09
> To: David Laight; Michael Ellerman
> Cc: linux-kernel@vger.kernel.org; linuxppc-...@lists.ozlabs.org; Masami 
> Hiramatsu; Ingo Molnar
> Subject: RE: [PATCH v2 1/5] kprobes: convert kprobe_lookup_name() to a 
> function
> 
> Excerpts from David Laight's message of April 18, 2017 18:22:
> > From: Naveen N. Rao
> >> Sent: 12 April 2017 11:58
> > ...
> >> +kprobe_opcode_t *kprobe_lookup_name(const char *name)
> >> +{
> > ...
> >> +  char dot_name[MODULE_NAME_LEN + 1 + KSYM_NAME_LEN];
> >> +  const char *modsym;
> >> +  bool dot_appended = false;
> >> +  if ((modsym = strchr(name, ':')) != NULL) {
> >> +  modsym++;
> >> +  if (*modsym != '\0' && *modsym != '.') {
> >> +  /* Convert to  */
> >> +  strncpy(dot_name, name, modsym - name);
> >> +  dot_name[modsym - name] = '.';
> >> +  dot_name[modsym - name + 1] = '\0';
> >> +  strncat(dot_name, modsym,
> >> +  sizeof(dot_name) - (modsym - name) - 2);
> >> +  dot_appended = true;
> >
> > If the ':' is 'a way down' name[] then although the strncpy() won't
> > overrun dot_name[] the rest of the code can.
> 
> Nice catch, thanks David!
> We need to be validating the length of 'name'. I'll put out a patch for
> that.

Silent truncation is almost certainly wrong here.

> As an aside, I'm not sure I follow what you mean when you say that the
> strncpy() won't overrun dot_name[]. If we have a name[] longer than
> sizeof(dot_name) with the ':' after that, the strncpy() can also overrun
> dot_name[].

Yes, that should just be a memcpy(), as should the strncat().

Using strncpy() where the length is other than the size of the target buffer
should be banned. Not that it ever does what people expect.
strncat() is even worse.

David

RE: [PATCH 2/8] selftests: lib.mk: define CLEAN macro to allow Makefiles to override clean

2017-04-24 Thread David Laight

From: Shuah Khan
> Sent: 22 April 2017 00:15
> Define CLEAN macro to allow Makefiles to override common clean target
> in lib.mk. This will help fix the following failures:
> 
> warning: overriding recipe for target 'clean'
> ../lib.mk:55: warning: ignoring old recipe for target 'clean'
> 
> Signed-off-by: Shuah Khan 
> ---
>  tools/testing/selftests/lib.mk | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
> index 775c589..959273c 100644
> --- a/tools/testing/selftests/lib.mk
> +++ b/tools/testing/selftests/lib.mk
> @@ -51,8 +51,12 @@ endef
>  emit_tests:
>   $(EMIT_TESTS)
> 
> -clean:
> +define CLEAN
>   $(RM) -r $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) 
> $(EXTRA_CLEAN)
> +endef
> +
> +clean:
> + $(CLEAN)

If might be easier to do something like:

ifneq($(NO_CLEAN),y)
clean:
$(RM) -r $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) 
$(EXTRA_CLEAN)
endif

David

RE: [PATCH 8/8] selftests: x86: override clean in lib.mk to fix warnings

2017-04-24 Thread David Laight

From: Linuxppc-dev Michael Ellerman
> Shuah Khan  writes:
> 
> > Add override for lib.mk clean to fix the following warnings from clean
> > target run.
> >
> > Makefile:44: warning: overriding recipe for target 'clean'
> > ../lib.mk:55: warning: ignoring old recipe for target 'clean'
> >
> > Signed-off-by: Shuah Khan 
> > ---
> >  tools/testing/selftests/x86/Makefile | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/selftests/x86/Makefile 
> > b/tools/testing/selftests/x86/Makefile
> > index 38e0a9c..4d27550 100644
> > --- a/tools/testing/selftests/x86/Makefile
> > +++ b/tools/testing/selftests/x86/Makefile
> > @@ -40,8 +40,9 @@ all_32: $(BINARIES_32)
> >
> >  all_64: $(BINARIES_64)
> >
> > -clean:
> > +override define CLEAN
> > $(RM) $(BINARIES_32) $(BINARIES_64)
> > +endef
> 
> Simpler as:
> 
> EXTRA_CLEAN := $(BINARIES_32) $(BINARIES_64)

Actually for builds that insist on crapping all over the source tree I've used:

clean:
rm -rf `cat .cvsignore 2>/dev/null`

David

RE: [PATCH net-next 4/4] net: dsa: lan9303: MDIO access phy registers directly

2017-07-31 Thread David Laight

From: Florian Fainelli
> Sent: 28 July 2017 18:05
...
>  +EXPORT_SYMBOL(lan9303_indirect_phy_ops);
> >>>
> >>> Isn't EXPORT_SYMBOL_GPL prefered over EXPORT_SYMBOL?
> >>
> >> I have no opinion. I just used the same variant as the other EXPORTS
> >> in the file.
> >
> > If there is no concern from others about this, LGTM too:
> 
> Since the kernel module license is GPL, EXPORT_SYMBOL_GPL() would seem
> to be appropriate, which can be done as a subsequent patch.

It depends on whether the function needs to be usable by 'out of tree'
non-GPL modules.
This looks like a 'private' export between related modules.
The problems arise with functions like put_ns() and put_pid()
which can easily be needed by non-GPL code.

David

RE: [PATCH 1/5] Fix packed and aligned attribute warnings.

2017-07-31 Thread David Laight

From: SZ Lin
> Sent: 29 July 2017 08:24
...
> diff --git a/drivers/char/tpm/tpm_ibmvtpm.h b/drivers/char/tpm/tpm_ibmvtpm.h
> index 91dfe766d080..9f708ca3dc84 100644
> --- a/drivers/char/tpm/tpm_ibmvtpm.h
> +++ b/drivers/char/tpm/tpm_ibmvtpm.h
> @@ -25,7 +25,7 @@ struct ibmvtpm_crq {
>   __be16 len;
>   __be32 data;
>   __be64 reserved;
> -} __attribute__((packed, aligned(8)));
> +} __packed __aligned(8);

You can't need __packed and __aligned(8) on that structure.
There are no gaps and you are saying it is always aligned.

So just remove the pointless attributes.

David

RE: [RESEND][PATCH V10 0/3] powernv : Add support for OPAL-OCC command/response interface

2017-08-01 Thread David Laight

From: Shilpasri G Bhat
> Sent: 31 July 2017 08:43
> In P9, OCC (On-Chip-Controller) supports shared memory based
> commad-response interface. Within the shared memory there is an OPAL
  ^ typo
> command buffer and OCC response buffer that can be used to send
> inband commands to OCC. The following commands are supported:
...

David

RE: [PATCH] netfilter: fix stringop-overflow warning with UBSAN

2017-08-01 Thread David Laight

From: Arnd Bergmann
> Sent: 31 July 2017 11:09
> Using gcc-7 with UBSAN enabled, we get this false-positive warning:
> 
> net/netfilter/ipset/ip_set_core.c: In function 'ip_set_sockfn_get':
> net/netfilter/ipset/ip_set_core.c:1998:3: error: 'strncpy' writing 32 bytes 
> into a region of size 2
> overflows the destination [-Werror=stringop-overflow=]
>strncpy(req_get->set.name, set ? set->name : "",
>^~~~
> sizeof(req_get->set.name));
> ~~
> 
> This seems completely bogus, and I could not find a nice workaround.
> To work around it in a less elegant way, I change the ?: operator
> into an if()/else() construct.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  net/netfilter/ipset/ip_set_core.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/ipset/ip_set_core.c 
> b/net/netfilter/ipset/ip_set_core.c
> index e495b5e484b1..d7ebb021003b 100644
> --- a/net/netfilter/ipset/ip_set_core.c
> +++ b/net/netfilter/ipset/ip_set_core.c
> @@ -1995,8 +1995,12 @@ ip_set_sockfn_get(struct sock *sk, int optval, void 
> __user *user, int *len)
>   }
>   nfnl_lock(NFNL_SUBSYS_IPSET);
>   set = ip_set(inst, req_get->set.index);
> - strncpy(req_get->set.name, ,
> - IPSET_MAXNAMELEN);
> + if (set)
> + strncpy(req_get->set.name, set->name,
> + sizeof(req_get->set.name));
> + else
> + memset(req_get->set.name, '\0',
> +sizeof(req_get->set.name));

If you use strncpy() here, the compiler might optimise the code
back to 'how it was before'.

Or, maybe an explicit temporary: 'const char *name = set ? set->name : "";

David

RE: [PATCH v2] iwlwifi: Demote messages about fw flags size to info

2017-08-03 Thread David Laight

From: João Paulo Rechi Vita
> Sent: 03 August 2017 15:30
> These messages are not reporting a real error, just the fact that the
> firmware knows about more flags then the driver.
  than
> 
> Currently these messages are presented to the user during boot if there
> is no bootsplash covering the console, even when booting the kernel with
> "quiet".
> 
> Demoting it to the warn level helps having a clean boot process.
> 
> Signed-off-by: Joo Paulo Rechi Vita 

David

RE: [PATCH] New Chapter on CodingStyle .

2017-08-15 Thread David Laight

From: Stephen Hemminger
> Sent: 15 August 2017 17:21
> On Tue, 15 Aug 2017 10:42:39 +0000
> David Laight <david.lai...@aculab.com> wrote:
> 
> > From: Jonathan Corbet
> > > Sent: 12 August 2017 15:55
> > ...
> > > > +   Chapter 20: Put values on initialisers without exception
> > > > +
> > > > +When declaring variables on functions must put values:
> > >
> > > Thanks for sending a patch for the kernel's documentation.
> > > Unfortunately, I can't accept this patch for a couple of reasons:
> > ...
> > > - The coding style document is there to describe the community's
> > >   standards for kernel code.  It is *not* a mechanism for imposing new
> > >   standards.  If you really think that the kernel community should adopt
> > >   this rule, you will need to argue for it on the mailing lists.  I will
> > >   say, though, that I do not expect that this effort would be successful.
> >
> > I'd even go as far as suggesting almost the opposite.
> > Declarations should only have initialisers if the value is constant.
> 
> Yup. This new rule sound like something taught to people in coding schools.
> But initializing everything defeats the compiler detection of uninitialized 
> variables
> which is more useful for catching errors.

You'll also get:
Values being read the wrong side of locks.
Values being read early so requiring spilling to stack.

Next someone will be suggesting that all pointers should be checked
against NULL every time they are used.

David

RE: [RFC PATCH v5 0/5] vfio-pci: Add support for mmapping MSI-X table

2017-08-15 Thread David Laight

From: Benjamin Herrenschmidt
> Sent: 15 August 2017 02:34
> On Tue, 2017-08-15 at 09:16 +0800, Jike Song wrote:
> > > Taking a step back, though, why does vfio-pci perform this check in the
> > > first place? If a malicious guest already has control of a device, any
> > > kind of interrupt spoofing it could do by fiddling with the MSI-X
> > > message address/data it could simply do with a DMA write anyway, so the
> > > security argument doesn't stand up in general (sure, not all PCIe
> > > devices may be capable of arbitrary DMA, but that seems like more of a
> > > tenuous security-by-obscurity angle to me).
> 
> I tried to make that point for years, thanks for re-iterating it :-)

Indeed, we have an FPGA based PCIe card where the MSI-X table is just a
piece of PCIe accessible memory.
The device driver has to read the MSI-X table and write the address+data
values to other registers which are then used to raise the interrupt.
(Ok, I've written a better interrupt generator so we don't do that
any more.)

David

RE: [PATCH] New Chapter on CodingStyle .

2017-08-15 Thread David Laight

From: Jonathan Corbet
> Sent: 12 August 2017 15:55
...
> > +   Chapter 20: Put values on initialisers without exception
> > +
> > +When declaring variables on functions must put values:
> 
> Thanks for sending a patch for the kernel's documentation.
> Unfortunately, I can't accept this patch for a couple of reasons:
...
> - The coding style document is there to describe the community's
>   standards for kernel code.  It is *not* a mechanism for imposing new
>   standards.  If you really think that the kernel community should adopt
>   this rule, you will need to argue for it on the mailing lists.  I will
>   say, though, that I do not expect that this effort would be successful.

I'd even go as far as suggesting almost the opposite.
Declarations should only have initialisers if the value is constant.

David

RE: [RFC PATCH 2/2] bpf: Initialise mod[] in bpf_trace_printk

2017-08-14 Thread David Laight

From: Daniel Borkmann
> Sent: 11 August 2017 17:47
> On 08/09/2017 10:34 PM, Daniel Borkmann wrote:
> > On 08/09/2017 09:39 AM, James Hogan wrote:
> > [...]
> >> time (but please consider looking at the other patch which is certainly
> >> a more real issue).
> >
> > Sorry for the delay, started looking into that and whether we
> > have some other options, I'll get back to you on this.
> 
> Could we solve this more generically (as in: originally intended) in
> the sense that we don't need to trust the gcc va_list handling; I feel
> this is relying on an implementation detail? Perhaps something like
> below poc patch?

That patch still has 'cond ? arg : cond1 ? (long)arg : (u32)arg' so
probably has the same warning as the original version.

The va_list handling is defined by the relevant ABI, not gcc.

It is ok on x86-64 because all 32bit values are extended to 64bits
before being passed as arguments (either in registers, or on the stack).
Nothing in the C language requires that, so other 64bit architectures
could pass 32bit values in 4 bytes of stack.

David

RE: [RFC PATCH v5 0/5] vfio-pci: Add support for mmapping MSI-X table

2017-08-17 Thread David Laight

From: Alex Williamson
> Sent: 16 August 2017 17:56
...
> Firmware pissing match...  Processors running with 8k or less page size
> fall within the recommendations of the PCI spec for register alignment
> of MMIO regions of the device and this whole problem becomes less of an
> issue.

Actually if qemu is causing the MSI-X table accesses to fault, why doesn't
it just lie to the guest about the physical address of the MSI-X table?
Then mmio access to anything in the same physical page will just work.

It has already been pointed out that you can't actually police the
interrupts that are raised without host hardware support.

Actually, putting other vectors in the MSI-X table is boring, most
drivers will ignore unexpected interrupts.
Much more interesting are physical memory addresses and accessible IO
addresses.
Of course, a lot of boards have PCI master capability and can probably
be persuaded to do writes to specific location anyway.

David

RE: [PATCH 04/22] scsi: fusion: fix string overflow warning

2017-07-17 Thread David Laight

From: Arnd Bergmann
> Sent: 14 July 2017 13:07
> gcc points out a theorerical string overflow:
> 
> drivers/message/fusion/mptbase.c: In function 'mpt_detach':
> drivers/message/fusion/mptbase.c:2103:17: error: '%s' directive writing up to 
> 31 bytes into a region
> of size 28 [-Werror=format-overflow=]
> sprintf(pname, MPT_PROCFS_MPTBASEDIR "/%s/summary", ioc->name);
>^
> drivers/message/fusion/mptbase.c:2103:2: note: 'sprintf' output between 13 
> and 44 bytes into a
> destination of size 32
> 
> We can simply double the size of the local buffer here to be on the
> safe side.

I think I'd change it to snprintf() as well.
Saves any worries if ioc->name isn't '\0' terminated.

David

RE: [PATCH] tty: Fix TIOCGPTPEER ioctl definition

2017-07-11 Thread David Laight

From: Linuxppc-dev  Gleb Fotengauer-Malinovskiy
> Sent: 11 July 2017 01:12
> This ioctl does nothing to justify an _IOC_READ or _IOC_WRITE flag
> because it doesn't copy anything from/to userspace to access the
> argument.
> 
> Fixes: 54ebbfb1 ("tty: add TIOCGPTPEER ioctl")
...
> -#define TIOCGPTPEER  _IOR('T', 0x41, int) /* Safely open the slave */
> +#define TIOCGPTPEER  _IO('T', 0x41) /* Safely open the slave */

This is a user API change. When was the ioctl added?

David

RE: [PATCH 3/3] fpga manager: Add FT232H driver for Altera FPP

2017-07-07 Thread David Laight

From: Anatolij Gustschin
> Sent: 06 July 2017 21:49
> 
> Add FPGA manager driver for loading Altera FPGAs via fast
> passive parallel (FPP) interface using FTDI FT232H chip.

I can't help feeling this is very specific for a particular card.

While all(?) Altera (now Intel) FPGA support FPP programming
it will require extra on-board logic to interface to anything
external - especially something as generic as the FT232H.

Since this isn't a connector that Altera put on any of their
dev boards (not any I've seen anyway), the board manufacturer
has probably picked their own assignments for the pins.

The only standard interface for programming the FPGA is the
JTAG one - and the data formats for that probably require
reverse engineering (has been done).

David

RE: [PATCH V4 net-next 7/8] net: hns3: Add Ethtool support to HNS3 driver

2017-07-25 Thread David Laight

From: Rustad, Mark D
> Sent: 24 July 2017 21:32
> > On Jul 23, 2017, at 10:05 AM, Florian Fainelli  wrote:
> >> +
> >> +  strncpy(drvinfo->version, HNAE_DRIVER_VERSION,
> >> +  sizeof(drvinfo->version));
> >> +  drvinfo->version[sizeof(drvinfo->version) - 1] = '\0';
> >
> > strlcpy() would probably do that for you.
> 
> You need to be careful about strlcpy - it does not completely write the 
> destination buffer as strncpy
> does, and so can result in a kernel memory leak if the destination is not 
> known to already be cleared.

Not only that, strlcpy() is defined to return the size of the
source string.
So if the source buffers isn't '\0' terminated it can fault.
(Not a problem above.)

Neither function is the one you were looking for.

David

RE: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-07-25 Thread David Laight

From: Brijesh Singh
> Sent: 24 July 2017 20:08
> From: Tom Lendacky 
> 
> Secure Encrypted Virtualization (SEV) does not support string I/O, so
> unroll the string I/O operation into a loop operating on one element at
> a time.
> 
> Signed-off-by: Tom Lendacky 
> Signed-off-by: Brijesh Singh 
> ---
>  arch/x86/include/asm/io.h | 26 ++
>  1 file changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
> index e080a39..2f3c002 100644
> --- a/arch/x86/include/asm/io.h
> +++ b/arch/x86/include/asm/io.h
> @@ -327,14 +327,32 @@ static inline unsigned type in##bwl##_p(int port)   
> \
>   \
>  static inline void outs##bwl(int port, const void *addr, unsigned long 
> count) \
>  {

Is it even worth leaving these as inline functions?
Given the speed of IO cycles it is unlikely that the cost of calling a real
function will be significant.
The code bloat reduction will be significant.

David

RE: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-07-27 Thread David Laight

From: Brijesh Singh
> Sent: 26 July 2017 21:07
...
> I am not sure if I understand your concern.
> 
> Are you commenting on amount of code duplication ? If so, I can certainly 
> improve
> and use the similar macro used into header file to generate the functions 
> body.

If you are careful the real functions could expand the inline functions
that get used when SEV is compiled out.

Oh, if you are looking at this, can you fix memcpy_to_io()
so that it is never 'rep movsb'?

David

RE: [RFC Part1 PATCH v3 07/17] x86/mm: Include SEV for encryption memory attribute changes

2017-07-28 Thread David Laight

From: Borislav Petkov
> Sent: 27 July 2017 15:59
> On Mon, Jul 24, 2017 at 02:07:47PM -0500, Brijesh Singh wrote:
> > From: Tom Lendacky 
> >
> > The current code checks only for sme_active() when determining whether
> > to perform the encryption attribute change.  Include sev_active() in this
> > check so that memory attribute changes can occur under SME and SEV.
> >
> > Signed-off-by: Tom Lendacky 
> > Signed-off-by: Brijesh Singh 
> > ---
> >  arch/x86/mm/pageattr.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> > index dfb7d65..b726b23 100644
> > --- a/arch/x86/mm/pageattr.c
> > +++ b/arch/x86/mm/pageattr.c
> > @@ -1781,8 +1781,8 @@ static int __set_memory_enc_dec(unsigned long addr, 
> > int numpages, bool enc)
> > unsigned long start;
> > int ret;
> >
> > -   /* Nothing to do if the SME is not active */
> > -   if (!sme_active())
> > +   /* Nothing to do if SME and SEV are not active */
> > +   if (!sme_active() && !sev_active())
> 
> This is the second place which does
> 
>   if (!SME && !SEV)
> 
> I wonder if, instead of sprinking those, we should have a
> 
>   if (mem_enc_active())
> 
> or so which unifies all those memory encryption logic tests and makes
> the code more straightforward for readers who don't have to pay
> attention to SME vs SEV ...

If any of the code paths are 'hot' it would make sense to be checking
a single memory location.

David

RE: [PATCH 3/4] powerpc: pseries: only store the device node basename in full_name

2017-07-26 Thread David Laight

From: Rob Herring
> Sent: 25 July 2017 22:44
> With dependencies on full_name containing the entire device node path
> removed, stop storing the full_name in nodes created by
> dlpar_configure_connector() and pSeries_reconfig_add_node().
...
>   dn = kzalloc(sizeof(*dn), GFP_KERNEL);
>   if (!dn)
>   return NULL;
> 
>   name = (char *)ccwa + be32_to_cpu(ccwa->name_offset);
> - dn->full_name = kasprintf(GFP_KERNEL, "%s/%s", path, name);
> + dn->full_name = kasprintf(GFP_KERNEL, "%s", name);

Isn't this just strdup()?
Perhaps you should rename full_name to name - since it is no longer 'full'?

Maybe you could do a single malloc() for both 'dn' and the name?

David

RE: [PATCH 1/3] mfd: Add support for FTDI FT232H devices

2017-07-19 Thread David Laight

From: Anatolij Gustschin
> Sent: 19 July 2017 14:30
...
> >Stupid question, I know, but I cannot help thinking: If you have an
> >EEPROM then why the h... don't you use an application specific device
> >ID?
> 
> It would make sense for adapter devices that you can buy and plug.
> In my particular case the configuration device with FTDI chips is
> internal part of embedded board, the configuration interface is
> never exposed to end users. I doesn't make sense to register an
> ID for such hardware.

Sounds like you should absolutely be registering an ID so that
nothing can try to use it using the default one.

David

RE: [PATCH v2 0/9] Remove spin_unlock_wait()

2017-07-06 Thread David Laight

From: Paul E. McKenney
> Sent: 06 July 2017 00:30
> There is no agreed-upon definition of spin_unlock_wait()'s semantics,
> and it appears that all callers could do just as well with a lock/unlock
> pair.  This series therefore removes spin_unlock_wait() and changes
> its users to instead use a lock/unlock pair.  The commits are as follows,
> in three groups:
> 
> 1-7.  Change uses of spin_unlock_wait() and raw_spin_unlock_wait()
>   to instead use a spin_lock/spin_unlock pair.  These may be
>   applied in any order, but must be applied before any later
>   commits in this series.  The commit logs state why I believe
>   that these commits won't noticeably degrade performance.

I can't help feeling that it might be better to have a spin_lock_sync()
call that is equivalent to a spin_lock/spin_unlock pair.
The default implementation being an inline function that does exactly that.
This would let an architecture implement a more efficient version.

It might even be that this is the defined semantics of spin_unlock_wait().

Note that it can only be useful to do a spin_lock/unlock pair if it is
impossible for another code path to try to acquire the lock.
(Or, at least, the code can't care if the lock is acquired just after.)
So if it can de determined that the lock isn't held (a READ_ONCE()
might be enough) the lock itself need not be acquired (with the
associated slow bus cycles).

David

RE: [PATCH 4/8] usb: bdc: Small code cleanup

2017-06-29 Thread David Laight

From: Al Cooper
> Sent: 28 June 2017 15:56
> On Wed, Jun 28, 2017 at 4:47 AM, David Laight <david.lai...@aculab.com> wrote:
> >>
> >>   temp = bdc_readl(bdc->regs, BDC_BDCSC);
> >>   if ((temp & BDC_P64) &&
> >>   !dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64))) {
> >> - dev_dbg(bdc->dev, "Using 64-bit address\n");
> >> + dev_dbg(dev, "Using 64-bit address\n");
> >>   } else {
> >> - ret = dma_set_mask_and_coherent(>dev, 
> >> DMA_BIT_MASK(32));
> >> + ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> >
> > That just wrong...
> > Or was wrong before.
> 
> Why is this wrong?

It isn't obvious that >dev is bdc->dev and hence dev.

David

RE: [PATCH v6 1/5] skbuff: return -EMSGSIZE in skb_to_sgvec to prevent overflow

2017-04-27 Thread David Laight

From: Jason A. Donenfeld
> On Thu, Apr 27, 2017 at 1:30 PM, Sabrina Dubroca  wrote:
> > Hmm, I think this can actually happen:
> 
> Alright, perhaps better to err on the side of caution, then.

You only need to recurse if both pointers are set.

David

RE: [PATCH] macsec: avoid heap overflow in skb_to_sgvec

2017-04-24 Thread David Laight

From: Jason A. Donenfeld
> Sent: 21 April 2017 22:15
> While this may appear as a humdrum one line change, it's actually quite
> important. An sk_buff stores data in three places:
> 
> 1. A linear chunk of allocated memory in skb->data. This is the easiest
>one to work with, but it precludes using scatterdata since the memory
>must be linear.
> 2. The array skb_shinfo(skb)->frags, which is of maximum length
>MAX_SKB_FRAGS. This is nice for scattergather, since these fragments
>can point to different pages.
> 3. skb_shinfo(skb)->frag_list, which is a pointer to another sk_buff,
>which in turn can have data in either (1) or (2).
> 
> The first two are rather easy to deal with, since they're of a fixed
> maximum length, while the third one is not, since there can be
> potentially limitless chains of fragments. Fortunately dealing with
> frag_list is opt-in for drivers, so drivers don't actually have to deal
> with this mess. For whatever reason, macsec decided it wanted pain, and
> so it explicitly specified NETIF_F_FRAGLIST.
> 
> Because dealing with (1), (2), and (3) is insane, most users of sk_buff
> doing any sort of crypto or paging operation calls a convenient function
> called skb_to_sgvec (which happens to be recursive if (3) is in use!).
> This takes a sk_buff as input, and writes into its output pointer an
> array of scattergather list items. Sometimes people like to declare a
> fixed size scattergather list on the stack; othertimes people like to
> allocate a fixed size scattergather list on the heap. However, if you're
> doing it in a fixed-size fashion, you really shouldn't be using
> NETIF_F_FRAGLIST too (unless you're also ensuring the sk_buff and its
> frag_list children arent't shared and then you check the number of
> fragments in total required.)
> 
> Macsec specifically does this:
> 
> size += sizeof(struct scatterlist) * (MAX_SKB_FRAGS + 1);
> tmp = kmalloc(size, GFP_ATOMIC);
> *sg = (struct scatterlist *)(tmp + sg_offset);
>   ...
> sg_init_table(sg, MAX_SKB_FRAGS + 1);
> skb_to_sgvec(skb, sg, 0, skb->len);
> 
> Specifying MAX_SKB_FRAGS + 1 is the right answer usually, but not if you're
> using NETIF_F_FRAGLIST, in which case the call to skb_to_sgvec will
> overflow the heap, and disaster ensues.
...

Shouldn't skb_to_sgvec() be checking the number of fragments against
the size of the sg list?
The callers would then all need auditing to allow for failure.

David

RE: [PATCH] kallsyms: optimize kallsyms_lookup_name() for a few cases

2017-04-25 Thread David Laight

From: Naveen N. Rao
> Sent: 25 April 2017 17:18
> 1. Fail early for invalid/zero length symbols.
> 2. Detect names of the form  and skip checking for kernel
> symbols in that case.
> 
> Signed-off-by: Naveen N. Rao 
> ---
> Masami, Michael,
> I have added two very simple checks here, which I felt is good to have,
> rather than the elaborate checks in the previous version. Given the
> change in module code to use strnchr(), the checks are now safe and
> further tests are not probably not that useful.
...
> + if (strnchr(name, MODULE_NAME_LEN, ':'))
> + return module_kallsyms_lookup_name(name);

Should that be MODULE_NAME_LEN - 1 ?

David

RE: [PATCH] iov_iter: don't revert if csum error

2017-04-28 Thread David Laight

From: Sabrina Dubroca
> Sent: 28 April 2017 14:17
...
> > if (__skb_checksum_complete(skb))
> > -   goto csum_error;
> > +   goto fault;
> 
> With this patch, skb_copy_and_csum_datagram_msg() will return -EFAULT
> for an incorrect checksum, that doesn't seem right.

Especially since (IIRC) -EFAULT generates SIGSEGV.

David

RE: [PATCH net] tcp: avoid bogus gcc-7 array-bounds warning

2017-07-28 Thread David Laight

From: Arnd Bergmann
> Sent: 28 July 2017 15:42
...
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2202,9 +2202,10 @@ static bool tcp_small_queue_check(struct sock *sk, 
> const struct sk_buff *skb,
>  static void tcp_chrono_set(struct tcp_sock *tp, const enum tcp_chrono new)
>  {
>   const u32 now = tcp_jiffies32;
> + enum tcp_chrono old = tp->chrono_type;
> 
> - if (tp->chrono_type > TCP_CHRONO_UNSPEC)
> - tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;
> + if (old > TCP_CHRONO_UNSPEC)
> + tp->chrono_stat[old - 1] += now - tp->chrono_start;
>   tp->chrono_start = now;
>   tp->chrono_type = new;

What a horrid combination of enum and integers.
Also have u32 chrono_stat[3]; - should probably be [__TCP_CHRONO_MAX - 1]
(or - CHRONO_FIRST which is defined to be 1).

Checking if (old != 0) would make the code more readable.

David

RE: [PATCH] Adding Agile-SD TCP module and modifying Kconfig and Makefile

2017-08-02 Thread David Laight

From: mohamedalrshah
> Sent: 02 August 2017 05:44
> Published:
> Alrshah, M.A., Othman, M., Ali, B. and Hanapi, Z.M., 2015. Agile-SD: a 
> Linux-based
> TCP congestion control algorithm for supporting high-speed and short-distance 
> networks.
> Journal of Network and Computer Applications, 55, pp.181-190.
> 
> Agile-SD is a new loss-based and RTT-independent TCP congestion
> control algorithm designed to support high-speed and Short-Distance (SD)
> networks. It mainly contributes the agility factor mechanism, which allows
> Agile-SD to deal with small buffer sizes while reducing its sensitivity to
> packet loss. Due to the use of this mechanism, Agile-SD improves the
> throughput of TCP up to 50% compared to Cubic-TCP and Compound-TCP. Its
> performance was evaluated using the well-known NS2 simulator to measure
> the average throughput, loss ratio and fairness.
> 
> Agile-SD is designed and implemented by Mohamed A. Alrshah et al. (2015) as
> a Linux kernel module in the real Linux operating system (openSUSE 42.1 Leap),
> which is implemented at the Network, Parallel and Distributed Computing 
> Laboratory,
> A2.10, second floor, Block A, Faculty of Computer Science and Information 
> Technology,
> Universiti Putra Malaysia, over real PCs connected to the Internet for daily 
> uses and
> evaluation purposes.
> 
> The main motivation behind Agile-SD is to support the short-distance networks 
> such
> as local area networks and data center networks in order to increase the 
> bandwidth
> utilization over high-speed networks. Moreover, Agile-SD is introduced in 
> support
> of the open source community, where it can be used under the OpenGL agreement.

What a pile of bullshit ...

David

RE: [v6 11/15] arm64/kasan: explicitly zero kasan shadow memory

2017-08-08 Thread David Laight

From: Pasha Tatashin
> Sent: 08 August 2017 12:49
> Thank you for looking at this change. What you described was in my
> previous iterations of this project.
> 
> See for example here: https://lkml.org/lkml/2017/5/5/369
> 
> I was asked to remove that flag, and only zero memory in place when
> needed. Overall the current approach is better everywhere else in the
> kernel, but it adds a little extra code to kasan initialization.

Perhaps you could #define the function prototype(s?) so that the flags
are not passed unless it is a kasan build?

David

RE: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-08-07 Thread David Laight

From: Casey Leedom
> Sent: 04 August 2017 21:49
...
> Whenever our Hardware Designers implement new functionality in our hardware,
> they almost always put in A. several "knobs" which can control fundamental
> parameters of the new Hardware Feature, and B.  a mechanism of completely
> disabling it if necessary.  This stems from the incredibly long Design ->
> Deployment cyle for Hardware (as opposed to the edit->compile->run cycle for 
> s!

Indeed, I'd also expect there to be an undocumented flag to turn
it on (broken) in earlier parts to allow testing.

David

RE: [PATCH 1/3] Fix ERROR: trailing statements should be on next line

2017-05-15 Thread David Laight

From: Alex Williamson
> Sent: 15 May 2017 04:21
...
> > >   /* Find end of list, sew whole thing into vi->rq.pages. */
> > > - for (end = page; end->private; end = (struct page *)end->private);
> > > + for (end = page; end->private; end = (struct page *)end->private)
> > > + ;
> 
> FWIW, I generally like to put a comment on the next line to make it
> abundantly clear that there's nothing in the body of the loop, it's
> also more aesthetically pleasing than a semi-colon on the line by
> itself, ex. /* Nothing */;  It's just too easy to misinterpret the
> loop otherwise, especially without gratuitous white space.  Thanks,

My preference is to put 'continue;' on a line by itself.
Or even move the termination condition into the loop:
for (end = page;; end = (struct page *)end->private)
if (!end->private)
break;

(oh, is that cast needed??)

David

RE: [PATCH 1/4] hamradio: Combine two seq_printf() calls into one in yam_seq_show()

2017-05-10 Thread David Laight

From: SF Markus Elfring
> Sent: 09 May 2017 15:22
> A bit of data was put into a sequence by two separate function calls.
> Print the same data by a single function call instead.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 
> ---
>  drivers/net/hamradio/yam.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/net/hamradio/yam.c b/drivers/net/hamradio/yam.c
> index b6891ada1d7b..542f1e511df1 100644
> --- a/drivers/net/hamradio/yam.c
> +++ b/drivers/net/hamradio/yam.c
> @@ -830,8 +830,7 @@ static int yam_seq_show(struct seq_file *seq, void *v)
>   seq_printf(seq, "  RxFrames %lu\n", dev->stats.rx_packets);
>   seq_printf(seq, "  TxInt%u\n", yp->nb_mdint);
>   seq_printf(seq, "  RxInt%u\n", yp->nb_rxint);
> - seq_printf(seq, "  RxOver   %lu\n", dev->stats.rx_fifo_errors);
> - seq_printf(seq, "\n");
> + seq_printf(seq, "  RxOver   %lu\n\n", dev->stats.rx_fifo_errors);
>   return 0;

The code was consistently (and probably deliberately) using one seq_printf()
call for each line so that the source looks 'a bit like' the output.

These changes are all stupid.

David

RE: sky2: Use seq_putc() in sky2_debug_show()

2017-05-09 Thread David Laight

From: Stephen Hemminger
> Sent: 09 May 2017 06:50
> On Mon, 8 May 2017 19:42:46 +0200
> SF Markus Elfring  wrote:
> 
> > > Which issue do you mean? I dont see any issue you fix here.
> >
> > Are the run time characteristics a bit nicer for the function seq_putc
> > in comparison to the function seq_puts for printing a single line break 
> > here?
> >
> > Regards,
> > Markus
> 
> I would put this in why bother category. seq_puts is correct and this is only
> in diagnostic output useful to developer and disabled on most distro kernels

Sometimes consistency is best.
Output everything with seq_printf(), using a format "%s" if necessary.
The performance really doesn't matter here at all.

It is also (probably) possible to get gcc to do the conversions - as it does 
for printf().
(A right PITA for embedded systems where only printf() exists.)

David

RE: [PATCH 00/51] rtc: stop using rtc deprecated functions

2017-06-21 Thread David Laight

From: Russell King - ARM Linux
> Sent: 20 June 2017 22:16
..
> Consider that at the moment, we define the 32-bit RTC representation to
> start at a well known epoch.  We _could_ decide that when it wraps to
> 0x8000 seconds, we'll define the lower 0x4000 seconds to mean
> dates in the future - and keep rolling that forward each time we cross
> another 0x4000 seconds.  Unless someone invents a real time machine,
> we shouldn't need to set a modern RTC back to 1970.

True, just treating the value as unsigned gives another 67 years.

If a 32bit RTC is programmed with the low 32bits of the 64bit 'seconds
since 1970' the kernel should have no real difficulty sorting out the
high bits from other available information.

Problems with things like the x86 bios setting the rtc to stupid values
are another matter.
ISTR the rtc chip has a bit for 'summertime' that is never set, on a
multi-os system you can get multiple summer time changes.

David

RE: remove dma_alloc_noncoherent

2017-06-20 Thread David Laight

From: Christoph Hellwig
> Sent: 16 June 2017 08:17
>
> For many years we've had the dma_alloc_attrs API that is more flexible
> than dma_alloc_noncoherent.  This series moves the remaining users over
> to the attrs API.

And most of the callers probably only want to specify 'noncoherent'.
Grepping the source for other uses is easier if the wrapper is left.

David

RE: [PATCH] soc/qman: Sleep instead of stuck hacking jiffies.

2017-06-26 Thread David Laight

From: Karim Eshapa
> Sent: 25 June 2017 16:14
> Use msleep() instead of stucking with
> long delay will be more efficient.
...
> --- a/drivers/soc/fsl/qbman/qman.c
> +++ b/drivers/soc/fsl/qbman/qman.c
> @@ -1084,11 +1084,7 @@ static int drain_mr_fqrni(struct qm_portal *p)
>* entries well before the ring has been fully consumed, so
>* we're being *really* paranoid here.
>*/
> - u64 now, then = jiffies;
> -
> - do {
> - now = jiffies;
> - } while ((then + 1) > now);
> + msleep(1);
...
How is that in any way equivalent?
If HZ is 1000 the old code loops for 10 seconds.
If HZ is 250 (common for some distros) it loops for 40 seconds.

Clearly both are horrid, but it isn't at all clear that a 1ms sleep
is performing the same job.

My guess is that this code is never called, and broken if actually called.

David

RE: [PATCH 2/9] timers: provide a "modern" variant of timers

2017-05-19 Thread David Laight

From: Christoph Hellwig
> Sent: 16 May 2017 12:48
>
> The new callback gets a pointer to the timer_list itself, which can
> then be used to get the containing structure using container_of
> instead of casting from and to unsigned long all the time.

What about sensible drivers that put some other value in the 'data'
field?

Perhaps it ought to have been 'void *data'.

Seems retrograde to be passing the address of the timer structure
(which, in principle, the callers no nothing about).

So I wouldn't call it 'modern', just different.

David

RE: [RFC V1 1/1] net: cdc_ncm: Reduce memory use when kernel memory low

2017-05-19 Thread David Laight

From: linux-usb-ow...@vger.kernel.org [mailto:linux-usb-ow...@vger.kernel.org] 
On Behalf Of Jim Baxter
> Sent: 16 May 2017 18:41
> 
> The CDC-NCM driver can require large amounts of memory to create
> skb's and this can be a problem when the memory becomes fragmented.
> 
> This especially affects embedded systems that have constrained
> resources but wish to maximise the throughput of CDC-NCM with 16KiB
> NTB's.

Why is this driver copying multiple tx messages into a single skb.
Surely there are better ways to do this??

I think it is generating a 'multi-ethernet frame' URB with an
overall header for each URB and a header for each ethernet frame.

Given that the USB stack allows multiple concurrent transmits I'm
surprised that batching large ethernet frames makes much difference.

Also the USB target can't actually tell when URB that contain
multiples of the USB packet size end.
So it is possible to send a single NTB as multiple URB.
Of course, the usb_net might have other ideas.

David

RE: RFC: better timer interface

2017-05-23 Thread David Laight

From: Thomas Gleixner
> Sent: 23 May 2017 12:59
> On Tue, 23 May 2017, David Laight wrote:
> 
> > From: Thomas Gleixner
> > > Sent: 21 May 2017 19:15
> > ...
> > > > timer_start(timer, ms, abs)
> > >
> > > I'm not even sure, whether we need absolute timer wheel timers at
> > > all, because most use cases are relative to now.
> >
> > Posix requires absolute timers for some userspace calls
> > (annoying because the code often wants relative).
> 
> Posix is completely irrelevant here. These timers are purely kernel
> internal.

Somehow pthread_cond_timedwait() has to be implemented.
Doing so without kernel timers that use absolute 'wall clock' time is tricky.

David

RE: RFC: better timer interface

2017-05-23 Thread David Laight

From: Thomas Gleixner
> Sent: 21 May 2017 19:15
...
> > timer_start(timer, ms, abs)
> 
> I'm not even sure, whether we need absolute timer wheel timers at
> all, because most use cases are relative to now.

Posix requires absolute timers for some userspace calls
(annoying because the code often wants relative).

OTOH how much conditional code is there for the 'abs' argument.
And is there any code that doesn't pass a constant?

Certainly worth a separate timer_start_abs(timer, wall_time)
function since you can't correctly map a wall_time timer
to a jiffies one.

David

RE: [RFC V1 1/1] net: cdc_ncm: Reduce memory use when kernel memory low

2017-05-19 Thread David Laight

From: Bjørn Mork
> Sent: 19 May 2017 14:56
...
> Unless someone has a nice way to just collect a list of skbs and have
> them converted to proper framing on the fly when transmitting, without
> having to care about USB packet boundaries.

skb can be linked into arbitrary chains (or even trees), but I suspect
the usbnet code would need to be taught about them.

For XHCI it isn't too bad because it will do arbitrary scatter-gather
(apart from the ring end).
But I believe the earlier controllers only support fragments that
end on usb packet boundaries.

I looked at the usbnet code a few years ago, I'm sure it ought to
be possible to shortcut most of the code that uses URB and directly
write to the xhci (in particular) ring descriptors.

For receive you probably want to feed the USB stack multiple (probably)
2k buffers, and handle the debatching into ethernet frames yourself.

One of the ASIX drivers used to be particularly bad, it allocated 64k
skb for receive (the hardware can merge IP packets) and then hacked
the true_size before passing upstream.

David

RE: [PATCH 3/3] EDAC: mv64x60: replace in_le32/out_le32 with ioread32/iowrite32

2017-05-19 Thread David Laight

From: Arnd Bergmann
> Sent: 17 May 2017 22:40
> 
> On Wed, May 17, 2017 at 11:16 PM, Chris Packham
>  wrote:
> > On 18/05/17 06:18, Borislav Petkov wrote:
> > One thing I would like confirmation on is is in_le32 -> ioread32 the
> > correct change? I tossed up between ioread32 and readl. Looking at
> > mv643xx_eth.c which supports both the MV643XX and Orion it's using readl
> > so perhaps I should be using that.
> 
> There is no easy answer: on powerpc, readl is used for PCI,
> while in_le32 is used for on-chip devices, so in_le32 is the
> right one in principle. The main difference is that readl can
> work with CONFIG_EEH on pseries, but in_le32 is cheaper.
> 
> On ARM and most other architectures, readl is used for both
> PCI and on-chip devices, so that's what portable code tends
> to use.
> 
> ioread32 is required to behave the same way as readl
> on all __iomem pointers returned from ioremap(), but
> is an extern function on powerpc and can be more
> expensive when CONFIG_GENERIC_IOMAP is set.

What about x86?
Isn't ioread32() an extern function that checks for 'io' addresses
than need 'inb' (etc) instructions rather than memory ones.
If we know a PCI slave isn't 'io' should be be using ioread32() or readl()?
Don't some architectures have different enforced barriers in both these?

David

RE: [PATCH v3 0/2] usb: Check for DMA capable buffer sanity

2017-05-31 Thread David Laight

From: Wolfram Sang
> Sent: 28 May 2017 17:04
> 
> On Fri, May 05, 2017 at 02:08:31PM -0700, Florian Fainelli wrote:
> > On 04/25/2017 05:56 PM, Florian Fainelli wrote:
> > > Changes in v3:
> > >
> > > - added check in usb_gadget_map_request_by_dev (Felipe), new patch
> > > - improved commit message description (Clemens)
> > > - added additiona checks for urb->setup_packet (Alan)
> > >
> > > Changes in v2:
> > >
> > > - moved the check from usb_start_wait_urb() to usb_hcd_map_urb_for_dma()
> >
> > Is this version looking good now? Thanks!
> 
> So, it seems I am in a similar situation with the I2C subsystem right
> now. I need to check the message buffers if they are DMA capable.
> 
> Because you have basically the same checks in 3 different places, and I
> need something similar for I2C, I wondered about a generic place to put
> these checks. Especially since we want future improvements to these
> checks applied everywhere immediately. Here is a small diff on what I
> have now:
...
> + } else if (!is_dma_capable_addr(urb->transfer_buffer)) {

For a generic function I'd pass the length as well.
It might be that buffers that don't cross page boundaries might be
deemed 'dma-able'.

Possibly more useful would be a variant of (IIRC) dma_map_for_device()
that will allocate a suitable bounce buffer for non-dma memory.
I think it can already do so for memory that is outside the address
range that the device can address (eg for a 32bit PCIe master in 64bit
system).

David

RE: [PATCH] net: dsa: loop: Check for memory allocation failure

2017-05-08 Thread David Laight

From: Christophe JAILLET
> Sent: 06 May 2017 06:30
> If 'devm_kzalloc' fails, a NULL pointer will be dereferenced.
> Return -ENOMEM instead, as done for some other memory allocation just a
> few lines above.
...
> --- a/drivers/net/dsa/dsa_loop.c
> +++ b/drivers/net/dsa/dsa_loop.c
> @@ -256,6 +256,9 @@ static int dsa_loop_drv_probe(struct mdio_device *mdiodev)
>   return -ENOMEM;
> 
>   ps = devm_kzalloc(>dev, sizeof(*ps), GFP_KERNEL);
> + if (!ps)
> + return -ENOMEM;
> +
>   ps->netdev = dev_get_by_name(_net, pdata->netdev);
>   if (!ps->netdev)
>   return -EPROBE_DEFER;

On the face if it this code leaks like a sieve.

David

RE: Question about SOCK_SEQPACKET

2017-05-05 Thread David Laight

From: Steven Whitehouse
> Sent: 05 May 2017 11:34
...
> Just before the part that you've quoted, the description for
> SOCK_SEQPACKET says:
> "The SOCK_SEQPACKET socket type is similar to the SOCK_STREAM type, and
> is also connection-oriented. The only difference between these types is
> that record boundaries are maintained using the SOCK_SEQPACKET type. A
> record can be sent using one or more output operations and received
> using one or more input operations, but a single operation never
> transfers parts of more than one record."

Right SOCK_SEQPACKET is for protocols like ISO transport.
There is no limit on the length of a 'record'.
I've written file transfer programs that put the entire file
data into a single 'record'. The receiver disconnected on
receipt of the 'end of record'.

The socket man pages could easily be wrong - they are very IP-centric.
Remember ISO transport use declined when Unix became more popular
back in the mid 1980s. Unix sockets have never really been used for
it - the address information needed just doesn't match that IP
(especially IPv4).

David

RE: [PATCH] x86/asm: Don't use rbp as temp register in csum_partial_copy_generic()

2017-05-04 Thread David Laight

From: Josh Poimboeuf
> Sent: 04 May 2017 15:52
> Andrey Konovalov reported the following warning while fuzzing the kernel
> with syzkaller:
> 
>   WARNING: kernel stack regs at 8800686869f8 in a.out:4933 has bad 'bp' 
> value c3fc855a10167ec0
> 
> The unwinder dump revealed that rbp had a bad value when an interrupt
> occurred in csum_partial_copy_generic().
> 
> That function saves rbp on the stack and then overwrites it, using it as
> a scratch register.  That's problematic because it breaks stack traces
> if an interrupt occurs in the middle of the function.

Does gcc guarantee not to use bp as a scratch register in leaf functions?

David

RE: Asmedia USB 1343 crashes

2017-05-02 Thread David Laight

From: Thomas Fjellstrom
> Sent: 01 May 2017 14:40
> I've got a 970 Pro gaming aura motherboard with an Asmedia 1343 Usb 3.1
> controller. It's been consistently throwing errors and eventually crashing and
> becomming unresponsive.
...

I've an earlier Asmedia 1042 controller.
It has a bug (which I don't remember seeing a fix for) that it will
only process one entry from the command ring each time the doorbell
is rung.
If more than one command gets queued then it all got very confusing
because responses to timed out commands kept being seen.
(I no longer have the USB3 ethernet device needed to reproduce this.)

The later part might have the same bug.

David

RE: [PATCH 4/8] usb: bdc: Small code cleanup

2017-06-28 Thread David Laight

From: Al Cooper
> Sent: 27 June 2017 19:23
> Signed-off-by: Al Cooper 
> ---
>  drivers/usb/gadget/udc/bdc/bdc_core.c | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/usb/gadget/udc/bdc/bdc_core.c 
> b/drivers/usb/gadget/udc/bdc/bdc_core.c
> index 3bd82d2..621328f 100644
> --- a/drivers/usb/gadget/udc/bdc/bdc_core.c
> +++ b/drivers/usb/gadget/udc/bdc/bdc_core.c
> @@ -488,28 +488,29 @@ static int bdc_probe(struct platform_device *pdev)
>   platform_set_drvdata(pdev, bdc);
>   bdc->irq = irq;
>   bdc->dev = dev;
> - dev_dbg(bdc->dev, "bdc->regs: %p irq=%d\n", bdc->regs, bdc->irq);
> + dev_dbg(dev, "bdc->regs: %p irq=%d\n", bdc->regs, bdc->irq);

The compiler will use the value without re-reading it.
In the other places it makes very little difference.
The changed code might require one less memory read, but if the extra
'live' local variable causes gcc to save registers to stack all
bets are off.

The more explicit bdc->dev is probably more readable.

> 
>   temp = bdc_readl(bdc->regs, BDC_BDCSC);
>   if ((temp & BDC_P64) &&
>   !dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64))) {
> - dev_dbg(bdc->dev, "Using 64-bit address\n");
> + dev_dbg(dev, "Using 64-bit address\n");
>   } else {
> - ret = dma_set_mask_and_coherent(>dev, DMA_BIT_MASK(32));
> + ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));

That just wrong...
Or was wrong before.

...

David

RE: [PATCH] net_sched: use explicit size of struct tcmsg, remove need to declare tcm

2017-09-18 Thread David Laight

From: Eric Dumazet
> Sent: 18 September 2017 16:01
...
> > > - err = nlmsg_parse(nlh, sizeof(*tcm), tca, TCA_MAX, NULL, NULL);
> > > + err = nlmsg_parse(nlh, sizeof(struct tcmsg), tca, TCA_MAX, NULL, NULL);
> >
> > Would sizeof(*nlmsg_data(nlh)) be cleaner??
> 
> Not really, since
> 
> static inline void *nlmsg_data(const struct nlmsghdr *nlh)

I thought about that after posting :-(

David

RE: [PATCH] net_sched: use explicit size of struct tcmsg, remove need to declare tcm

2017-09-18 Thread David Laight

From: Colin King
> Sent: 18 September 2017 12:41
> Pointer tcm is being initialized and is never read, it is only being used
> to determine the size of struct tcmsg.  Clean this up by removing
> variable tcm and explicitly using the sizeof struct tcmsg rather than *tcm.
> Cleans up clang warning:
> 
> warning: Value stored to 'tcm' during its initialization is never read
> 
> Signed-off-by: Colin Ian King 
> ---
>  net/sched/sch_api.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index c6deb74e3d2f..aa82116ed10c 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -1500,7 +1500,6 @@ static int tc_dump_qdisc(struct sk_buff *skb, struct 
> netlink_callback *cb)
>   int s_idx, s_q_idx;
>   struct net_device *dev;
>   const struct nlmsghdr *nlh = cb->nlh;
> - struct tcmsg *tcm = nlmsg_data(nlh);
>   struct nlattr *tca[TCA_MAX + 1];
>   int err;
> 
> @@ -1510,7 +1509,7 @@ static int tc_dump_qdisc(struct sk_buff *skb, struct 
> netlink_callback *cb)
>   idx = 0;
>   ASSERT_RTNL();
> 
> - err = nlmsg_parse(nlh, sizeof(*tcm), tca, TCA_MAX, NULL, NULL);
> + err = nlmsg_parse(nlh, sizeof(struct tcmsg), tca, TCA_MAX, NULL, NULL);

Would sizeof(*nlmsg_data(nlh)) be cleaner??

David

RE: [PATCH 0/5] [RFC] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-09-19 Thread David Laight

From: Sergey Senozhatsky
> Sent: 19 September 2017 03:06
...
> I'll simply convert everything to `unsigned long'. including the
> dereference_function_descriptor() function [I believe there are
> still some casts happening when we pass addr from kernel/module
> dereference functions to dereference_function_descriptor(), or
> when we return `void *' back to symbol resolution code, etc.)
> besides, it seems that everything that uses
> dereference_function_descriptor() wants `unsigned long' anyway:

Using 'unsigned long' for any kind of pointer is an accident
waiting do happen.
It also makes it difficult to typecheck the function calls.
Using 'void *' isn't any better.
Either a pointer to an undefined struct, or a struct containing
a single 'char' member, is likely to be safest.

David

RE: [PATCH 0/5] [RFC] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-09-20 Thread David Laight

From: Helge Deller
> Sent: 19 September 2017 21:08
...
> > Using 'unsigned long' for any kind of pointer is an accident
> > waiting do happen.
> > It also makes it difficult to typecheck the function calls.
> > Using 'void *' isn't any better.
> > Either a pointer to an undefined struct, or a struct containing
> > a single 'char' member, is likely to be safest.
> 
> David, you might be right in most cases, but in this case I'd prefer
> unsigned long too. I think this will create the least amount of
> typecasts here.

I've not looked at the specifics case...

Another option is using a struct with a single member and
passing it by value.
This could be used for things like user-space pointers or
even errno values.
The only problem is old ABI where even small structures are
always passed by reference.

David

RE: [PATCH 04/11] ia64: make dma_cache_sync a no-op

2017-10-04 Thread David Laight

From: Christoph Hellwig
> Sent: 03 October 2017 11:43
>
> ia64 does not implement DMA_ATTR_NON_CONSISTENT allocations, so it doesn't
> make any sense to do any work in dma_cache_sync given that it must be a
> no-op when dma_alloc_attrs returns coherent memory.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/ia64/include/asm/dma-mapping.h | 5 -
>  1 file changed, 5 deletions(-)
> 
> diff --git a/arch/ia64/include/asm/dma-mapping.h 
> b/arch/ia64/include/asm/dma-mapping.h
> index 3ce5ab4339f3..99dfc1aa9d3c 100644
> --- a/arch/ia64/include/asm/dma-mapping.h
> +++ b/arch/ia64/include/asm/dma-mapping.h
> @@ -48,11 +48,6 @@ static inline void
>  dma_cache_sync (struct device *dev, void *vaddr, size_t size,
>   enum dma_data_direction dir)
>  {
> - /*
> -  * IA-64 is cache-coherent, so this is mostly a no-op.  However, we do 
> need to
> -  * ensure that dma_cache_sync() enforces order, hence the mb().
> -  */
> - mb();
>  }

Are you sure about this one?
It looks as though you are doing a mechanical change for all architectures.
Some of them are probably stranger than you realise.

Even with cache coherent memory any cpu 'store/write buffer' may not
be snooped by dma reads.

Something needs to flush the store buffer between the last cpu write
to the dma buffer and the write (probably a device register) that
tells the device it can read the memory.

My guess from the comment is that dma_cache_synch() is expected to
include that barrier - and it might not be anywhere else.

David

RE: [PATCH net-next v2 2/4] net: dsa: mv88e6060: setup random mac address

2017-10-16 Thread David Laight

From: woojung@microchip.com
> Sent: 13 October 2017 18:59
> > >> > +  REG_WRITE(REG_GLOBAL, GLOBAL_MAC_01, (addr[0] << 9) |
> > >> addr[1]);
> > >>
> > >> Is that supposed to be 9 ?
> > >
> > > Looks like it.
> > > Check
> > http://www.marvell.com/switching/assets/marvell_linkstreet_88E6060_data
> > sheet.pdf
> >
> > Hum, David is correct, there is a bug in the driver which needs to be
> > addressed first. MAC address bit 40 is addr[0] & 0x1, thus we must
> > shift byte 0 by 8 and mask it against 0xfe.
> >
> > I'll respin this serie including a fix for both net and net-next.
> 
> Yes, you are right. Missed description about bit 40.

Since they are all random numbers it doesn't matter.

Actually isn't this just masking off the non-unicast bit?
So it won't be set in the data - and so doesn't need masking.

David

RE: [PATCH net-next v3 3/5] net: dsa: mv88e6060: setup random mac address

2017-10-16 Thread David Laight

From: Vivien Didelot
> Sent: 13 October 2017 19:18
> As for mv88e6xxx, setup the switch from within the mv88e6060 driver with
> a random MAC address, and remove the .set_addr implementation.
> 
> Signed-off-by: Vivien Didelot 
> ---
>  drivers/net/dsa/mv88e6060.c | 43 ++-
>  1 file changed, 26 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
> index d64be2b83d3c..6173be889d95 100644
> --- a/drivers/net/dsa/mv88e6060.c
> +++ b/drivers/net/dsa/mv88e6060.c
> @@ -9,6 +9,7 @@
>   */
> 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -188,6 +189,27 @@ static int mv88e6060_setup_port(struct dsa_switch *ds, 
> int p)
>   return 0;
>  }
> 
> +static int mv88e6060_setup_addr(struct dsa_switch *ds)
> +{
> + u8 addr[ETH_ALEN];
> + u16 val;
> +
> + eth_random_addr(addr);
> +
> + val = addr[0] << 8 | addr[1];
> +
> + /* The multicast bit is always transmitted as a zero, so the switch uses
> +  * bit 8 for "DiffAddr", where 0 means all ports transmit the same SA.
> +  */
> + val &= 0xfeff;

The comment is probably ok, but the mask isn't needed.
eth_randmon_addr() won't return and address with the multicast bit set.

David

RE: [PATCH] rsi: fix integer overflow warning

2017-10-05 Thread David Laight

From: Joe Perches
> Sent: 05 October 2017 13:19
> On Thu, 2017-10-05 at 14:05 +0200, Arnd Bergmann wrote:
> > gcc produces a harmless warning about a recently introduced
> > signed integer overflow:
> >
> > drivers/net/wireless/rsi/rsi_91x_hal.c: In function 'rsi_prepare_mgmt_desc':
> > include/uapi/linux/swab.h:13:15: error: integer overflow in expression 
> > [-Werror=overflow]
> >   (((__u16)(x) & (__u16)0x00ffU) << 8) |   \
> >^
> > include/uapi/linux/swab.h:104:2: note: in expansion of macro 
> > '___constant_swab16'
> >   ___constant_swab16(x) :   \
> >   ^~
> > include/uapi/linux/byteorder/big_endian.h:34:43: note: in expansion of 
> > macro '__swab16'
> >  #define __cpu_to_le16(x) ((__force __le16)__swab16((x)))
> 
> []
> 
> > The problem is that the 'mask' value is a signed integer that gets
> > turned into a negative number when truncated to 16 bits. Making it
> > an unsigned constant avoids this.
> 
> I would expect there are more of these.
> 
> Perhaps this define in include/uapi/linux/swab.h:
> 
> #define __swab16(x)   \
>   (__builtin_constant_p((__u16)(x)) ? \
>   ___constant_swab16(x) : \
>   __fswab16(x))
> 
> should be
> 
> #define __swab16(x)   \
>   (__builtin_constant_p((__u16)(x)) ? \
>   ___constant_swab16((__u16)(x)) :\
>   __fswab16((__u16)(x)))

You probably don't want the cast in the call to __fswab16() since
that is likely to generate an explicit and with 0x.
You will likely also get one if the argument is _u16 (not unsigned int).

David

RE: [PATCH v10 09/10] mm: stop zeroing memory during allocation in vmemmap

2017-10-06 Thread David Laight

From: Michal Hocko
> Sent: 06 October 2017 12:47
> On Fri 06-10-17 11:10:14, David Laight wrote:
> > From: Pavel Tatashin
> > > Sent: 05 October 2017 22:11
> > > vmemmap_alloc_block() will no longer zero the block, so zero memory
> > > at its call sites for everything except struct pages.  Struct page memory
> > > is zero'd by struct page initialization.
> >
> > It seems dangerous to change an allocator to stop zeroing memory.
> > It is probably saver to add a new function that doesn't zero
> > the memory and use that is the places where you don't want it
> > to be zeroed.
> 
> Not sure what you mean. memblock_virt_alloc_try_nid_raw is a new
> function which doesn't zero out...

You should probably leave vmemap_alloc_block() zeroing the memory
so that existing alls don't have to be changed - apart from the
ones you are explicitly optimising.

David

RE: [PATCH v10 09/10] mm: stop zeroing memory during allocation in vmemmap

2017-10-06 Thread David Laight

From: Pavel Tatashin
> Sent: 05 October 2017 22:11
> vmemmap_alloc_block() will no longer zero the block, so zero memory
> at its call sites for everything except struct pages.  Struct page memory
> is zero'd by struct page initialization.

It seems dangerous to change an allocator to stop zeroing memory.
It is probably saver to add a new function that doesn't zero
the memory and use that is the places where you don't want it
to be zeroed.

David

RE: [PATCH net-next v3 0/5] net: dsa: remove .set_addr

2017-10-16 Thread David Laight

From: Andrew Lunn
> Sent: 16 October 2017 17:10
...
> So, received Pause frames never leave the MAC. They don't get bridged,
> nor do they get passed up for host processing. They are purely point
> to point between two MAC peers. The destination is unambiguous. It is
> simple the other MAC peer. The destination address makes it clear it
> is a pause frame, the the source address seems to be unneeded.
> 
> In this context, a random MAC addresses are safe.

Is there any reason why a fixed value (say 00:00:00:00:00:00)
can't be used?

> In the more general case, i would agree with you. Collisions are
> possible, causing problems.

For IP MAC addresses only go as far as the first router.
So the duplicates would (typically) have to be within the same subnet.
This makes the chance of a duplicate random address unlikely.

(Unless you have an un-subnetted class A network consisting of
multiple 1km long coax segments connected by 1km long repeaters
making a single collision domain.)

David

RE: [PATCH] PCI: dwc: dra7xx: Print link state to console for debug

2017-10-17 Thread David Laight

From: Joe Perches
> Sent: 17 October 2017 07:35
> On Tue, 2017-10-17 at 11:13 +0530, Faiz Abbas wrote:
> > Enable support for printing the LTSSM link state for debugging PCI
> > when link is down.
> []
> > diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
> > index 34427a6..7b150b0 100644
> > --- a/drivers/pci/dwc/pci-dra7xx.c
> > +++ b/drivers/pci/dwc/pci-dra7xx.c
> > @@ -98,6 +98,18 @@ struct dra7xx_pcie_of_data {
> >
> >  #define to_dra7xx_pcie(x)  dev_get_drvdata((x)->dev)
> >
> > +static char state[][20] = {
> > +   "DETECT_QUIET", "DETECT_ACT", "POLL_ACTIVE", "POLL_COMPLIANCE",
> > +   "POLL_CONFIG", "PRE_DETECT_QUIET", "DETECT_WAIT", "CFG_LINKWD_START",
> > +   "CFG_LINKWD_ACEPT", "CFG_LANENUM_WAIT", "CFG_LANENUM_ACEPT",
> > +   "CFG_COMPLETE", "CFG_IDLE", "RCVRY_LOCK", "RCVRY_SPEED",
> > +   "RCVRY_RCVRCFG", "RCVRY_IDLE", "L0", "L0S", "L123_SEND_EIDLE",
> > +   "L1_IDLE", "L2_IDLE", "L2_WAKE", "DISABLED_ENTRY", "DISABLED_IDLE",
> > +   "DISABLED", "LPBK_ENTRY", "LPBK_ACTIVE", "LPBK_EXIT",
> > +   "LPBK_EXIT_TIMEOUT", "HOT_RESET_ENTRY", "HOT_RESET", "RCVRY_EQ0",
> > +   "RCVRY_EQ1", "RCVRY_EQ2", "RCVRY_EQ3"
> 
> what's wrong with using the far more typical
> 
> static const char *link_state[] = {
>   "DETECT_QUIET",
>   ...

Or 4 aligned columns.
Better still used named initialisers:
[xxx_DETECT_QUIET] = "DETECT_QUIET",
you really don't want an 'off-by-one' in that list.

David

RE: [PATCH net 0/3] Fix for BPF devmap percpu allocation splat

2017-10-17 Thread David Laight

From: Daniel Borkmann
> Sent: 17 October 2017 15:56
> 
> The set fixes a splat in devmap percpu allocation when we alloc
> the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
> patch 1 is rather small, so if this could be routed via -net, for
> example, with Tejun's Ack that would be good. Patch 3 gets rid of
> remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
> internals and should not be used.

Does it make sense to allow the user program to try to allocate ever
smaller very large maps until it finds one that succeeds - thus
using up all the percpu space?

Or is this a 'root only' 'shoot self in foot' job?

David

RE: [PATCH net-next 5/5] net: dsa: split dsa_port's netdev member

2017-10-13 Thread David Laight

From: Vivien Didelot
> Sent: 13 October 2017 16:29
> Vivien Didelot  writes:
> 
> >>> How about using:
> >>>
> >>>   union {
> >>>   struct net_device *master;
> >>>   struct net_device *slave;
> >>>   } netdev;
> >> ...
> >>
> >> You can remove the 'netdev' all the compilers support unnamed unions.
> >
> > There are issues with older GCC versions, see the commit 42275bd8fcb3
> > ("switchdev: don't use anonymous union on switchdev attr/obj structs")
> >
> > That's why I kept it in the v2 I sent.
> 
> At the same time, I can see that struct sk_buff uses anonym union a lot.
> 
> It seems weird that one raised a compiler issue for switchdev but not
> for skbuff.h... Do you think it is viable to drop the name here then?

I believe the problem is with initialisers for static structures
that contain anonymous unions.

David

< 1 2 3 4 5 6 7 8 9 10 >

501 - 600 of 2842 matches

Mail list logo