Re: Kernel header changes break glibc build

2006-12-03 Thread David Woodhouse
On Fri, 2006-10-06 at 17:20 +, Joseph S. Myers wrote:
> The kernel headers installed by Linux 2.6.19-rc1 "make
> headers_install" do not work for building glibc, because glibc expects
>  to provide various definitions, some of which have
> been moved to  and some of which have been removed
> altogether.
> 
> This kernel patch allows glibc to build again by making rtnetlink.h
> include if_addr.h and adding back the removed definitions required by
> glibc, but I don't know if it's the correct approach or if glibc
> should change the headers it includes and add its own macro
> definitions.
> 
> Signed-off-by: Joseph Myers <[EMAIL PROTECTED]>

Since it was Thomas Graf who made these changes and David Miller who
committed them, you'd do well to Cc both of those people.

Sorry for the delayed response on my part -- I've been away
concentrating on OLPC stuff, and just about everything else has fallen
by the wayside.

Thomas, this is in response to your changes in
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=1823730fbc89fadde72a7bb3b7bdf03cc7b8835c;hp=47f68512d2685431f1781830dfcbab31bda87644
in which you create  and require that it's included
directly rather than being part of (or even included from)
. Was there a good reason for changing that
user-visible header? Is there a reason not to include if_addr.h from
rtnetlink.h as Joseph's patch does?

I suspect that if the IF{L,}A_{PAYLOAD,RTA} macros aren't used in the
kernel then the best answer is for glibc to define those for itself.

> ---
> Index: include/linux/rtnetlink.h
> ===
> --- include/linux/rtnetlink.h
> +++ include/linux/rtnetlink.h
> @@ -2,6 +2,7 @@
>  #define __LINUX_RTNETLINK_H
>  
>  #include 
> +#include 
>  #include 
>  
>  /
> Index: include/linux/if_link.h
> ===
> --- include/linux/if_link.h
> +++ include/linux/if_link.h
> @@ -82,6 +82,9 @@
>  
>  #define IFLA_MAX (__IFLA_MAX - 1)
>  
> +#define IFLA_RTA(r)  ((struct rtattr*)(((char*)(r)) + 
> NLMSG_ALIGN(sizeof(struct ifinfomsg
> +#define IFLA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct ifinfomsg))
> +
>  /* ifi_flags.
>  
> IFF_* flags.
> Index: include/linux/if_addr.h
> ===
> --- include/linux/if_addr.h
> +++ include/linux/if_addr.h
> @@ -52,4 +52,7 @@
>   __u32   tstamp; /* updated timestamp, hundredths of seconds */
>  };
>  
> +#define IFA_RTA(r)  ((struct rtattr*)(((char*)(r)) + 
> NLMSG_ALIGN(sizeof(struct ifaddrmsg
> +#define IFA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct ifaddrmsg))
> +
>  #endif
> 
> 
> 
-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel header changes break glibc build

2006-12-06 Thread David Woodhouse
On Mon, 2006-12-04 at 10:13 +0100, Thomas Graf wrote:
> Userspace is not supposd to directly include kernel headers, instead
> it has to make local copies and compile against them.

No. It was _never_ sensible to simply declare that userspace "shall not
use kernel headers" in the absence of any serious alternative. It was
always just a cop-out.

Historically, there were a number of efforts to provide 'sanitised'
kernel headers which are needed for userspace to build against. Various
people forked headers from the kernel at different times, got burned out
at different times, took different approaches w.r.t. how much abuse of
kernel-private stuff should be permitted. We ended up with a bunch of
inconsistent sets of 'kernel headers' each done differently and updated
sporadically.

Thankfully, this is no longer the case. The kernel now has a
'headers_install' make target which spits out those headers which are
suitable for userspace, sanitised by removing anything within
__KERNEL__. We have a _consistent_, up to date set of headers which
distributions can use for building glibc and other system libraries and
tools. Some distributions are already shipping like that; others are
still working on doing so.

> Binary compatibility is always guaranteed but in times of development
> within a stable tree it's wrong to assume that headers never change.
>
> I do not agree with the change to include if_addr.h in rtnetlink.h.
> The point is to move bits apart and have multiple small pieces
> of header files defining a specific rtnetlink family which are a
> lot easier to maintain for both kernel and userspace than one giant
> rtnetlink.h for everything.

That makes some sense, but we need to be more careful about making
incompatible changes in the headers which we export -- it's no longer
acceptable to just toss a pile of crap over the wall and declare it to
be someone else's problem.

That isn't to say that we should _never_ make such changes, but we
should at least make sure we think about it and make sure that glibc
adapts to cope, _before_ people start to see build failures.

> > I suspect that if the IF{L,}A_{PAYLOAD,RTA} macros aren't used in the
> > kernel then the best answer is for glibc to define those for itself.
> 
> Right, if they did it right they would only have noticed when they
> updated the kernel headers to some newer versions and only had to
> move the bits to some compat header.

No. They _are_ doing it right -- they're running 'make headers_install'
against the 2.6.19 kernel and only _now_ are they finding that we broke
it without even the courtesy of a warning, let alone any consultation.

If _we_ had done it right, then they would have been warned when we
decided to change this, and we wouldn't have just released 2.6.19 with
changes which break the glibc build.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel header changes break glibc build

2006-12-06 Thread David Woodhouse
On Wed, 2006-12-06 at 14:43 +0100, Jakub Jelinek wrote:
> 
> +/* 2.6.19 kernel headers helpfully removed some macros and
> +   moved lots of stuff into new headers, some of which aren't
> +   included by linux/rtnetlink.h.  */
> +
> +#ifndef IFA_MAX
> +struct ifaddrmsg
> +{
> +  uint8_t ifa_family;
> +  uint8_t ifa_prefixlen;
> +  uint8_t ifa_flags;
> +  uint8_t ifa_scope;
> +  uint32_t ifa_index;
> +};
> +
> +enum
> +{
> +  IFA_UNSPEC,
> +  IFA_ADDRESS,
> +  IFA_LOCAL,
> +  IFA_LABEL,
> +  IFA_BROADCAST,
> +  IFA_ANYCAST,
> +  IFA_CACHEINFO,
> +  IFA_MULTICAST
> +};
> +#endif
> +
> +#ifndef IFA_F_SECONDARY
> +# define IFA_F_SECONDARY   0x01
> +# define IFA_F_TEMPORARY   IFA_F_SECONDARY
> +# define IFA_F_HOMEADDRESS 0x10
> +# define IFA_F_DEPRECATED  0x20
> +#endif

This much is still available from the new  -- you could
include that directly instead of copying it.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel header changes break glibc build

2006-12-06 Thread David Woodhouse
On Wed, 2006-12-06 at 14:57 +0100, Jakub Jelinek wrote:
> Yes, but as I said, I'd need to add configure checks for that, using
> #include 
> alone breaks build with older headers.

I was thinking that the #ifndef IFA_MAX you already have ought to be
sufficient for that. Or even checking KERNEL_VERSION.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel header changes break glibc build

2006-12-06 Thread David Woodhouse
On Wed, 2006-12-06 at 14:59 +0100, Thomas Graf wrote:
> Are you suggesting that the kernel has to keep macros around which
> are of no use to the kernel itself just because glibc uses them?

No, although in fact that _is_ the only reason we use these horrid __uXX
types rather than proper C datatypes, isn't it?

I'm suggesting that if you want to change things around as you did, you
should make sure the users of those headers adapt to cope. You did fix
the in-kernel users; you neglected to fix glibc -- and as far as I can
tell you didn't even bother to _warn_ glibc folks.

We need to do better than that. The dark ages where we used to toss a
pile of crap over the wall and declare it was someone else's problem are
now behind us, thankfully.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NETLINK]: Restore API compatibility of address and neighbour bits

2006-12-07 Thread David Woodhouse
On Thu, 2006-12-07 at 11:55 +0100, Thomas Graf wrote:
> Restore API compatibility due to bits moved from rtnetlink.h to
> separate headers.

For 2.6.19.1 too, please.

> Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>
> 
> Index: net-2.6/include/linux/rtnetlink.h
> ===
> --- net-2.6.orig/include/linux/rtnetlink.h2006-12-07 11:25:29.0 
> +0100
> +++ net-2.6/include/linux/rtnetlink.h 2006-12-07 11:32:25.0 +0100
> @@ -3,6 +3,8 @@
>  
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  /
>   *   Routing/neighbour discovery messages.
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel header changes break glibc build

2006-12-07 Thread David Woodhouse
On Wed, 2006-12-06 at 15:23 +0100, Thomas Graf wrote:
> > I'm suggesting that if you want to change things around as you did, you
> > should make sure the users of those headers adapt to cope. You did fix
> > the in-kernel users; you neglected to fix glibc -- and as far as I can
> > tell you didn't even bother to _warn_ glibc folks.
> 
> I didn't warn them because I didn't know better. 

Fair enough. Now you do -- and thanks for fixing it.

> I was under the impression that glibc still maintains their own set of
> headers and will fix this automatically when they look at the diff.
> That's what I do for my userspace applications that use kernel
> headers.

That's never been the case for glibc. As I said, various people have
_attempted_ to do it, but it really isn't a workable approach and the
results were wildly inconsistent. Having a single, centrally-exported
set of headers is a massive improvement -- but yes, it does mean we have
to take a little care with what we're exporting to userspace.

> Ideally install_headers would do the trick but it often fails f.e.
> when some application which uses bsd features thus including net/if.h
> also wants to use new linux features and includes linux/if.h which
> then conflicts.

Applications in general shouldn't be doing that -- kernel headers are
for system libraries and tools only. We should try to ensure that the
_sensible_ use cases don't break, but we don't have to go overboard with
keeping our namespace clean and anticipating _every_ strange thing which
a userspace app might do with our headers -- we still get to say 'caveat
emptor'; it's just that we can't be _quite_ as haphazard about it as
we've been in the past.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NETLINK]: Schedule removal of old macros exported to userspace

2006-12-09 Thread David Woodhouse
On Sat, 2006-12-09 at 15:58 +0100, Stefan Rompf wrote:
> But when even glibc needs internal compat headers to compile with the second 
> kernel version that provides userspace headers, what is the long-term - even 
> mid-term - perspective of make headers_install then?

We've only _just_ started paying attention to what we export. I tried to
keep the initial cut of headers_install very simple and uncontentious --
sticking to implementation, and not policy. So I didn't do a
particularly thorough cull of stuff that we shouldn't be exposing --
it's expected that we may lose _some_ more stuff.

But breaking userspace like _this_ isn't acceptable, and we're not doing
it.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NETLINK]: Schedule removal of old macros exported to userspace

2006-12-12 Thread David Woodhouse
On Sat, 2006-12-09 at 15:58 +0100, Stefan Rompf wrote:
> But when even glibc needs internal compat headers to compile with the second 
> kernel version that provides userspace headers, what is the long-term - even 
> mid-term - perspective of make headers_install then?

Bear in mind that with the first cut of the headers_install I was trying
to keep it entirely uncontentious and stick to _mechanism_, not policy.
The _first_ set of exported headers still aren't ideal, and we're still
cleaning up some parts of them (like properly getting rid of the
_syscallX() macros, etc.). 

We've only just started to be sensible about what we export to userspace
-- it's not entirely set in stone at this point, 
 
-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix buglets in mpc5200 FEC code that are corrupting memory.

2007-11-28 Thread David Woodhouse

On Sun, 2007-11-18 at 20:49 -0500, Jon Smirl wrote:
> On 11/9/07, Grant Likely <[EMAIL PROTECTED]> wrote:
> > On 11/9/07, Domen Puncer <[EMAIL PROTECTED]> wrote:
> > > On 09/11/07 00:31 -0500, Jon Smirl wrote:
> > > > This is the reason I couldn't get user space started or connect to my
> > > > nfs server. Patch is against current linus git.
> > > >
> > > > mpc5200 fec driver is corrupting memory. This patch fixes two bugs
> > > > where the wrong skb buffer was being referenced.
> > > >
> > > > Signed-off-by: Jon Smirl <[EMAIL PROTECTED]>
> > >
> > > Acked-by: Domen Puncer <[EMAIL PROTECTED]>
> >
> > Signed-off-by: Grant Likely <[EMAIL PROTECTED]>
> >
> > Jeff, can you please pick this up for .24?
> 
> This is didn't make it into rc3. Without this patch this driver is broken.

Jeff?

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] SET_NETDEV_DEV() in fec_mpc52xx.c

2007-11-28 Thread David Woodhouse
This helps to allow the Fedora installer to use the built-in Ethernet on
the Efika for a network install.

Signed-off-by: David Woodhouse <[EMAIL PROTECTED]>

--- a/drivers/net/fec_mpc52xx.c
+++ b/drivers/net/fec_mpc52xx.c
@@ -971,6 +971,8 @@ mpc52xx_fec_probe(struct of_device *op, const struct 
of_device_id *match)
 
mpc52xx_fec_reset_stats(ndev);
 
+   SET_NETDEV_DEV(ndev, &op->dev);
+
/* Register the new network device */
rv = register_netdev(ndev);
if (rv < 0)

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Stop phy code from returning success to unknown ioctls.

2007-11-28 Thread David Woodhouse
This kind of sucks, and prevents the Fedora installer from using the
device for network installs...

[EMAIL PROTECTED] phy]# iwconfig eth0   
 
Warning: Driver for device eth0 has been compiled with an ancient version   
   
of Wireless Extension, while this program support version 11 and later. 
   
Some things may be broken...
   

   
eth0ESSID:off/any  Nickname:""  
   
  NWID:0  Channel:0  Access Point: 00:00:BF:81:14:E0
   
  Bit Rate:-1.08206e+06 kb/s   Sensitivity=0/0  
   
  RTS thr:off   Fragment thr:off
   
  Encryption key:  
   
  Power Management:off  
   

   
Signed-off-by: David Woodhouse <[EMAIL PROTECTED]>

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 9bc1177..7c9e6e3 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -406,6 +406,9 @@ int phy_mii_ioctl(struct phy_device *phydev,
&& phydev->drv->config_init)
phydev->drv->config_init(phydev);
break;
+
+   default:
+   return -ENOTTY;
}
 
return 0;

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PASEMI_MAC] Don't claim to do IPv6 checksum offload

2007-12-02 Thread David Woodhouse
Signed-off-by: David Woodhouse <[EMAIL PROTECTED]>

diff --git a/drivers/net/pasemi_mac.c b/drivers/net/pasemi_mac.c
index 09b4fde..a8db5d7 100644
--- a/drivers/net/pasemi_mac.c
+++ b/drivers/net/pasemi_mac.c
@@ -1362,7 +1362,7 @@ pasemi_mac_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
 
netif_napi_add(dev, &mac->napi, pasemi_mac_poll, 64);
 
-   dev->features = NETIF_F_HW_CSUM | NETIF_F_LLTX | NETIF_F_SG;
+   dev->features = NETIF_F_IP_CSUM | NETIF_F_LLTX | NETIF_F_SG;
 
/* These should come out of the device tree eventually */
mac->dma_txch = index;

-- 
dwmw2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix memory corruption in fec_mpc52xx

2007-12-03 Thread David Woodhouse
From: Jon Smirl <[EMAIL PROTECTED]>

The mpc5200 fec driver is corrupting memory. This patch fixes two bugs
where the wrong skb was being referenced.

Signed-off-by: Jon Smirl <[EMAIL PROTECTED]>
Acked-by: Domen Puncer <[EMAIL PROTECTED]>
Signed-off-by: Grant Likely <[EMAIL PROTECTED]>
Signed-off-by: David Woodhouse <[EMAIL PROTECTED]>

--- a/drivers/net/fec_mpc52xx.c
+++ b/drivers/net/fec_mpc52xx.c
@@ -422,7 +422,7 @@ static irqreturn_t mpc52xx_fec_rx_interrupt(int irq, void 
*dev_id)

rskb = bcom_retrieve_buffer(priv->rx_dmatsk, &status,
(struct bcom_bd **)&bd);
-   dma_unmap_single(&dev->dev, bd->skb_pa, skb->len, 
DMA_FROM_DEVICE);
+   dma_unmap_single(&dev->dev, bd->skb_pa, rskb->len, 
DMA_FROM_DEVICE);

/* Test for errors in received frame */
if (status & BCOM_FEC_RX_BD_ERRORS) {
@@ -467,7 +467,7 @@ static irqreturn_t mpc52xx_fec_rx_interrupt(int irq, void 
*dev_id)
bcom_prepare_next_buffer(priv->rx_dmatsk);

bd->status = FEC_RX_BUFFER_SIZE;
-   bd->skb_pa = dma_map_single(&dev->dev, rskb->data,
+   bd->skb_pa = dma_map_single(&dev->dev, skb->data,
FEC_RX_BUFFER_SIZE, DMA_FROM_DEVICE);

bcom_submit_next_buffer(priv->rx_dmatsk, skb);


-- 
dwmw2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.24] pasemi_mac: Fix reuse of free'd skb

2007-12-04 Thread David Woodhouse
Oops: Exception in kernel mode, sig: 5 [#1] 
nfs: server pmac not responding, still trying   
SMP NR_CPUS=128 NUMA PA Semi PWRficient 
Modules linked in: appletouch cbc blkcipher dm_crypt dm_emc dm_round_robin dm_mu
ltipath dm_snapshot dm_mirror dm_zero dm_mod xfs jfs reiserfs lock_nolock gfs2 m
sdos linear raid10 raid456 async_memcpy async_tx async_xor xor raid1 raid0 sata_
mv libata pasemi_mac gpio_mdio libphy ehci_hcd ohci_hcd iscsi_tcp libiscsi scsi_
transport_iscsi sr_mod sd_mod scsi_mod ide_cd cdrom ipv6 ext2 ext4dev crc16 jbd2
 ext3 mbcache jbd squashfs loop nfs nfs_acl lockd sunrpc vfat fat cramfs
NIP: c03af2f0 LR: c03af2ec CTR: c00791c0
REGS: c9c0 TRAP: 0700   Not tainted  (2.6.24-0.67.rc3.git7.fc9) 
MSR: 90029032   CR: 2428  XER: 
TASK = c0677720[0] 'swapper' THREAD: c0758000 CPU: 0
GPR00: c03af2ec cc40 c0753d98 0087  
GPR04:      
GPR08: c0677720  0004   
GPR12:  c0678200    
GPR16:    4140  
GPR20: 0040 06e0 06e8 0037  
GPR24:  c0003ba40370 00dc 540045ee0111a3a8  
GPR28: 05ee c0003d104900 c0712b18 c0003b939800  
NIP [c03af2f0] .skb_over_panic+0x50/0x58
LR [c03af2ec] .skb_over_panic+0x4c/0x58 
Call Trace: 
[cc40] [c03af2ec] .skb_over_panic+0x4c/0x58 (unreliable)
[ccd0] [d02dfb10] .pasemi_mac_clean_rx+0x2f0/0x480 [pasemi_m
ac] 
[cda0] [d02e008c] .pasemi_mac_poll+0x44/0xe4 [pasemi_mac]   
[ce40] [c03bba54] .net_rx_action+0xf8/0x214 
[cef0] [c008eb58] .__do_softirq+0xa8/0x164  
[cf90] [c002b468] .call_do_softirq+0x14/0x24
[c075b940] [c000d4d0] .do_softirq+0x68/0xac 
[c075b9d0] [c008ecc8] .irq_exit+0x60/0xb0   
[c075ba50] [c000db00] .do_IRQ+0x174/0x1e8   
[c075baf0] [c0004c24] hardware_interrupt_entry+0x24/0x28
--- Exception: 501 at .ppc64_runlatch_off+0x0/0x54  
LR = .cpu_idle+0x70/0x1f0   
[c075bde0] [c00135c4] .cpu_idle+0x11c/0x1f0 (unreliable)
[c075be60] [c0009b08] .rest_init+0x78/0x90  
[c075bee0] [c05c59ec] .start_kernel+0x3f8/0x41c 
[c075bf90] [c0008590] .start_here_common+0x60/0xd0  
Instruction dump:   
80a30068 e8e300c8 e90300d0 812300bc 814300c0 2fa0 409e0008 e81e8018 
e87e8028 f8010070 4bcd92cd 6000 <0fe0> 4800 7c0802a6 faa1ffa8   
Kernel panic - not syncing: Fatal exception in interrupt
Rebooting in 180 seconds..

-- 
dwmw2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.24] pasemi_mac: Fix reuse of free'd skb

2007-12-04 Thread David Woodhouse
On Tue, 2007-12-04 at 18:04 +, David Woodhouse wrote:
> Oops: Exception in kernel mode, sig: 5 [#1]   
>   
> nfs: server pmac not responding, still trying 

Oops, sorry. That was meant to be just to Olof. And it helps if I'm
actually running the kernel in which his patch is applied, too...

-- 
dwmw2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

2007-12-06 Thread David Woodhouse
On Mon, 2007-11-26 at 15:17 -0700, Eric W. Biederman wrote:
> Well I clearly goofed when I added the initial network namespace support
> for /proc/net.  Currently things work but there are odd details visible
> to user space, even when we have a single network namespace.
> 
> Since we do not cache proc_dir_entry dentries at the moment we can
> just modify ->lookup to return a different directory inode depending
> on the network namespace of the process looking at /proc/net, replacing
> the current technique of using a magic and fragile follow_link method.
> 
> To accomplish that this patch:
> - introduces a shadow_proc method to allow different dentries to
>   be returned from proc_lookup.
> - Removes the old /proc/net follow_link magic
> - Fixes a weakness in our not caching of proc generic dentries.
> 
> As shadow_proc uses a task struct to decided which dentry to return we
> can go back later and fix the proc generic caching without modifying any code 
> that
> uses the shadow_proc method.
> 
> Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
> ---
>  fs/proc/generic.c   |   12 ++-
>  fs/proc/proc_net.c  |   86 
> +++
>  include/linux/proc_fs.h |3 ++
>  3 files changed, 19 insertions(+), 82 deletions(-)

(commit 2b1e300a9dfc3196ccddf6f1d74b91b7af55e416)

This seems to have broken the use of /proc/bus/usb as a mountpoint. It
always appears empty now, whatever's supposed to be mounted there.

-- 
dwmw2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] PS3: gelic: Add wireless support for PS3

2007-12-13 Thread David Woodhouse
--- linux-2.6.23.ppc64/drivers/net/ps3_gelic_net.c~ 2007-12-14 
01:31:50.0 -0500
+++ linux-2.6.23.ppc64/drivers/net/ps3_gelic_net.c  2007-12-14 
01:39:25.0 -0500
@@ -1139,7 +1139,7 @@ static irqreturn_t gelic_card_interrupt(
  *
  * see Documentation/networking/netconsole.txt
  */
-static void gelic_net_poll_controller(struct net_device *netdev)
+void gelic_net_poll_controller(struct net_device *netdev)
 {
struct gelic_card *card = netdev_card(netdev);
 
--- linux-2.6.23.ppc64/drivers/net/ps3_gelic_net.h~ 2007-12-14 
01:31:50.0 -0500
+++ linux-2.6.23.ppc64/drivers/net/ps3_gelic_net.h  2007-12-14 
01:39:47.0 -0500
@@ -346,6 +346,7 @@ extern void gelic_net_tx_timeout(struct 
 extern int gelic_net_change_mtu(struct net_device *netdev, int new_mtu);
 extern int gelic_net_setup_netdev(struct net_device *netdev,
  struct gelic_card *card);
+extern void gelic_net_poll_controller(struct net_device *netdev);
 
 /* shared ethtool ops */
 extern void gelic_net_get_drvinfo(struct net_device *netdev,

-- 
dwmw2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH FINAL] Merge the Sonics Silicon Backplane subsystem

2007-08-01 Thread David Woodhouse
On Tue, 2007-07-31 at 20:26 -0700, Andrew Morton wrote:
> Look.  Kconfig's `select' Just.  Does.  Not.  Work.  If you find
> yourself contemplating using it, please, don sackcloth, take a cold
> shower and several analgesics, then have another go, OK?

Amen.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix incorrect prototype for ipxrtr_route_packet()

2007-05-17 Thread David Woodhouse
The function ipxrtr_route_packet() takes a 'len' argument of type
size_t. However, its prototype in af_ipx.c incorrectly suggests that the
corresponding argument is of type 'int' instead.

Discovered by building with --combine and letting the compiler see it
all at once.

Signed-off-by: David Woodhouse <[EMAIL PROTECTED]>

--- a/net/ipx/af_ipx.c
+++ b/net/ipx/af_ipx.c
@@ -87,7 +87,7 @@ extern int ipxrtr_add_route(__be32 network, struct 
ipx_interface *intrfc,
unsigned char *node);
 extern void ipxrtr_del_routes(struct ipx_interface *intrfc);
 extern int ipxrtr_route_packet(struct sock *sk, struct sockaddr_ipx *usipx,
-  struct iovec *iov, int len, int noblock);
+  struct iovec *iov, size_t len, int noblock);
 extern int ipxrtr_route_skb(struct sk_buff *skb);
 extern struct ipx_route *ipxrtr_lookup(__be32 net);
 extern int ipxrtr_ioctl(unsigned int cmd, void __user *arg);

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix incorrect prototype for ipxrtr_route_packet()

2007-05-17 Thread David Woodhouse
On Thu, 2007-05-17 at 15:27 -0700, Andrew Morton wrote:
> Lovely.  So it was actually generating wrong code on all
> sizeof(size_t)!=sizeof(int) architectures.

I was trying to work out which architectures would actually be affected.
Probably not many. If it's in a register it probably doesn't matter --
the size only matters if you put it on the stack. And it would have to
be a 64-bit architecture, none of which (iirc) are so register-starved
that this particular argument would be on the stack anyway.

So I'm not actually sure it will really generate wrong code on any
architecture we support -- not that this excuses it in any way :)

> If only we could find some way in which all callers of a function as
> well as its definition can see the same declaration? 

Well, building with --combine helps. :)

Admittedly it doesn't necessarily catch _all_ callers -- only those
which we actually compile together. That does cover _most_ of the times
we do crap like this without a prototype in a header file though, I
suspect.

As Geert points out, sparse can warn about non-static functions which
aren't prototyped; if we start getting anal about that, it might help.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [1/2] 2.6.22-rc4: known regressions with patches

2007-06-05 Thread David Woodhouse
On Tue, 2007-06-05 at 16:54 +0200, Michal Piotrowski wrote:
> File systems
> 
> Subject: JFFS2 issues
> References : 
> http://lists.infradead.org/pipermail/linux-mtd/2007-May/018426.html
> Submitter  : Haavard Skinnemoen <[EMAIL PROTECTED]>
> Caused-By  : commit 10731f83009e2556f98ffa5c7c2cbffe66dacfb3
>  Artem Bityutskiy <[EMAIL PROTECTED]>
> Handled-By : Artem Bityutskiy <[EMAIL PROTECTED]>
> Patch  : 
> http://lists.infradead.org/pipermail/linux-mtd/2007-May/018453.html
> Status : patch available
> 

This is fixed in 2.6.22-rc4 with this commit:
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ea55d30798ac206c9f584ac264b6b8eb093d237a

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cbe-oss-dev] [PATCH 0/15] spidernet driver bug fixes

2007-06-14 Thread David Woodhouse
On Wed, 2007-06-13 at 11:14 -0500, Linas Vepstas wrote:
> Some googling seems to show that "git pull" has a bug/feature of
> ignoring the branch that one is working in, and pulling "master"
> no matter what.  I have no clue why; this seems broken to me.

Branches are generally a PITA -- it's probably best just to avoid them
entirely. It's not as if it's hard to create separate trees, and even
share objects between the trees.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cbe-oss-dev] [PATCH 0/15] spidernet driver bug fixes

2007-06-14 Thread David Woodhouse
On Thu, 2007-06-14 at 19:01 -0400, Jeff Garzik wrote:
> It makes diffing between lines of development more difficult, takes up
> more overall space, less cache friendly, ... 

All of which is much less true if you're sharing object directories or
even using alternates.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cbe-oss-dev] [PATCH 0/15] spidernet driver bug fixes

2007-06-14 Thread David Woodhouse
On Thu, 2007-06-14 at 19:04 -0400, Jeff Garzik wrote:
> Think about the actual kernel tree source code, not just the
> metadata...

Disk is cheap. Waiting for the whole damn thing to rebuild after
switching branches and back again is less so.

Besides, checking it out is optional.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc6: known regressions with patches

2007-09-12 Thread David Woodhouse
On Wed, 2007-09-12 at 18:59 +0200, Michal Piotrowski wrote:
> MTD
> 
> Subject : include/linux/mtd/map.h:128:2: error: #error "No bus width 
> supported. What's the point?"
> References  : http://lkml.org/lkml/2007/9/11/151
> Last known good : ?
> Submitter   : Toralf Förster <[EMAIL PROTECTED]>
> Caused-By   : ?
> Handled-By  : ?
> Patch   : 
> http://git.infradead.org/?p=mtd-2.6.git;a=commit;h=241651d04d672fb685b2874707016cbbf95931e5
> Status  : patch available

Didn't you already _drop_ this from your list of regressions once?

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPV6]: Do not set IF_READY if device is down

2007-03-27 Thread David Woodhouse
On Thu, 2007-03-08 at 03:59 +, Linux Kernel Mailing List wrote:
> Gitweb: 
> http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c7ababbdc647e67e953d153ddf62cbdc9fe3297e
> Commit: c7ababbdc647e67e953d153ddf62cbdc9fe3297e
> Parent: 16bec31db751030171b31d7767fa3a5bdbe980ea
> Author: Herbert Xu <[EMAIL PROTECTED]>
> AuthorDate: Wed Mar 7 16:02:40 2007 -0800
> Committer:  David S. Miller <[EMAIL PROTECTED]>
> CommitDate: Wed Mar 7 16:08:12 2007 -0800
> 
> [IPV6]: Do not set IF_READY if device is down
> 
> Now that we add the IPv6 device at registration time we don't need
> to set IF_READY in ipv6_add_dev anymore because we will always get
> a NETDEV_UP event later on should the device ever become ready.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>
> Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

This commit seems to have broken NetworkManager, which removes the
link-local IPv6 address from each wireless interface while it's only
intending to use it for scanning, then adds it back when it actually
wants to _use_ the interface in question.

After this commit, any IPv6 address added to an interface which had none
will be marked as 'tentative', and will never become real.

A failing test looks like this

+ ip -6 addr del fe80::203:ccff:fe3c:27d/64 dev eth0
+ ip -6 addr del 2001:8b0:10b:1:203:ccff:fe3c:27d/64 dev eth0   
+ ip -6 addr list eth0  
+ ip -6 addr add fe80::203:ccff:fe3c:27d/64 dev eth0
+ ip -6 addr list eth0  
4: eth0:  mtu 1500 qlen 1000   
inet6 fe80::203:ccff:fe3c:27d/64 scope link tentative   
   valid_lft forever preferred_lft forever  
+ sleep 3   
+ ip -6 addr list eth0  
4: eth0:  mtu 1500 qlen 1000   
inet6 fe80::203:ccff:fe3c:27d/64 scope link tentative   
   valid_lft forever preferred_lft forever  

... while on a good kernel it looks like this...

+ ip -6 addr del fe80::203:ccff:fe3c:27d/64 dev eth0
+ ip -6 addr del 2001:8b0:10b:1:203:ccff:fe3c:27d/64 dev eth0   
+ ip -6 addr list eth0  
+ ip -6 addr add fe80::203:ccff:fe3c:27d/64 dev eth0
+ ip -6 addr list eth0  
2: eth0:  mtu 1500 qlen 1000   
inet6 fe80::203:ccff:fe3c:27d/64 scope link tentative   
   valid_lft forever preferred_lft forever  
+ sleep 3   
+ ip -6 addr list eth0  
2: eth0:  mtu 1500 qlen 1000   
inet6 2001:8b0:10b:1:203:ccff:fe3c:27d/64 scope global dynamic  
   valid_lft 298sec preferred_lft 118sec
inet6 fe80::203:ccff:fe3c:27d/64 scope link 
   valid_lft forever preferred_lft forever  

If I add an extra IPv6 address before running the script on a 'bad'
kernel, it behaves correctly. The problem only happens when you remove
_all_ IPv6 addresses from an interface.

Answers of "don't do that then" should be accompanied with an
alternative plan which allows NetworkManager to use a wireless device
for scanning without actually having any other effect.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPV6]: Do not set IF_READY if device is down

2007-03-27 Thread David Woodhouse
On Wed, 2007-03-28 at 07:30 +1000, Herbert Xu wrote:
> Sorry, that patch is indeed broken.  We do need to set IF_READY in
> the case where all addresses were deleted (including the link-local)
> and then recreated.  The IPv6 device will be destroyed and recreated
> too in that case. 

Your new patch fixes the problem for me, at least with my test script.
Thanks.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20-rc5: known regressions with patches (v2)

2007-01-18 Thread David Woodhouse
On Thu, 2007-01-18 at 20:59 +0100, Adrian Bunk wrote:
> Subject: CONFIG_JFFS2_FS_DEBUG=2 compile error
> References : http://lkml.org/lkml/2007/1/12/161
> Submitter  : Russell King <[EMAIL PROTECTED]>
> Caused-By  : Al Viro <[EMAIL PROTECTED]>
>  commit 914e26379decf1fd984b22e51fd2e4209b7a7f1b
> Handled-By : David Woodhouse <[EMAIL PROTECTED]>
> Status : patch available

Linus, please pull from git://git.infradead.org/mtd-2.6.git

This fixes the above bug along with a few others. It does also contain a
small amount of new code which has been waiting for a while (including
the driver for the CAFÉ NAND controller which we use on OLPC.).

My apologies for missing the merge window and first asking you to pull
this a few hours after 2.6.20-rc1 was cut; I'd been waiting for the
bitrev stuff to land, and had waited too long.

Adrian Bunk (3):
  [MTD] SSFDC must depend on BLOCK
  [MTD] [NAND] rtc_from4.c: use lib/bitrev.c
  [MTD] make drivers/mtd/cmdlinepart.c:mtdpart_setup() static

Adrian Hunter (2):
  [MTD] OneNAND: Implement read-while-load
  [MTD] OneNAND: Handle DDP chip boundary during read-while-load

Akinobu Mita (1):
  [JFFS2] Use rb_first() and rb_last() cleanup

Alan Cox (1):
  [MTD] MAPS: esb2rom: use hotplug safe interfaces

Alexey Dobriyan (1):
  [MTD] JEDEC probe: fix comment typo (devic)

Amit Choudhary (1):
  [JFFS2] Fix error-path leak in summary scan

Andrew Morton (1):
  [MTD] Tidy bitrev usage in rtc_from4.c

Andrew Victor (2):
  [MTD] NAND: AT91 NAND driver
  [MTD] NAND: Support for 16-bit bus-width on AT91.

Artem Bityutskiy (10):
  [MTD] core: trivial comments fix
  [MTD] NAND: nandsim: support subpage write
  [MTD] increase MAX_MTD_DEVICES
  [MTD] add get_mtd_device_nm() function
  [MTD] add get and put methods
  [MTD] return error code from get_mtd_device()
  [MTD] nandsim: bugfix in page addressing
  [JFFS2] add cond_resched() when garbage collecting deletion dirent
  [JFFS2] Reschedule in loops
  [MTD] OneNAND: release CPU in cycles

Burman Yan (1):
  [MTD] replace kmalloc+memset with kzalloc

Dave Olsen (1):
  [MTD] [MAPS] Support for BIOS flash chips on the nvidia ck804 southbridge

David Anders (1):
  [MTD] NOR: leave Intel chips in read-array mode on suspend

David Woodhouse (29):
  [MTD NAND] Initial import of CAFÉ NAND driver.
  [MTD NAND] OLPC CAFÉ driver update
  Merge branch 'master' of git://git.kernel.org/.../torvalds/linux-2.6
  [MTD] NAND: Combined oob buffer so it's contiguous with data
  [MTD] NAND: Correct setting of chip->oob_poi OOB buffer
  Merge git://git.infradead.org/~dwmw2/cafe-2.6
  [MTD] NAND: Add hardware ECC correction support to CAFÉ NAND driver
  [MTD] NAND: CAFÉ NAND driver cleanup, fix ECC on reading empty flash
  [MTD] NAND: Disable ECC checking on CAFÉ since it's broken for now
  [MTD] NAND: Café ECC -- remove spurious BUG_ON() in err_pos()
  [MTD] NAND: Reset Café controller before initialising.
  [MTD] CAFÉ NAND: Add 'slowtiming' parameter, default usedma and checkecc 
on
  [MTD] NAND: Add ECC debugging for CAFÉ
  [MTD] NAND: Remove empty block ECC workaround
  [MTD] NAND: Fix timing calculation in CAFÉ debugging message
  [MTD] NAND: Use register #defines throughout CAFÉ driver, not numbers
  [MTD] NAND: Add register debugging spew option to CAFÉ driver
  [MTD] NAND: Fix ECC settings in CAFÉ controller driver.
  Merge git://git.infradead.org/~dwmw2/cafe-2.6
  Merge git://git.infradead.org/~kmpark/onenand-mtd-2.6
  [MTD] [NAND] Update CAFÉ driver interrupt handler prototype
  [MTD] Use EXPORT_SYMBOL_GPL() for exported symbols.
  [MTD] Remove trailing whitespace
  Merge branch 'master' of git://git.kernel.org/.../torvalds/linux-2.6
  [MTD] Fix SSFDC build for variable blocksize.
  [MTD] Fix ssfdc blksize typo
  Merge branch 'master' of git://git.infradead.org/~kmpark/onenand-mtd-2.6
  [JFFS2] debug.h: include  for current->pid
  Merge branch 'master' of git://git.kernel.org/.../torvalds/linux-2.6

Haavard Skinnemoen (1):
  [MTD] bugfix: DataFlash is not bit writable

Jeff Garzik (1):
  [JFFS2] kill warning RE debug-only variables

Josh Boyer (1):
  [MTD] add MTD_BLKDEVS Kconfig option

Kyungmin Park (9):
  MTD: OneNAND: interrupt based wait support
  [MTD] OneNAND: lock support
  [MTD] OneNAND: Single bit error detection
  [MTD] OneNAND: fix oob handling in recent oob patch
  [JFFS2] use the ref_offset macro
  [MTD] OneNAND: fix onenand_wait bug
  [MTD] OneNAND: add subpage write support
  [MTD] OneNAND: fix onenand_wait bug in read ecc error
  [MTD] OneNAND: return ecc error code only when 2-bit ecc occurs

Lew Glendenning (1):
  [MTD] MAPS: Support for BIOS f

Re: [PATCH 1/7] 8139cp: Improve accuracy of cp_interrupt() return, to survive IRQ storms

2015-10-28 Thread David Woodhouse
On Wed, 2015-09-23 at 10:58 -0700, David Miller wrote:
> From: David Woodhouse 
> Date: Wed, 23 Sep 2015 09:14:09 +0100
> 
> > But sure, for now I'll drop this from the series and I can try to
> > convince you separately.
> 
> Yes, let's discuss this independantly to the nice bug fixing
> that's happening in the rest of this series.
> 
> Thanks.

This is the patch we discussed earlier, and I came away from the
conversation believing that I had achieved my goal of trying to
convince you separately :)

The patch in patchwork at https://patchwork.ozlabs.org/patch/520318/
still applies cleanly; is that sufficient or would you like it
resubmitted (perhaps after -rc1)?

Thanks.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


[PATCH] Fix AF_PACKET ABI breakage in 4.2

2015-09-23 Thread David Woodhouse
Commit 7d82410950aa ("virtio: add explicit big-endian support to memory
accessors") accidentally changed the virtio_net header used by
AF_PACKET with PACKET_VNET_HDR from host-endian to big-endian.

Since virtio_legacy_is_little_endian() is a very long identifier,
define a VIO_LE macro and use that throughout the code instead of the
hard-coded 'false' for little-endian.

This restores the ABI to match 4.1 and earlier kernels, and makes my
test program work again.

Signed-off-by: David Woodhouse 
---
Or perhaps we should just use (__force u16) and (__force __virtio16)
since that's all we really want anyway?

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 7b8e39a..d646623 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -230,6 +230,8 @@ struct packet_skb_cb {
} sa;
 };
 
+#define VIO_LE virtio_legacy_is_little_endian()
+
 #define PACKET_SKB_CB(__skb)   ((struct packet_skb_cb *)((__skb)->cb))
 
 #define GET_PBDQC_FROM_RB(x)   ((struct tpacket_kbdq_core *)(&(x)->prb_bdqc))
@@ -2680,15 +2682,15 @@ static int packet_snd(struct socket *sock, struct 
msghdr *msg, size_t len)
goto out_unlock;
 
if ((vnet_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) &&
-   (__virtio16_to_cpu(false, vnet_hdr.csum_start) +
-__virtio16_to_cpu(false, vnet_hdr.csum_offset) + 2 >
- __virtio16_to_cpu(false, vnet_hdr.hdr_len)))
-   vnet_hdr.hdr_len = __cpu_to_virtio16(false,
-__virtio16_to_cpu(false, vnet_hdr.csum_start) +
-   __virtio16_to_cpu(false, vnet_hdr.csum_offset) 
+ 2);
+   (__virtio16_to_cpu(VIO_LE, vnet_hdr.csum_start) +
+__virtio16_to_cpu(VIO_LE, vnet_hdr.csum_offset) + 2 >
+ __virtio16_to_cpu(VIO_LE, vnet_hdr.hdr_len)))
+   vnet_hdr.hdr_len = __cpu_to_virtio16(VIO_LE,
+__virtio16_to_cpu(VIO_LE, vnet_hdr.csum_start) 
+
+   __virtio16_to_cpu(VIO_LE, vnet_hdr.csum_offset) 
+ 2);
 
err = -EINVAL;
-   if (__virtio16_to_cpu(false, vnet_hdr.hdr_len) > len)
+   if (__virtio16_to_cpu(VIO_LE, vnet_hdr.hdr_len) > len)
goto out_unlock;
 
if (vnet_hdr.gso_type != VIRTIO_NET_HDR_GSO_NONE) {
@@ -2731,7 +2733,7 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
hlen = LL_RESERVED_SPACE(dev);
tlen = dev->needed_tailroom;
skb = packet_alloc_skb(sk, hlen + tlen, hlen, len,
-  __virtio16_to_cpu(false, vnet_hdr.hdr_len),
+  __virtio16_to_cpu(VIO_LE, vnet_hdr.hdr_len),
   msg->msg_flags & MSG_DONTWAIT, &err);
if (skb == NULL)
goto out_unlock;
@@ -2778,8 +2780,8 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
 
if (po->has_vnet_hdr) {
if (vnet_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
-   u16 s = __virtio16_to_cpu(false, vnet_hdr.csum_start);
-   u16 o = __virtio16_to_cpu(false, vnet_hdr.csum_offset);
+   u16 s = __virtio16_to_cpu(VIO_LE, vnet_hdr.csum_start);
+   u16 o = __virtio16_to_cpu(VIO_LE, vnet_hdr.csum_offset);
if (!skb_partial_csum_set(skb, s, o)) {
err = -EINVAL;
goto out_free;
@@ -2787,7 +2789,7 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
}
 
skb_shinfo(skb)->gso_size =
-   __virtio16_to_cpu(false, vnet_hdr.gso_size);
+   __virtio16_to_cpu(VIO_LE, vnet_hdr.gso_size);
skb_shinfo(skb)->gso_type = gso_type;
 
/* Header must be checked, and gso_segs computed. */
@@ -3161,9 +3163,9 @@ static int packet_recvmsg(struct socket *sock, struct 
msghdr *msg, size_t len,
 
/* This is a hint as to how much should be linear. */
vnet_hdr.hdr_len =
-   __cpu_to_virtio16(false, skb_headlen(skb));
+   __cpu_to_virtio16(VIO_LE, skb_headlen(skb));
vnet_hdr.gso_size =
-   __cpu_to_virtio16(false, sinfo->gso_size);
+   __cpu_to_virtio16(VIO_LE, sinfo->gso_size);
if (sinfo->gso_type & SKB_GSO_TCPV4)
vnet_hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
else if (sinfo->gso_type & SKB_GSO_TCPV6)
@@ -3181,9 +3183,9 @@ static int packet_recvmsg(stru

Re: [RFC PATCH 1/3] Avoid making inappropriate requests of NETIF_F_V[46]_CSUM devices

2015-09-23 Thread David Woodhouse
On Mon, 2015-09-21 at 17:29 +0100, David Woodhouse wrote:
> 
> Did we ever resolve this? AFAICT from inspecting the code the
> virtio_net device still advertises hardware csum capabilities to the
> guest. And accepts packets which need checksumming, calling
> skb_partial_csum_set() as appropriate. Likewise tun, xen, macvtap and
> af_packet.

Here's a test case which provokes the network stack into handing a
CHECKSUM_PARTIAL skb to a device which it knows can't handle it. (It
obviously needs the AF_PACKET endianness ABI fix I sent earlier.)

You might well be right to refer to this as 'injecting unchecked crap',
but we are *gaining* injection points with the ability to do this, and
for not entirely insane reasons — people want to be able to make full
use of hardware offload capabilities.

And we *have* a safety check, to avoid handing CHECKSUM_PARTIAL buffers
to devices which can't handle them. We already do check the
capabilities of the device we end up routing it to, and complete the
checksum in software if the device can't cope.

All we're talking about here is a corner case when that existing check
doesn't actually give the right results, because it assumes a device
with NETIF_F_IP_CSUM can checksum *all* Legacy IP packets, not just TCP
and UDP.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
//#include 
#define PACKET_VNET_HDR 15

struct virtio_net_hdr
{
uint8_t flags;
uint8_t gso_type;
uint16_t hdr_len;
uint16_t gso_size;
uint16_t csum_start;
uint16_t csum_offset;
};
#define VIRTIO_NET_HDR_F_NEEDS_CSUM 1

unsigned char eth_hdr[] = {
	0x52, 0x54, 0x00, 0x3a, 0xbe, 0x28, 0x52, 0x54, 0x00, 0x74, 0x2f, 0xfd, 0x08, 0x00
};
unsigned char icmp_pkt[] = {
0x45, 0x00, 0x00, 0x54, 0x11, 0xec, 0x40, 0x00, 0x40, 0x01, 0xb3, 0x58, 0xc0, 0xa8, 0x7a, 0x12,
0xc0, 0xa8, 0x7a, 0x01, 0x08, 0x00, 0x00, 0x00, 0x07, 0xd2, 0x00, 0x01, 0x63, 0x7f, 0x02, 0x56,
0x00, 0x00, 0x00, 0x00, 0x7c, 0x1b, 0x0b, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x11, 0x12, 0x13,
0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20, 0x21, 0x22, 0x23,
0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f, 0x30, 0x31, 0x32, 0x33,
0x34, 0x35, 0x36, 0x37
};

unsigned char udp_pkt[] = {
	0x45, 0x00, 0x00, 0x22, 0xc4, 0x25, 0x40, 0x00, 0x40, 0x11, 0x01, 0x41, 0xc0, 0xa8, 0x7a, 0x12,
0xc0, 0xa8, 0x7a, 0x01, 0xae, 0xc7, 0x1f, 0x90, 0x00, 0x0e, 0x75, 0x84, 0x68, 0x65, 0x6c, 0x6c,
0x6f, 0x0a
};

int main(int argc, char **argv)
{
	struct ifreq req;
	struct sockaddr_ll lladdr;
	int fd, ret, val;
	struct {
		struct virtio_net_hdr vnet;
		unsigned char eth[14];
		unsigned char pkt[2048];
	} buf;

	if (argc != 2) {
		fprintf(stderr, "Usage: %s \n", argv[0]);
		exit(1);
	}
	fd = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_IP));
	if (fd < 0) {
		perror("socket");
		exit(1);
	}

	memset(&req, 0, sizeof(req));
	strncpy(req.ifr_name, argv[1], IFNAMSIZ-1);
	ret = ioctl(fd, SIOCGIFINDEX, &req);
	if (ret < 0) {
		perror("SIOGIFINDEX");
		exit(1);
	}

	memset(&lladdr, 0, sizeof(lladdr));
	lladdr.sll_family   = AF_PACKET;
	lladdr.sll_protocol = htons(ETH_P_IP);
	lladdr.sll_ifindex  = req.ifr_ifindex;
	ret = bind(fd, (const struct sockaddr *)&lladdr, sizeof(lladdr));
	if (ret < 0) {
		perror("bind");
		exit(1);
	}

val = 1;
ret = setsockopt(fd, SOL_PACKET, PACKET_VNET_HDR, (const char *)&val,
			 sizeof(val));
if (ret < 0) {
		perror("setsockopt(SOL_PACKET, PACKET_VNET_HDR)");
		exit(1);
	}

	memset(&buf.vnet, 0, sizeof(buf.vnet));
	memcpy(&buf.eth, eth_hdr, sizeof(eth_hdr));

	memcpy(&buf.pkt, udp_pkt, sizeof(udp_pkt));
	buf.vnet.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
	buf.vnet.csum_start = 0x22;
	buf.vnet.csum_offset = 0x6;
	if (write(fd, (void *)&buf, sizeof(buf.vnet) + 14 + sizeof(udp_pkt)) < 0) {
	perror("Write UDP packet");
		exit(1);
	}

	memcpy(&buf.pkt, icmp_pkt, sizeof(icmp_pkt));
	buf.vnet.csum_offset = 0x2;
	if (write(fd, (void *)&buf, sizeof(buf.vnet) + 14 + sizeof(icmp_pkt)) < 0) {
		perror("Write ICMP packet");
		exit(1);
	}
	return 0;
}


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 2/2] Convert smsc911x to use ACPI as well as DT

2015-09-23 Thread David Woodhouse
On Wed, 2015-08-12 at 17:06 -0500, Jeremy Linton wrote:
> 
>  
> +static const struct acpi_device_id smsc911x_acpi_match[] = {
> +   { "ARMH9118", 0 },
> +   { }
> +};
> +MODULE_DEVICE_TABLE(acpi, smsc911x_acpi_match);
> +
>  static struct platform_driver smsc911x_driver = {
> .probe = smsc911x_drv_probe,
> .remove = smsc911x_drv_remove,
> @@ -2661,6 +2656,7 @@ static struct platform_driver smsc911x_driver =
> {
> .name   = SMSC_CHIPNAME,
> .pm = SMSC911X_PM_OPS,
> .of_match_table = of_match_ptr(smsc911x_dt_ids),
> +   .acpi_match_table = ACPI_PTR(smsc911x_acpi_match),
> },
>  };

Hm, surely you shouldn't need this part? If the ACPI device has a HID
of PRP0001, and an appropriate "compatible" property, then it ought to
be loaded anyway. If that doesn't work, we should fix that.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


[PATCH v2] Fix AF_PACKET ABI breakage in 4.2

2015-09-23 Thread David Woodhouse
Commit 7d82410950aa ("virtio: add explicit big-endian support to memory
accessors") accidentally changed the virtio_net header used by
AF_PACKET with PACKET_VNET_HDR from host-endian to big-endian.

Since virtio_legacy_is_little_endian() is a very long identifier,
define a VIO_LE macro and use that throughout the code instead of the
hard-coded 'false' for little-endian.

This restores the ABI to match 4.1 and earlier kernels, and makes my
test program work again.

Signed-off-by: David Woodhouse 
---
On Wed, 2015-09-23 at 11:09 -0700, David Miller wrote:
> > +#define VIO_LE virtio_legacy_is_little_endian()
> 
> When you define a shorthand macro, the defines to a function call,
> make the macro have parenthesis too.

In which case I suppose it also wants to be lower-case. Although
"function call" is a bit strong since it's effectively just a constant.
I'm still wondering if it'd be nicer just to use (__force u16) instead.

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 7b8e39a..aa4b15c 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -230,6 +230,8 @@ struct packet_skb_cb {
} sa;
 };
 
+#define vio_le() virtio_legacy_is_little_endian()
+
 #define PACKET_SKB_CB(__skb)   ((struct packet_skb_cb *)((__skb)->cb))
 
 #define GET_PBDQC_FROM_RB(x)   ((struct tpacket_kbdq_core *)(&(x)->prb_bdqc))
@@ -2680,15 +2682,15 @@ static int packet_snd(struct socket *sock, struct 
msghdr *msg, size_t len)
goto out_unlock;
 
if ((vnet_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) &&
-   (__virtio16_to_cpu(false, vnet_hdr.csum_start) +
-__virtio16_to_cpu(false, vnet_hdr.csum_offset) + 2 >
- __virtio16_to_cpu(false, vnet_hdr.hdr_len)))
-   vnet_hdr.hdr_len = __cpu_to_virtio16(false,
-__virtio16_to_cpu(false, vnet_hdr.csum_start) +
-   __virtio16_to_cpu(false, vnet_hdr.csum_offset) 
+ 2);
+   (__virtio16_to_cpu(vio_le(), vnet_hdr.csum_start) +
+__virtio16_to_cpu(vio_le(), vnet_hdr.csum_offset) + 2 >
+ __virtio16_to_cpu(vio_le(), vnet_hdr.hdr_len)))
+   vnet_hdr.hdr_len = __cpu_to_virtio16(vio_le(),
+__virtio16_to_cpu(vio_le(), 
vnet_hdr.csum_start) +
+   __virtio16_to_cpu(vio_le(), 
vnet_hdr.csum_offset) + 2);
 
err = -EINVAL;
-   if (__virtio16_to_cpu(false, vnet_hdr.hdr_len) > len)
+   if (__virtio16_to_cpu(vio_le(), vnet_hdr.hdr_len) > len)
goto out_unlock;
 
if (vnet_hdr.gso_type != VIRTIO_NET_HDR_GSO_NONE) {
@@ -2731,7 +2733,7 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
hlen = LL_RESERVED_SPACE(dev);
tlen = dev->needed_tailroom;
skb = packet_alloc_skb(sk, hlen + tlen, hlen, len,
-  __virtio16_to_cpu(false, vnet_hdr.hdr_len),
+  __virtio16_to_cpu(vio_le(), vnet_hdr.hdr_len),
   msg->msg_flags & MSG_DONTWAIT, &err);
if (skb == NULL)
goto out_unlock;
@@ -2778,8 +2780,8 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
 
if (po->has_vnet_hdr) {
if (vnet_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
-   u16 s = __virtio16_to_cpu(false, vnet_hdr.csum_start);
-   u16 o = __virtio16_to_cpu(false, vnet_hdr.csum_offset);
+   u16 s = __virtio16_to_cpu(vio_le(), 
vnet_hdr.csum_start);
+   u16 o = __virtio16_to_cpu(vio_le(), 
vnet_hdr.csum_offset);
if (!skb_partial_csum_set(skb, s, o)) {
err = -EINVAL;
goto out_free;
@@ -2787,7 +2789,7 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
}
 
skb_shinfo(skb)->gso_size =
-   __virtio16_to_cpu(false, vnet_hdr.gso_size);
+   __virtio16_to_cpu(vio_le(), vnet_hdr.gso_size);
skb_shinfo(skb)->gso_type = gso_type;
 
/* Header must be checked, and gso_segs computed. */
@@ -3161,9 +3163,9 @@ static int packet_recvmsg(struct socket *sock, struct 
msghdr *msg, size_t len,
 
/* This is a hint as to how much should be linear. */
vnet_hdr.hdr_len =
-   __cpu_to_virtio16(false, skb_headlen(skb));
+   __cpu_to_virtio16(vio_le(), skb_headlen(skb));
vnet_hdr.gso_size =
-   __cpu_to_virtio16(false, sinfo-&g

Re: [PATCH 1/7] 8139cp: Improve accuracy of cp_interrupt() return, to survive IRQ storms

2015-09-23 Thread David Woodhouse
On Wed, 2015-09-23 at 10:58 -0700, David Miller wrote:
> From: David Woodhouse 
> Date: Wed, 23 Sep 2015 09:14:09 +0100
> 
> > But sure, for now I'll drop this from the series and I can try to
> > convince you separately.
> 
> Yes, let's discuss this independantly to the nice bug fixing
> that's happening in the rest of this series.

Since you explicitly call them 'bug fixes'... I suppose the last patch
in the latest series (enabling the various offloads by default) isn't a
bug fix and should be held back for net-next.

I think I can live with pushing the rest for the net tree. Just to
recap, the series is:

  8139cp: Do not re-enable RX interrupts in cp_tx_timeout()
  8139cp: Fix tx_queued debug message to print correct slot numbers
  8139cp: Fix TSO/scatter-gather descriptor setup
  8139cp: Reduce duplicate csum/tso code in cp_start_xmit()
  8139cp: Fix DMA unmapping of transmitted buffers
  8139cp: Dump contents of descriptor ring on TX timeout
  8139cp: Enable offload features by default

The penultimate patch is also not strictly a bug fix, but it's only
adding additional diagnostics to a code path which was already broken 
until I fixed it anyway, so I can live with that in net although I'll
also be happy if you want to defer it.

The fourth patch which removes the three duplicate versions of the
csum/tso checks might also be deferrable but really, it's *also*
fixing the failure mode when we see an inappropriate CHECKSUM_PARTIAL
packet, and it's touching a code path which I've *also* fixed in this
patch set. So it might as well stay.

I can shuffle things around if you disagree, but assuming Francois
concurs I'd suggest merging patches 1-6 (or just pulling from
git://git.infradead.org/users/dwmw2/linux-8139cp.git fixes-only)
and then I'll resubmit the last patch some time later, after you
next pull net into net-next.

Otherwise, please let me know if/how you'd like me to reorganise it.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 2/2] Convert smsc911x to use ACPI as well as DT

2015-09-23 Thread David Woodhouse
On Wed, 2015-09-23 at 22:51 +0200, Rafael J. Wysocki wrote:
> 
> But if the device ID is assigned already, why would it hurt to 
> actually use it?

It doesn't hurt at all, as long as we understand that there was no need
to assign it a device ID at all, at least for Linux's benefit. And as
long as it doesn't set a precedent that makes other people think they
need to do so.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


[PATCH] 8139cp: Set GSO max size and enforce it

2015-09-23 Thread David Woodhouse
The maximum MSS value we can set is 0xFFF. When fixing the TSO code I
noticed we just mask the gso_size value to the hardware's maximum and
don't care about the consequences. That seems... unsagacious.

Instead of masking, WARN and drop the packet if it's too large. And use
 netif_set_gso_max_size() so it shouldn't happen in the first place.

Signed-off-by: David Woodhouse 
---
 drivers/net/ethernet/realtek/8139cp.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index e5173f3..fa8f850 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -754,10 +754,16 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
eor = (entry == (CP_TX_RING_SIZE - 1)) ? RingEnd : 0;
mss = skb_shinfo(skb)->gso_size;
 
+   if (mss > MSSMask) {
+   WARN_ONCE(1, "Net bug: GSO size %d too large for 8139CP\n",
+ mss);
+   goto out_dma_error;
+   }
+
opts2 = cpu_to_le32(cp_tx_vlan_tag(skb));
opts1 = DescOwn;
if (mss)
-   opts1 |= LargeSend | ((mss & MSSMask) << MSSShift);
+   opts1 |= LargeSend | (mss << MSSShift);
else if (skb->ip_summed == CHECKSUM_PARTIAL) {
const struct iphdr *ip = ip_hdr(skb);
if (ip->protocol == IPPROTO_TCP)
@@ -1994,6 +2000,8 @@ static int cp_init_one (struct pci_dev *pdev, const 
struct pci_device_id *ent)
dev->vlan_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO |
NETIF_F_HIGHDMA;
 
+   netif_set_gso_max_size(dev, MSSMask);
+
rc = register_netdev(dev);
    if (rc)
    goto err_out_iomap;
-- 
2.4.3

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 1/7] 8139cp: Improve accuracy of cp_interrupt() return, to survive IRQ storms

2015-09-23 Thread David Woodhouse
On Wed, 2015-09-23 at 14:48 -0700, David Miller wrote:
> I've applied patches #1-#6 and left #7 sitting around in patchwork
> for when I next merge net into net-next, thanks.

Great, thanks. I've got one more for the net tree; just masking the
gso_size to the maximum (0xfff) that the hardware can handle, and never
telling the net stack that that's our maximum, doesn't seem like such a
good idea...

I think I really am going to stop looking now :)

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 1/7] 8139cp: Improve accuracy of cp_interrupt() return, to survive IRQ storms

2015-09-23 Thread David Woodhouse
On Thu, 2015-09-24 at 00:44 +0200, Francois Romieu wrote:
> David Woodhouse  :
> [...]
> > I can shuffle things around if you disagree, but assuming Francois
> > concurs I'd suggest merging patches 1-6 (or just pulling from
> > git://git.infradead.org/users/dwmw2/linux-8139cp.git fixes-only)
> > and then I'll resubmit the last patch some time later, after you
> > next pull net into net-next.
> 
> No objection but I do not really trust whatever review I do this late.
> 
> Did you try #5 with the DMA allocation debug code enabled ?

Only in QEMU (where there wasn't a problem in the first place). But
yes.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 1/7] 8139cp: Improve accuracy of cp_interrupt() return, to survive IRQ storms

2015-09-23 Thread David Woodhouse
On Thu, 2015-09-24 at 00:44 +0200, Francois Romieu wrote:
> David Woodhouse  :
> [...]
> > But sure, for now I'll drop this from the series and I can try to
> > convince you separately.
> 
> You may as well change the IRQ storm avoidance logic so that it does not
> require conformant driver code if you do not want to waste more time :o)

That's not entirely trivial — at a certain level we *do* require that
drivers don't lie to the core IRQ logic.

I suppose even if a rogue driver is returning IRQ_HANDLED in an IRQ
storm, when it's *not* actually doing anything or making any progress,
perhaps there's a way we could detect that we're always going straight
back into the next interrupt handler as *soon* as we exit the previous
one. Perhaps there's merit in exploring that option.

I'm not sure that gives you a valid excuse for not wanting to fix the
driver though :)

I might start by "clarifying" the documentation on the IRQ handler's
return value...

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 2/2] Convert smsc911x to use ACPI as well as DT

2015-09-24 Thread David Woodhouse
On Wed, 2015-09-23 at 16:56 -0700, Hanjun Guo wrote:
> 
> It really depends on the people who writing the ASL code (DSDT),
> I'm not sure if Windows will use _DSD and how to use it, but
> firmware guys may just use the device ID to make the firmware
> to be usable both by Linux and Windows, and that's reasonable
> I think.

And the ideal way to do that is to add a _CID of "PRP0001" and a
compatible string, and then Linux doesn't need to care about the
special new HID that you've added purely to pander to the limitations
of Windows.

That way, once drivers are converted from the legacy of-property APIs
to the new device-property APIs, there should be *nothing* required to
make device drivers work under ACPI. That was kind of the point.

If people want to add HIDs, fine. But let's not go down a path which
*requires* modifying drivers.

(I'm planning to try to learn Coccinelle and do a bombing run convertin
to the new API some time soon, FWIW).

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


[PATCH WTF] 8139cp: Fix GSO MSS handling

2015-09-24 Thread David Woodhouse
When fixing the TSO support I noticed we just mask skb->gso_size with
the MSSMask value and don't care about the consequences.

That seems... unsagacious.

Instead of masking, WARN and drop the packet if the maximum is exceeded.
And call netif_set_gso_max_size() so it shouldn't happen in the first
place.

Finally, Francois Romieu noticed that we didn't even have the right
value for MSSMask anyway; it should be 0x7ff (11 bits) not 0xfff.

Signed-off-by: David Woodhouse 
---

Nice and simple and obvious, right?

So why does it stop TSO from happening at all?

When I call netif_set_gso_max_size() with a value of 0xaad (2733) or
higher, I see TSO being used (with an MSS of 1214; this is just between
a VM and its host; Legacy IP with an MTU of 1500).

If I set a maximum of 0xaac or lower, TSO never gets used. Am I doing
something wrong? Or is there a problem somewhere else?

 drivers/net/ethernet/realtek/8139cp.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index 686334f..ae24d42 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -175,7 +175,7 @@ enum {
LastFrag= (1 << 28), /* Final segment of a packet */
LargeSend   = (1 << 27), /* TCP Large Send Offload (TSO) */
MSSShift= 16,/* MSS value position */
-   MSSMask = 0xfff, /* MSS value: 11 bits */
+   MSSMask = 0x7ff, /* MSS value: 11 bits */
TxError = (1 << 23), /* Tx error summary */
RxError = (1 << 20), /* Rx error summary */
IPCS= (1 << 18), /* Calculate IP checksum */
@@ -754,10 +754,16 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
eor = (entry == (CP_TX_RING_SIZE - 1)) ? RingEnd : 0;
mss = skb_shinfo(skb)->gso_size;
 
+   if (mss > MSSMask) {
+   WARN_ONCE(1, "Net bug: GSO size %d too large for 8139CP\n",
+ mss);
+   goto out_dma_error;
+   }
+
opts2 = cpu_to_le32(cp_tx_vlan_tag(skb));
opts1 = DescOwn;
if (mss)
-   opts1 |= LargeSend | ((mss & MSSMask) << MSSShift);
+   opts1 |= LargeSend | (mss << MSSShift);
else if (skb->ip_summed == CHECKSUM_PARTIAL) {
const struct iphdr *ip = ip_hdr(skb);
if (ip->protocol == IPPROTO_TCP)
@@ -1994,6 +2000,8 @@ static int cp_init_one (struct pci_dev *pdev, const 
struct pci_device_id *ent)
dev->vlan_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO |
NETIF_F_HIGHDMA;
 
+   netif_set_gso_max_size(dev, MSSMask);
+
rc = register_netdev(dev);
if (rc)
goto err_out_iomap;
-- 
2.4.3

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v2] Fix AF_PACKET ABI breakage in 4.2

2015-09-24 Thread David Woodhouse
On Thu, 2015-09-24 at 12:55 +0300, Michael S. Tsirkin wrote:
> 
> I'm fine with this patch
> 
> Acked-by: Michael S. Tsirkin 

Thanks. In fact Dave has already merged it.

> but if you want to re-work it along the lines suggested
> by Greg, that's also fine with me.

If I'm going to define my own accessors, I'd probably just make them
use (__force __virtio16). But TBH none of the options seemed
particularly pretty to me. I can live with what we have.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


[PATCH v2 RFC] 8139cp: Fix GSO MSS handling

2015-09-24 Thread David Woodhouse
When fixing the TSO support I noticed we just mask ->gso_size with the
MSSMask value and don't care about the consequences.

Provide a .ndo_features_check() method which drops the NETIF_F_TSO
feature for any skb which would exceed the maximum, and thus forces it
to be segmented by software.

Then we can stop the masking in cp_start_xmit(), and just WARN if the
maximum is exceeded, which should now never happen.

Finally, Francois Romieu noticed that we didn't even have the right
value for MSSMask anyway; it should be 0x7ff (11 bits) not 0xfff.

Signed-off-by: David Woodhouse 
---
OK, netif_set_gso_max_size() was *not*, as I had naïvely assumed, a way
to set the maximum value of ->gso_size. That's an unfortunate set of
naming decisions.

I don't see a simple way to tell the net stack about the maximum MSS
value we can cope with in the general case. But we can do it this way
by clearing NETIF_F_TSO on a per-packet basis instead.

Incidentally, other drivers might benefit from doing the same if this
is considered reasonable. For example it looks like the r8169 driver
manually calls skb_gso_segment() for its fallback case, and I don't see
what then prevents it from hitting the "BUG!" at the start of
rtl8169_start_xmit() if it runs out of space in the descriptor ring.

 drivers/net/ethernet/realtek/8139cp.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index 686334f..fe1040d 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -175,7 +175,7 @@ enum {
LastFrag= (1 << 28), /* Final segment of a packet */
LargeSend   = (1 << 27), /* TCP Large Send Offload (TSO) */
MSSShift= 16,/* MSS value position */
-   MSSMask = 0xfff, /* MSS value: 11 bits */
+   MSSMask = 0x7ff, /* MSS value: 11 bits */
TxError = (1 << 23), /* Tx error summary */
RxError = (1 << 20), /* Rx error summary */
IPCS= (1 << 18), /* Calculate IP checksum */
@@ -754,10 +754,16 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
eor = (entry == (CP_TX_RING_SIZE - 1)) ? RingEnd : 0;
mss = skb_shinfo(skb)->gso_size;
 
+   if (mss > MSSMask) {
+   WARN_ONCE(1, "Net bug: GSO size %d too large for 8139CP\n",
+ mss);
+   goto out_dma_error;
+   }
+
opts2 = cpu_to_le32(cp_tx_vlan_tag(skb));
opts1 = DescOwn;
if (mss)
-   opts1 |= LargeSend | ((mss & MSSMask) << MSSShift);
+   opts1 |= LargeSend | (mss << MSSShift);
else if (skb->ip_summed == CHECKSUM_PARTIAL) {
const struct iphdr *ip = ip_hdr(skb);
if (ip->protocol == IPPROTO_TCP)
@@ -1852,6 +1858,15 @@ static void cp_set_d3_state (struct cp_private *cp)
pci_set_power_state (cp->pdev, PCI_D3hot);
 }
 
+static netdev_features_t cp_features_check(struct sk_buff *skb,
+  struct net_device *dev,
+  netdev_features_t features)
+{
+   if (skb_shinfo(skb)->gso_size > MSSMask)
+   features &= ~NETIF_F_TSO;
+
+   return vlan_features_check(skb, features);
+}
 static const struct net_device_ops cp_netdev_ops = {
.ndo_open   = cp_open,
.ndo_stop   = cp_close,
@@ -1864,6 +1879,7 @@ static const struct net_device_ops cp_netdev_ops = {
.ndo_tx_timeout = cp_tx_timeout,
.ndo_set_features   = cp_set_features,
.ndo_change_mtu = cp_change_mtu,
+   .ndo_features_check = cp_features_check,
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller= cp_poll_controller,
-- 
2.4.3

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 2/2] Convert smsc911x to use ACPI as well as DT

2015-09-24 Thread David Woodhouse
Hi Catalin,

I understand your concerns, but I'm still not convinced of your
conclusion.

On Thu, 2015-09-24 at 11:31 +0100, Catalin Marinas wrote:
> PRP0001 opens the door to any DT-like properties in ACPI and, knowing
> how the ARM ecosystem works, I'm pretty sure we'll soon lose control (if
> we haven't already; not all the developments are done in the open).

No. That door is wide open already — people can already do whatever
they like in _DSD properties. If they're going to screw up DT
properties, they'll screw up ACPI-only _DSD properties just the same.

You speak of maintaining "tight control of the _DSD properties that
are going to be used in ACPI tables"... but if you're going to
confiscate their crackpipe and stand over them while they work, you can
do that just as well whether they're using "PRP0001" or "FOO1234" as
their HID value.

In that sense, the HID is entirely orthogonal.

And think about it... we *really* don't want a given peripheral device
to have *different* bindings depending on whether it's discovered with
a specific ACPI HID, vs. when it's discovered via DT or the PRP0001
HID. That way lies complete insanity.

In some ways, your proposal would be actively *counterproductive*. You
say you want to train people *not* to keep patching the kernel. But
where they *could* have just used PRP0001 and used a pre-existing
kernel, you then tell them "oh, but now you need to patch the kernel
because we want you to make up a new HID and not be compatible with
what already exists."

If you go down this road, I predict we'll start seeing *separate*
drivers for identical components, because the bindings for DT vs. ACPI
properties are different. We really don't want that.

> The pro ARM ACPI camp has been very vocal against DT in the server
> space. I'd like to seem them as vocal about PRP0001, unless they find it
> acceptable (and should apologise for bashing DT ;)).

Fundamentally, that DT vs. ACPI distinction has gone away with the
introduction of _DSD. We *need* the flexibility that we gain from being
able to provide device properties rather than inventing a new HID for
every permutation, and with that flexibility comes a certain amount of
responsibility to do things sensibly.

People have not always designed their bindings sensibly. But that isn't
going to be magically solved in the ACPI world, unless you *do*
actually stand over them with their crackpipe in your hand, as I joked
above. And eschewing PRP0001 really doesn't help you with that. It just
makes things harder.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v2 RFC] 8139cp: Fix GSO MSS handling

2015-09-24 Thread David Woodhouse
On Thu, 2015-09-24 at 05:05 -0700, Eric Dumazet wrote:
> 
> Right, netif_skb_features() only has the following checks :
> 
> if (gso_segs > dev->gso_max_segs || gso_segs < dev->gso_min_segs)
> features &= ~NETIF_F_GSO_MASK;
> 
> But now we have .ndo_features_check() we could remove this generic 
> check from fast path.

Perhaps so, yes.

Any thoughts on the other reason I was staring at this same code path
this week? I am able to reliably feed inappropriate packets to a
NETIF_F_IP_CSUM-capable device with the test program at
http://bombadil.infradead.org/~dwmw2/raw.c (and equivalent code paths
via virtio_net, tun and others).

They're *supposed* to get checksummed by software if the device can't
cope, but netif_skb_features() returns the wrong answer, so we fail to
do that and they're fed with CHECKSUM_PARTIAL to a device which can't
handle them. Causing a WARN() or a BUG() or a silent corruption,
depending on the driver.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 2/2] Convert smsc911x to use ACPI as well as DT

2015-09-24 Thread David Woodhouse
On Thu, 2015-09-24 at 15:01 +0100, Lorenzo Pieralisi wrote:
> On Thu, Sep 24, 2015 at 12:52:38PM +0100, David Woodhouse wrote:
> 
> [...]
> 
> > And think about it... we *really* don't want a given peripheral device
> > to have *different* bindings depending on whether it's discovered with
> > a specific ACPI HID, vs. when it's discovered via DT or the PRP0001
> > HID. That way lies complete insanity.
> 
> There are "devices" that we do _not_ want to discover at all when
> booting with ACPI (eg drivers/clk, drivers/regulator) because the way
> they interact with ACPI standard power management AML methods may
> definitely clash.

So it would be quite clearly a firmware bug to *expose* those things in
that way, when also exposing AML methods which make use of them.

But those really aren't what I'm looking at converting, anyway. The
target of the Coccinelle script would be leaf-node drivers, like the
one in this thread. Not subsystem code.

We have, for example, a generic method for 'get the GPIO' which works
equally well whether the actual information is given in the DT form, or
whether it's done through the ACPI GPIO abstraction as described in
 Documentation/acpi/gpio-properties.txt.

We certainly shouldn't be looking at a naïve 1:1 literal
transliteration of properties. But on the other hand, we *do* want a
direct and automatic bijective mapping between the ACPI and DT
representations.


> PRP0001 allows to pull everything in (depending on what the Coccinelle
> script will look like, because AFAIK that's the only thing that stops
> people from enabling eg drivers/clk drivers in ACPI systems with PRP0001,
> namely the OF to FW node API conversion).
> 
> > In some ways, your proposal would be actively *counterproductive*. You
> > say you want to train people *not* to keep patching the kernel. But
> > where they *could* have just used PRP0001 and used a pre-existing
> > kernel, you then tell them "oh, but now you need to patch the kernel
> > because we want you to make up a new HID and not be compatible with
> > what already exists."
> 
> We do not want people to reuse DT drivers that are handled in ACPI with
> existing methods (again, power management is a prime example).

> I do not think we want to enable DT PCI host controllers drivers either
> since the ACPI model for PCI works differently already.
> 
> Should I mention CPUfreq and CPUidle DT drivers ?

No. Except to note that those are prime examples of the parts that
*don't* want to be converted.

> > Fundamentally, that DT vs. ACPI distinction has gone away with the
> > introduction of _DSD. We *need* the flexibility that we gain from being
> > able to provide device properties rather than inventing a new HID for
> > every permutation, and with that flexibility comes a certain amount of
> > responsibility to do things sensibly.
> 
> How do you enforce that apart from limiting the number of drivers you
> convert to the FW node API ? What drivers are we going to convert and
> on what basis ?

To start with, as I said, just leaf-node device drivers. None of the
things like the above, which you are quite right don't really want a
trivial conversion.

It's insane to talk of "enforcement", so let's not go there. Anyone
taking an existing driver and adding an ACPI HID to it would need to
start with *precisely* the same patches for those properties anyway.
Just as shown in the example in this thread.

> I think that's a debate worth having before making a tree-wide conversion
> of OF API calls to the generic FW node API.

Well, if that's true, that's a debate worth having before introducing
the generic FW node API. That having been kind of the *point* of the
generic FW node API — to give code which doesn't *need* to be
gratuitously firmware-specific, an API that works everywhere.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 2/2] Convert smsc911x to use ACPI as well as DT

2015-09-24 Thread David Woodhouse
On Thu, 2015-09-24 at 16:15 +0100, Catalin Marinas wrote:

> With "PRP0001", they can skip the _DSD properties review process (not
> that they bother much currently) as long as the existing DT support
> covers their needs.

So no change there then. I take it the smsc911x bindings didn't go
through this mythical _DSD properties review process either, right?

>  However, we don't want to emulate DT in ACPI but > first try the
> established ACPI methods and only use _DSD where these are
> insufficient. Automatically converting existing drivers and encouraging
> people to use "PRP0001" does not provide them with any incentive to try
> harder and learn the "ACPI way".

But again, we're *not* looking at a simplistic transliteration of
properties. Where there *is* an "ACPI way", such as the GPIO example I
gave, that should absolutely be used. And there should be sufficiently
high-level access functions that it shouldn't *matter* to the driver
whether the information is coming from ACPI or DT.

> > In that sense, the HID is entirely orthogonal.
> > 
> > And think about it... we *really* don't want a given peripheral device
> > to have *different* bindings depending on whether it's discovered with
> > a specific ACPI HID, vs. when it's discovered via DT or the PRP0001
> > HID. That way lies complete insanity.
> 
> I'm fine with reusing the same property names between _DSD and DT but I
> strongly consider that _DSD should only cover a _subset_ of the DT
> bindings and they should be reviewed independently.
> 
> Take the smsc911x driver for example: the DT bindings for the ARM Juno
> platform include things like "clocks", "vdd33a-supply". Such concepts
> are not available in ACPI (but we've seen people trying to emulate
> them), so rather than enabling such clocks in firmware, vendors will be
> tempted to do things the DT way, possibly with "PRP0001" HIDs covering
> the clock devices (though I think they currently require some kernel
> changes).

Aren't those optional? If nothing is provided, they are basically a
no-op. AFAICT the existing patches for smsc911x don't touch that code
at all. So whatever motivation you imagine there is for ACPI firmware
vendors to "do things the DT way" is still there with the existing
patches.

Again, this is *orthogonal* to the question of whether the device is
matched by PRP0001 + its compatible string, vs. a newly-invented HID.


> > In some ways, your proposal would be actively *counterproductive*. You
> > say you want to train people *not* to keep patching the kernel. But
> > where they *could* have just used PRP0001 and used a pre-existing
> > kernel, you then tell them "oh, but now you need to patch the kernel
> > because we want you to make up a new HID and not be compatible with
> > what already exists."
> 
> I gave an example above with the clocks but let's take a simpler,
> device-specific property like "smsc,irq-active-high". Is this documented
> anywhere apart from the kernel driver and the in-kernel DT smsc911x.txt?
> No. Are other non-Linux OS vendors going to look in the kernel source
> tree to implement support in their drivers? I doubt it and we could end
> up with two different paths in firmware for handling the same device.
> ACPI probably never was truly OS agnostic but with "PRP0001" we don't
> even try.

Where is the documentation for the INT_CFG_IRQ_POL_ bit in the INT_CFG
register, that it controls? The documentation on the bindings really
ought to live near that, in an ideal world.

But that's still a non-sequitur in the context of this particular
discussion. The patch that was posted *already* keeps that very same
(optional) smsc,irq-active-high property. 

Again, it isn't relevant to the question of whether the driver is
matched via PRP0001 or a newly-allocated HID.

The driver, as the patch was posted, *does* have the same set of
properties with the same meanings — because anything else would be
insane.

> The idea of registering a HID is to also have a process for getting
> corresponding _DSD properties approved, published. Some of them like
> "mac-address" would be more generic and not require further review but
> we'll have more specific properties.
> 
> Patching a kernel driver to support a new HID is a (weak) method of
> keeping track of which devices are used with ACPI. Given how quickly
> these two patches were merged while still under discussion, I doubt that
> arch maintainers would be in a position to review/reject drivers using
> non-approved _DSD properties.

Again, nothing has changed there.

> Apart from a _DSD properties review process, what we need is (main) OS
> vendors enforcing it, possibly with stricter ACPI test suite. Disabling
> PRP0001 for ARM wouldn't solve the problem but at least it would make
> people think about what _DSD properties they need registered for their
> device (or I'm over optimistic and I should just stop caring ;)).

I'm all for requiring pre-existing DT bindings to be "vetted" by the
nascent _DSD review process, b

[RFC PATCH] Fix false positives in can_checksum_protocol()

2015-09-25 Thread David Woodhouse
The check in harmonize_features() is supposed to match the skb against
the features of the device in question, and prevent us from handing a
skb to a device which can't handle it.

It doesn't work correctly. A device with NETIF_F_IP_CSUM or
NETIF_F_IPV6_CSUM capabilities is only required to checksum TCP or UDP,
on Legacy IP and IPv6 respectively. But the existing check will allow
us to pass it *any* ETH_P_IP/ETH_P_IPV6 packets for hardware checksum
offload.

Depending on the driver in use, this leads to a BUG, a WARNING, or just
silent data corruption.

This is one approach to fixing that, and my test program at
http://bombadil.infradead.org/~dwmw2/raw.c can no longer trivially
reproduce the problem.

The test does now have false *negatives*, but those shouldn't happen
for locally-generated packets; only packets injected from af_packet,
tun, virtio_net and other places that allow us to inject
CHECKSUM_PARTIAL packets in order to make use of hardware offload
features. And false negatives aren't anywhere near as much of a problem
as false positives are — we just finish the checksum in software and
send the packet anyway.

It would be possible to fix those false negatives, if we really wanted
to — perhaps by adding an additional bit in the skbuff which indicates
that it *is* a TCP or UDP packet, rather than using ->sk->sk_protocol.
Then that bit could be set if appropriate in skb_partial_csum_set(), as
well as the places where we locally generate such packets. And the
check in can_checksum_protocol() would just check for that bit.

Signed-off-by: David Woodhouse 
---
Since can_checksum_protocol is inline, the compiler ought to know that
it doesn't even need to dereference skb->sk in the case where the
device has the NETIF_F_GEN_CSUM feature. So the additional check should
not slow down the (hopefully) common case in the fast path.

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2d15e38..76c8330 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3628,15 +3628,23 @@ struct sk_buff *skb_gso_segment(struct sk_buff *skb, 
netdev_features_t features)
 __be16 skb_network_protocol(struct sk_buff *skb, int *depth);
 
 static inline bool can_checksum_protocol(netdev_features_t features,
-__be16 protocol)
-{
-   return ((features & NETIF_F_GEN_CSUM) ||
-   ((features & NETIF_F_V4_CSUM) &&
-protocol == htons(ETH_P_IP)) ||
-   ((features & NETIF_F_V6_CSUM) &&
-protocol == htons(ETH_P_IPV6)) ||
-   ((features & NETIF_F_FCOE_CRC) &&
-protocol == htons(ETH_P_FCOE)));
+__be16 protocol, u8 sk_protocol)
+{
+   if ((features & NETIF_F_GEN_CSUM) ||
+   ((features & NETIF_F_FCOE_CRC) && protocol == htons(ETH_P_FCOE)))
+   return 1;
+
+   /* NETIF_F_V[46]_CSUM are defined to work only on TCP and UDP.
+* That is, when it needs to start checksumming at the transport
+* header, and place the result at an offset of either 6 (for UDP)
+* or 16 (for TCP).
+*/
+   if features & NETIF_F_V4_CSUM) && protocol == htons(ETH_P_IP)) ||
+((features & NETIF_F_V6_CSUM) && protocol == htons(ETH_P_IPV6))) &&
+   (sk_protocol == IPPROTO_TCP || sk_protocol == IPPROTO_UDP))
+   return 1;
+
+   return 0;
 }
 
 #ifdef CONFIG_BUG
diff --git a/net/core/dev.c b/net/core/dev.c
index 6bb6470..3c40957 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2613,7 +2613,8 @@ static netdev_features_t harmonize_features(struct 
sk_buff *skb,
features = net_mpls_features(skb, features, type);
 
if (skb->ip_summed != CHECKSUM_NONE &&
-   !can_checksum_protocol(features, type)) {
+   !can_checksum_protocol(features, type,
+  skb->sk ? skb->sk->sk_protocol : 0)) {
features &= ~NETIF_F_ALL_CSUM;
} else if (illegal_highdma(skb->dev, skb)) {
features &= ~NETIF_F_SG;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index dad4dd3..9126c6f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3004,7 +3004,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
return ERR_PTR(-EINVAL);
 
csum = !head_skb->encap_hdr_csum &&
-   !!can_checksum_protocol(features, proto);
+   !!can_checksum_protocol(features, proto,
+   head_skb->sk ? head_skb->sk->sk_protocol : 0);
 
headroom = skb_headroom(head_skb);
pos = skb_headlen(head_skb);
-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 2/2] Convert smsc911x to use ACPI as well as DT

2015-09-26 Thread David Woodhouse
On Fri, 2015-09-25 at 16:28 +0100, Catalin Marinas wrote:
> 
> This only works as long as they target an existing driver with prior DT
> support (usually with reviewed bindings). If they have a new driver and
> only ACPI in mind, I'm pretty sure we'll end up with new insane things.
> That's why we need some form of _DSD properties review and "compatible"
> is one such property.

Sure, that makes a lot of sense.

My main concern is that we don't end up with gratuitously *different*
property sets for DT vs. ACPI. That way we end up with either two
separate drivers, or abstracting the core of the driver out and having
two bindings for it (much as we do for PCI vs. platform/etc for some
devices already). We don't want that pain where we can avoid it. And we
don't want people to *have* to hack the kernel driver to migrate to
ACPI, if we can avoid it.


Sometimes it might be worth being different — if the DT binding is
utterly crap, and deserves to be thrown away. In that case, by all
means invent a new binding. But if we're going to accept the pain of
having multiple bindings, why not make the *new* one work via DT too
anyway.

I'd be happy to see the existing DT bindings put through the nascent
_DSD review process — such as it is — and into the database. One at a
time on a case-by-case basis as they get used, perhaps.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v2 RFC] 8139cp: Fix GSO MSS handling

2015-09-28 Thread David Woodhouse
On Sun, 2015-09-27 at 22:37 -0700, Tom Herbert wrote:
> 
> Which drivers are doing this? It is up to the driver to determine
> whether a particular packet being sent can have checksum offloaded to
> the device. If it cannot offload the checksum it must call
> skb_checksum_help.

Not so.

A driver sets the NETIF_F_IP_CSUM feature to indicate that it can do
the checksum on Legacy IP TCP or UDP frames and *nothing* else.

It most certainly does not expect to be handed any other kind of packet
for checksumming, and bad things will often happen if it is. If drivers
*do* spot that they've been given something they don't handle, I see
BUG() calls and warnings, but I don't see any of them calling
skb_checksum_help() to silently cope. Many of them just feed it to the
hardware and don't even notice at all because it's the *hardware* which
decides whether to do a TCP or a UDP checksum. So who knows what'll
happen.

The check is supposed to be done in can_checksum_protocol(), called
from harmonize_features(). But as noted, that check has false positives
and lets some inappropriate packets through — for NETIF_F_IP_CSUM it
lets through *all* skbuffs with ->protocol == ETH_P_IP instead of only
TCP and UDP.

I originally couldn't see how to deal with this except by looking at
the contents of the packet, which sucked. But I think I've found a
somewhat more acceptable approach now:
http://lists.openwall.net/netdev/2015/09/25/85

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [RFC PATCH] Fix false positives in can_checksum_protocol()

2015-09-28 Thread David Woodhouse
On Mon, 2015-09-28 at 10:03 -0700, Tom Herbert wrote:

> > +   if features & NETIF_F_V4_CSUM) && protocol == htons(ETH_P_IP)) 
> > ||
> > +((features & NETIF_F_V6_CSUM) && protocol == 
> > htons(ETH_P_IPV6))) &&
> > +   (sk_protocol == IPPROTO_TCP || sk_protocol == IPPROTO_UDP))
> > +   return 1;
> > +
> Relying on skb->sk->sk_protocol is problematic. This is making the
> assumption that the checksum being offloaded for the packet is the
> same as that of the protocol for the socket-- this may not be the 
> case when we are offloading an outer checksum in encapsulation.

> Currently this wouldn't a be problem since we're probably only 
> offloading outer UDP checksums, but if we ever start trying to 
> offload other outer checksums (e.g. GRE) then this probably doesn't
> work so well.

That makes sense.

>  Also, this doesn't help those drivers that that can offload TCP and 
> UDP for IPv6 but only if there are no extension headers, in those 
> case the driver needs to look at the packet to see if it is a 
> "simple" UDP/TCP packet.

Hm, are such devices even permitted to set NETIF_F_IPV6_CSUM?

> AFAIK, the only non UDP/TCP transport IP checksum in the stack is GRE
> checksum which as I pointed out we don't attempt to offload. So the
> only way to trip the bug that you are seeing is probably through a
> userspace packet interface like in the test code. I think this
> actually might expose a much more serious issue. Looking at tun.c, I
> don't see anything that validates that the csum_start and csum_offset
> provided by userspace actually refers to a sane checksum offset. 

That's handled in skb_partial_csum_set().

> Not only is this a way to ask the stack to perform checksums for non
> TCP/UDP, but it actually seems like the interface could be used by a
> malicious application to have a device arbitrarily overwrite two 
> bytes anywhere in the packet with it's own data far below the stack,
> netfilter, routing. To really fix this we should probably be doing
> validation in tun, if the checksum isn't for TCP or UDP then call
> skb_checksum_help before sending the packet into the stack.

So... if it's never valid to ask for a hardware checksum on anything
but TCP or UDP, why do we bother with NETIF_F_GEN_CSUM at all? Should
we just be removing it entirely? That seems like something of a
retrograde step.

Perhaps a better solution would be a bit in the skbuff which indicates
that it *is* a TCP or UDP checksum. That would be set by our UDP and
TCP sockets, cleared by encapsulation, also set if appropriate by
skb_partial_csum_set().

And then the check in can_checksum_protocol() is trivial and clearly
correct.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [RFC PATCH] Fix false positives in can_checksum_protocol()

2015-09-28 Thread David Woodhouse
On Mon, 2015-09-28 at 12:13 -0700, Tom Herbert wrote:
> 
> > Perhaps a better solution would be a bit in the skbuff which indicates
> > that it *is* a TCP or UDP checksum. That would be set by our UDP and
> > TCP sockets, cleared by encapsulation, also set if appropriate by
> > skb_partial_csum_set().
> >
> Yes I agree. What I have been thinking to do is steal two bits from
> csum_offset that would indicate that the checksum is IPv4 or IPv6
> (specifically that the checksum value is seeded with an IPv4 or IPv6
> pseudo header). This information plus the csum_offset would be
> sufficient for drivers to identify the checksum as UDP/TCP-IPv4/IPv6.

> The other case that needs special handling is inner vs. outer
> checksum, but that can be deduced by comparing (inner of outer)
> transport offset to checksum start. With this and a couple of utility
> functions we should be able to start deprecating NETIF_F_IP_CSUM and
> NETIF_F_IPV6_CSUM.

You mean drivers which currently set NETIF_F_IP_CSUM would need to
provide a .ndo_features_check() which tolerates only the packets they
can actually handle? And we'd just ensure that the bits are there for
them to use, in the skbuff? That seems reasonable.

Note that 'seeded with an IPv[46] pseudo header' isn't quite
sufficient. Some hardware like 8139cp is explicitly told to do a UDP or
a TCP checksum with a bit in the descriptor, so any UDP-like or TCP
-like checksum works out fine.

Other hardware works out whether to do a UDP or a TCP checksum for
*itself*, so it *can't* cope with other protocols which just happen to
look the same. For those it really *must* be IPPROTO_TCP or IPPROTO_UDP
and they're going to be looking in the IP header for it.

I do suspect we'll want a bit which says it's *actually* TCP or UDP,
not just 'seeded with a pseudo-header'. That's the important
distinction for NETIF_F_IP_CSUM vs. NETIF_F_HW_CSUM.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [RFC PATCH] Fix false positives in can_checksum_protocol()

2015-09-29 Thread David Woodhouse
On Mon, 2015-09-28 at 12:37 -0700, Tom Herbert wrote:
> I think it's easier to just call skb_checksum_help from the driver
> when the packet is actually sent to the device (should be no cost for
> late binding).

That's true for checksum. Not for things like TSO though, and I wonder
if it's worth keeping it simple and doing it *all* in
.ndo_features_check()? 

> > Note that 'seeded with an IPv[46] pseudo header' isn't quite
> > sufficient. Some hardware like 8139cp is explicitly told to do a UDP or
> > a TCP checksum with a bit in the descriptor, so any UDP-like or TCP
> > -like checksum works out fine.
> > 
> UDP or TCP can be determined from csum_offset, e.g. 16=>TCP 6=>UDP

Kind of. There'll be false positives there too, though. That was
actually the basis of my first attempt to address this, at
http://lists.openwall.net/netdev/2013/01/14/36

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [RFC PATCH] Fix false positives in can_checksum_protocol()

2015-09-29 Thread David Woodhouse
On Mon, 2015-09-28 at 20:04 -0700, Tom Herbert wrote:
> 
> > I've been pondering a bit of a redesign in this space.  I think the
> > skb struct should be explicit in its instructions to hardware for
> > which offloads to do for each packet.
> >
> > In this way, the stack would be *directly* telling the drivers what to
> > do (and what not to do), solving all sorts of bugs and really improving
> > driver reliability and implementation.
> >
> Doesn't CHECKSUM_PARTIAL with csum_offset and csum_start already tell
> the driver unambiguously what to do wrt checksum offload?

Right. That's precisely what we *do* have. But as things stand, we
can't *use* it to its full capability.

It's fine for decent devices which can handle such explicit
instructions (advertised by the NETIF_F_HW_CSUM feature).

The problem is the crappy devices that can *only* checksum UDP and TCP
frames, advertised with the NETIF_F_IP{V6,}_CSUM features. We make a
primitive attempt *not* to feed arbitrary checksum requests to such
hardware. But we fail — we end up feeding *all* Legacy IP packets to a
NETIF_F_IP_CSUM device, and *all* IPv6 packets to a NETIF_F_IPV6_CSUM
device, regardless of whether they're *actually* TCP or UDP packets.

That's the problem I'm trying to solve. And then we *can* make full use
of the generic checksum offload (I had it working for ICMPv6 at one
point: http://lists.openwall.net/netdev/2013/01/14/38 ).

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH net-next 3/3] r8169: support IPv6

2015-10-02 Thread David Woodhouse

> Support the IPv6 hw checksum for RTL8111C and later chips. Note
> that the hw has the limitation for the transport offset. The
> checksum must be calculated by sw, when the transport offset is
> out of the range which the hw accepts.


It would be better to implement this check in a .ndo_features_check
method, just clearing the appropriate CSUM/TSO features for the skbs for
which the hardware cannot cope. That way the stack fixes them up for you.


> + if (skb_shinfo(skb)->gso_size) {
> + netdev_features_t features = tp->dev->features;
> + struct sk_buff *segs, *nskb;
> +
> + features &= ~(NETIF_F_SG | NETIF_F_IPV6_CSUM | NETIF_F_TSO6);
> + segs = skb_gso_segment(skb, features);
> + if (IS_ERR(segs) || !segs)
> + goto drop;
> +
> + do {
> + nskb = segs;
> + segs = segs->next;
> + nskb->next = NULL;
> + rtl8169_start_xmit(nskb, tp->dev);
> + } while (segs);
> +
> + dev_kfree_skb(skb);

This loop in particular makes no attempt to avoid exceeding the available
space in your descriptor ring, and can drop packets and trigger the
warning at the start of your hard_start_xmit function.

-- 
dwmw2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Fix false positives in can_checksum_protocol()

2015-10-05 Thread David Woodhouse
On Tue, 2015-09-29 at 15:52 -0700, Tom Herbert wrote:
> Please look at ixgbe_tx_csum in ixgbe driver. This one example of how
> a driver can determine whether the checksum being offloaded is TCP or
> UDP. The bug in this driver is...

I think it serves better as an example of why we don't *want* drivers
doing that kind of thing for themselves... :)

I propose we steal some high bits from csum_offset, as you suggested,
and use them to indicate a 'checksum type', which will include TCP and
UDP.

Then the filter in netif_skb_features() can trivially do the right
thing for NETIF_F_IP{V6,}_CSUM devices, so avoid feeding them packets
they can't handle.

You mentioned that you actually want to deprecate those feature flags —
which works for me, but it's kind of orthogonal. If we do that we'd
still want to provide generic functions that such drivers can use as
their .ndo_features_check() method. And we'd *still* want to do the
check based on a simple flag, rather than grubbing around in the packet
data. (And the drivers if they *are* asked to do the checksum will
sometimes care whether it's TCP vs. UDP too).

I don't think we want drivers calling skb_checksum_help() for
themselves; we want the pre-filter. Mainly because we *definitely*
don't want drivers calling gso_skb_segment() for themselves in the same
situation — see the comment I posted on Friday about the r8169 instance
of that. ('Re: [PATCH net-next 3/3] r8169: support IPv6').

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [RFC PATCH] Fix false positives in can_checksum_protocol()

2015-10-05 Thread David Woodhouse
On Mon, 2015-10-05 at 09:23 -0700, Tom Herbert wrote:
> 
> 1) Drivers may advertise NETIF_F_HW_CSUM. The stack will indicate
> checksum offload exclusively using the
> CHECKSUM_PARTIAL/csum_start/csum_offset interface. No additional
> interfaces (bits in skbuff should not be needed)
> 2) A driver may inspect packets via ndo_check to decide if it wants to
> offload the checksum, if not cancels NETIF_F_HW_CSUM in the packet.
> 3) In driver xmit when CHECKSUM_PARTIAL is set the driver MUST
> correctly resolve the checksum-- either by properly offloading to the
> device or calling skb_checksum_help.

In cases where they haven't used .ndo_features_check() to ensure that
they don't *see* such packets, sure. But using .ndo_features_check()
should probably be the preferred method.

> 4) To help drivers for devices with limited offload capabilities we'll
> define a helper function to check for typical restrictions (.e.g. IPv4
> only, TCP/UDP only. no encapsulation, no IPv6 extension headers,
> etc.). I am working on this helper function and will send RFC shortly.

I do suspect that helper function would benefit from seeing TCP/UDP
flags in the high bits of csum_offset, rather than grubbing around in
the packet for itself to see if it's really TCP/UDP. After all, it's
almost free to set those in the first place — at least for locally
-generated packets. And not *so* hard to add them in
skb_partial_csum_set(). But hey, if you can push an implementation
which is grubbing around in the packet then at least *I* don't have to
feel dirty for it... :)

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH RFC 0/3] net: Add driver helper function to determine if checksum is offloadable

2015-10-06 Thread David Woodhouse
On Mon, 2015-10-05 at 16:39 -0700, Tom Herbert wrote:
> When drivers have support for offloading transport IP checksums, they
> will indicate this in the features for the device. As described in
> skbuff.h, a driver will usually advertise NETIF_F_HW_CSUM,
> NETIF_F_IP_CSUM, and/or NETIF_F_IPV6_CSUM. The first of these
> (NETIF_F_HW_CSUM) is the preferred method since this implies that the
> device has implemented a generic checksum offload feature that should
> work under arbitrary scenarios (e.g. for different protocols, with or
> without encapsulation).
> 
> For narrow features support (NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM
> for offload of TCP/UDP), the features flags may not be sufficient to
> deduce whether a packet may be offloaded. Some devices will not be able
> to offload encapsulated checksums, some cannot offload transport
> checksums in packet with IPv6 extension headers, etc. In these cases
> a driver will need to perform additional packet inspection to determine
> if a packet's checksum can be offloaded to a device.
> 
> This patch defines a helper function that drivers can call to check if
> it is able to offload the checksum for a particular packet. In an
> argument to the function, the driver specifies what type of packets it
> is able to offload to a device. The function is intended to check for
> the most common restrictions of devices (like by IP version, transport
> protocol, encapsulation, extension headers). Since the function includes
> checks for IP version and transport protocol, the driver is able
> to advertise NETIF_F_HW_CSUM instead of protocol specific support.
> This should put us on a path to deprecate NETIF_F_IP_CSUM and
> NETIF_F_IPV6_CSUM.
> 
> The helper function may be called from ndo_features_check or the xmit
> routine of the driver. The csum_help argument does not need to be set
> when the function is called from ndo_features_check.

Looks good in general; thanks.

I do strongly believe we want to encourage your helper to be called
from .ndo_features_check(). Because if you can't do the checksum, then
you can't do TSO. And if you can't do TSO, you *really* want your
hard_start_xmit() function to be handed one skb at a time and not have
to call skb_gso_segment() for itself when it might not have enough room
in its descriptor ring for *all* the resulting segments.

Also, is your skb_csum_offload_chk() going to do the right thing for
SCTP packets? Looks like it doesn't do crc32...

And how accurate is the check we have, on the various IP output
routines, for rt->dst.dev->features & NETIF_F_xx_CSUM? Could we
consider just ditching that check and always using CHECKSUM_PARTIAL, in
the knowledge that we check it before it hits a non-capable driver
anyway? Or do we benefit from doing the software checksum early, due to
improved cache locality when we've probably just copied the data from
userspace in many cases?

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 2/2] Convert smsc911x to use ACPI as well as DT

2015-10-06 Thread David Woodhouse
On Mon, 2015-10-05 at 17:20 -0700, Charles Garcia-Tobin wrote:
> it in ACPI circles
> unless we had wider agreement among OSs to use it. AFAIK PRP1 has not
> actually been approved yet in the specification forum, and that it in
> itself is more of a concern for me,as the code has been pushed upstream.

Why would that be a concern? In that context it's just one device ID.
Individual devices don't *need* to be approved. OK, the 'PRP' vendor
prefix is not officially assigned but that's really a trivial piece of
bureaucracy.

> I guess it¹s up to Catalin, but disabling for ARM seems like a good idea
> right now, another option is to add tests to FWTS.

I understand the motivation to avoid embracing a whole bunch of crappy
bindings. But I think that eschewing PRP0001 is the wrong technical
approach to achieving that.

It has false negatives — as soon as you have a *single* existing DT
binding, perhaps something as simple as the serial port bindings from
the CHRP days, you'll be in a situation where you can't use that.
I've *got* hardware where I need to advertise a serial port with a
clock-frequency property because it *isn't* compatible with PNP0501.

And it has false positives — there's nothing to prevent people from
doing ACPI-style bindings with crappy device bindings which also aren't
approved.

I think it's utterly naïve to believe that simply avoiding the use of
PRP0001 + compatible for matching is going to have *any* significant
beneficial effect whatsoever. It only makes life harder for all
concerned.

Perhaps a better approach would be to introduce something like
CONFIG_UNAPPROVED_BINDINGS (which can't be set on ARM64), and those
drivers which use bindings that *aren't* approved by Catalin's crack
team of reviewers need to depend on !UNAPPROVED_BINDINGS. To be honest,
I still think even *that* is somewhat naïve, but it's still a better
way of implementing what you're actually trying to achieve, however
optimistic you have to be to think it'll ever work in practice.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH RFC 0/3] net: Add driver helper function to determine if checksum is offloadable

2015-10-08 Thread David Woodhouse
On Mon, 2015-10-05 at 16:39 -0700, Tom Herbert wrote:
> This patch defines a helper function that drivers can call to check 
> if it is able to offload the checksum for a particular packet.

It occurs to me that by itself, this doesn't actually fix the problem.
We'd then need to go over all drivers which currently use
NETIF_F_IP_CSUM or NETIF_F_IPV6_CSUM, and convert them. Would that be
the intention?

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-14 Thread David Woodhouse
On Mon, 2013-05-20 at 17:27 -0700, Stephen Hemminger wrote:
> On Mon, 20 May 2013 23:37:28 +0200
> Francois Romieu  wrote:
> 
> > cp_stop_hw includes netdev_reset_queue.
> > 
> > You have imho exhibited a start_xmit after cp_stop_hw race - not sure if
> > it happens in cp_tx_timeout or cp_change_mtu. Reverting the analysis above,
> > I have not found a place where cp_stop_hw could be called without being
> > followed by a cp_clean_rings. The netdev_reset_queue in cp_stop_hw, now
> > useless, should thus be removed.
> > 
> > Does it make sense ?
> 
> Your right, you could probably remove it.
> 
> It doesn't solve the problem, still seeing transmit timeouts.
> Looks like what happens with DHCP is something else.

Did you ever work this out? I'm seeing something similar on the inward
-facing interface on my home router under high load — and it doesn't
automatically recover.



[308309.340644] [ cut here ]
[308309.345379] WARNING: at net/sched/sch_generic.c:255 
dev_watchdog+0x103/0x190()
[308309.352789] Hardware name: Geos
[308309.356020] NETDEV WATCHDOG: eth1 (8139cp): transmit queue 0 timed out
[308309.362733] Modules linked in: sch_fq_codel sch_teql gpio_keys_polled 
leds_gpio geodewdt solos_pci ledtrig_heartbeat gpio_cs5535 cs5535_clockevt 
8139cp ip6t_REJECT ip6t_rt ip6t_hbh ip6t_mh ip6t_ipv6header ip6t_frag 
ip6t_eui64 ip6t_ah ip6table_raw ip6table_mangle ip6table_filter ip6_tables 
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_irc nf_conntrack_irc nf_nat_ftp 
nf_conntrack_ftp xt_HL xt_hl xt_ecn ipt_ECN xt_CLASSIFY xt_time xt_tcpmss 
xt_statistic xt_mark xt_length xt_DSCP xt_dscp cs5535_mfgpt cs5535_mfd mfd_core 
ipt_MASQUERADE nf_nat xt_recent xt_helper xt_connmark xt_connbytes pptp 
l2tp_ppp pppoe xt_conntrack xt_CT iptable_raw xt_state nf_conntrack_ipv4 
nf_defrag_ipv4 nf_conntrack pppox pppoatm ipt_REJECT xt_TCPMSS xt_comment 
xt_multiport xt_mac xt_limit iptable_mangle iptable_filter ip_tables xt_tcpudp 
x_tables nsc_gpio ip_gre gre sit l2tp_netlink l2tp_core ppp_mppe tunnel4 tun 
ppp_async ppp_generic slhc br2684 atm crc_ccitt ipv6 input_polldev msr 
input_core sha1_generic geode_aes ecb arc4 aes_i586 ohci_hcd ehci_hcd usbcore 
usb_common
[308309.457239] Pid: 0, comm: swapper Not tainted 3.7.1 #1
[308309.462463] Call Trace:
[308309.465020]  [] ? warn_slowpath_common+0x87/0xb0
[308309.470691]  [] ? dev_watchdog+0x103/0x190
[308309.475755]  [] ? warn_slowpath_fmt+0x33/0x40
[308309.481159]  [] ? dev_watchdog+0x103/0x190
[308309.486244]  [] ? pfifo_fast_dequeue+0xd0/0xd0
[308309.491751]  [] ? call_timer_fn.isra.42+0x1c/0x80
[308309.497422]  [] ? process_backlog+0x54/0xe0
[308309.502674]  [] ? run_timer_softirq+0x12a/0x160
[308309.508169]  [] ? pfifo_fast_dequeue+0xd0/0xd0
[308309.513697]  [] ? __do_softirq+0x6d/0x110
[308309.518675]  [] ? __tasklet_schedule+0x40/0x40
[308309.524178][] ? irq_exit+0x31/0x60
[308309.529359]  [] ? do_IRQ+0x8d/0xb0
[308309.533723]  [] ? do_IRQ+0x8d/0xb0
[308309.538201]  [] ? common_interrupt+0x29/0x2e
[308309.543440]  [] ? rt_mutex_adjust_prio_chain+0x180/0x280
[308309.549829]  [] ? default_idle+0x14/0x30
[308309.554719]  [] ? cpu_idle+0x2f/0x50
[308309.559259]  [] ? start_kernel+0x286/0x28b
[308309.564414]  [] ? repair_env_string+0x4d/0x4d
[308309.569729] ---[ end trace 2e18cc211cee6089 ]---
[308309.574551] 8139cp :00:0b.0 eth1: Transmit timeout, status  c   2b0 
80ff

-- 
dwmw2




smime.p7s
Description: S/MIME cryptographic signature


IPv6 routing/fragmentation panic

2015-09-15 Thread David Woodhouse
kb, 16);
SKB_CB(skb)->dma_addr =
pci_map_single(card->dev, skb->data,
   RX_DMA_SIZE, 
PCI_DMA_FROMDEVICE);

Now, I probably should have done this a long time ago, because that
lack of headroom probably meant that the machine was always having to
reallocate buffers just to fit the Ethernet header on the front of them
when routing incoming packets. So I might be happy enough with
submitting a variant of the above patch and calling it a performance
improvement.

But should the kernel *panic* without it? If there are requirements on
the headroom I must leave on received packets, where are they
documented? Or is this a bug in the IPv6 fragmentation code, to make
such assumptions?

I'm not entirely sure how to interpret the above stack trace. Is the
incoming IPv6 packet being reassembled for netfilter's benefit, then re
-fragmented for transmission?
 
-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH RFC] solos-pci: Fix BUG() with shared skb

2015-09-15 Thread David Woodhouse
On Wed, 2013-09-04 at 21:41 +0100, David Woodhouse wrote:
> On Wed, 2013-09-04 at 14:30 -0400, David Miller wrote:
> > skb_realloc_headroom() should do everything you need.
> 
> Great, thanks! Something like this then... ? 
>
> diff --git a/drivers/atm/solos-pci.c b/drivers/atm/solos-pci.c
> index 32784d1..4492c0f 100644
> --- a/drivers/atm/solos-pci.c
> +++ b/drivers/atm/solos-pci.c
> @@ -1145,19 +1145,19 @@ static int psend(struct atm_vcc *vcc, struct sk_buff 
> *skb)
>  >>   > return 0;
>  >> }
>  
> ->> if (!skb_clone_writable(skb, sizeof(*header))) {
> ->>   > int expand_by = 0;
> ->>   > int ret;
> -
> ->>   > if (skb_headroom(skb) < sizeof(*header))
> ->>   >   > expand_by = sizeof(*header) - skb_headroom(skb);
> -
> ->>   > ret = pskb_expand_head(skb, expand_by, 0, GFP_ATOMIC);
> ->>   > if (ret) {
> ->>   >   > dev_warn(&card->dev->dev, "pskb_expand_head 
> failed.\n");
> ->>   >   > solos_pop(vcc, skb);
> ->>   >   > return ret;
> ->>   > }
> +>> if (skb_headroom(skb) < sizeof(*header)) {
> +>>   > struct sk_buff *nskb;
> +
> +>>   > nskb = skb_realloc_headroom(skb, sizeof(*header));
> +>>   > if (!nskb) {
> +>>   >   > solos_pop(vcc, skb);
> +>>   >   > return -ENOMEM;
> +>>   > }
> +>>   > if (skb->truesize != nskb->truesize)
> +>>   >   > atm_force_charge(vcc, nskb->truesize - skb->truesize);
> + 
> +>>   > dev_kfree_skb_any(skb);
> +>>   > skb = nskb;
>  >> }
>  
>  >> header = (void *)skb_push(skb, sizeof(*header));

Simon, did you ever test this?
Can you still (tell me how to) reproduce the original problem? I think
that sending on br2684 was necessary but not sufficient...?

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: IPv6 routing/fragmentation panic

2015-09-16 Thread David Woodhouse
On Wed, 2015-09-16 at 01:48 +0200, Florian Westphal wrote:
> 
> What I don't understand is why you see this with fragmented ipv6 
> packets only (and not with all ipv6 forwarded skbs).
> 
> Something like this copy-pastry from ip_finish_output2 should fix it:

That works; thanks.

Tested-by: David Woodhouse 

A little extra debugging output shows that the offending fragments were
arriving here with skb_headroom(skb)==10. Which is reasonable, being
the Solos ADSL card's header of 8 bytes followed by 2 bytes of PPP
frame type.

The non-fragmented packets, on the other hand, are arriving with a
headroom of 42 bytes. Could something else already have reallocated
them before they get that far? (Do we have any way to gather statistics
on such reallocations? It seems that might be useful for performance
investigation.)

Johannes and I were talking on IRC yesterday about trying to make this
kind of thing easier to reproduce without odd hardware. We postulated a
skb_torture() function which, when an appropriate debugging option was
enabled, would randomly screw around with the skb in various
interesting ways — shifting the data down so that there's no headroom,
deliberately making it *non-linear*, temporarily cloning it and freeing
the clone a couple of seconds later, etc.

Then we could insert calls to skb_torture() in interesting places like
netif_rx(), ip6_finish_output2() and anywhere else that seems
appropriate (perhaps with flags to indicate *what* kind of torture is
permissible in certain locations). And see what breaks...

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


[PATCH] solos-pci: Increase headroom on received packets

2015-09-16 Thread David Woodhouse
A comment in include/linux/skbuff.h says that:

 * Various parts of the networking layer expect at least 32 bytes of
 * headroom, you should not reduce this.

This was demonstrated by a panic when handling fragmented IPv6 packets:
http://marc.info/?l=linux-netdev&m=144236093519172&w=2

It's not entirely clear if that comment is still valid — and if it is,
perhaps netif_rx() ought to be enforcing it with a warning.

But either way, it is rather stupid from a performance point of view
for us to be receiving packets into a buffer which doesn't have enough
room to prepend an Ethernet header — it means that *every* incoming
packet is going to be need to be reallocated. So let's fix that.

Signed-off-by: David Woodhouse 
--- 
Tested in the DMA code path; I don't believe the DMA-capable devices
can still be used in MMIO mode. Simon, Guy, would you be able to test
the MMIO version?

diff --git a/drivers/atm/solos-pci.c b/drivers/atm/solos-pci.c
index 74e18b0..be8225e 100644
--- a/drivers/atm/solos-pci.c
+++ b/drivers/atm/solos-pci.c
@@ -805,13 +805,13 @@ static void solos_bh(unsigned long card_arg)
continue;
}
 
-   skb = alloc_skb(size + 1, GFP_ATOMIC);
+   skb = alloc_skb(size + NET_SKB_PAD + 1, 
GFP_ATOMIC);
if (!skb) {
if (net_ratelimit())
dev_warn(&card->dev->dev, 
"Failed to allocate sk_buff for RX\n");
continue;
}
-
+   skb_reserve(skb, NET_SKB_PAD);
memcpy_fromio(skb_put(skb, size),
  RX_BUF(card, port) + 
sizeof(*header),
  size);
@@ -869,8 +869,10 @@ static void solos_bh(unsigned long card_arg)
/* Allocate RX skbs for any ports which need them */
if (card->using_dma && card->atmdev[port] &&
!card->rx_skb[port]) {
-   struct sk_buff *skb = alloc_skb(RX_DMA_SIZE, 
GFP_ATOMIC);
+   struct sk_buff *skb = alloc_skb(RX_DMA_SIZE + 
NET_SKB_PAD,
+   GFP_ATOMIC);
if (skb) {
+   skb_reserve(skb, NET_SKB_PAD);
SKB_CB(skb)->dma_addr =
dma_map_single(&card->dev->dev, 
skb->data,
       RX_DMA_SIZE, 
DMA_FROM_DEVICE);

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] solos-pci: Increase headroom on received packets

2015-09-16 Thread David Woodhouse
On Wed, 2015-09-16 at 03:53 -0700, Eric Dumazet wrote:
> You should use netdev_alloc_skb() : This helper is better for rx skbs,
> as it allows for better packing of frames in GRO or TCP stack.

OK, thanks. I don't have a netdev (this is an ATM device) but I can use
dev_alloc_skb().

> Also netdev_alloc_skb_ip_align() might handle the NET_IP_ALIGN stuff
> for arches that care.

I'd briefly considered NET_IP_ALIGN but decided against it because this
isn't Ethernet and my hardware header is a nice sane 8 bytes, not 14.

But actually, the primary use cases for this are PPPoATM — with 2 bytes
of PPP frame type, and PPPoE over BR2684 — with 14 bytes of Ethernet
header. So NET_IP_ALIGN would actually make sense.

Unfortunately the FPGA can't do DMA to unaligned addresses, so I can't
do it in the DMA case. I can do it for the MMIO code path though (which
I still haven't tested).

I'll send a new patch in a moment...

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


[PATCH v2] solos-pci: Increase headroom on received packets

2015-09-16 Thread David Woodhouse
A comment in include/linux/skbuff.h says that:

 * Various parts of the networking layer expect at least 32 bytes of
 * headroom, you should not reduce this.

This was demonstrated by a panic when handling fragmented IPv6 packets:
http://marc.info/?l=linux-netdev&m=144236093519172&w=2

It's not entirely clear if that comment is still valid — and if it is,
perhaps netif_rx() ought to be enforcing it with a warning.

But either way, it is rather stupid from a performance point of view
for us to be receiving packets into a buffer which doesn't have enough
room to prepend an Ethernet header — it means that *every* incoming
packet is going to be need to be reallocated. So let's fix that.

Signed-off-by: David Woodhouse 
---
diff --git a/drivers/atm/solos-pci.c b/drivers/atm/solos-pci.c
index 74e18b0..3d7fb65 100644
--- a/drivers/atm/solos-pci.c
+++ b/drivers/atm/solos-pci.c
@@ -805,7 +805,12 @@ static void solos_bh(unsigned long card_arg)
continue;
}
 
-   skb = alloc_skb(size + 1, GFP_ATOMIC);
+   /* Use netdev_alloc_skb() because it adds 
NET_SKB_PAD of
+* headroom, and ensures we can route packets 
back out an
+* Ethernet interface (for example) without 
having to
+* reallocate. Adding NET_IP_ALIGN also ensures 
that both
+* PPPoATM and PPPoEoBR2684 packets end up 
aligned. */
+   skb = netdev_alloc_skb_ip_align(NULL, size + 1);
if (!skb) {
if (net_ratelimit())
dev_warn(&card->dev->dev, 
"Failed to allocate sk_buff for RX\n");
@@ -869,7 +874,10 @@ static void solos_bh(unsigned long card_arg)
/* Allocate RX skbs for any ports which need them */
if (card->using_dma && card->atmdev[port] &&
!card->rx_skb[port]) {
-   struct sk_buff *skb = alloc_skb(RX_DMA_SIZE, 
GFP_ATOMIC);
+   /* Unlike the MMIO case (qv) we can't add NET_IP_ALIGN
+* here; the FPGA can only DMA to addresses which are
+* aligned to 4 bytes. */
+   struct sk_buff *skb = dev_alloc_skb(RX_DMA_SIZE);
if (skb) {
SKB_CB(skb)->dma_addr =
    dma_map_single(&card->dev->dev, 
skb->data,

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: IPv6 routing/fragmentation panic

2015-09-16 Thread David Woodhouse
On Wed, 2015-09-16 at 15:27 +0200, Florian Westphal wrote:
> 
> David, could you test this?  I'd do an official patch submission
> then.

Compiles. Doesn't fix the problem.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: IPv6 routing/fragmentation panic

2015-09-16 Thread David Woodhouse
On Wed, 2015-09-16 at 15:27 +0200, Florian Westphal wrote:
> @@ -599,7 +600,7 @@ int ip6_fragment(struct sock *sk, struct sk_buff
> *skb,
> /* Correct geometry. */
> if (frag->len > mtu ||
> ((frag->len & 7) && frag->next) ||
> -   skb_headroom(frag) < hlen)
> +   skb_headroom(frag) < (hlen + hroom))
> goto slow_path_clean;
>  
> /* Partially cloned skb? */

My test is 'ping -s 2000', and I end up with a fragment of 1280 bytes
followed by a fragment of 776 bytes.

The test cited above is only actually running on the latter fragment
(which for some reason is fine and has headroom of 58 bytes).

The first, larger, fragment isn't being checked. And that's the one
with only 10 bytes of headroom.

[   62.027984] has frag list
[   62.030616] line 604 check frag ddc5fcc0 len 776 headroom 58 (hlen 40 hroom 
16) 
[   62.036720] line 678 send skb ded050c0 len 1280 headroom 10  
  
[   62.041096] skbuff: skb_under_panic: text:c125f9ca len:1294 put:14 head:dec89
000 data:dec88ffc tail:0xdec8950a end:0xdec89f50 dev:br-lan 

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH net] ipv6: ip6_fragment: fix headroom tests and skb leak

2015-09-17 Thread David Woodhouse
On Wed, 2015-09-16 at 17:26 +0200, Florian Westphal wrote:
> I tested this e1000 driver hacked to not allocate additional headroom
> (we end up in slowpath, since LL_RESERVED_SPACE is 16).

And it works on the originally-offending setup too; thanks.

Tested-by: David Woodhouse 

> Reported-by: David Woodhouse 
> Diagnosed-by: David Woodhouse 

They generally prefer me to use @intel.com for those too, if you would.
I draw the line at using it for actual email communication though :)

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-17 Thread David Woodhouse
_spin_unlock_irqrestore+0xa/0x10   
[  260.116369] EAX:  EBX: 00200292 ECX: e04d8100 EDX: 00200292  
[  260.116369] ESI:  EDI: ff32 EBP: 0258 ESP: dec0bf90  
[  260.116369]  DS: 007b ES: 007b FS:  GS:  SS: 0068
[  260.116369] CR0: 8005003b CR2: 098c4c50 CR3: 1ef88000 CR4: 0090  
[  260.116369] Stack:   
[  260.116369]  dee95000 c1272ecb dee95000 dee95240 0258 8100 c1272d10 
dee95000
 
[  260.116369]  0100 c105b19e 0410 c1272d10 c105b33d  c146d524 
0001
 
[  260.116369]  0002 c1033777 0002 e577 c13f2000 000a 0020 
c13f1f70
 
[  260.116369] Call Trace:  
[  260.116369]  [] ? dev_watchdog+0x1bb/0x200 
[  260.116369]  [] ? qdisc_rcu_free+0x30/0x30 
[  260.116369]  [] ? call_timer_fn.isra.7+0xe/0x60
[  260.116369]  [] ? qdisc_rcu_free+0x30/0x30 
[  260.116369]  [] ? run_timer_softirq+0xfd/0x1b0 
[  260.116369]  [] ? __do_softirq+0xa7/0x190  
[  260.116369]  [] ? __hrtimer_tasklet_trampoline+0x20/0x20   
[  260.116369]  [] ? do_softirq_own_stack+0x1b/0x20   
[  260.116369] 
[  260.116369]  [] ? do_IRQ+0x35/0xa0 
[  260.116369]  [] ? common_interrupt+0x29/0x30   
[  260.116369]  [] ? put_unbound_pool+0x17b/0x1a0 
[  260.116369]  [] ? default_idle+0x2/0x10
[  260.116369]  [] ? arch_cpu_idle+0x6/0x10   
[  260.116369]  [] ? cpu_startup_entry+0xf5/0x190 
[  260.116369]  [] ? start_kernel+0x2e5/0x2e8 
[  260.116369] Code: 00 b8 01 00 00 00 c3 8d 76 00 8d bc 27 00 00 00 00 e8 db 
2e d3 ff c3 8d 76 00 8d bc 27 00 00 00 00 53 89 d3 e8 c8 2e d3 ff 53 9d <5b> c3 
8d 74 26 00 e8 bb 2e d3 ff fb c3 89 f6 8d bc 27 00 00 00
  

At this point even sysrq on the serial console isn't working; I'm
going to have to go and physically reset it.

(gdb) list *dev_watchdog+0x1bb
0xc1272ecb is in dev_watchdog (net/sched/sch_generic.c:304).
299 }
300 
301 if (some_queue_timedout) {
302 WARN_ONCE(1, KERN_INFO
"NETDEV WATCHDOG: %s (%s): transmit queue %u timed out\n",
303dev->name,
netdev_drivername(dev), i);
304 dev->netdev_ops
->ndo_tx_timeout(dev);
305 }
306 if (!mod_timer(&dev->watchdog_timer,
307round_jiffies(jiffies
+
308  dev
->watchdog_timeo)))

So that _raw_spin_unlock_irqrestore() is going to be a tailcall from
the end of cp_tx_timeout(). Moderately confused that it dies on *unlock*.

_raw_spin_lock_irqrestore+0xa is the instruction after a 'popf', which
makes me wonder if it dies in an IRQ storm. Although then *some* of the
NMI watchdog invocations would surely show an IRQ on the stack, but
they
don't; they're all right on the same instruction.


-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-17 Thread David Woodhouse
at pppoe 
nf_nat_i...
[  292.130412] CPU: 0 PID: 0 Comm: swapper Tainted: GW   4.2.0-gx+ 
#26 
 
[  292.130412] task: c13f7540 ti: c13f task.ti: c13f
[  292.130412] EIP: 0060:[] EFLAGS: 00200286 CPU: 0   
[  292.130412] EIP is at _raw_spin_unlock_irqrestore+0xa/0x10   
[  292.130412] EAX:  EBX: 00200286 ECX: c1409ee4 EDX: 00200286  
[  292.130412] ESI: 00200286 EDI: ddc53428 EBP: ddc53420 ESP: dec0bf68  
[  292.130412]  DS: 007b ES: 007b FS:  GS:  SS: 0068
[  292.130412] CR0: 8005003b CR2: 0805244c CR3: 1edf6000 CR4: 0090  
[  292.130412] Stack:   
[  292.130412]  ddc53000 e06f8055 e06f9e2e 00200286 e06f9e18 e06f9f0c 00200286 
ddc53000
 
[  292.130412]   fde1 0258 c1272ecb ddc53000 ddc53240 0258 
8100
 
[  292.130412]  c1272d10 ddc53000 0100 c105b19e 0442 c1272d10 c105b33d 
ddc7c444
 
[  292.130412] Call Trace:  
[  292.130412]  [] ? cp_tx_timeout+0x1a5/0x1c0 [8139cp]   
[  292.130412]  [] ? dev_watchdog+0x1bb/0x200 
[  292.130412]  [] ? qdisc_rcu_free+0x30/0x30 
[  292.130412]  [] ? call_timer_fn.isra.7+0xe/0x60
[  292.130412]  [] ? qdisc_rcu_free+0x30/0x30 
[  292.130412]  [] ? run_timer_softirq+0xfd/0x1b0 
[  292.130412]  [] ? __do_softirq+0xa7/0x190  
[  292.130412]  [] ? __hrtimer_tasklet_trampoline+0x20/0x20   
[  292.130412]  [] ? do_softirq_own_stack+0x1b/0x20   
[  292.130412] 
[  292.130412]  [] ? do_IRQ+0x35/0xa0 
[  292.130412]  [] ? common_interrupt+0x29/0x30   
[  292.130412]  [] ? put_unbound_pool+0x17b/0x1a0 
[  292.130412]  [] ? default_idle+0x2/0x10
[  292.130412]  [] ? arch_cpu_idle+0x6/0x10   
[  292.130412]  [] ? cpu_startup_entry+0xf5/0x190 
[  292.130412]  [] ? start_kernel+0x2e5/0x2e8 
[  292.130412] Code: 00 b8 01 00 00 00 c3 8d 76 00 8d bc 27 00 00 00 00 e8 db 
2e d3 ff c3 8d 76 00 8d bc 27 00 00 00 00 53 89 d3 e8 c8 2e d3 ff 53 9d <5b> c3 
8d 74 26 00 e8 bb 2e d3 ff fb c3 89 f6 8d bc 27 00 00 00
  
[  292.130412] Kernel panic - not syncing: softlockup: hung tasks   
[  292.130412] CPU: 0 PID: 0 Comm: swapper Tainted: GWL  4.2.0-gx+ 
#26 
 
[  292.130412]   c1319a5f c13f7540   00f9 c1071439 
c1399de4
 
[  292.130412]  dec0bf2c c140b660 c10712f0 0001 c140aac0 c105bf8e  
03bbae55
 
[  292.130412]  0044 c140aacc 03bbae55 0044 0003 c141b860 c141b801 

 
[  292.130412] Call Trace:  
[  292.130412]  [] ? panic+0x76/0x161 
[  292.130412]  [] ? watchdog_timer_fn+0x149/0x150
[  292.130412]  [] ? watchdog_cleanup+0x10/0x10   
[  292.130412]  [] ? __hrtimer_run_queues.constprop.7+0xae/0x180  
[  292.130412]  [] ? hrtimer_interrupt+0x87/0x1d0 
[  292.130412]  [] ? tick_handle_oneshot_broadcast+0xcf/0x130 
[  292.130412]  [] ? timer_interrupt+0xa/0x10 
[  292.130412]  [] ? handle_irq_event_percpu+0x4f/0xf0
[  292.130412]  [] ? handle_irq_event+0x2d/0x50   
[  292.130412]  [] ? handle_level_irq+0x69/0xf0   
[  292.130412]  [] ? handle_simple_irq+0x80/0x80  
[  292.130412]  [] ? handle_irq+0x43/0x70 
[  292.130412][] ? do_IRQ+0x2c/0xa0  
[  292.130412]  [] ? common_interrupt+0x29/0x30   
[  292.130412]  [] ? remove_waiter+0x11b/0x120
[  292.130412]  [] ? _raw_spin_unlock_irqrestore+0xa/0x10 
[  292.130412]  [] ? cp_tx_timeout+0x1a5/0x1c0 [8139cp]       

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com 

Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-17 Thread David Woodhouse
On Thu, 2015-09-17 at 12:36 +0100, David Woodhouse wrote:
> 
> Thanks; I'll try that. In fact since updating to 4.2 the problem has
> got worse — now the whole machine dies:

There is something very strange going on here. I've found two ways to
make it stop crashing when cp_tx_timeout() hits the 'popf' when
unlocking the spinlock.

The first is to comment out the whole of cp_tx_timeout() and let it
happen once. Then put that code *back* again and reload the module.
Then it can work fine.

The second way is to comment out the WARN_ONCE in dev_watchdog().
I remain utterly bemused; I have no idea what's going on there.

But that aside, even when it survives running cp_tx_timeout(), it still
doesn't *work* — it looks like TX is indeed working and has recovered,
but we are not *receiving* any packets.

I can't actually trigger the TX timeout at all with debugging enabled;
I've hacked things so that cp_set_wol() will also call cp_tx_timeout()
and simulate it. And now I see this...

[ 4358.499474] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.499488] 8139cp :00:0b.0 eth1: tx done, slot 35
[ 4358.513663] 8139cp :00:0b.0 eth1: tx queued, slot 37, skblen 54
[ 4358.513692] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.513705] 8139cp :00:0b.0 eth1: tx done, slot 36
[ 4358.518880] 8139cp :00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.518900] 8139cp :00:0b.0 eth1: rx slot 1 status 0x32014040 len 60
[ 4358.523601] 8139cp :00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.526910] 8139cp :00:0b.0 eth1: rx slot 2 status 0x32036052 len 78
[ 4358.547898] 8139cp :00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.547996] 8139cp :00:0b.0 eth1: rx slot 3 status 0x32036052 len 78
[ 4358.580526] 8139cp :00:0b.0 eth1: tx queued, slot 38, skblen 70
[ 4358.580555] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.580569] 8139cp :00:0b.0 eth1: tx done, slot 37
[ 4358.601912] 8139cp :00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.601932] 8139cp :00:0b.0 eth1: rx slot 4 status 0x32036052 len 78
[ 4358.650678] 8139cp :00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.650698] 8139cp :00:0b.0 eth1: rx slot 5 status 0x320145a5 len 1441
[ 4358.665572] will lock...
[ 4358.668222] Handling tx timeout, flags 282
[ 4358.672494] nway_reset
[ 4358.674858] Will wake queue...
[ 4358.677919] Will unlock... flags 282
[ 4358.681525] did unlock...
[ 4358.684198] 8139cp :00:0b.0 eth1: Transmit timeout handled, status  c   
2b0 80ff
[ 4358.708234] 8139cp :00:0b.0 eth1: tx queued, slot 1, skblen 92
[ 4358.714567] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.722405] 8139cp :00:0b.0 eth1: tx done, slot 0
[ 4358.747412] 8139cp :00:0b.0 eth1: tx queued, slot 2, skblen 106
[ 4358.753736] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.756824] 8139cp :00:0b.0 eth1: tx done, slot 1
[ 4358.814961] 8139cp :00:0b.0 eth1: tx queued, slot 3, skblen 173
[ 4358.821291] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.824186] 8139cp :00:0b.0 eth1: tx done, slot 2
[ 4358.834352] 8139cp :00:0b.0 eth1: tx queued, slot 4, skblen 86
[ 4358.840579] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.844216] 8139cp :00:0b.0 eth1: tx done, slot 3
[ 4358.853615] 8139cp :00:0b.0 eth1: tx queued, slot 5, skblen 54
[ 4358.859822] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.863497] 8139cp :00:0b.0 eth1: tx done, slot 4
[ 4358.873111] 8139cp :00:0b.0 eth1: tx queued, slot 6, skblen 66

-- 
-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-17 Thread David Woodhouse
On Thu, 2015-09-17 at 22:44 +0200, Francois Romieu wrote:
> David Woodhouse  :
> > On Thu, 2015-09-17 at 12:36 +0100, David Woodhouse wrote:
> > > 
> > > Thanks; I'll try that. In fact since updating to 4.2 the problem has
> > > got worse — now the whole machine dies:
> > 
> > There is something very strange going on here. I've found two ways to
> > make it stop crashing when cp_tx_timeout() hits the 'popf' when
> > unlocking the spinlock.
> 
> cp_tx_timeout takes lock, disables irq, calls cp_clean_rings, thus
> plain dev_kfree_skb if a skb is still referenced in one of the
> rx/tx ring. You may replace it with dev_kfree_skb_any.

Well spotted; I've made that change locally. Although I don't think it
explains the symptoms. Not that I'm sure what *could*.

I've also found that adding a call to __cp_set_rx_mode() seems to fix
the RX after reset, in some tests. Especially the simulated one via the
hack in cp_set_wol(). I think that's necessary, if not sufficient — at
least on real hardware. I didn't see the problem at all when running in
qemu.

Sometimes, though, it still dies in an interrupt storm after re
-enabling IRQs:

[  900.004214] 8139cp :00:0b.0 eth1: Transmit timeout, status  c   2b0 
80ff
[  900.011725] will lock...
[  900.014273] Handling tx timeout, flags 200296
[  900.018774] Will wake queue...
[  900.021645] Will unlock... flags 200296
[  900.021645] 8139cp :00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c 
cpcmd 002b
[  900.021645] 8139cp :00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c 
cpcmd 002b
... 
[  901.628439] 8139cp :00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c 
cpcmd 002b
[  901.636291] 8139cp :00:0b.0 eth1: intr, status 0011 enable 80ff cmd 0c 
cpcmd 002b
...
[  901.966243] 8139cp :00:0b.0 eth1: intr, status 0011 enable 80ff cmd 0c 
cpcmd 002b
[  901.968353] 8139cp :00:0b.0 eth1: intr, status 0051 enable 80ff cmd 0c 
cpcmd 002b
... forever...

And of course, even if I fix the TX timeout handling, I'd still like to
know why it's happening in the first place...

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


[PATCH 1/2] 8139cp: Use dev_kfree_skb_any() instead of dev_kfree_skb() in cp_clean_rings()

2015-09-17 Thread David Woodhouse
This can be called from cp_tx_timeout() with interrupts disabled.
Spotted by Francois Romieu 

Signed-off-by: David Woodhouse 
---
 drivers/net/ethernet/realtek/8139cp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index d79e33b..52a5334 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -1151,7 +1151,7 @@ static void cp_clean_rings (struct cp_private *cp)
desc = cp->rx_ring + i;
dma_unmap_single(&cp->pdev->dev,le64_to_cpu(desc->addr),
 cp->rx_buf_sz, PCI_DMA_FROMDEVICE);
-   dev_kfree_skb(cp->rx_skb[i]);
+   dev_kfree_skb_any(cp->rx_skb[i]);
}
}
 
@@ -1164,7 +1164,7 @@ static void cp_clean_rings (struct cp_private *cp)
 le32_to_cpu(desc->opts1) & 0x,
 PCI_DMA_TODEVICE);
if (le32_to_cpu(desc->opts1) & LastFrag)
-   dev_kfree_skb(skb);
+   dev_kfree_skb_any(skb);
cp->dev->stats.tx_dropped++;
        }
}
-- 
2.4.3

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


[PATCH 2/2] 8139cp: Call __cp_set_rx_mode() from cp_tx_timeout()

2015-09-17 Thread David Woodhouse
Unless we reset the RX config, on real hardware I don't seem to receive
any packets after a TX timeout.

Signed-off-by: David Woodhouse 
---
Now it does actually recover from the TX timeout, lots of the time.
Sometimes it still hits that IRQ storm, which probably explains the
apparent lockup right after the 'popf'... although I thought we handled
it more gracefully than that these days.

That's probably a race with the RX handling code, or something. I'll
try harder to reproduce the TX timeout with the debugging enabled.
Which might shed some light on this, and also on the reason why it
happens in the first place. If we're lucky.

 drivers/net/ethernet/realtek/8139cp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index 52a5334..ba3dab7 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -1261,6 +1261,7 @@ static void cp_tx_timeout(struct net_device *dev)
cp_clean_rings(cp);
rc = cp_init_rings(cp);
cp_start_hw(cp);
+   __cp_set_rx_mode(dev);
cp_enable_irq(cp);
 
netif_wake_queue(dev);
-- 
2.4.3


-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-17 Thread David Woodhouse
On Fri, 2015-09-18 at 01:44 +0200, Francois Romieu wrote:
> The TxDmaOkLowDesc register may tell if the Tx dma part is still 
> making any progress. I have added a TxPoll request. See below.

I've just added that into the original TX timeout handler, since that
doesn't seem to be crashing the box for me as long as I avoid the IRQ
storm.

Not sure what we learn from it ('desc 6550' printed as hex)... I've
also made it dump the TX descriptor ring (skb, addr, opts1, opts2):

[ 1733.027156] 8139cp :00:0b.0 eth1: Transmit timeout, status  c   2b head 
25 tail 22 desc 65500 80ff
[ 1733.036819] TX ring 00:   (null) 1dd68d8c 3000cb67 0 
[ 1733.037040] TX ring 01:   (null) 1dd8774c 3000cb67 0 
[ 1733.037040] TX ring 02:   (null) 1dd86e7c 3a08 0 
[ 1733.037040] TX ring 03:   (null) 1dd865ac 3a08 0 
[ 1733.037040] TX ring 04:   (null) 1dd8540c 3a08 0 
[ 1733.037040] TX ring 05:   (null) 1dd85cdc 3a08 0 
[ 1733.037040] TX ring 06:   (null) 1dd84b3c 3a08 0 
[ 1733.037040] TX ring 07:   (null) 1dd8399c 3a08 0 
[ 1733.037040] TX ring 08:   (null) 1dd8426c 3a08 0 
[ 1733.037040] TX ring 09:   (null) 1dd830cc 3a08 0 
[ 1733.037040] TX ring 10:   (null) 1dd81f2c 3a08 0 
[ 1733.037040] TX ring 11:   (null) 1dd8165c 3a08 0 
[ 1733.037040] TX ring 12:   (null) 1dd827fc 3a08 0 
[ 1733.037040] TX ring 13:   (null) 1dd80d8c 3a08 0 
[ 1733.037040] TX ring 14:   (null) 1dd804bc 3a08 0 
[ 1733.037040] TX ring 15:   (null) 1dd6ee7c 3a08 0 
[ 1733.037040] TX ring 16:   (null) 1dd6f74c 3a08 0 
[ 1733.037040] TX ring 17:   (null) 1dd6e5ac 3a08 0 
[ 1733.037040] TX ring 18:   (null) 1dd6dcdc 3a08 0 
[ 1733.037040] TX ring 19:   (null) 1dd6cb3c 3000f804 0 
[ 1733.037040] TX ring 20:   (null) 1dd8de02 300083da 0 
[ 1733.037040] TX ring 21:   (null) 1dd8da02 3000 0 
[ 1733.037040] TX ring 22: defcc240 1dd6d40c b5ea 0 
[ 1733.037040] TX ring 23: decdb900 1dd6b99c b046 0 
[ 1733.037040] TX ring 24: ddd27c00 1dd6b0cc b5ea 0 
[ 1733.037040] TX ring 25:   (null) 1dd6e5ac 3000ca24 0 
[ 1733.037040] TX ring 26:   (null) 1dd6dcdc 3000ca24 0 
[ 1733.037040] TX ring 27:   (null) 1dd6d40c 3000ca24 0 
[ 1733.037040] TX ring 28:   (null) 1dd6cb3c 3000ca24 0 
[ 1733.037040] TX ring 29:   (null) 1dd6c26c 3000ca24 0 
[ 1733.037040] TX ring 30:   (null) 1dd6b99c 3000ca24 0 
[ 1733.037040] TX ring 31:   (null) 1dd6b0cc 3000ca24 0 
[ 1733.037040] TX ring 32:   (null) 1dd6a7fc 3000ca24 0 
[ 1733.037040] TX ring 33:   (null) 1dd69f2c 3000ca24 0 
[ 1733.037040] TX ring 34:   (null) 1dd68d8c 3000ca24 0 
[ 1733.037040] TX ring 35:   (null) 1dd6965c 3000cb67 0 
[ 1733.037040] TX ring 36:   (null) 1dd684bc 3000cb67 0 
[ 1733.037040] TX ring 37:   (null) 1dd86e7c 3000cb67 0 
[ 1733.037040] TX ring 38:   (null) 1dd8774c 3000cb67 0 
[ 1733.037040] TX ring 39:   (null) 1dd865ac 3000cb67 0 
[ 1733.037040] TX ring 40:   (null) 1dd85cdc 3000cb67 0 
[ 1733.037040] TX ring 41:   (null) 1dd8540c 3000cb67 0 
[ 1733.037040] TX ring 42:   (null) 1dd84b3c 3000cb67 0 
[ 1733.037040] TX ring 43:   (null) 1dd8426c 3000cb67 0 
[ 1733.037040] TX ring 44:   (null) 1dd8399c 3000cb67 0 
[ 1733.037040] TX ring 45:   (null) 1dd830cc 3000cb67 0 
[ 1733.037040] TX ring 46:   (null) 1dd81f2c 3000cb67 0 
[ 1733.037040] TX ring 47:   (null) 1dd827fc 3000cb67 0 
[ 1733.037040] TX ring 48:   (null) 1dd8165c 3000cb67 0 
[ 1733.037040] TX ring 49:   (null) 1dd80d8c 3000cb67 0 
[ 1733.037040] TX ring 50:   (null) 1dd804bc 3000cb67 0 
[ 1733.037040] TX ring 51:   (null) 1dd6f74c 3000cb67 0 
[ 1733.037040] TX ring 52:   (null) 1dd6ee7c 3000cb67 0 
[ 1733.037040] TX ring 53:   (null) 1dd6dcdc 3000cb67 0 
[ 1733.037040] TX ring 54:   (null) 1

Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-18 Thread David Woodhouse
On Fri, 2015-09-18 at 02:04 +0100, David Woodhouse wrote:
> On Fri, 2015-09-18 at 01:44 +0200, Francois Romieu wrote:
> > The TxDmaOkLowDesc register may tell if the Tx dma part is still 
> > making any progress. I have added a TxPoll request. See below.
> 
> I've just added that into the original TX timeout handler, since that
> doesn't seem to be crashing the box for me as long as I avoid the IRQ
> storm.
> 
> Not sure what we learn from it ('desc 6550' printed as hex)... I've
> also made it dump the TX descriptor ring (skb, addr, opts1, opts2):

The TxDmaOkLowDesc values look sane to me; they always match the low
bits of the last descriptor that *should* have been sent, according to
the ring dumps.

I made it store and dump the original contents of the TX descriptor
ring too (before it gets overwritten by the hardware). So we can see
what *was* being transmitted.

It isn't any more enlightening. I also tried just prodding the hardware
by writing to the TxPoll register. That wasn't sufficient to restart
it.

[26589.024750] 8139cp :00:0b.0 eth1: Transmit timeout, status  c   2b head 
52 tail 45 desc 96c00 80ff
[26589.034632] TX ring 00:   (null) 1de16e7c 30003130 0 (b5ea 0)
[26589.034632] TX ring 01:   (null) 1de165ac 30003130 0 (b5ea 0)
[26589.034632] TX ring 02:   (null) 1de1540c 30003130 0 (b5ea 0)
[26589.034632] TX ring 03:   (null) 1de15cdc 30003130 0 (b5ea 0)
[26589.034632] TX ring 04:   (null) 1de14b3c 30003130 0 (b5ea 0)
[26589.034632] TX ring 05:   (null) 1de1399c 30003130 0 (b5ea 0)
[26589.034632] TX ring 06:   (null) 1de130cc 30003130 0 (b5ea 0)
[26589.034632] TX ring 07:   (null) 1de1426c 30003130 0 (b5ea 0)
[26589.034632] TX ring 08:   (null) 1de127fc 30003130 0 (b5ea 0)
[26589.034632] TX ring 09:   (null) 1de11f2c 30003130 0 (b5ea 0)
[26589.034632] TX ring 10:   (null) 1de10d8c 30003130 0 (b056 0)
[26589.034632] TX ring 11:   (null) 1de1165c 30003130 0 (b5ea 0)
[26589.034632] TX ring 12:   (null) 1df4774c 30003130 0 (b5ea 0)
[26589.034632] TX ring 13:   (null) 1de104bc 30003130 0 (b5ea 0)
[26589.034632] TX ring 14:   (null) 1df46e7c 30003130 0 (b5ea 0)
[26589.034632] TX ring 15:   (null) 1df465ac 30003130 0 (b5ea 0)
[26589.034632] TX ring 16:   (null) 1df45cdc 30003130 0 (b5ea 0)
[26589.034632] TX ring 17:   (null) 1df44b3c 30003130 0 (b056 0)
[26589.034632] TX ring 18:   (null) 1df4426c 30003130 0 (b5ea 0)
[26589.034632] TX ring 19:   (null) 1df4540c 30003130 0 (b5ea 0)
[26589.034632] TX ring 20:   (null) 1df4399c 30004954 0 (b5ea 0)
[26589.034632] TX ring 21:   (null) 1df430cc 30004954 0 (b5ea 0)
[26589.034632] TX ring 22:   (null) 1df427fc 30004954 0 (b5ea 0)
[26589.034632] TX ring 23:   (null) 1df4165c 30004954 0 (b5ea 0)
[26589.034632] TX ring 24:   (null) 1df41f2c 30004954 0 (b5ea 0)
[26589.034632] TX ring 25:   (null) 1df40d8c 30004954 0 (b5ea 0)
[26589.034632] TX ring 26:   (null) 1dcc9602 3000796b 0 (b06a 0)
[26589.034632] TX ring 27:   (null) 1de1774c 3000 0 (b097 0)
[26589.034632] TX ring 28:   (null) 1de16e7c 3000 0 (b467 0)
[26589.034632] TX ring 29:   (null) 1df404bc 3000 0 (b557 0)
[26589.034632] TX ring 30:   (null) 1de165ac 3000 0 (b0e7 0)
[26589.034632] TX ring 31:   (null) 1de1540c 3000 0 (b0d7 0)
[26589.034632] TX ring 32:   (null) 1dcc9602 300087b2 0 (b062 0)
[26589.034632] TX ring 33:   (null) 1de15cdc 300087b2 0 (b046 0)
[26589.034632] TX ring 34:   (null) 1de14b3c 3540 0 (b08b 0)
[26589.034632] TX ring 35:   (null) 1de1426c 3000709e 0 (b097 0)
[26589.034632] TX ring 36:   (null) 1de1399c 3000709e 0 (b097 0)
[26589.034632] TX ring 37:   (null) 1de130cc 3000c169 0 (b084 0)
[26589.034632] TX ring 38:   (null) 1de127fc 3000a5eb 0 (b5ea 0)
[26589.034632] TX ring 39:   (null) 1de11f2c 3000d4a3 0 (b557 0)
[26589.034632] TX ring 40:   (null) 1de10d8c 3000b57e 0 (b046 0)
[26589.034632] TX ring 41:   (null) 1de104bc 3000 0 (b0a7 0)
[26589.034632] TX ring 42:   (null) 1dce774c 30004000 0 (b046 0)
[26589.034632] TX ring 43:   (null) 1dce6e7c 300076f6 0 (b046 0)
[26589.034632] TX ring 44:   (null) 1dce5cdc 30002034 0 (b096 0)
[26589.034632] TX ring 45: dde4c3c0 1dce540c b557 0 (b557 0)
[26589.034632] TX ring 46: dddc1900 1dce426c b097 0 (b0

[PATCH 3/2] 8139cp: Improve accuracy of cp_interrupt() return, to survive IRQ storms

2015-09-18 Thread David Woodhouse
The TX timeout handling has been observed to trigger RX IRQ storms. And
since cp_interrupt() just keeps saying that it handled the interrupt,
the machine then dies. Fix the return value from cp_interrupt(), and
the offending IRQ gets disabled and the machine survives.

Signed-off-by: David Woodhouse 
---
... and I have to make fewer trips to the cupboard under the stairs.
There follows a fix to prevent the IRQ storm from happening at all.

 drivers/net/ethernet/realtek/8139cp.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index ba3dab7..f1054ad 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -371,7 +371,7 @@ struct cp_private {
 
 
 static void __cp_set_rx_mode (struct net_device *dev);
-static void cp_tx (struct cp_private *cp);
+static int cp_tx (struct cp_private *cp);
 static void cp_clean_rings (struct cp_private *cp);
 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void cp_poll_controller(struct net_device *dev);
@@ -587,8 +587,6 @@ static irqreturn_t cp_interrupt (int irq, void 
*dev_instance)
if (!status || (status == 0x))
goto out_unlock;
 
-   handled = 1;
-
netif_dbg(cp, intr, dev, "intr, status %04x cmd %02x cpcmd %04x\n",
  status, cpr8(Cmd), cpr16(CpCmd));
 
@@ -596,25 +594,30 @@ static irqreturn_t cp_interrupt (int irq, void 
*dev_instance)
 
/* close possible race's with dev_close */
if (unlikely(!netif_running(dev))) {
+   handled = 1;
cpw16(IntrMask, 0);
goto out_unlock;
}
 
-   if (status & (RxOK | RxErr | RxEmpty | RxFIFOOvr))
+   if (status & (RxOK | RxErr | RxEmpty | RxFIFOOvr)) {
if (napi_schedule_prep(&cp->napi)) {
+   handled = 1;
cpw16_f(IntrMask, cp_norx_intr_mask);
__napi_schedule(&cp->napi);
}
-
+   }
if (status & (TxOK | TxErr | TxEmpty | SWInt))
-   cp_tx(cp);
-   if (status & LinkChg)
-   mii_check_media(&cp->mii_if, netif_msg_link(cp), false);
+   handled |= cp_tx(cp);
 
+   if (status & LinkChg) {
+   handled = 1;
+   mii_check_media(&cp->mii_if, netif_msg_link(cp), false);
+   }
 
if (status & PciErr) {
u16 pci_status;
 
+   handled = 1;
pci_read_config_word(cp->pdev, PCI_STATUS, &pci_status);
pci_write_config_word(cp->pdev, PCI_STATUS, pci_status);
netdev_err(dev, "PCI bus error, status=%04x, PCI status=%04x\n",
@@ -645,11 +648,12 @@ static void cp_poll_controller(struct net_device *dev)
 }
 #endif
 
-static void cp_tx (struct cp_private *cp)
+static int cp_tx (struct cp_private *cp)
 {
unsigned tx_head = cp->tx_head;
unsigned tx_tail = cp->tx_tail;
unsigned bytes_compl = 0, pkts_compl = 0;
+   int handled = 0;
 
while (tx_tail != tx_head) {
struct cp_desc *txd = cp->tx_ring + tx_tail;
@@ -661,6 +665,7 @@ static void cp_tx (struct cp_private *cp)
if (status & DescOwn)
break;
 
+   handled = 1;
skb = cp->tx_skb[tx_tail];
BUG_ON(!skb);
 
@@ -704,6 +709,8 @@ static void cp_tx (struct cp_private *cp)
netdev_completed_queue(cp->dev, pkts_compl, bytes_compl);
if (TX_BUFFS_AVAIL(cp) > (MAX_SKB_FRAGS + 1))
netif_wake_queue(cp->dev);
+
+   return handled;
 }
 
 static inline u32 cp_tx_vlan_tag(struct sk_buff *skb)
-- 
2.4.3

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


[PATCH 4/2] 8139cp: Do not re-enable RX interrupts in cp_tx_timeout()

2015-09-18 Thread David Woodhouse
If an RX interrupt was already received but NAPI has not yet run when
the RX timeout happens, we end up in cp_tx_timeout() with RX interrupts
already disabled. Blindly re-enabling them will cause an IRQ storm.

This is somewhat less painful than it was a few minutes ago before I
fixed the return value from cp_interrupt(), but still suboptimal.

Unconditionally leave RX interrupts disabled after the reset, and
schedule NAPI to check the receive ring and re-enable them.

Signed-off-by: David Woodhouse 
---
It might seem a little odd to deliberately schedule NAPI when we know
there's going to be nothing there since we just cleared the rings.

But given that we only get here if the hardware is playing silly
buggers, I don't much like the idea of preserving the existing IntrMask
that we read from the hardware. And there's no pretty way of
determining whether NAPI is already scheduled. So just ensuring that
it *is* scheduled seems simplest. It's not like we're going to
care about the performance implications of one fruitless poll when
we've just suffered a TX timeout and reset the hardware.

In future I think I might want to fix cp_tx_timeout() to *only* reset
the TX ring anyway. Especially as I'm entirely unsure about its locking
w.r.t. cp_rx_poll()... what prevents cp_rx_poll() from running while
we're in the middle of screwing around with the RX rings?

 drivers/net/ethernet/realtek/8139cp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c
b/drivers/net/ethernet/realtek/8139cp.c
index f1054ad..d12fc50 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -1269,9 +1269,10 @@ static void cp_tx_timeout(struct net_device
*dev)
rc = cp_init_rings(cp);
cp_start_hw(cp);
__cp_set_rx_mode(dev);
-   cp_enable_irq(cp);
+   cpw16_f(IntrMask, cp_norx_intr_mask);
 
netif_wake_queue(dev);
+   napi_schedule_irqoff(&cp->napi);
 
spin_unlock_irqrestore(&cp->lock, flags);
 }
-- 
2.4.3

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-18 Thread David Woodhouse
On Fri, 2015-09-18 at 01:44 +0200, Francois Romieu wrote:
> The TxDmaOkLowDesc register may tell if the Tx dma part is still 
> making any progress. I have added a TxPoll request. See below.

It isn't making any progress. And TxPoll doesn't help. The only thing
I've found that restarts it is to clear and then set the TxOn bit in
the Cmd register, which resets the entire ring.

I briefly wondered if it was triggered by our constantly banging on the
TxPoll register even when the ring is already running, so I put in an
optimisation to avoid that except in the case where we've had a TxEmpty
(0x80) interrupt. That doesn't help either.

When the problem happens, we've put a new descriptor into the ring and
written to TxPoll to start the processing. We get a TxEmpty interrupt
*without* TxDone, while the descriptor is still marked as being owned
by the hardware.

At other times we do get a TxEmpty without TxDone, but usually the
TxDone happens shortly thereafter. In the problematic case, the TxDone
never happens — and the offending descriptor is never given back.

(Note that I also fixed the off-by-one in the 'tx queued' debugging
message)

[35322.861870] 8139cp :00:0b.0 eth1: tx queued, slot 13, skblen 1294
[35322.861870] 8139cp :00:0b.0 eth1: tx kicking tx_poll, head 14 tail 13 
desc 54c0 poll 00
[35322.861870] 8139cp :00:0b.0 eth1: intr, status 0484 cmd 0c cpcmd 002b
[35322.861870] 8139cp :00:0b.0 eth1: tx done, slot 13, status 0x30003a6e 
dec 54d0
[35322.861870] 8139cp :00:0b.0 eth1: irq not kicking tx_poll, head 14 tail 
14 desc 54d0 poll 00
[35322.875014] 8139cp :00:0b.0 eth1: intr, status 0001 cmd 0c cpcmd 002b
[35322.913285] 8139cp :00:0b.0 eth1: intr, status 0001 cmd 0c cpcmd 002b
[35322.917250] 8139cp :00:0b.0 eth1: intr, status 0001 cmd 0c cpcmd 002b
[35322.943075] 8139cp :00:0b.0 eth1: tx queued, slot 14, skblen 70
[35322.943103] 8139cp :00:0b.0 eth1: tx kicking tx_poll, head 15 tail 14 
desc 54d0 poll 00
[35322.943138] 8139cp :00:0b.0 eth1: intr, status 0484 cmd 0c cpcmd 002b
[35322.943164] 8139cp :00:0b.0 eth1: tx done, slot 14, status 0x30008001 
dec 54e0
[35322.943190] 8139cp :00:0b.0 eth1: irq not kicking tx_poll, head 15 tail 
15 desc 54e0 poll 00
[35322.947071] 8139cp :00:0b.0 eth1: intr, status 0001 cmd 0c cpcmd 002b
[35322.959487] 8139cp :00:0b.0 eth1: intr, status 0001 cmd 0c cpcmd 002b
[35323.001723] 8139cp :00:0b.0 eth1: intr, status 0001 cmd 0c cpcmd 002b
[35323.041541] 8139cp :00:0b.0 eth1: intr, status 0001 cmd 0c cpcmd 002b
[35323.041541] 8139cp :00:0b.0 eth1: intr, status 0001 cmd 0c cpcmd 002b
[35323.052386] 8139cp :00:0b.0 eth1: intr, status 0001 cmd 0c cpcmd 002b
[35323.113828] 8139cp :00:0b.0 eth1: intr, status 0001 cmd 0c cpcmd 002b
[35323.114824] 8139cp :00:0b.0 eth1: tx queued, slot 15, skblen 1294
[35323.114851] 8139cp :00:0b.0 eth1: tx kicking tx_poll, head 16 tail 15 
desc 54e0 poll 00
[35323.114887] 8139cp :00:0b.0 eth1: intr, status 0080 cmd 0c cpcmd 002b
[35323.114921] 8139cp :00:0b.0 eth1: Invalid TxEmpty, should have seen 15 
at ddea54f0 status  c   2b0 80ff desc 54e0 poll 00
[35323.124559] 8139cp :00:0b.0 eth1: irq kicking tx_poll, head 16 tail 15 
desc 54e0 poll 00
[35323.126775] 8139cp :00:0b.0 eth1: tx queued, slot 16, skblen 1294
[35323.126803] 8139cp :00:0b.0 eth1: tx not kicking tx_poll, head 17 tail 
15 desc 54e0 poll 00
[35323.127154] 8139cp :00:0b.0 eth1: tx queued, slot 17, skblen 1294
[35323.127181] 8139cp :00:0b.0 eth1: tx not kicking tx_poll, head 18 tail 
15 desc 54e0 poll 00
[35323.127776] 8139cp :00:0b.0 eth1: tx queued, slot 18, skblen 1294
[35323.127802] 8139cp :00:0b.0 eth1: tx not kicking tx_poll, head 19 tail 
15 desc 54e0 poll 00
[35323.147218] 8139cp :00:0b.0 eth1: intr, status 0001 cmd 0c cpcmd 002b
[35323.247288] 8139cp :00:0b.0 eth1: intr, status 0001 cmd 0c cpcmd 002b
[35323.314456] 8139cp :00:0b.0 eth1: intr, status 0001 cmd 0c cpcmd 002b
...
[35331.995477] 8139cp :00:0b.0 eth1: intr, status 0001 cmd 0c cpcmd 002b
[35332.024428] 8139cp :00:0b.0 eth1: Transmit timeout, status  c   2b0 
80ff desc 54e0 poll 00
[35332.033424] TX ring 00 @ddea5400:   (null) 1ddd65ac 30006362 0 (b50e 0)
[35332.040399] TX ring 01 @ddea5410:   (null) 1ddd5cdc 30006362 0 (b50e 0)
[35332.043399] TX ring 02 @ddea5420:   (null) 1ed9165c 30006362 0 (b50e 0)
[35332.043399] TX ring 03 @ddea5430:   (null) 1ddd540c 30006362 0 (b50e 0)
[35332.043399] TX ring 04 @ddea5440:   (null) 1ddd4b3c 30006362 0 (b50e 0)
[35332.043399] TX ring 05 @ddea5450:   (null) 1ddd426c 30006362 0 (b50e 0)
[35332.043399] TX ring 06 @ddea5460:   (null) 1ddd399c 30006362 0 (b50e 0)
[35332.043399] TX ring 07 @ddea5470:   (null) 1ddd27fc 30006362 0 (b50e 0)
[35332.043399] TX ring 08 @ddea5480:   (null) 1ddd1f2c 30006362 0 (b50e 0)
[35332.043399] TX ring 09 @ddea5490:   (null) 1ddd165c 30006362 0 (b50e 0)
[35332.0433

[PATCH 2/7] 8139cp: Do not re-enable RX interrupts in cp_tx_timeout()

2015-09-21 Thread David Woodhouse
From: David Woodhouse 

If an RX interrupt was already received but NAPI has not yet run when
the RX timeout happens, we end up in cp_tx_timeout() with RX interrupts
already disabled. Blindly re-enabling them will cause an IRQ storm.

This is somewhat less painful than it was a few minutes ago before I
fixed the return value from cp_interrupt(), but still suboptimal.

Unconditionally leave RX interrupts disabled after the reset, and
schedule NAPI to check the receive ring and re-enable them.

Signed-off-by: David Woodhouse 
---
 drivers/net/ethernet/realtek/8139cp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index f1054ad..d12fc50 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -1269,9 +1269,10 @@ static void cp_tx_timeout(struct net_device *dev)
rc = cp_init_rings(cp);
cp_start_hw(cp);
__cp_set_rx_mode(dev);
-   cp_enable_irq(cp);
+   cpw16_f(IntrMask, cp_norx_intr_mask);
 
netif_wake_queue(dev);
+   napi_schedule_irqoff(&cp->napi);
 
spin_unlock_irqrestore(&cp->lock, flags);
 }
-- 
2.4.3

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation


smime.p7s
Description: S/MIME cryptographic signature


[PATCH 1/7] 8139cp: Improve accuracy of cp_interrupt() return, to survive IRQ storms

2015-09-21 Thread David Woodhouse
From: David Woodhouse 

The TX timeout handling has been observed to trigger RX IRQ storms. And
since cp_interrupt() just keeps saying that it handled the interrupt,
the machine then dies. Fix the return value from cp_interrupt(), and
the offending IRQ gets disabled and the machine survives.

Signed-off-by: David Woodhouse 
---
 drivers/net/ethernet/realtek/8139cp.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index ba3dab7..f1054ad 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -371,7 +371,7 @@ struct cp_private {
 
 
 static void __cp_set_rx_mode (struct net_device *dev);
-static void cp_tx (struct cp_private *cp);
+static int cp_tx (struct cp_private *cp);
 static void cp_clean_rings (struct cp_private *cp);
 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void cp_poll_controller(struct net_device *dev);
@@ -587,8 +587,6 @@ static irqreturn_t cp_interrupt (int irq, void 
*dev_instance)
if (!status || (status == 0x))
goto out_unlock;
 
-   handled = 1;
-
netif_dbg(cp, intr, dev, "intr, status %04x cmd %02x cpcmd %04x\n",
  status, cpr8(Cmd), cpr16(CpCmd));
 
@@ -596,25 +594,30 @@ static irqreturn_t cp_interrupt (int irq, void 
*dev_instance)
 
/* close possible race's with dev_close */
if (unlikely(!netif_running(dev))) {
+   handled = 1;
cpw16(IntrMask, 0);
goto out_unlock;
}
 
-   if (status & (RxOK | RxErr | RxEmpty | RxFIFOOvr))
+   if (status & (RxOK | RxErr | RxEmpty | RxFIFOOvr)) {
if (napi_schedule_prep(&cp->napi)) {
+   handled = 1;
cpw16_f(IntrMask, cp_norx_intr_mask);
__napi_schedule(&cp->napi);
}
-
+   }
if (status & (TxOK | TxErr | TxEmpty | SWInt))
-   cp_tx(cp);
-   if (status & LinkChg)
-   mii_check_media(&cp->mii_if, netif_msg_link(cp), false);
+   handled |= cp_tx(cp);
 
+   if (status & LinkChg) {
+   handled = 1;
+   mii_check_media(&cp->mii_if, netif_msg_link(cp), false);
+   }
 
if (status & PciErr) {
u16 pci_status;
 
+   handled = 1;
pci_read_config_word(cp->pdev, PCI_STATUS, &pci_status);
pci_write_config_word(cp->pdev, PCI_STATUS, pci_status);
netdev_err(dev, "PCI bus error, status=%04x, PCI status=%04x\n",
@@ -645,11 +648,12 @@ static void cp_poll_controller(struct net_device *dev)
 }
 #endif
 
-static void cp_tx (struct cp_private *cp)
+static int cp_tx (struct cp_private *cp)
 {
unsigned tx_head = cp->tx_head;
unsigned tx_tail = cp->tx_tail;
unsigned bytes_compl = 0, pkts_compl = 0;
+   int handled = 0;
 
while (tx_tail != tx_head) {
struct cp_desc *txd = cp->tx_ring + tx_tail;
@@ -661,6 +665,7 @@ static void cp_tx (struct cp_private *cp)
if (status & DescOwn)
break;
 
+   handled = 1;
skb = cp->tx_skb[tx_tail];
BUG_ON(!skb);
 
@@ -704,6 +709,8 @@ static void cp_tx (struct cp_private *cp)
netdev_completed_queue(cp->dev, pkts_compl, bytes_compl);
if (TX_BUFFS_AVAIL(cp) > (MAX_SKB_FRAGS + 1))
netif_wake_queue(cp->dev);
+
+   return handled;
 }
 
 static inline u32 cp_tx_vlan_tag(struct sk_buff *skb)
-- 
2.4.3

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 2/2] 8139cp: Call __cp_set_rx_mode() from cp_tx_timeout()

2015-09-21 Thread David Woodhouse
On Sun, 2015-09-20 at 22:24 -0700, David Miller wrote:
> From: David Woodhouse 
> Date: Fri, 18 Sep 2015 00:21:54 +0100
> 
> > Unless we reset the RX config, on real hardware I don't seem to
> receive
> > any packets after a TX timeout.
> > 
> > Signed-off-by: David Woodhouse 
> 
> Applied.

Thanks. I'll send another batch, including the original patches 3/2 and
4/3 from this series, in reply to this message.

After which, I think we might be able to turn on TX checksumming by
default and I also have a way to implement early detection of the TX
stall I've been seeing.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


[PATCH 6/7] 8139cp: Dump contents of descriptor ring on TX timeout

2015-09-21 Thread David Woodhouse
From: David Woodhouse 

We are seeing unexplained TX timeouts under heavy load. Let's try to get
a better idea of what's going on.

Signed-off-by: David Woodhouse 
---
 drivers/net/ethernet/realtek/8139cp.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index 036ad85..6feff9f 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -157,6 +157,7 @@ enum {
NWayAdvert  = 0x66, /* MII ADVERTISE */
NWayLPAR= 0x68, /* MII LPA */
NWayExpansion   = 0x6A, /* MII Expansion */
+   TxDmaOkLowDesc  = 0x82, /* Low 16 bit address of a Tx descriptor. */
Config5 = 0xD8, /* Config5 */
TxPoll  = 0xD9, /* Tell chip to check Tx descriptors for work */
RxMaxSize   = 0xDA, /* Max size of an Rx packet (8169 only) */
@@ -1263,7 +1264,7 @@ static void cp_tx_timeout(struct net_device *dev)
 {
struct cp_private *cp = netdev_priv(dev);
unsigned long flags;
-   int rc;
+   int rc, i;
 
netdev_warn(dev, "Transmit timeout, status %2x %4x %4x %4x\n",
cpr8(Cmd), cpr16(CpCmd),
@@ -1271,6 +1272,17 @@ static void cp_tx_timeout(struct net_device *dev)
 
spin_lock_irqsave(&cp->lock, flags);
 
+   netif_dbg(cp, tx_err, cp->dev, "TX ring head %d tail %d desc %x\n",
+ cp->tx_head, cp->tx_tail, cpr16(TxDmaOkLowDesc));
+   for (i = 0; i < CP_TX_RING_SIZE; i++) {
+   netif_dbg(cp, tx_err, cp->dev,
+ "TX slot %d @%p: %08x (%08x) %08x %llx %p\n",
+ i, &cp->tx_ring[i], le32_to_cpu(cp->tx_ring[i].opts1),
+ cp->tx_opts[i], le32_to_cpu(cp->tx_ring[i].opts2),
+ le64_to_cpu(cp->tx_ring[i].addr),
+ cp->tx_skb[i]);
+   }
+
cp_stop_hw(cp);
    cp_clean_rings(cp);
rc = cp_init_rings(cp);
-- 
2.4.3

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation


smime.p7s
Description: S/MIME cryptographic signature


[PATCH 5/7] 8139cp: Fix DMA unmapping of transmitted buffers

2015-09-21 Thread David Woodhouse
From: David Woodhouse 

The low 16 bits of the 'opts1' field in the TX descriptor are supposed
to still contain the buffer length when the descriptor is handed back to
us. In practice, at least on my hardware, they don't. So stash the
original value of the opts1 field and get the length to unmap from
there.

There are other ways we could have worked out the length, but I actually
want a stash of the opts1 field anyway so that I can dump it alongside
the contents of the descriptor ring when we suffer a TX timeout.

Signed-off-by: David Woodhouse 
---
 drivers/net/ethernet/realtek/8139cp.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index 07621b5..036ad85 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -341,6 +341,7 @@ struct cp_private {
unsignedtx_tail;
struct cp_desc  *tx_ring;
struct sk_buff  *tx_skb[CP_TX_RING_SIZE];
+   u32 tx_opts[CP_TX_RING_SIZE];
 
unsignedrx_buf_sz;
unsignedwol_enabled : 1; /* Is Wake-on-LAN enabled? */
@@ -670,7 +671,7 @@ static int cp_tx (struct cp_private *cp)
BUG_ON(!skb);
 
dma_unmap_single(&cp->pdev->dev, le64_to_cpu(txd->addr),
-le32_to_cpu(txd->opts1) & 0x,
+cp->tx_opts[tx_tail] & 0x,
 PCI_DMA_TODEVICE);
 
if (status & LastFrag) {
@@ -793,6 +794,7 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
wmb();
 
cp->tx_skb[entry] = skb;
+   cp->tx_opts[entry] = flags;
netif_dbg(cp, tx_queued, cp->dev, "tx queued, slot %d, skblen 
%d\n",
  entry, skb->len);
} else {
@@ -856,6 +858,8 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
 
txd->opts1 = cpu_to_le32(ctrl);
wmb();
+
+   cp->tx_opts[entry] = ctrl;
cp->tx_skb[entry] = skb;
}
 
@@ -880,6 +884,7 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
txd->opts1 = cpu_to_le32(ctrl);
wmb();
 
+   cp->tx_opts[first_entry] = ctrl;
netif_dbg(cp, tx_queued, cp->dev, "tx queued, slots %d-%d, 
skblen %d\n",
  first_entry, entry, skb->len);
}
@@ -1122,6 +1127,7 @@ static int cp_init_rings (struct cp_private *cp)
 {
memset(cp->tx_ring, 0, sizeof(struct cp_desc) * CP_TX_RING_SIZE);
cp->tx_ring[CP_TX_RING_SIZE - 1].opts1 = cpu_to_le32(RingEnd);
+   memset(cp->tx_opts, 0, sizeof(cp->tx_opts));
 
cp_init_rings_index(cp);
 
@@ -1179,6 +1185,7 @@ static void cp_clean_rings (struct cp_private *cp)
 
memset(cp->rx_ring, 0, sizeof(struct cp_desc) * CP_RX_RING_SIZE);
memset(cp->tx_ring, 0, sizeof(struct cp_desc) * CP_TX_RING_SIZE);
+   memset(cp->tx_opts, 0, sizeof(cp->tx_opts));
 
memset(cp->rx_skb, 0, sizeof(struct sk_buff *) * CP_RX_RING_SIZE);
memset(cp->tx_skb, 0, sizeof(struct sk_buff *) * CP_TX_RING_SIZE);
-- 
2.4.3

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation


smime.p7s
Description: S/MIME cryptographic signature


[PATCH 3/7] 8139cp: Fix tx_queued debug message to print correct slot numbers

2015-09-21 Thread David Woodhouse
From: David Woodhouse 

After a certain amount of staring at the debug output of this driver, I
realised it was lying to me.

Signed-off-by: David Woodhouse 
---
 drivers/net/ethernet/realtek/8139cp.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index d12fc50..058f835 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -793,7 +793,8 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
wmb();
 
cp->tx_skb[entry] = skb;
-   entry = NEXT_TX(entry);
+   netif_dbg(cp, tx_queued, cp->dev, "tx queued, slot %d, skblen 
%d\n",
+ entry, skb->len);
} else {
struct cp_desc *txd;
u32 first_len, first_eor;
@@ -812,7 +813,6 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
goto out_dma_error;
 
cp->tx_skb[entry] = skb;
-   entry = NEXT_TX(entry);
 
for (frag = 0; frag < skb_shinfo(skb)->nr_frags; frag++) {
const skb_frag_t *this_frag = 
&skb_shinfo(skb)->frags[frag];
@@ -820,6 +820,8 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
u32 ctrl;
dma_addr_t mapping;
 
+   entry = NEXT_TX(entry);
+
len = skb_frag_size(this_frag);
mapping = dma_map_single(&cp->pdev->dev,
 skb_frag_address(this_frag),
@@ -855,9 +857,7 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
 
txd->opts1 = cpu_to_le32(ctrl);
wmb();
-
cp->tx_skb[entry] = skb;
-   entry = NEXT_TX(entry);
}
 
txd = &cp->tx_ring[first_entry];
@@ -880,12 +880,13 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
txd->opts1 = cpu_to_le32(first_eor | first_len |
 FirstFrag | DescOwn);
wmb();
+
+   netif_dbg(cp, tx_queued, cp->dev, "tx queued, slots %d-%d, 
skblen %d\n",
+ first_entry, entry, skb->len);
}
-   cp->tx_head = entry;
+   cp->tx_head = NEXT_TX(entry);
 
netdev_sent_queue(dev, skb->len);
-   netif_dbg(cp, tx_queued, cp->dev, "tx queued, slot %d, skblen %d\n",
- entry, skb->len);
if (TX_BUFFS_AVAIL(cp) <= (MAX_SKB_FRAGS + 1))
netif_stop_queue(dev);
 
-- 
2.4.3

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation


smime.p7s
Description: S/MIME cryptographic signature


[PATCH 4/7] 8139cp: Fix TSO/scatter-gather descriptor setup

2015-09-21 Thread David Woodhouse
From: David Woodhouse 

When sending a TSO frame in multiple buffers, we were neglecting to set
the first descriptor up in TSO mode.

Signed-off-by: David Woodhouse 
---
 drivers/net/ethernet/realtek/8139cp.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index 058f835..07621b5 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -797,7 +797,7 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
  entry, skb->len);
} else {
struct cp_desc *txd;
-   u32 first_len, first_eor;
+   u32 first_len, first_eor, ctrl;
dma_addr_t first_mapping;
int frag, first_entry = entry;
const struct iphdr *ip = ip_hdr(skb);
@@ -817,7 +817,6 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
for (frag = 0; frag < skb_shinfo(skb)->nr_frags; frag++) {
const skb_frag_t *this_frag = 
&skb_shinfo(skb)->frags[frag];
u32 len;
-   u32 ctrl;
dma_addr_t mapping;
 
entry = NEXT_TX(entry);
@@ -865,20 +864,20 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
txd->addr = cpu_to_le64(first_mapping);
wmb();
 
-   if (skb->ip_summed == CHECKSUM_PARTIAL) {
+   ctrl = first_eor | first_len | FirstFrag | DescOwn;
+   if (mss)
+   ctrl |= LargeSend |
+   ((mss & MSSMask) << MSSShift);
+   else if (skb->ip_summed == CHECKSUM_PARTIAL) {
if (ip->protocol == IPPROTO_TCP)
-   txd->opts1 = cpu_to_le32(first_eor | first_len |
-FirstFrag | DescOwn |
-IPCS | TCPCS);
+   ctrl |= IPCS | TCPCS;
else if (ip->protocol == IPPROTO_UDP)
-   txd->opts1 = cpu_to_le32(first_eor | first_len |
-FirstFrag | DescOwn |
-IPCS | UDPCS);
+   ctrl |= IPCS | UDPCS;
else
BUG();
-   } else
-   txd->opts1 = cpu_to_le32(first_eor | first_len |
-FirstFrag | DescOwn);
+   }
+
+   txd->opts1 = cpu_to_le32(ctrl);
wmb();
 
netif_dbg(cp, tx_queued, cp->dev, "tx queued, slots %d-%d, 
skblen %d\n",
-- 
2.4.3

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation


smime.p7s
Description: S/MIME cryptographic signature


[PATCH 7/7] 8139cp: Avoid gratuitous writes to TxPoll register when already running

2015-09-21 Thread David Woodhouse
From: David Woodhouse 

This is a minor optimisation, but as a side-effect it means we can know
precisely which descriptors were already in the ring when we last
prodded it to run.

This will give us a better chance to catch the case where we get a
TxEmpty interrupt when it hasn't actually finished the descriptors we
*know* it should have seen, before it ends up being a full-blown TX
timeout and reset.

Since QEMU's emulation doesn't give TxEmpty interrupts, *always* bash on
TxPoll until we see the first TxEmpty interrupt and cp->txempty_seen
gets set.

Signed-off-by: David Woodhouse 
---
I'm actually having second thoughts about this one since realising that
QEMU doesn't implement it correctly. The workaround isn't *that* horrid
but it's not clear it's enough of a performance win — or whether it's
entirely necessary for catching my TX stall.

 drivers/net/ethernet/realtek/8139cp.c | 37 +--
 1 file changed, 31 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index 6feff9f..67a4fcf 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -340,6 +340,9 @@ struct cp_private {
 
unsignedtx_head cacheline_aligned;
unsignedtx_tail;
+   unsignedtx_running;
+   unsignedtxempty_seen;
+
struct cp_desc  *tx_ring;
struct sk_buff  *tx_skb[CP_TX_RING_SIZE];
u32 tx_opts[CP_TX_RING_SIZE];
@@ -611,6 +614,22 @@ static irqreturn_t cp_interrupt (int irq, void 
*dev_instance)
if (status & (TxOK | TxErr | TxEmpty | SWInt))
handled |= cp_tx(cp);
 
+   if ((status & TxEmpty) && cp->tx_running) {
+   handled = 1;
+   /* Qemu's emulation doesn't give TxEmpty interrupts */
+   cp->txempty_seen = 1;
+   if (cp->tx_head == cp->tx_tail) {
+   /* Out of descriptors and we have nothing more for it.
+  Let it stop. */
+   cp->tx_running = 0;
+   } else {
+   /* The hardware raced with us adding a new descriptor,
+  and we didn't get the TxEmpty IRQ in time so we
+  didn't prod it. Prod it now to restart. */
+   cpw8(TxPoll, NormalTxPoll);
+   }
+   }
+
if (status & LinkChg) {
handled = 1;
mii_check_media(&cp->mii_if, netif_msg_link(cp), false);
@@ -796,8 +815,9 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
 
cp->tx_skb[entry] = skb;
cp->tx_opts[entry] = flags;
-   netif_dbg(cp, tx_queued, cp->dev, "tx queued, slot %d, skblen 
%d\n",
- entry, skb->len);
+   netif_dbg(cp, tx_queued, cp->dev,
+ "tx queued, slot %d, skblen %d r %d\n",
+ entry, skb->len, cp->tx_running);
} else {
struct cp_desc *txd;
u32 first_len, first_eor, ctrl;
@@ -886,8 +906,9 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
wmb();
 
cp->tx_opts[first_entry] = ctrl;
-   netif_dbg(cp, tx_queued, cp->dev, "tx queued, slots %d-%d, 
skblen %d\n",
- first_entry, entry, skb->len);
+   netif_dbg(cp, tx_queued, cp->dev,
+ "tx queued, slots %d-%d, skblen %d r %d\n",
+ first_entry, entry, skb->len, cp->tx_running);
}
cp->tx_head = NEXT_TX(entry);
 
@@ -895,11 +916,13 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
if (TX_BUFFS_AVAIL(cp) <= (MAX_SKB_FRAGS + 1))
netif_stop_queue(dev);
 
+   if (!cp->tx_running || !cp->txempty_seen) {
+   cpw8(TxPoll, NormalTxPoll);
+   cp->tx_running = 1;
+   }
 out_unlock:
spin_unlock_irqrestore(&cp->lock, intr_flags);
 
-   cpw8(TxPoll, NormalTxPoll);
-
return NETDEV_TX_OK;
 out_dma_error:
dev_kfree_skb_any(skb);
@@ -989,6 +1012,7 @@ static void cp_stop_hw (struct cp_private *cp)
 
cp->rx_tail = 0;
cp->tx_head = cp->tx_tail = 0;
+   cp->tx_running = 0;
 
netdev_reset_queue(cp->dev);
 }
@@ -1041,6 +1065,7 @@ static inline void cp_start_hw (struct cp_private *cp)
 * This variant appears to work fine.
 */
cpw8(Cmd, RxOn | TxOn);
+   cp->tx_running = 0;
 
netdev_reset_queue(cp->dev);
 }
-- 
2.4.3

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 2/2] 8139cp: Call __cp_set_rx_mode() from cp_tx_timeout()

2015-09-21 Thread David Woodhouse
On Mon, 2015-09-21 at 14:59 +0100, David Woodhouse wrote:

> After which, I think we might be able to turn on TX checksumming by
> default and I also have a way to implement early detection of the TX
> stall I've been seeing.

This is a patch I've been playing with to catch the TX stall.

When it happens we get a TxEmpty interrupt *without* TxDone.

After loading the driver, we never get TxEmpty without TxDone until the
problem has happened. I've got a minimal cp_tx_timeout() which just
clears the TxOn bit in the Cmd register and turns it back on again,
after moving the descriptors all down to the start of the TX ring.

Strangely, *after* this has happened we do see a lot of instances of
TxEmpty without TxDone. But then the TxDone usually comes immediately
afterwards. Until the driver is reloaded, at which point the next
instance of TxEmpty without TxDone *is* the hardware stalling again.

I'm very confused about what's going on there. I think I possibly need
to set a timer when the TxEmpty/!TxDone interrupt happens, and do a
preemptive reset of the TX queue (as shown below) if the timer manages
to expire before the TxDone actually happens.

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index 67a4fcf..64b44ec 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -592,8 +592,8 @@ static irqreturn_t cp_interrupt (int irq, void 
*dev_instance)
if (!status || (status == 0x))
goto out_unlock;
 
-   netif_dbg(cp, intr, dev, "intr, status %04x cmd %02x cpcmd %04x\n",
- status, cpr8(Cmd), cpr16(CpCmd));
+   netif_dbg(cp, intr, dev, "intr, status %04x cmd %02x cpcmd %04x desc 
%04x\n",
+ status, cpr8(Cmd), cpr16(CpCmd), cpr16(TxDmaOkLowDesc));
 
cpw16(IntrStatus, status & ~cp_rx_intr_mask);
 
@@ -623,6 +623,10 @@ static irqreturn_t cp_interrupt (int irq, void 
*dev_instance)
   Let it stop. */
cp->tx_running = 0;
} else {
+   if (!(status & (TxOK | TxErr)))
+   netdev_warn(dev, "TxEmpty without TxDone. h %d 
t %d d %04x\n",
+   cp->tx_head, cp->tx_tail, 
cpr16(TxDmaOkLowDesc));
+
/* The hardware raced with us adding a new descriptor,
   and we didn't get the TxEmpty IRQ in time so we
   didn't prod it. Prod it now to restart. */
@@ -1307,12 +1311,49 @@ static void cp_tx_timeout(struct net_device *dev)
  le64_to_cpu(cp->tx_ring[i].addr),
  cp->tx_skb[i]);
}
-
+#if 1
+   /* Stop the TX (which is already stopped), move the
+  descriptors down, and start it again */
+   cpw8_f(Cmd, RxOn);
+   if (cp->tx_tail == 0) {
+   /* Do nothing */
+   } else if (cp->tx_head > cp->tx_tail) {
+   for (i = 0; i < cp->tx_head - cp->tx_tail; i++) {
+   int from = i + cp->tx_tail;
+   cp->tx_ring[i].addr = cp->tx_ring[from].addr;
+   cp->tx_ring[i].opts2 = cp->tx_ring[from].opts2;
+   cp->tx_ring[i].opts1 = cp->tx_ring[from].opts1;
+   cp->tx_ring[from].opts1 &= ~cpu_to_le32(DescOwn);
+   cp->tx_opts[i] = cp->tx_opts[from];
+   cp->tx_skb[i] = cp->tx_skb[from];
+   cp->tx_skb[from] = NULL;
+   printk("Ring move %d->%d\n", from, i);
+   }
+   } else {
+   for (i = cp->tx_tail - cp->tx_head - 1; i >= 0; i--) {
+   int from = (i + cp->tx_tail) & (CP_TX_RING_SIZE - 1);
+   cp->tx_ring[i].addr = cp->tx_ring[from].addr;
+   cp->tx_ring[i].opts2 = cp->tx_ring[from].opts2;
+   cp->tx_ring[i].opts1 = cp->tx_ring[from].opts1;
+   cp->tx_ring[from].opts1 &= ~cpu_to_le32(DescOwn);
+   cp->tx_opts[i] = cp->tx_opts[from];
+   cp->tx_skb[i] = cp->tx_skb[from];
+   cp->tx_skb[from] = NULL;
+   printk("Ring move %d->%d\n", from, i);
+   }
+   }
+   cpw8_f(Cmd, RxOn|TxOn);
+   cp->tx_head = (cp->tx_head - cp->tx_tail) & (CP_TX_RING_SIZE - 1);
+   cp->tx_tail = 0;
+   printk("head %d tail %d\n", cp->tx_head, cp->tx_tail);
+   cpw8(TxPoll, NormalTxPoll);
+#else
cp_stop_hw(cp);
cp_clean_rings(cp);
rc = cp_init_rings(cp);
cp_start_hw(cp);
__cp_set_rx_mode(dev);
+#endif
cpw16_f(IntrMask, cp_norx_intr_mask);
 
netif_wake_queue(dev);

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [RFC PATCH 1/3] Avoid making inappropriate requests of NETIF_F_V[46]_CSUM devices

2015-09-21 Thread David Woodhouse
On Wed, 2013-01-16 at 18:00 -0500, David Miller wrote:
> From: David Woodhouse 
> Date: Wed, 16 Jan 2013 22:34:18 +
> 
> > On Wed, 2013-01-16 at 15:54 -0500, David Miller wrote:
> >> 
> >> My opinion on this is that the injectors of packets are responsible
> >> for ensuring checksum types are set on SKBs in an appropriate way.
> >> 
> >> So we ensure this in the local protocol stacks that generate packets,
> >> and if foreign alien entities can inject SKBs with these checksum
> >> settings (like the tun device can) the burdon of verification falls
> >> upon whatever layer allows that to happen.
> >> 
> >> So really, the fix is in the tun device and the virtio layer.
> > 
> > The virtio layer (and the tun device) expose the equivalent of the
> > NETIF_F_HW_CSUM capability to the guest. In the case where we have a
> > real device on the host which *also* has NETIF_F_HW_CSUM capability, are
> > you saying that the tun driver should do the checksum for non-UDP/TCP
> > packets in software *anyway*, just because the packet might end up going
> > out a device *without* that capability, and the check in
> > harmonize_features() isn't sophisticated enough to cope properly?
> 
> I'm saying that tun can't inject unchecked crap into our stack.

Did we ever resolve this? AFAICT from inspecting the code the
virtio_net device still advertises hardware csum capabilities to the
guest. And accepts packets which need checksumming, calling
skb_partial_csum_set() as appropriate. Likewise tun, xen, macvtap and
af_packet.

And that works fine — it's a nice performance win because it means that
VM guests (and other clients) can make full use of the HW csum
capabilities of the real network hardware. And when the outbound
netdevice *doesn't* have HW csum support, we generally do the right
thing and complete the csum in software in the host kernel before
transmitting it.

Perhaps I'm missing something, but I'm not sure why you refer to that
as 'injecting unchecked crap'. Surely it's using CHECKSUM_PARTIAL
precisely as it was designed, and allowing the checksum to be completed
either by hardware or software as appropriate?

The *only* problem is the false positive in harmonize_features(), which
was addressed by my patch which started this thread (in 2013). The
problem is that an IP packet that *isn't* TCP or UDP, being sent out a
device that has only NETIF_F_IP_CSUM capability, ends up being handed
to the device unchecksummed because harmonize_features() fails to clear
the HW csum flag as it (arguably) should.

Original thread at
http://comments.gmane.org/gmane.linux.network/254981

I'm only looking at it again because I'm pondering enabling HW csum in
8139cp (now that I've fixed TSO), and it reminded me of this...


-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 1/7] 8139cp: Improve accuracy of cp_interrupt() return, to survive IRQ storms

2015-09-21 Thread David Woodhouse
On Mon, 2015-09-21 at 22:25 +0200, Francois Romieu wrote:
> David Woodhouse  :
> > From: David Woodhouse 
> > 
> > The TX timeout handling has been observed to trigger RX IRQ storms. And
> > since cp_interrupt() just keeps saying that it handled the interrupt,
> > the machine then dies. Fix the return value from cp_interrupt(), and
> > the offending IRQ gets disabled and the machine survives.
> 
> I am not fond of the way it dissociates the hardware status word and the
> software "handled" variable.

Oh, I like that part very much :)

The practice of returning a 'handled' status only when you've actually
*done* something you expect to mitigate the interrupt, is a useful way
of also protecting against both hardware misbehaviour and software
bugs.

> What you are describing - RX IRQ storms - looks like a problem between
> the irq and poll handlers. That's where I expect the problem to be solved.

I already fixed that, in the next patch in the series. But the failure
mode *should* have been 'IRQ disabled' and the device continuing to
work via polling. Not a complete death of the machine. That's the
difference this patch makes.

> Sprinkling "handled" operations does not make me terribly confortable,
> especially as I'd happily trade the old-style part irq, part napi
> processing for a plain napi processing (I can get over it though :o) ).

The existing cp_rx_poll() function mostly runs without taking cp->lock.
But cp_tx() *does* need cp->lock for the tx_head/tx_tail manipulations.
I'm guessing that's why it's still being called directly from the hard
IRQ handler? I suppose we could probably find a way to move that out.

But I was mostly just trying to fix stuff that was actually broken...
:)

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 4/7] 8139cp: Fix TSO/scatter-gather descriptor setup

2015-09-21 Thread David Woodhouse
On Mon, 2015-09-21 at 23:01 +0200, Francois Romieu wrote:
> Can you pile a patch to replace BUG with WARN_ON_ONCE(1) ?

OK. I can probably contrive a userspace program using AF_PACKET and
PACKET_VNET_HDR to trigger it, too¹ :)

-- 
dwmw2


¹ http://comments.gmane.org/gmane.linux.network/254981


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 7/7] 8139cp: Avoid gratuitous writes to TxPoll register when already running

2015-09-21 Thread David Woodhouse
On Mon, 2015-09-21 at 22:54 +0200, Francois Romieu wrote:
> David Woodhouse  :
> [...]
> > I'm actually having second thoughts about this one since realising that
> > QEMU doesn't implement it correctly. The workaround isn't *that* horrid
> > but it's not clear it's enough of a performance win — or whether it's
> > entirely necessary for catching my TX stall.
> 
> It don't indulge in this kind of fetish but I'm fine with people who want
> to play with QEMU 8139cp emulation where they could use virtio. They should
> imvho realize that it's their job to fix QEMU 8139cp emulation, not the
> other way around.

Oh, I'll fix the QEMU emulation (but not this week; updating my router
has taken long enough already and I have Real Work™ to do).

But those fixes will take time to propagate to actual users. I'm not
sure it's so reasonable to knowingly break the 8139cp driver for all
currently-released versions of QEMU.

It wouldn't surprise me to find that QEMU probably accounts for the
*majority* of 8139cp use on modern Linux kernels. I've had to fix
regressions which *only* fail on real hardware, and were *only* tested
on QEMU :)

> Please keep things sane (*cough*) and avoid the qemu workaround.

I don't even know that I need it. As you saw in my last WIP patch for
catching the TX stall, I was just checking for TxEmpty without TxDone.
An earlier iteration had actually remembered the last slot that was
already present when we prodded the TxPoll, and would warn on getting a
TxEmpty interrupt while that slot was still owned by the hardware. But
that has the *same* false positives, only *after* the initial stall,
that my current version has.

So I'm just not sure I care about eliminating the gratuitous TxPoll
writes. It's not as if we even wait for them. It's an MMIO write which
can complete later.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 4/7] 8139cp: Fix TSO/scatter-gather descriptor setup

2015-09-21 Thread David Woodhouse
On Mon, 2015-09-21 at 23:01 +0200, Francois Romieu wrote:
> 
> Can you pile a patch to replace BUG with WARN_ON_ONCE(1) ?

Let's avoid having three copies of the same damn code, while we're at
it... http://git.infradead.org/users/dwmw2/linux-8139cp.git has this
and the appropriate minor fixes to subsequent patches in the series.

What do you think of finally enabling hw csum and TSO by default, btw?

Subject: [PATCH 4½/7] 8139cp: Reduce duplicate csum/tso code in cp_start_xmit()

We calculate the value of the opts1 descriptor field in three different
places. With two different behaviours when given an invalid packet to
be checksummed — none of them correct. Sort that out.

Signed-off-by: David Woodhouse 
---
 drivers/net/ethernet/realtek/8139cp.c | 61 ---
 1 file changed, 20 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index 3b219aa..a2c471d 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -740,7 +740,7 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
 {
struct cp_private *cp = netdev_priv(dev);
unsigned entry;
-   u32 eor, flags;
+   u32 eor, opts1;
unsigned long intr_flags;
__le32 opts2;
int mss = 0;
@@ -760,6 +760,21 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
mss = skb_shinfo(skb)->gso_size;
 
opts2 = cpu_to_le32(cp_tx_vlan_tag(skb));
+   opts1 = DescOwn;
+   if (mss)
+   opts1 |= LargeSend | ((mss & MSSMask) << MSSShift);
+   else if (skb->ip_summed == CHECKSUM_PARTIAL) {
+   const struct iphdr *ip = ip_hdr(skb);
+   if (ip->protocol == IPPROTO_TCP)
+   opts1 |= IPCS | TCPCS;
+   else if (ip->protocol == IPPROTO_UDP)
+   opts1 |= IPCS | UDPCS;
+   else {
+   WARN_ONCE(1,
+ "Net bug: asked to checksum invalid Legacy IP 
packet\n");
+   goto out_dma_error;
+   }
+   }
 
if (skb_shinfo(skb)->nr_frags == 0) {
struct cp_desc *txd = &cp->tx_ring[entry];
@@ -775,21 +790,9 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
txd->addr = cpu_to_le64(mapping);
wmb();
 
-   flags = eor | len | DescOwn | FirstFrag | LastFrag;
-
-   if (mss)
-   flags |= LargeSend | ((mss & MSSMask) << MSSShift);
-   else if (skb->ip_summed == CHECKSUM_PARTIAL) {
-   const struct iphdr *ip = ip_hdr(skb);
-   if (ip->protocol == IPPROTO_TCP)
-   flags |= IPCS | TCPCS;
-   else if (ip->protocol == IPPROTO_UDP)
-   flags |= IPCS | UDPCS;
-   else
-   WARN_ON(1); /* we need a WARN() */
-   }
+   opts1 |= eor | len | FirstFrag | LastFrag;
 
-   txd->opts1 = cpu_to_le32(flags);
+   txd->opts1 = cpu_to_le32(opts1);
wmb();
 
cp->tx_skb[entry] = skb;
@@ -800,7 +803,6 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
u32 first_len, first_eor, ctrl;
dma_addr_t first_mapping;
int frag, first_entry = entry;
-   const struct iphdr *ip = ip_hdr(skb);
 
/* We must give this initial chunk to the device last.
 * Otherwise we could race with the device.
@@ -832,19 +834,7 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
 
eor = (entry == (CP_TX_RING_SIZE - 1)) ? RingEnd : 0;
 
-   ctrl = eor | len | DescOwn;
-
-   if (mss)
-   ctrl |= LargeSend |
-   ((mss & MSSMask) << MSSShift);
-   else if (skb->ip_summed == CHECKSUM_PARTIAL) {
-   if (ip->protocol == IPPROTO_TCP)
-   ctrl |= IPCS | TCPCS;
-   else if (ip->protocol == IPPROTO_UDP)
-   ctrl |= IPCS | UDPCS;
-   else
-   BUG();
-   }
+   ctrl = opts1 | eor | len;
 
if (frag == skb_shinfo(skb)->nr_frags - 1)
ctrl |= LastFrag;
@@ -864,18 +854,7 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
txd->addr = cpu_to_le64(first_mapping);
wmb();
 
-   ctrl = first_eor | first_len | FirstFrag | DescOwn;
-   

Re: [PATCH 1/7] 8139cp: Improve accuracy of cp_interrupt() return, to survive IRQ storms

2015-09-23 Thread David Woodhouse
On Tue, 2015-09-22 at 16:45 -0700, David Miller wrote:
> And if we are getting Rx* interrupts with napi_schedule_prep()
> returning false, that's a serious problem.  It can mean that the TX
> timeout handler's resetting of the chip is either miscoded or is
> racing with either NAPI polling or this interrupt handler.

Such bugs happen. That's why we have IRQ-storm detection, to help
protect us.

> And if that's the case your patch is making the chip's IRQ line get
> disabled when this race triggers.
> 
> This change is even worse, in my opinion, if patch #2 indeed makes
> the problem go away.

Even worse than what?

When a bug like the one I've fixed in #2 happens, let's look at the
options:

  - With this patch, the IRQ storm is detected. We disable the IRQ 
line and the driver continues to work, albeit with a higher
latency.

  - Without this patch, the box just dies completely. First the 
hardware watchdog was triggering an apparently spontaneous reset. 
Then when I disabled the hardware watchdog, the machine locked up 
and doesn't even respond to serial sysrq. Eventually I got a
little bit of sense out of it using the NMI watchdog.

Trust me, the disabled IRQ you get with my patch really *isn't* worse
than what happened without it :)

If cp_interrupt() had already worked the way I suggest, I would
literally have two days of my life back right now. I spent a *lot* of
time working out that it was dying in an interrupt storm. And trying to
find the underlying problem while having to physically visit it under
the stairs and reset it all the time.

In fact, I've been seeing undebuggable spontaneous resets of this
router for a while now. If I'd had an 'IRQ disabled' backtrace and a
smoking gun, I might have been able to fix it a long time ago.

I even discounted the IRQ-storm hypothesis, early on, on the basis that
"Linux handles those quite gracefully now, and disables the interrupt".
Except of course *that* relies on the IRQ handler returning an
appropriate value. So it doesn't work with 8139cp as things stand.

That's why I'm really quite keen on fixing cp_interrupt() to be more
defensive in when it claims to have handled the interrupt.

Surely that's *why* we have the IRQ-storm detection in the first place
— to help us survive precisely this situation. IRQ storms are either
caused by software bugs like the one fixed in patch #2, or hardware
misbehaviour. And if IRQ handlers blindly return 'handled' just because
the hardware actually *did* assert an interrupt, regardless of whether
they've made any *progress*, then we don't get that protection in a
*lot* of the situations where we'd want it.

But sure, for now I'll drop this from the series and I can try to
convince you separately. In fact I think it only affects the last patch
in the series which reduces the banging on TxPoll. And I'm going to
drop that one too for now anyway. I'll repost shortly. And I think I'll
add a new one at the end, enabling TX checksums, SG and TSO by default.

-- 
dwmw2


smime.p7s
Description: S/MIME cryptographic signature


[PATCH 3/7] 8139cp: Fix TSO/scatter-gather descriptor setup

2015-09-23 Thread David Woodhouse
From: David Woodhouse 

When sending a TSO frame in multiple buffers, we were neglecting to set
the first descriptor up in TSO mode.

Signed-off-by: David Woodhouse 
---
 drivers/net/ethernet/realtek/8139cp.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index cbca0de..75a8cee 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -790,7 +790,7 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
  entry, skb->len);
} else {
struct cp_desc *txd;
-   u32 first_len, first_eor;
+   u32 first_len, first_eor, ctrl;
dma_addr_t first_mapping;
int frag, first_entry = entry;
const struct iphdr *ip = ip_hdr(skb);
@@ -810,7 +810,6 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
for (frag = 0; frag < skb_shinfo(skb)->nr_frags; frag++) {
const skb_frag_t *this_frag = 
&skb_shinfo(skb)->frags[frag];
u32 len;
-   u32 ctrl;
dma_addr_t mapping;
 
entry = NEXT_TX(entry);
@@ -858,20 +857,19 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
txd->addr = cpu_to_le64(first_mapping);
wmb();
 
-   if (skb->ip_summed == CHECKSUM_PARTIAL) {
+   ctrl = first_eor | first_len | FirstFrag | DescOwn;
+   if (mss)
+   ctrl |= LargeSend | ((mss & MSSMask) << MSSShift);
+   else if (skb->ip_summed == CHECKSUM_PARTIAL) {
if (ip->protocol == IPPROTO_TCP)
-   txd->opts1 = cpu_to_le32(first_eor | first_len |
-FirstFrag | DescOwn |
-IPCS | TCPCS);
+   ctrl |= IPCS | TCPCS;
else if (ip->protocol == IPPROTO_UDP)
-   txd->opts1 = cpu_to_le32(first_eor | first_len |
-FirstFrag | DescOwn |
-IPCS | UDPCS);
+   ctrl |= IPCS | UDPCS;
else
BUG();
-   } else
-   txd->opts1 = cpu_to_le32(first_eor | first_len |
-FirstFrag | DescOwn);
+   }
+
+   txd->opts1 = cpu_to_le32(ctrl);
wmb();
 
netif_dbg(cp, tx_queued, cp->dev, "tx queued, slots %d-%d, 
skblen %d\n",
-- 
2.4.3

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation


smime.p7s
Description: S/MIME cryptographic signature


[PATCH 1/7] 8139cp: Do not re-enable RX interrupts in cp_tx_timeout()

2015-09-23 Thread David Woodhouse
From: David Woodhouse 

If an RX interrupt was already received but NAPI has not yet run when
the RX timeout happens, we end up in cp_tx_timeout() with RX interrupts
already disabled. Blindly re-enabling them will cause an IRQ storm.

(This is made particularly horrid by the fact that cp_interrupt() always
returns that it's handled the interrupt, even when it hasn't actually
done anything. If it didn't do that, the core IRQ code would have
detected the storm and handled it, I'd have had a clear smoking gun
backtrace instead of just a spontaneously resetting router, and I'd have
at *least* two days of my life back. Changing the return value of
cp_interrupt() will be argued about under separate cover.)

Unconditionally leave RX interrupts disabled after the reset, and
schedule NAPI to check the receive ring and re-enable them.

Signed-off-by: David Woodhouse 
---
 drivers/net/ethernet/realtek/8139cp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index ba3dab7..947932d 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -1262,9 +1262,10 @@ static void cp_tx_timeout(struct net_device *dev)
rc = cp_init_rings(cp);
cp_start_hw(cp);
__cp_set_rx_mode(dev);
-   cp_enable_irq(cp);
+   cpw16_f(IntrMask, cp_norx_intr_mask);
 
netif_wake_queue(dev);
+   napi_schedule_irqoff(&cp->napi);
 
spin_unlock_irqrestore(&cp->lock, flags);
 }
-- 
2.4.3

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation


smime.p7s
Description: S/MIME cryptographic signature


[PATCH 4/7] 8139cp: Reduce duplicate csum/tso code in cp_start_xmit()

2015-09-23 Thread David Woodhouse
From: David Woodhouse 

We calculate the value of the opts1 descriptor field in three different
places. With two different behaviours when given an invalid packet to
be checksummed — none of them correct. Sort that out.

Signed-off-by: David Woodhouse 
---
 drivers/net/ethernet/realtek/8139cp.c | 61 ---
 1 file changed, 20 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index 75a8cee..b3bd8b1 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -733,7 +733,7 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
 {
struct cp_private *cp = netdev_priv(dev);
unsigned entry;
-   u32 eor, flags;
+   u32 eor, opts1;
unsigned long intr_flags;
__le32 opts2;
int mss = 0;
@@ -753,6 +753,21 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
mss = skb_shinfo(skb)->gso_size;
 
opts2 = cpu_to_le32(cp_tx_vlan_tag(skb));
+   opts1 = DescOwn;
+   if (mss)
+   opts1 |= LargeSend | ((mss & MSSMask) << MSSShift);
+   else if (skb->ip_summed == CHECKSUM_PARTIAL) {
+   const struct iphdr *ip = ip_hdr(skb);
+   if (ip->protocol == IPPROTO_TCP)
+   opts1 |= IPCS | TCPCS;
+   else if (ip->protocol == IPPROTO_UDP)
+   opts1 |= IPCS | UDPCS;
+   else {
+   WARN_ONCE(1,
+ "Net bug: asked to checksum invalid Legacy IP 
packet\n");
+   goto out_dma_error;
+   }
+   }
 
if (skb_shinfo(skb)->nr_frags == 0) {
struct cp_desc *txd = &cp->tx_ring[entry];
@@ -768,21 +783,9 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
txd->addr = cpu_to_le64(mapping);
wmb();
 
-   flags = eor | len | DescOwn | FirstFrag | LastFrag;
-
-   if (mss)
-   flags |= LargeSend | ((mss & MSSMask) << MSSShift);
-   else if (skb->ip_summed == CHECKSUM_PARTIAL) {
-   const struct iphdr *ip = ip_hdr(skb);
-   if (ip->protocol == IPPROTO_TCP)
-   flags |= IPCS | TCPCS;
-   else if (ip->protocol == IPPROTO_UDP)
-   flags |= IPCS | UDPCS;
-   else
-   WARN_ON(1); /* we need a WARN() */
-   }
+   opts1 |= eor | len | FirstFrag | LastFrag;
 
-   txd->opts1 = cpu_to_le32(flags);
+   txd->opts1 = cpu_to_le32(opts1);
wmb();
 
cp->tx_skb[entry] = skb;
@@ -793,7 +796,6 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
u32 first_len, first_eor, ctrl;
dma_addr_t first_mapping;
int frag, first_entry = entry;
-   const struct iphdr *ip = ip_hdr(skb);
 
/* We must give this initial chunk to the device last.
 * Otherwise we could race with the device.
@@ -825,19 +827,7 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
 
eor = (entry == (CP_TX_RING_SIZE - 1)) ? RingEnd : 0;
 
-   ctrl = eor | len | DescOwn;
-
-   if (mss)
-   ctrl |= LargeSend |
-   ((mss & MSSMask) << MSSShift);
-   else if (skb->ip_summed == CHECKSUM_PARTIAL) {
-   if (ip->protocol == IPPROTO_TCP)
-   ctrl |= IPCS | TCPCS;
-   else if (ip->protocol == IPPROTO_UDP)
-   ctrl |= IPCS | UDPCS;
-   else
-   BUG();
-   }
+   ctrl = opts1 | eor | len;
 
if (frag == skb_shinfo(skb)->nr_frags - 1)
ctrl |= LastFrag;
@@ -857,18 +847,7 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
txd->addr = cpu_to_le64(first_mapping);
wmb();
 
-   ctrl = first_eor | first_len | FirstFrag | DescOwn;
-   if (mss)
-   ctrl |= LargeSend | ((mss & MSSMask) << MSSShift);
-   else if (skb->ip_summed == CHECKSUM_PARTIAL) {
-   if (ip->protocol == IPPROTO_TCP)
-   ctrl |= IPCS | TCPCS;
-   else if (ip->protocol == IPPROTO_UDP)
-   ctrl |= IPCS | UDPCS;
-   else
-   BUG();
-

  1   2   >