Re: [PATCH 4/6] myri10ge - First half of the driver
On Fri, 2006-05-12 at 01:53 +0200, Brice Goglin wrote: Francois Romieu wrote: + spin_lock(mgp-cmd_lock); + response-result = 0x; + mb(); + myri10ge_pio_copy((void __iomem *) cmd_addr, buf, sizeof (*buf)); + + /* wait up to 2 seconds */ You must not hold a spinlock for up to 2 seconds. We are working on reducing the delay to about 15ms. It only occurs when the driver is loaded or the link brought up. I think 15ms is quite a long time to hold a spinlock also - most spinlocks in the kernel are held for less than 500 microseconds. Can't you use a mutex? Lee - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] myri10ge - First half of the driver
Lee Revell wrote: On Fri, 2006-05-12 at 01:53 +0200, Brice Goglin wrote: Francois Romieu wrote: + spin_lock(mgp-cmd_lock); + response-result = 0x; + mb(); + myri10ge_pio_copy((void __iomem *) cmd_addr, buf, sizeof (*buf)); + + /* wait up to 2 seconds */ You must not hold a spinlock for up to 2 seconds. We are working on reducing the delay to about 15ms. It only occurs when the driver is loaded or the link brought up. I think 15ms is quite a long time to hold a spinlock also - most spinlocks in the kernel are held for less than 500 microseconds. Can't you use a mutex? It looks like rtnl_lock protects us here. We are working on it. Brice - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] myri10ge - First half of the driver
Brice Goglin [EMAIL PROTECTED] : [...] Return in a middle of a spinlock-intensive function. :o( What do you mean ? It is preferred for maintenance purpose (hello Mr Morton) to organize the control flow with a single spin_{lock/unlock} pair: if there is a branch in the control flow, it ought to be joined again before returning. -- Ueimor - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] myri10ge - First half of the driver
On Fri, May 12, 2006 at 01:53:44AM +0200, Brice Goglin ([EMAIL PROTECTED]) wrote: Imho you will want to work directly with pages shortly. We had thought about doing this, but were a little nervous since we did not know of any other drivers that worked directly with pages. If this is an official direction to work directly with pages, we will. s2io does. e1000 does it with skb frags. If your hardware allows header split and driver can put headers into skb-data and real data into frag_list, that allows to create various interesting things like receiving zero-copy support and netchannels support. It is work in progress, not official direction currently, but this definitely will help your driver to support future high performance extensions. Brice -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] myri10ge - First half of the driver
From: Evgeniy Polyakov [EMAIL PROTECTED] Date: Fri, 12 May 2006 10:47:11 +0400 On Fri, May 12, 2006 at 01:53:44AM +0200, Brice Goglin ([EMAIL PROTECTED]) wrote: Imho you will want to work directly with pages shortly. We had thought about doing this, but were a little nervous since we did not know of any other drivers that worked directly with pages. If this is an official direction to work directly with pages, we will. s2io does. e1000 does it with skb frags. If your hardware allows header split and driver can put headers into skb-data and real data into frag_list, that allows to create various interesting things like receiving zero-copy support and netchannels support. It is work in progress, not official direction currently, but this definitely will help your driver to support future high performance extensions. The most important impact is not having to use order 1 pages for jumbo MTU frames, which are more likely to fail allocations thant order 0 pages under heavy load. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] myri10ge - First half of the driver
Roland Dreier wrote: +#define myri10ge_pio_copy(to,from,size) __iowrite64_copy(to,from,size/8) Why do you need this wrapper? Why not just call __iowrite64_copy() without the obfuscation? Anyone reading the code will just have to search back to this define and mentally translate the size back and forth all the time. Well, I know that abstraction layer is bad. But in this case I really think that a name like myri10ge_pio_copy(size) is way less obfuscating than __iowrite64_copy(size/8). Will fix it if it really matters. +int myri10ge_hyper_msi_cap_on(struct pci_dev *pdev) +{ + uint8_t cap_off; + int nbcap = 0; + + cap_off = PCI_CAPABILITY_LIST - 1; + /* go through all caps looking for a hypertransport msi mapping */ This looks like something that should be fixed up in the general PCI quirk handling rather than in every driver... +static int +myri10ge_use_msi(struct pci_dev *pdev) +{ + if (myri10ge_msi == 1 || myri10ge_msi == 0) + return myri10ge_msi; + + /* find root complex for our device */ + while (pdev-bus pdev-bus-self) { + pdev = pdev-bus-self; + } Similarly looks like generic PCI code (if it's needed at all). If I understand correctly you're trying to check if MSI has a chance at working on the system, but a network device driver has no business walking up the PCI hierarchy. Right, I will look at moving all this to the core PCI code. Thanks for all the comments. Brice - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] myri10ge - First half of the driver
Francois Romieu wrote: +spin_lock(mgp-cmd_lock); +response-result = 0x; +mb(); +myri10ge_pio_copy((void __iomem *) cmd_addr, buf, sizeof (*buf)); + +/* wait up to 2 seconds */ You must not hold a spinlock for up to 2 seconds. We are working on reducing the delay to about 15ms. It only occurs when the driver is loaded or the link brought up. +for (sleep_total = 0; sleep_total (2 * 1000); sleep_total += 10) { +mb(); +if (response-result != 0x) { +if (response-result == 0) { +data-data0 = ntohl(response-data); +spin_unlock(mgp-cmd_lock); +return 0; +} else { +dev_err(mgp-pdev-dev, +command %d failed, result = %d\n, + cmd, ntohl(response-result)); +spin_unlock(mgp-cmd_lock); +return -ENXIO; Return in a middle of a spinlock-intensive function. :o( What do you mean ? +{ +struct sk_buff *skb; +unsigned long data, roundup; + +skb = dev_alloc_skb(bytes + 4096 + MYRI10GE_MCP_ETHER_PAD); +if (skb == NULL) +return NULL; Imho you will want to work directly with pages shortly. We had thought about doing this, but were a little nervous since we did not know of any other drivers that worked directly with pages. If this is an official direction to work directly with pages, we will. But the existing approach is well tested through our beta cycle, and we would prefer to leave it as is and update to a pages based approach in the future. Thanks a lot for all the comments. Brice - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] myri10ge - First half of the driver
On Wed, 10 May 2006 14:40:22 -0700 (PDT) Brice Goglin [EMAIL PROTECTED] wrote: [PATCH 4/6] myri10ge - First half of the driver The first half of the myri10ge driver core. Splitting it in half, might help email restrictions, but it kills future users of 'git bisect' who expect to have every kernel buildable. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] myri10ge - First half of the driver
+typedef struct { +mcp_kreq_ether_recv_t __iomem *lanai; /* lanai ptr for recv ring */ +volatile uint8_t __iomem *wc_fifo; /* w/c rx dma addr fifo address */ +mcp_kreq_ether_recv_t *shadow; /* host shadow of recv ring */ +struct myri10ge_rx_buffer_state *info; +int cnt; +int alloc_fail; +int mask; /* number of rx slots -1 */ +} myri10ge_rx_buf_t; Why is wc_fifo volatile? The only places you actually use it, you seem to cast away the volatile anyway. Also, again, no typedef of structures please. +#define myri10ge_pio_copy(to,from,size) __iowrite64_copy(to,from,size/8) Why do you need this wrapper? Why not just call __iowrite64_copy() without the obfuscation? Anyone reading the code will just have to search back to this define and mentally translate the size back and forth all the time. +int myri10ge_hyper_msi_cap_on(struct pci_dev *pdev) +{ +uint8_t cap_off; +int nbcap = 0; + +cap_off = PCI_CAPABILITY_LIST - 1; +/* go through all caps looking for a hypertransport msi mapping */ This looks like something that should be fixed up in the general PCI quirk handling rather than in every driver... +static int +myri10ge_use_msi(struct pci_dev *pdev) +{ +if (myri10ge_msi == 1 || myri10ge_msi == 0) +return myri10ge_msi; + +/* find root complex for our device */ +while (pdev-bus pdev-bus-self) { +pdev = pdev-bus-self; +} Similarly looks like generic PCI code (if it's needed at all). If I understand correctly you're trying to check if MSI has a chance at working on the system, but a network device driver has no business walking up the PCI hierarchy. +buf = (mcp_cmd_t *) ((unsigned long)(buf_bytes + 7) ~7UL); ALIGN() from linux/kernel.h? - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] myri10ge - First half of the driver
Brice Goglin [EMAIL PROTECTED] : [PATCH 4/6] myri10ge - First half of the driver The first half of the myri10ge driver core. Signed-off-by: Brice Goglin [EMAIL PROTECTED] Signed-off-by: Andrew J. Gallatin [EMAIL PROTECTED] myri10ge.c | 1483 + 1 file changed, 1483 insertions(+) --- /dev/null 2006-05-09 19:43:19.324446250 +0200 +++ linux/drivers/net/myri10ge/myri10ge.c 2006-05-09 23:00:55.0 +0200 [...] +module_param(myri10ge_flow_control, int, S_IRUGO); +module_param(myri10ge_deassert_wait, int, S_IRUGO | S_IWUSR); +module_param(myri10ge_force_firmware, int, S_IRUGO); +module_param(myri10ge_skb_cross_4k, int, S_IRUGO | S_IWUSR); +module_param(myri10ge_initial_mtu, int, S_IRUGO); +module_param(myri10ge_napi, int, S_IRUGO); +module_param(myri10ge_napi_weight, int, S_IRUGO); +module_param(myri10ge_watchdog_timeout, int, S_IRUGO); +module_param(myri10ge_max_irq_loops, int, S_IRUGO); MODULE_PARM_DESC() would be nice. + +#define MYRI10GE_FW_OFFSET 1024*1024 +#define MYRI10GE_HIGHPART_TO_U32(X) \ +(sizeof (X) == 8) ? ((uint32_t)((uint64_t)(X) 32)) : (0) +#define MYRI10GE_LOWPART_TO_U32(X) ((uint32_t)(X)) + +#define myri10ge_pio_copy(to,from,size) __iowrite64_copy(to,from,size/8) + +int myri10ge_hyper_msi_cap_on(struct pci_dev *pdev) static int ? [...] +static int +myri10ge_send_cmd(struct myri10ge_priv *mgp, uint32_t cmd, + myri10ge_cmd_t *data) +{ + mcp_cmd_t *buf; + char buf_bytes[sizeof(*buf) + 8]; + volatile mcp_cmd_response_t *response = mgp-cmd; + volatile char __iomem *cmd_addr = mgp-sram + MYRI10GE_MCP_CMD_OFFSET; + uint32_t dma_low, dma_high; + int sleep_total = 0; + + /* ensure buf is aligned to 8 bytes */ + buf = (mcp_cmd_t *) ((unsigned long)(buf_bytes + 7) ~7UL); + + buf-data0 = htonl(data-data0); + buf-data1 = htonl(data-data1); + buf-data2 = htonl(data-data2); + buf-cmd = htonl(cmd); + dma_low = MYRI10GE_LOWPART_TO_U32(mgp-cmd_bus); + dma_high = MYRI10GE_HIGHPART_TO_U32(mgp-cmd_bus); + + buf-response_addr.low = htonl(dma_low); + buf-response_addr.high = htonl(dma_high); + spin_lock(mgp-cmd_lock); + response-result = 0x; + mb(); + myri10ge_pio_copy((void __iomem *) cmd_addr, buf, sizeof (*buf)); + + /* wait up to 2 seconds */ You must not hold a spinlock for up to 2 seconds. + for (sleep_total = 0; sleep_total (2 * 1000); sleep_total += 10) { + mb(); + if (response-result != 0x) { + if (response-result == 0) { + data-data0 = ntohl(response-data); + spin_unlock(mgp-cmd_lock); + return 0; + } else { + dev_err(mgp-pdev-dev, + command %d failed, result = %d\n, +cmd, ntohl(response-result)); + spin_unlock(mgp-cmd_lock); + return -ENXIO; Return in a middle of a spinlock-intensive function. :o( + } + } + udelay(1000 * 10); + } + spin_unlock(mgp-cmd_lock); + dev_err(mgp-pdev-dev, command %d timed out, result = %d\n, +cmd, ntohl(response-result)); + return -EAGAIN; +} + + +/* + * The eeprom strings on the lanaiX have the format + * SN=x\0 + * MAC=x:x:x:x:x:x\0 + * PT:ddd mmm xx xx:xx:xx xx\0 + * PV:ddd mmm xx xx:xx:xx xx\0 + */ +int +myri10ge_read_mac_addr(struct myri10ge_priv *mgp) static int ? [...] +static void +myri10ge_dummy_rdma(struct myri10ge_priv *mgp, int enable) +{ + volatile uint32_t *confirm; + volatile char __iomem *submit; + uint32_t buf[16]; + uint32_t dma_low, dma_high; + int i; + + /* clear confirmation addr */ + confirm = (volatile uint32_t *) mgp-cmd; + *confirm = 0; + mb(); + + /* send a rdma command to the PCIe engine, and wait for the + * response in the confirmation address. The firmware should + * write a -1 there to indicate it is alive and well + */ + dma_low = MYRI10GE_LOWPART_TO_U32(mgp-cmd_bus); + dma_high = MYRI10GE_HIGHPART_TO_U32(mgp-cmd_bus); + + buf[0] = htonl(dma_high); /* confirm addr MSW */ + buf[1] = htonl(dma_low);/* confirm addr LSW */ + buf[2] = htonl(0x); /* confirm data */ + buf[3] = htonl(dma_high); /* dummy addr MSW */ + buf[4] = htonl(dma_low);/* dummy addr LSW */ + buf[5] = htonl(enable); /* enable? */ + + submit = mgp-sram + 0xfc01c0; + + myri10ge_pio_copy((void __iomem *) submit, buf, sizeof (buf)); + mb(); + udelay(1000); + mb(); + i = 0; + while (*confirm != 0x i 20) { +