Re: [PATCH 3/4] myri10ge - Driver core
Hi there, Benjamin Herrenschmidt wrote: On Wed, 2006-05-24 at 01:39 +1000, Anton Blanchard wrote: +#ifdef CONFIG_MTRR + mgp-mtrr = mtrr_add(mgp-iomem_base, mgp-board_span, + MTRR_TYPE_WRCOMB, 1); +#endif ... + mgp-sram = ioremap(mgp-iomem_base, mgp-board_span); Not sure how we are meant to specify write through in drivers. Any ideas Ben? No proper interface exposed, he'll have to do an #ifdef powerpc here or such and use __ioremap with explicit page attributes. I have a hack to do that automatically for memory covered by prefetchable PCI BARs when mmap'ing from userland but not for kernel ioremap. Stupid question: pci_iomap() is NOT what you are looking for, right? Implementation is at the end of lib/iomap.c Regards Ingo Oeser - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] myri10ge - Driver core
No proper interface exposed, he'll have to do an #ifdef powerpc here or such and use __ioremap with explicit page attributes. I have a hack to do that automatically for memory covered by prefetchable PCI BARs when mmap'ing from userland but not for kernel ioremap. Stupid question: pci_iomap() is NOT what you are looking for, right? Implementation is at the end of lib/iomap.c No, there is no difference there, pci_iomap won't help for passing in platform specific page attributes for things like write combining. Ben. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] myri10ge - Driver core
Benjamin Herrenschmidt wrote: No proper interface exposed, he'll have to do an #ifdef powerpc here or such and use __ioremap with explicit page attributes. I have a hack to do that automatically for memory covered by prefetchable PCI BARs when mmap'ing from userland but not for kernel ioremap. Stupid question: pci_iomap() is NOT what you are looking for, right? Implementation is at the end of lib/iomap.c No, there is no difference there, pci_iomap won't help for passing in platform specific page attributes for things like write combining. Since we already have ioremap_nocache(), maybe adding ioremap_wcomb() is appropriate? Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] myri10ge - Driver core
On Fri, 2006-05-26 at 06:30 -0400, Jeff Garzik wrote: Benjamin Herrenschmidt wrote: No proper interface exposed, he'll have to do an #ifdef powerpc here or such and use __ioremap with explicit page attributes. I have a hack to do that automatically for memory covered by prefetchable PCI BARs when mmap'ing from userland but not for kernel ioremap. Stupid question: pci_iomap() is NOT what you are looking for, right? Implementation is at the end of lib/iomap.c No, there is no difference there, pci_iomap won't help for passing in platform specific page attributes for things like write combining. Since we already have ioremap_nocache(), maybe adding ioremap_wcomb() is appropriate? Yes, but be careful there.. on ppc, not setting the page guarded bit will enable write combining on various CPUs but will also enable speculative reads, prefetch, etc... so it's not just WC, you have to know for sure what you are doing with that memory (basically, enable side-effects). Ben. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] myri10ge - Driver core
On Wed, 2006-05-24 at 01:39 +1000, Anton Blanchard wrote: +#ifdef CONFIG_MTRR + mgp-mtrr = mtrr_add(mgp-iomem_base, mgp-board_span, +MTRR_TYPE_WRCOMB, 1); +#endif ... + mgp-sram = ioremap(mgp-iomem_base, mgp-board_span); Not sure how we are meant to specify write through in drivers. Any ideas Ben? No proper interface exposed, he'll have to do an #ifdef powerpc here or such and use __ioremap with explicit page attributes. I have a hack to do that automatically for memory covered by prefetchable PCI BARs when mmap'ing from userland but not for kernel ioremap. Ben. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] myri10ge - Driver core
On Wed, 2006-05-24 at 10:04 +0200, Brice Goglin wrote: I am not sure what you mean. The only ppc64 with PCI-E that we have seen so far (a G5) couldn't do write combining according to Apple. That is not 100% true I don't know what apple had in mind. It also depends in what slot you are. Do you have ways to measure the difference ? Try doing __ioremap(mgp-iomem_base, mgp-board_span, _PAGE_NO_CACHE); instead of using the normal ioremap for #ifdef powerpc and tell us if it makes a difference. Ben. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] myri10ge - Driver core
Benjamin Herrenschmidt wrote: On Wed, 2006-05-24 at 10:04 +0200, Brice Goglin wrote: I am not sure what you mean. The only ppc64 with PCI-E that we have seen so far (a G5) couldn't do write combining according to Apple. That is not 100% true I don't know what apple had in mind. It also depends in what slot you are. Do you have ways to measure the difference ? No, we don't have any PPC with PCIe running Linux. The only G5 PCIe that we have is running MacOSX. Try doing __ioremap(mgp-iomem_base, mgp-board_span, _PAGE_NO_CACHE); instead of using the normal ioremap for #ifdef powerpc and tell us if it makes a difference. I'll try it as soon as we get our G5 PCIe to run Linux. thanks, Brice - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] myri10ge - Driver core
Anton Blanchard wrote: +/* + * Set of routunes to get a new receive buffer. Any buffer which + * crosses a 4KB boundary must start on a 4KB boundary due to PCIe + * wdma restrictions. We also try to align any smaller allocation to + * at least a 16 byte boundary for efficiency. We assume the linux + * memory allocator works by powers of 2, and will not return memory + * smaller than 2KB which crosses a 4KB boundary. If it does, we fall + * back to allocating 2x as much space as required. + * + * We intend to replace large (4KB) skb allocations by using + * pages directly and building a fraglist in the near future. + */ You go to a lot of trouble to align things. One thing on ppc64 is that we really want to start all DMA writes on a cacheline boundary. We enforce that in network drivers by making NET_IP_ALIGN = 0 and having the drivers do: skb_reserve(skb, NET_IP_ALIGN); It sounds like your small allocations will be only aligned to 16 bytes. We didn't get any ppc64 with PCI-E to run Linux so far. What performance drop should we expect with our current code ? Id suggest using the dma API instead of the pci API. We have seen machines in the field that have failed large pci_alloc_consistent calls because it always asks for GFP_ATOMIC memory (it presumes the worst). The dma API allows you to pass a GFP_ flag in which will have a much better chance of succeeding when you dont need GFP_ATOMIC memory. Good idea, thanks, +#ifdef CONFIG_MTRR +mgp-mtrr = mtrr_add(mgp-iomem_base, mgp-board_span, + MTRR_TYPE_WRCOMB, 1); +#endif ... +mgp-sram = ioremap(mgp-iomem_base, mgp-board_span); Not sure how we are meant to specify write through in drivers. Any ideas Ben? I am not sure what you mean. The only ppc64 with PCI-E that we have seen so far (a G5) couldn't do write combining according to Apple. Thanks, Brice - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] myri10ge - Driver core
Hi, We didn't get any ppc64 with PCI-E to run Linux so far. What performance drop should we expect with our current code ? We have seen 20% improvement on ppc64 running some networking workloads when forcing 128 byte alignment (instead of 16 byte alignment). DMA writes have to get cacheline aligned (in power of 2 steps) on some IO chips. I am not sure what you mean. The only ppc64 with PCI-E that we have seen so far (a G5) couldn't do write combining according to Apple. Im thinking more generally, MTRRs are x86 specific and it would be good to have a more generic way to enable write combining. Anton - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] myri10ge - Driver core
On Friday 19 May 2006 12:00, Arnd Bergmann wrote: On Friday 19 May 2006 04:25, Brice Goglin wrote: dev_mc_upload() from net/core/dev_mcast.c does spin_lock_bh(dev-xmit_lock); __dev_mc_upload(dev); which calls dev-set_multicast_list(), which is myri10ge_set_multicast_list() which calls myri10ge_change_promisc which calls myri10ge_send_cmd Hmm, if that is the only path where you call it, it may be helpful to change myri10ge_change_promisc to call a special atomic version of myri10ge_send_cmd then, while all others use the regular version, which you can then convert to do msleep as well. Or just drop the xmit lock while you do that. As long as you handle races with -start_xmit yourself it's ok -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] myri10ge - Driver core
dev_err? [...] Could probably use dev_printk. When the interface name is known, we prefer having the message look like myri10ge: eth2: something since it might be easier to read than myri10ge: 5:000e: something. The administrator usually knows the name of the network interface better than its bus ID. Brice - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] myri10ge - Driver core
Andi Kleen wrote: On Friday 19 May 2006 12:00, Arnd Bergmann wrote: On Friday 19 May 2006 04:25, Brice Goglin wrote: dev_mc_upload() from net/core/dev_mcast.c does spin_lock_bh(dev-xmit_lock); __dev_mc_upload(dev); which calls dev-set_multicast_list(), which is myri10ge_set_multicast_list() which calls myri10ge_change_promisc which calls myri10ge_send_cmd Hmm, if that is the only path where you call it, it may be helpful to change myri10ge_change_promisc to call a special atomic version of myri10ge_send_cmd then, while all others use the regular version, which you can then convert to do msleep as well. Or just drop the xmit lock while you do that. As long as you handle races with -start_xmit yourself it's ok Looks like we can do that. Thanks, Brice - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] myri10ge - Driver core
Arnd Bergmann wrote: +static int myri10ge_open(struct net_device *dev) This function is too long to read easily. Ok we have split it a little bit. +/* allocate the host shadow rings */ + +bytes = 8 + (MYRI10GE_MCP_ETHER_MAX_SEND_DESC_TSO + 4) +* sizeof(*mgp-tx.req_list); +mgp-tx.req_bytes = kmalloc(bytes, GFP_KERNEL); +if (mgp-tx.req_bytes == NULL) +goto abort_with_nothing; +memset(mgp-tx.req_bytes, 0, bytes); + +/* ensure req_list entries are aligned to 8 bytes */ +mgp-tx.req_list = (struct mcp_kreq_ether_send *) +ALIGN((unsigned long)mgp-tx.req_bytes, 8); + +bytes = rx_ring_entries * sizeof(*mgp-rx_small.shadow); +mgp-rx_small.shadow = kmalloc(bytes, GFP_KERNEL); +if (mgp-rx_small.shadow == NULL) +goto abort_with_tx_req_bytes; +memset(mgp-rx_small.shadow, 0, bytes); + +bytes = rx_ring_entries * sizeof(*mgp-rx_big.shadow); +mgp-rx_big.shadow = kmalloc(bytes, GFP_KERNEL); +if (mgp-rx_big.shadow == NULL) +goto abort_with_rx_small_shadow; +memset(mgp-rx_big.shadow, 0, bytes); + +/* allocate the host info rings */ + +bytes = tx_ring_entries * sizeof(*mgp-tx.info); +mgp-tx.info = kmalloc(bytes, GFP_KERNEL); +if (mgp-tx.info == NULL) +goto abort_with_rx_big_shadow; +memset(mgp-tx.info, 0, bytes); + +bytes = rx_ring_entries * sizeof(*mgp-rx_small.info); +mgp-rx_small.info = kmalloc(bytes, GFP_KERNEL); +if (mgp-rx_small.info == NULL) +goto abort_with_tx_info; +memset(mgp-rx_small.info, 0, bytes); + +bytes = rx_ring_entries * sizeof(*mgp-rx_big.info); +mgp-rx_big.info = kmalloc(bytes, GFP_KERNEL); +if (mgp-rx_big.info == NULL) +goto abort_with_rx_small_info; +memset(mgp-rx_big.info, 0, bytes); + Can you do all these allocations at once? Maybe you can even move them into the size passed to alloc_etherdev. They are different things conceptually, so we prefer to allocate them separately. If you need separate allocations, using kzalloc simplifies your code. Right, will do. +/* Break the SKB or Fragment up into pieces which + * do not cross mgp-tx.boundary */ +low = MYRI10GE_LOWPART_TO_U32(bus); +high_swapped = htonl(MYRI10GE_HIGHPART_TO_U32(bus)); +while (len) { +u8 flags_next; +int cum_len_next; + +if (unlikely(count == max_segments)) +goto abort_linearize; + +boundary = (low + tx-boundary) ~(tx-boundary - 1); +seglen = boundary - low; +if (seglen len) +seglen = len; +flags_next = flags ~MYRI10GE_MCP_ETHER_FLAGS_FIRST; +cum_len_next = cum_len + seglen; +#ifdef NETIF_F_TSO +if (mss) { /* TSO */ +(req - rdma_count)-rdma_count = rdma_count + 1; + +if (likely(cum_len = 0)) { /* payload */ +int next_is_first, chop; + +chop = (cum_len_next mss); +cum_len_next = cum_len_next % mss; +next_is_first = (cum_len_next == 0); +flags |= chop * +MYRI10GE_MCP_ETHER_FLAGS_TSO_CHOP; +flags_next |= next_is_first * +MYRI10GE_MCP_ETHER_FLAGS_FIRST; +rdma_count |= -(chop | next_is_first); +rdma_count += chop !next_is_first; +} else if (likely(cum_len_next = 0)) { /* header ends */ +int small; + +rdma_count = -1; +cum_len_next = 0; +seglen = -cum_len; +small = +(mss = + MYRI10GE_MCP_ETHER_SEND_SMALL_SIZE); +flags_next = +MYRI10GE_MCP_ETHER_FLAGS_TSO_PLD | +MYRI10GE_MCP_ETHER_FLAGS_FIRST | +(small * + MYRI10GE_MCP_ETHER_FLAGS_SMALL); +} +} 100 characters per line are too much, as are six levels of intentation, or the number of lines in this function. You should try to split into into smaller ones. We have tried :) But there is no nice way to split this
Re: [PATCH 3/4] myri10ge - Driver core
We have tried :) But there is no nice way to split this code. So I guess we will have to keep it like this. You could shorten the #define names a bit (s/MYRI10GE_MCP_ETHER_/MCP_/) Then everything will fit better. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] myri10ge - Driver core
Roland Dreier wrote: Still some suspicious uses of volatile here. For example: +struct myri10ge_priv { ... +volatile u8 __iomem *sram; as far as I can see this is always used with proper __iomem accessors, often with casts to strip the volatile anyway. So why is volatile needed? I would suggest an audit of all uses of volatile in the driver, since volatile in drivers really should be read there's probably a bug here, and if not something very tricky is going on. If there are any valid uses of volatile then a comment should explain why, so that future reviewers don't have to try and puzzle out which of the two possible translations of volatile is correct. You are right, we audited the code and it looks like we don't need any volatile. Thanks, Brice - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] myri10ge - Driver core
Arnd Bergmann wrote: +for (sleep_total = 0; + sleep_total (15 * 1000) response-result == 0x; + sleep_total += 10) { +udelay(10); +} udelay does not sleep. If you want to sleep, use msleep instead. This place is actually the only one where we don't want to use msleep. This function (myri10ge_send_cmd) might be called from various context (spinlocked or not) and pass orders to the NIC whose processing time depends a lot on the command. Of course, we don't have any place where a long operation is passed from a spinlocked context :) But, we need the tiny udelay granularity for the spinlocked case, and the long loop for operations that are long to process in the NIC. Concerning all the other places where you suggested to use msleep, you were right. The __iomem variable need not be volatile. As Roland pointed out, there was too many volatile in this code. We are reworking this together with the sparse annotations. +printk(myri10ge: %s: %s IRQ %d, tx bndry %d, fw %s, WC %s\n, + netdev-name, (mgp-msi_enabled ? MSI : xPIC), + pdev-irq, mgp-tx.boundary, mgp-fw_name, + (mgp-mtrr = 0 ? Enabled : Disabled)); + missing printk level (KERN_DEBUG?). Could probably use dev_printk. When are we supposed to call dev_printk or not in such a driver ? +#define MYRI10GE_PCI_VENDOR_MYRICOM 0x14c1 +#define MYRI10GE_PCI_DEVICE_Z8E 0x0008 Shouldn't the vendor ID go to pci_ids.h? That's what I thought but i was told that the fashion these days is to keep the IDs with the driver that uses them. I'll happy to move as long as everybody agrees :) Thanks a lot for all your comments, Brice - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] myri10ge - Driver core
Am Friday 19 May 2006 01:56 schrieb Brice Goglin: This place is actually the only one where we don't want to use msleep. This function (myri10ge_send_cmd) might be called from various context (spinlocked or not) and pass orders to the NIC whose processing time depends a lot on the command. Of course, we don't have any place where a long operation is passed from a spinlocked context :) But, we need the tiny udelay granularity for the spinlocked case, and the long loop for operations that are long to process in the NIC. I don't see any spinlocks in your code and the function does not seem to be called from the interrupt handler or the softirq either. Maybe I'm missed something, but where is this ever called in an atomic context? missing printk level (KERN_DEBUG?). Could probably use dev_printk. When are we supposed to call dev_printk or not in such a driver ? Whenever you have a device associated with the message, it makes sense to use the dev_printk family of functions. +#define MYRI10GE_PCI_VENDOR_MYRICOM 0x14c1 +#define MYRI10GE_PCI_DEVICE_Z8E 0x0008 Shouldn't the vendor ID go to pci_ids.h? That's what I thought but i was told that the fashion these days is to keep the IDs with the driver that uses them. I'll happy to move as long as everybody agrees :) My understanding is that vendor IDs should go to the common file because they are likely to be used by multiple drivers whereas device IDs only need to be present in the one device driver for that particular device. Arnd - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] myri10ge - Driver core
Arnd Bergmann wrote: Am Friday 19 May 2006 01:56 schrieb Brice Goglin: This place is actually the only one where we don't want to use msleep. This function (myri10ge_send_cmd) might be called from various context (spinlocked or not) and pass orders to the NIC whose processing time depends a lot on the command. Of course, we don't have any place where a long operation is passed from a spinlocked context :) But, we need the tiny udelay granularity for the spinlocked case, and the long loop for operations that are long to process in the NIC. I don't see any spinlocks in your code and the function does not seem to be called from the interrupt handler or the softirq either. Maybe I'm missed something, but where is this ever called in an atomic context? dev_mc_upload() from net/core/dev_mcast.c does spin_lock_bh(dev-xmit_lock); __dev_mc_upload(dev); which calls dev-set_multicast_list(), which is myri10ge_set_multicast_list() which calls myri10ge_change_promisc which calls myri10ge_send_cmd Whenever you have a device associated with the message, it makes sense to use the dev_printk family of functions. Ok, thanks. My understanding is that vendor IDs should go to the common file because they are likely to be used by multiple drivers whereas device IDs only need to be present in the one device driver for that particular device. Make sense. I will change it. Thanks again, Brice - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] myri10ge - Driver core
Still some suspicious uses of volatile here. For example: +struct myri10ge_priv { ... + volatile u8 __iomem *sram; as far as I can see this is always used with proper __iomem accessors, often with casts to strip the volatile anyway. So why is volatile needed? I would suggest an audit of all uses of volatile in the driver, since volatile in drivers really should be read there's probably a bug here, and if not something very tricky is going on. If there are any valid uses of volatile then a comment should explain why, so that future reviewers don't have to try and puzzle out which of the two possible translations of volatile is correct. - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] myri10ge - Driver core
Am Thursday 18 May 2006 00:06 schrieb Brice Goglin: +static char *myri10ge_fw_name = NULL; +static char *myri10ge_fw_unaligned = myri10ge_ethp_z8e.dat; +static char *myri10ge_fw_aligned = myri10ge_eth_z8e.dat; +static int myri10ge_ecrc_enable = 1; +static int myri10ge_max_intr_slots = 1024; +static int myri10ge_small_bytes = -1;/* -1 == auto */ +static int myri10ge_msi = 1; /* enable msi by default */ +static int myri10ge_intr_coal_delay = 25; +static int myri10ge_flow_control = 1; +static int myri10ge_deassert_wait = 1; +static int myri10ge_force_firmware = 0; +static int myri10ge_skb_cross_4k = 0; +static int myri10ge_initial_mtu = MYRI10GE_MAX_ETHER_MTU - ETH_HLEN; +static int myri10ge_napi_weight = 64; +static int myri10ge_watchdog_timeout = 1; +static int myri10ge_max_irq_loops = 1048576; + +module_param(myri10ge_fw_name, charp, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(myri10ge_fw_name, Firmware image name\n); +module_param(myri10ge_max_intr_slots, int, S_IRUGO); +MODULE_PARM_DESC(myri10ge_max_intr_slots, Interrupt queue slots\n); +module_param(myri10ge_small_bytes, int, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(myri10ge_small_bytes, Threshold of small packets\n); +module_param(myri10ge_msi, int, S_IRUGO); +MODULE_PARM_DESC(myri10ge_msi, Enable Message Signalled Interrupts\n); +module_param(myri10ge_intr_coal_delay, int, S_IRUGO); +MODULE_PARM_DESC(myri10ge_intr_coal_delay, Interrupt coalescing delay\n); +module_param(myri10ge_ecrc_enable, int, S_IRUGO); +MODULE_PARM_DESC(myri10ge_ecrc_enable, Enable Extended CRC on PCI-E\n); +module_param(myri10ge_flow_control, int, S_IRUGO); +MODULE_PARM_DESC(myri10ge_flow_control, Pause parameter\n); +module_param(myri10ge_deassert_wait, int, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(myri10ge_deassert_wait, + Wait when deasserting legacy interrupts\n); +module_param(myri10ge_force_firmware, int, S_IRUGO); +MODULE_PARM_DESC(myri10ge_force_firmware, + Force firmware to assume aligned completions\n); +module_param(myri10ge_skb_cross_4k, int, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(myri10ge_skb_cross_4k, + Can a small skb cross a 4KB boundary?\n); +module_param(myri10ge_initial_mtu, int, S_IRUGO); +MODULE_PARM_DESC(myri10ge_initial_mtu, Initial MTU\n); +module_param(myri10ge_napi_weight, int, S_IRUGO); +MODULE_PARM_DESC(myri10ge_napi_weight, Set NAPI weight\n); +module_param(myri10ge_watchdog_timeout, int, S_IRUGO); +MODULE_PARM_DESC(myri10ge_watchdog_timeout, Set watchdog timeout\n); +module_param(myri10ge_max_irq_loops, int, S_IRUGO); +MODULE_PARM_DESC(myri10ge_max_irq_loops, + Set stuck legacy IRQ detection threshold\n); How about writing the module_param() and MODULE_PARM_DESC() calls directly after each declaration? That would make it clearer that they are all parameters. + response-result = 0x; 0x appears throughout your code as a return value. maybe use a named constant for it? + for (sleep_total = 0; + sleep_total (15 * 1000) response-result == 0x; + sleep_total += 10) { + udelay(10); + } udelay does not sleep. If you want to sleep, use msleep instead. + + myri10ge_pio_copy((void __iomem *)submit, buf, sizeof(buf)); + for (i = 0; *confirm != 0x i 20; i++) + udelay(1000); + if (*confirm != 0x) { + dev_err(mgp-pdev-dev, dummy rdma %s failed\n, + (enable ? enable : disable)); + } +} Can you use msleep here instead of udelay? +static int myri10ge_load_firmware(struct myri10ge_priv *mgp) +{ + volatile u32 *confirm; + volatile char __iomem *submit; The __iomem variable need not be volatile. + myri10ge_pio_copy((void __iomem *)submit, buf, sizeof(buf)); + mb(); + udelay(1000); + mb(); can't you use msleep(1) instead? +static inline void +myri10ge_submit_8rx(struct mcp_kreq_ether_recv __iomem * dst, + struct mcp_kreq_ether_recv *src) +{ + u32 low; + + low = src-addr_low; + src-addr_low = DMA_32BIT_MASK; + myri10ge_pio_copy(dst, src, 8 * sizeof(*src)); + mb(); + src-addr_low = low; + *(u32 __force *) dst-addr_low = src-addr_low; + mb(); +} The __force dereference seems fishy. + if (unlikely(((end 12) != (data 12)) (data 4095UL))) { + printk + (myri10ge_alloc_small: small skb crossed 4KB boundary\n); Printk level is missing. +static int myri10ge_open(struct net_device *dev) This function is too long to read easily. + /* allocate the host shadow rings */ + + bytes = 8 + (MYRI10GE_MCP_ETHER_MAX_SEND_DESC_TSO + 4) + * sizeof(*mgp-tx.req_list); + mgp-tx.req_bytes = kmalloc(bytes, GFP_KERNEL); + if (mgp-tx.req_bytes == NULL) + goto abort_with_nothing; + memset(mgp-tx.req_bytes, 0, bytes); + + /* ensure req_list