Re: [PATCH 3/4] myri10ge - Driver core

2006-05-26 Thread Ingo Oeser
Hi there,

Benjamin Herrenschmidt wrote:
 On Wed, 2006-05-24 at 01:39 +1000, Anton Blanchard wrote:
 
   +#ifdef CONFIG_MTRR
   + mgp-mtrr = mtrr_add(mgp-iomem_base, mgp-board_span,
   +  MTRR_TYPE_WRCOMB, 1);
   +#endif
  ...
   + mgp-sram = ioremap(mgp-iomem_base, mgp-board_span);
  
  Not sure how we are meant to specify write through in drivers. Any ideas 
  Ben?
 
 No proper interface exposed, he'll have to do an #ifdef powerpc here or
 such and use __ioremap with explicit page attributes. I have a hack to
 do that automatically for memory covered by prefetchable PCI BARs when
 mmap'ing from userland but not for kernel ioremap.

Stupid question: pci_iomap() is NOT what you are looking for, right?

Implementation is at the end of lib/iomap.c


Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] myri10ge - Driver core

2006-05-26 Thread Benjamin Herrenschmidt

  No proper interface exposed, he'll have to do an #ifdef powerpc here or
  such and use __ioremap with explicit page attributes. I have a hack to
  do that automatically for memory covered by prefetchable PCI BARs when
  mmap'ing from userland but not for kernel ioremap.
 
 Stupid question: pci_iomap() is NOT what you are looking for, right?
 
 Implementation is at the end of lib/iomap.c

No, there is no difference there, pci_iomap won't help for passing in
platform specific page attributes for things like write combining.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] myri10ge - Driver core

2006-05-26 Thread Jeff Garzik

Benjamin Herrenschmidt wrote:

No proper interface exposed, he'll have to do an #ifdef powerpc here or
such and use __ioremap with explicit page attributes. I have a hack to
do that automatically for memory covered by prefetchable PCI BARs when
mmap'ing from userland but not for kernel ioremap.

Stupid question: pci_iomap() is NOT what you are looking for, right?

Implementation is at the end of lib/iomap.c


No, there is no difference there, pci_iomap won't help for passing in
platform specific page attributes for things like write combining.


Since we already have ioremap_nocache(), maybe adding ioremap_wcomb() is 
appropriate?


Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] myri10ge - Driver core

2006-05-26 Thread Benjamin Herrenschmidt
On Fri, 2006-05-26 at 06:30 -0400, Jeff Garzik wrote:
 Benjamin Herrenschmidt wrote:
  No proper interface exposed, he'll have to do an #ifdef powerpc here or
  such and use __ioremap with explicit page attributes. I have a hack to
  do that automatically for memory covered by prefetchable PCI BARs when
  mmap'ing from userland but not for kernel ioremap.
  Stupid question: pci_iomap() is NOT what you are looking for, right?
 
  Implementation is at the end of lib/iomap.c
  
  No, there is no difference there, pci_iomap won't help for passing in
  platform specific page attributes for things like write combining.
 
 Since we already have ioremap_nocache(), maybe adding ioremap_wcomb() is 
 appropriate?

Yes, but be careful there.. on ppc, not setting the page guarded bit
will enable write combining on various CPUs but will also enable
speculative reads, prefetch, etc... so it's not just WC, you have to
know for sure what you are doing with that memory (basically, enable
side-effects).

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] myri10ge - Driver core

2006-05-25 Thread Benjamin Herrenschmidt
On Wed, 2006-05-24 at 01:39 +1000, Anton Blanchard wrote:

  +#ifdef CONFIG_MTRR
  +   mgp-mtrr = mtrr_add(mgp-iomem_base, mgp-board_span,
  +MTRR_TYPE_WRCOMB, 1);
  +#endif
 ...
  +   mgp-sram = ioremap(mgp-iomem_base, mgp-board_span);
 
 Not sure how we are meant to specify write through in drivers. Any ideas Ben?

No proper interface exposed, he'll have to do an #ifdef powerpc here or
such and use __ioremap with explicit page attributes. I have a hack to
do that automatically for memory covered by prefetchable PCI BARs when
mmap'ing from userland but not for kernel ioremap.

Ben.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] myri10ge - Driver core

2006-05-25 Thread Benjamin Herrenschmidt
On Wed, 2006-05-24 at 10:04 +0200, Brice Goglin wrote:

 I am not sure what you mean.
 The only ppc64 with PCI-E that we have seen so far (a G5) couldn't do
 write combining according to Apple.

That is not 100% true I don't know what apple had in mind. It also
depends in what slot you are.

Do you have ways to measure the difference ?

Try doing __ioremap(mgp-iomem_base, mgp-board_span, _PAGE_NO_CACHE);
instead of using the normal ioremap for #ifdef powerpc and tell us if it
makes a difference.

Ben.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] myri10ge - Driver core

2006-05-25 Thread Brice Goglin
Benjamin Herrenschmidt wrote:
 On Wed, 2006-05-24 at 10:04 +0200, Brice Goglin wrote:

   
 I am not sure what you mean.
 The only ppc64 with PCI-E that we have seen so far (a G5) couldn't do
 write combining according to Apple.
 

 That is not 100% true I don't know what apple had in mind. It also
 depends in what slot you are.

 Do you have ways to measure the difference ?
   

No, we don't have any PPC with PCIe running Linux. The only G5 PCIe that
we have is running MacOSX.

 Try doing __ioremap(mgp-iomem_base, mgp-board_span, _PAGE_NO_CACHE);
 instead of using the normal ioremap for #ifdef powerpc and tell us if it
 makes a difference.
   

I'll try it as soon as we get our G5 PCIe to run Linux.

thanks,
Brice

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] myri10ge - Driver core

2006-05-24 Thread Brice Goglin
Anton Blanchard wrote:
 +/*
 + * Set of routunes to get a new receive buffer.  Any buffer which
 + * crosses a 4KB boundary must start on a 4KB boundary due to PCIe
 + * wdma restrictions. We also try to align any smaller allocation to
 + * at least a 16 byte boundary for efficiency.  We assume the linux
 + * memory allocator works by powers of 2, and will not return memory
 + * smaller than 2KB which crosses a 4KB boundary.  If it does, we fall
 + * back to allocating 2x as much space as required.
 + *
 + * We intend to replace large (4KB) skb allocations by using
 + * pages directly and building a fraglist in the near future.
 + */
 

 You go to a lot of trouble to align things. One thing on ppc64 is that
 we really want to start all DMA writes on a cacheline boundary. We
 enforce that in network drivers by making NET_IP_ALIGN = 0 and having
 the drivers do:

 skb_reserve(skb, NET_IP_ALIGN);

 It sounds like your small allocations will be only aligned to 16 bytes.
   

We didn't get any ppc64 with PCI-E to run Linux so far. What performance
drop should we expect with our current code ?

 Id suggest using the dma API instead of the pci API. We have seen
 machines in the field that have failed large pci_alloc_consistent calls
 because it always asks for GFP_ATOMIC memory (it presumes the worst).
 The dma API allows you to pass a GFP_ flag in which will have a much
 better chance of succeeding when you dont need GFP_ATOMIC memory.
   

Good idea, thanks,

 +#ifdef CONFIG_MTRR
 +mgp-mtrr = mtrr_add(mgp-iomem_base, mgp-board_span,
 + MTRR_TYPE_WRCOMB, 1);
 +#endif
 
 ...
   
 +mgp-sram = ioremap(mgp-iomem_base, mgp-board_span);
 

 Not sure how we are meant to specify write through in drivers. Any ideas Ben?
   

I am not sure what you mean.
The only ppc64 with PCI-E that we have seen so far (a G5) couldn't do
write combining according to Apple.

Thanks,
Brice

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] myri10ge - Driver core

2006-05-24 Thread Anton Blanchard
 
Hi,

 We didn't get any ppc64 with PCI-E to run Linux so far. What performance
 drop should we expect with our current code ?

We have seen  20% improvement on ppc64 running some networking
workloads when forcing 128 byte alignment (instead of 16 byte
alignment). DMA writes have to get cacheline aligned (in power of 2
steps) on some IO chips.

 I am not sure what you mean.
 The only ppc64 with PCI-E that we have seen so far (a G5) couldn't do
 write combining according to Apple.

Im thinking more generally, MTRRs are x86 specific and it would be good
to have a more generic way to enable write combining.

Anton
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] myri10ge - Driver core

2006-05-19 Thread Andi Kleen
On Friday 19 May 2006 12:00, Arnd Bergmann wrote:
 On Friday 19 May 2006 04:25, Brice Goglin wrote:
  dev_mc_upload() from net/core/dev_mcast.c does
  
  spin_lock_bh(dev-xmit_lock);
  __dev_mc_upload(dev);
  
  which calls dev-set_multicast_list(), which is
  myri10ge_set_multicast_list()
  
  which calls myri10ge_change_promisc
  
  which calls myri10ge_send_cmd
 
 Hmm, if that is the only path where you call it, it may be
 helpful to change myri10ge_change_promisc to call a special
 atomic version of myri10ge_send_cmd then, while all others
 use the regular version, which you can then convert to
 do msleep as well.

Or just drop the xmit lock while you do that. As long as you
handle races with -start_xmit yourself it's ok

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] myri10ge - Driver core

2006-05-19 Thread Brice Goglin

 dev_err?
   
[...]

 Could probably use dev_printk.
   

When the interface name is known, we prefer having the message look like
myri10ge: eth2: something since it might be easier to read than
myri10ge: 5:000e: something. The administrator usually knows the
name of the network interface better than its bus ID.

Brice

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] myri10ge - Driver core

2006-05-19 Thread Brice Goglin
Andi Kleen wrote:
 On Friday 19 May 2006 12:00, Arnd Bergmann wrote:
   
 On Friday 19 May 2006 04:25, Brice Goglin wrote:
 
 dev_mc_upload() from net/core/dev_mcast.c does

 spin_lock_bh(dev-xmit_lock);
 __dev_mc_upload(dev);

 which calls dev-set_multicast_list(), which is
 myri10ge_set_multicast_list()

 which calls myri10ge_change_promisc

 which calls myri10ge_send_cmd
   
 Hmm, if that is the only path where you call it, it may be
 helpful to change myri10ge_change_promisc to call a special
 atomic version of myri10ge_send_cmd then, while all others
 use the regular version, which you can then convert to
 do msleep as well.
 

 Or just drop the xmit lock while you do that. As long as you
 handle races with -start_xmit yourself it's ok
   

Looks like we can do that.
Thanks,
Brice

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] myri10ge - Driver core

2006-05-19 Thread Brice Goglin
Arnd Bergmann wrote:
 +static int myri10ge_open(struct net_device *dev)
 

 This function is too long to read easily.
   

Ok we have split it a little bit.

 +/* allocate the host shadow rings */
 +
 +bytes = 8 + (MYRI10GE_MCP_ETHER_MAX_SEND_DESC_TSO + 4)
 +* sizeof(*mgp-tx.req_list);
 +mgp-tx.req_bytes = kmalloc(bytes, GFP_KERNEL);
 +if (mgp-tx.req_bytes == NULL)
 +goto abort_with_nothing;
 +memset(mgp-tx.req_bytes, 0, bytes);
 +
 +/* ensure req_list entries are aligned to 8 bytes */
 +mgp-tx.req_list = (struct mcp_kreq_ether_send *)
 +ALIGN((unsigned long)mgp-tx.req_bytes, 8);
 +
 +bytes = rx_ring_entries * sizeof(*mgp-rx_small.shadow);
 +mgp-rx_small.shadow = kmalloc(bytes, GFP_KERNEL);
 +if (mgp-rx_small.shadow == NULL)
 +goto abort_with_tx_req_bytes;
 +memset(mgp-rx_small.shadow, 0, bytes);
 +
 +bytes = rx_ring_entries * sizeof(*mgp-rx_big.shadow);
 +mgp-rx_big.shadow = kmalloc(bytes, GFP_KERNEL);
 +if (mgp-rx_big.shadow == NULL)
 +goto abort_with_rx_small_shadow;
 +memset(mgp-rx_big.shadow, 0, bytes);
 +
 +/* allocate the host info rings */
 +
 +bytes = tx_ring_entries * sizeof(*mgp-tx.info);
 +mgp-tx.info = kmalloc(bytes, GFP_KERNEL);
 +if (mgp-tx.info == NULL)
 +goto abort_with_rx_big_shadow;
 +memset(mgp-tx.info, 0, bytes);
 +
 +bytes = rx_ring_entries * sizeof(*mgp-rx_small.info);
 +mgp-rx_small.info = kmalloc(bytes, GFP_KERNEL);
 +if (mgp-rx_small.info == NULL)
 +goto abort_with_tx_info;
 +memset(mgp-rx_small.info, 0, bytes);
 +
 +bytes = rx_ring_entries * sizeof(*mgp-rx_big.info);
 +mgp-rx_big.info = kmalloc(bytes, GFP_KERNEL);
 +if (mgp-rx_big.info == NULL)
 +goto abort_with_rx_small_info;
 +memset(mgp-rx_big.info, 0, bytes);
 +
 

 Can you do all these allocations at once? Maybe you can even
 move them into the size passed to alloc_etherdev.
   

They are different things conceptually, so we prefer to allocate
them separately.

 If you need separate allocations, using kzalloc simplifies
 your code.
   

Right, will do.

 +/* Break the SKB or Fragment up into pieces which
 + * do not cross mgp-tx.boundary */
 +low = MYRI10GE_LOWPART_TO_U32(bus);
 +high_swapped = htonl(MYRI10GE_HIGHPART_TO_U32(bus));
 +while (len) {
 +u8 flags_next;
 +int cum_len_next;
 +
 +if (unlikely(count == max_segments))
 +goto abort_linearize;
 +
 +boundary = (low + tx-boundary)  ~(tx-boundary - 1);
 +seglen = boundary - low;
 +if (seglen  len)
 +seglen = len;
 +flags_next = flags  ~MYRI10GE_MCP_ETHER_FLAGS_FIRST;
 +cum_len_next = cum_len + seglen;
 +#ifdef NETIF_F_TSO
 +if (mss) {  /* TSO */
 +(req - rdma_count)-rdma_count = rdma_count + 1;
 +
 +if (likely(cum_len = 0)) { /* payload */
 +int next_is_first, chop;
 +
 +chop = (cum_len_next  mss);
 +cum_len_next = cum_len_next % mss;
 +next_is_first = (cum_len_next == 0);
 +flags |= chop *
 +MYRI10GE_MCP_ETHER_FLAGS_TSO_CHOP;
 +flags_next |= next_is_first *
 +MYRI10GE_MCP_ETHER_FLAGS_FIRST;
 +rdma_count |= -(chop | next_is_first);
 +rdma_count += chop  !next_is_first;
 +} else if (likely(cum_len_next = 0)) { /* 
 header ends */
 +int small;
 +
 +rdma_count = -1;
 +cum_len_next = 0;
 +seglen = -cum_len;
 +small =
 +(mss =
 + 
 MYRI10GE_MCP_ETHER_SEND_SMALL_SIZE);
 +flags_next =
 +MYRI10GE_MCP_ETHER_FLAGS_TSO_PLD |
 +MYRI10GE_MCP_ETHER_FLAGS_FIRST |
 +(small *
 + MYRI10GE_MCP_ETHER_FLAGS_SMALL);
 +}
 +}
 

 100 characters per line are too much, as are six levels of intentation,
 or the number of lines in this function.
 You should try to split into into smaller ones.
   

We have tried :) But there is no nice way to split this 

Re: [PATCH 3/4] myri10ge - Driver core

2006-05-19 Thread Andi Kleen

 We have tried :) But there is no nice way to split this code. So I guess
 we will have to keep it like this.

You could shorten the #define names a bit (s/MYRI10GE_MCP_ETHER_/MCP_/)  Then 
everything 
will fit better.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] myri10ge - Driver core

2006-05-18 Thread Brice Goglin
Roland Dreier wrote:
 Still some suspicious uses of volatile here.

 For example:

   
 +struct myri10ge_priv {
 
  ...
   
 +volatile u8 __iomem *sram;
 

 as far as I can see this is always used with proper __iomem accessors,
 often with casts to strip the volatile anyway.  So why is volatile needed?

 I would suggest an audit of all uses of volatile in the driver, since
 volatile in drivers really should be read there's probably a bug
 here, and if not something very tricky is going on.  If there are any
 valid uses of volatile then a comment should explain why, so that
 future reviewers don't have to try and puzzle out which of the
 two possible translations of volatile is correct.
   

You are right, we audited the code and it looks like we don't need any
volatile.

Thanks,
Brice

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] myri10ge - Driver core

2006-05-18 Thread Brice Goglin
Arnd Bergmann wrote:
 +for (sleep_total = 0;
 + sleep_total  (15 * 1000)  response-result == 0x;
 + sleep_total += 10) {
 +udelay(10);
 +}
 

 udelay does not sleep. If you want to sleep, use msleep instead.
   

This place is actually the only one where we don't want to use msleep.
This function (myri10ge_send_cmd) might be called from various context
(spinlocked or not) and pass orders to the NIC whose processing time
depends a lot on the command. Of course, we don't have any place where a
long operation is passed from a spinlocked context :) But, we need the
tiny udelay granularity for the spinlocked case, and the long loop for
operations that are long to process in the NIC.

Concerning all the other places where you suggested to use msleep, you
were right.

 The __iomem variable need not be volatile.

As Roland pointed out, there was too many volatile in this code. We are
reworking this together with the sparse annotations.

 +printk(myri10ge: %s: %s IRQ %d, tx bndry %d, fw %s, WC %s\n,
 +   netdev-name, (mgp-msi_enabled ? MSI : xPIC),
 +   pdev-irq, mgp-tx.boundary, mgp-fw_name,
 +   (mgp-mtrr = 0 ? Enabled : Disabled));
 +
 

 missing printk level (KERN_DEBUG?). Could probably use dev_printk.
   

When are we supposed to call dev_printk or not in such a driver ?

 +#define MYRI10GE_PCI_VENDOR_MYRICOM 0x14c1
 +#define MYRI10GE_PCI_DEVICE_Z8E 0x0008
 

 Shouldn't the vendor ID go to pci_ids.h?
   

That's what I thought but i was told that the fashion these days is to
keep the IDs with the driver that uses them. I'll happy to move as long
as everybody agrees :)


Thanks a lot for all your comments,
Brice

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] myri10ge - Driver core

2006-05-18 Thread Arnd Bergmann
Am Friday 19 May 2006 01:56 schrieb Brice Goglin:
 This place is actually the only one where we don't want to use msleep.
 This function (myri10ge_send_cmd) might be called from various context
 (spinlocked or not) and pass orders to the NIC whose processing time
 depends a lot on the command. Of course, we don't have any place where a
 long operation is passed from a spinlocked context :) But, we need the
 tiny udelay granularity for the spinlocked case, and the long loop for
 operations that are long to process in the NIC.

I don't see any spinlocks in your code and the function does not
seem to be called from the interrupt handler or the softirq either.
Maybe I'm missed something, but where is this ever called in an
atomic context?

  missing printk level (KERN_DEBUG?). Could probably use dev_printk.

 When are we supposed to call dev_printk or not in such a driver ?

Whenever you have a device associated with the message, it makes
sense to use the dev_printk family of functions.

  +#define MYRI10GE_PCI_VENDOR_MYRICOM   0x14c1
  +#define MYRI10GE_PCI_DEVICE_Z8E   0x0008
 
  Shouldn't the vendor ID go to pci_ids.h?

 That's what I thought but i was told that the fashion these days is to
 keep the IDs with the driver that uses them. I'll happy to move as long
 as everybody agrees :)

My understanding is that vendor IDs should go to the common file
because they are likely to be used by multiple drivers whereas
device IDs only need to be present in the one device driver for
that particular device.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] myri10ge - Driver core

2006-05-18 Thread Brice Goglin
Arnd Bergmann wrote:
 Am Friday 19 May 2006 01:56 schrieb Brice Goglin:
   
 This place is actually the only one where we don't want to use msleep.
 This function (myri10ge_send_cmd) might be called from various context
 (spinlocked or not) and pass orders to the NIC whose processing time
 depends a lot on the command. Of course, we don't have any place where a
 long operation is passed from a spinlocked context :) But, we need the
 tiny udelay granularity for the spinlocked case, and the long loop for
 operations that are long to process in the NIC.
 

 I don't see any spinlocks in your code and the function does not
 seem to be called from the interrupt handler or the softirq either.
 Maybe I'm missed something, but where is this ever called in an
 atomic context?
   

dev_mc_upload() from net/core/dev_mcast.c does

spin_lock_bh(dev-xmit_lock);
__dev_mc_upload(dev);

which calls dev-set_multicast_list(), which is
myri10ge_set_multicast_list()

which calls myri10ge_change_promisc

which calls myri10ge_send_cmd



 Whenever you have a device associated with the message, it makes
 sense to use the dev_printk family of functions.
   

Ok, thanks.

 My understanding is that vendor IDs should go to the common file
 because they are likely to be used by multiple drivers whereas
 device IDs only need to be present in the one device driver for
 that particular device.
   

Make sense. I will change it.

Thanks again,
Brice

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] myri10ge - Driver core

2006-05-17 Thread Roland Dreier
Still some suspicious uses of volatile here.

For example:

 +struct myri10ge_priv {
 ...
 + volatile u8 __iomem *sram;

as far as I can see this is always used with proper __iomem accessors,
often with casts to strip the volatile anyway.  So why is volatile needed?

I would suggest an audit of all uses of volatile in the driver, since
volatile in drivers really should be read there's probably a bug
here, and if not something very tricky is going on.  If there are any
valid uses of volatile then a comment should explain why, so that
future reviewers don't have to try and puzzle out which of the
two possible translations of volatile is correct.

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] myri10ge - Driver core

2006-05-17 Thread Arnd Bergmann
Am Thursday 18 May 2006 00:06 schrieb Brice Goglin:

 +static char *myri10ge_fw_name = NULL;
 +static char *myri10ge_fw_unaligned = myri10ge_ethp_z8e.dat;
 +static char *myri10ge_fw_aligned = myri10ge_eth_z8e.dat;
 +static int myri10ge_ecrc_enable = 1;
 +static int myri10ge_max_intr_slots = 1024;
 +static int myri10ge_small_bytes = -1;/* -1 == auto */
 +static int myri10ge_msi = 1; /* enable msi by default */
 +static int myri10ge_intr_coal_delay = 25;
 +static int myri10ge_flow_control = 1;
 +static int myri10ge_deassert_wait = 1;
 +static int myri10ge_force_firmware = 0;
 +static int myri10ge_skb_cross_4k = 0;
 +static int myri10ge_initial_mtu = MYRI10GE_MAX_ETHER_MTU - ETH_HLEN;
 +static int myri10ge_napi_weight = 64;
 +static int myri10ge_watchdog_timeout = 1;
 +static int myri10ge_max_irq_loops = 1048576;
 +
 +module_param(myri10ge_fw_name, charp, S_IRUGO | S_IWUSR);
 +MODULE_PARM_DESC(myri10ge_fw_name, Firmware image name\n);
 +module_param(myri10ge_max_intr_slots, int, S_IRUGO);
 +MODULE_PARM_DESC(myri10ge_max_intr_slots, Interrupt queue slots\n);
 +module_param(myri10ge_small_bytes, int, S_IRUGO | S_IWUSR);
 +MODULE_PARM_DESC(myri10ge_small_bytes, Threshold of small packets\n);
 +module_param(myri10ge_msi, int, S_IRUGO);
 +MODULE_PARM_DESC(myri10ge_msi, Enable Message Signalled Interrupts\n);
 +module_param(myri10ge_intr_coal_delay, int, S_IRUGO);
 +MODULE_PARM_DESC(myri10ge_intr_coal_delay, Interrupt coalescing
 delay\n); +module_param(myri10ge_ecrc_enable, int, S_IRUGO);
 +MODULE_PARM_DESC(myri10ge_ecrc_enable, Enable Extended CRC on PCI-E\n);
 +module_param(myri10ge_flow_control, int, S_IRUGO);
 +MODULE_PARM_DESC(myri10ge_flow_control, Pause parameter\n);
 +module_param(myri10ge_deassert_wait, int, S_IRUGO | S_IWUSR);
 +MODULE_PARM_DESC(myri10ge_deassert_wait,
 +  Wait when deasserting legacy interrupts\n);
 +module_param(myri10ge_force_firmware, int, S_IRUGO);
 +MODULE_PARM_DESC(myri10ge_force_firmware,
 +  Force firmware to assume aligned completions\n);
 +module_param(myri10ge_skb_cross_4k, int, S_IRUGO | S_IWUSR);
 +MODULE_PARM_DESC(myri10ge_skb_cross_4k,
 +  Can a small skb cross a 4KB boundary?\n);
 +module_param(myri10ge_initial_mtu, int, S_IRUGO);
 +MODULE_PARM_DESC(myri10ge_initial_mtu, Initial MTU\n);
 +module_param(myri10ge_napi_weight, int, S_IRUGO);
 +MODULE_PARM_DESC(myri10ge_napi_weight, Set NAPI weight\n);
 +module_param(myri10ge_watchdog_timeout, int, S_IRUGO);
 +MODULE_PARM_DESC(myri10ge_watchdog_timeout, Set watchdog timeout\n);
 +module_param(myri10ge_max_irq_loops, int, S_IRUGO);
 +MODULE_PARM_DESC(myri10ge_max_irq_loops,
 +  Set stuck legacy IRQ detection threshold\n);

How about writing the module_param() and MODULE_PARM_DESC() calls
directly after each declaration? That would make it clearer
that they are all parameters.

 + response-result = 0x;

0x appears throughout your code as a return value. maybe
use a named constant for it?

 + for (sleep_total = 0;
 +  sleep_total  (15 * 1000)  response-result == 0x;
 +  sleep_total += 10) {
 + udelay(10);
 + }

udelay does not sleep. If you want to sleep, use msleep instead.

 +
 + myri10ge_pio_copy((void __iomem *)submit, buf, sizeof(buf));
 + for (i = 0; *confirm != 0x  i  20; i++)
 + udelay(1000);
 + if (*confirm != 0x) {
 + dev_err(mgp-pdev-dev, dummy rdma %s failed\n,
 + (enable ? enable : disable));
 + }
 +}

Can you use msleep here instead of udelay?

 +static int myri10ge_load_firmware(struct myri10ge_priv *mgp)
 +{
 + volatile u32 *confirm;
 + volatile char __iomem *submit;

The __iomem variable need not be volatile.

 + myri10ge_pio_copy((void __iomem *)submit, buf, sizeof(buf));
 + mb();
 + udelay(1000);
 + mb();

can't you use msleep(1) instead? 

 +static inline void
 +myri10ge_submit_8rx(struct mcp_kreq_ether_recv __iomem * dst,
 + struct mcp_kreq_ether_recv *src)
 +{
 + u32 low;
 +
 + low = src-addr_low;
 + src-addr_low = DMA_32BIT_MASK;
 + myri10ge_pio_copy(dst, src, 8 * sizeof(*src));
 + mb();
 + src-addr_low = low;
 + *(u32 __force *)  dst-addr_low = src-addr_low;
 + mb();
 +}

The __force dereference seems fishy.

 + if (unlikely(((end  12) != (data  12))  (data  4095UL))) {
 + printk
 + (myri10ge_alloc_small: small skb crossed 4KB boundary\n);

Printk level is missing.


 +static int myri10ge_open(struct net_device *dev)

This function is too long to read easily.

 + /* allocate the host shadow rings */
 +
 + bytes = 8 + (MYRI10GE_MCP_ETHER_MAX_SEND_DESC_TSO + 4)
 + * sizeof(*mgp-tx.req_list);
 + mgp-tx.req_bytes = kmalloc(bytes, GFP_KERNEL);
 + if (mgp-tx.req_bytes == NULL)
 + goto abort_with_nothing;
 + memset(mgp-tx.req_bytes, 0, bytes);
 +
 + /* ensure req_list