Re: [PATCH 4/6] myri10ge - First half of the driver

2006-05-15 Thread Lee Revell
On Fri, 2006-05-12 at 01:53 +0200, Brice Goglin wrote:
 Francois Romieu wrote:
 
  +  spin_lock(mgp-cmd_lock);
  +  response-result = 0x;
  +  mb();
  +  myri10ge_pio_copy((void __iomem *) cmd_addr, buf, sizeof (*buf));
  +
  +  /* wait up to 2 seconds */
  
 
  You must not hold a spinlock for up to 2 seconds.

 
 We are working on reducing the delay to about 15ms. It only occurs when
 the driver is loaded or the link brought up.

I think 15ms is quite a long time to hold a spinlock also - most
spinlocks in the kernel are held for less than 500 microseconds.

Can't you use a mutex?

Lee

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] myri10ge - First half of the driver

2006-05-15 Thread Brice Goglin
Lee Revell wrote:
 On Fri, 2006-05-12 at 01:53 +0200, Brice Goglin wrote:
   
 Francois Romieu wrote:
 
 +  spin_lock(mgp-cmd_lock);
 +  response-result = 0x;
 +  mb();
 +  myri10ge_pio_copy((void __iomem *) cmd_addr, buf, sizeof (*buf));
 +
 +  /* wait up to 2 seconds */
 
 
 You must not hold a spinlock for up to 2 seconds.
   
   
 We are working on reducing the delay to about 15ms. It only occurs when
 the driver is loaded or the link brought up.
 

 I think 15ms is quite a long time to hold a spinlock also - most
 spinlocks in the kernel are held for less than 500 microseconds.

 Can't you use a mutex?
   

It looks like rtnl_lock protects us here. We are working on it.

Brice

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] myri10ge - First half of the driver

2006-05-13 Thread Francois Romieu
Brice Goglin [EMAIL PROTECTED] :
[...]
  Return in a middle of a spinlock-intensive function. :o(

 
 What do you mean ?

It is preferred for maintenance purpose (hello Mr Morton) to organize
the control flow with a single spin_{lock/unlock} pair: if there is a
branch in the control flow, it ought to be joined again before returning.

-- 
Ueimor
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] myri10ge - First half of the driver

2006-05-12 Thread Evgeniy Polyakov
On Fri, May 12, 2006 at 01:53:44AM +0200, Brice Goglin ([EMAIL PROTECTED]) 
wrote:
  Imho you will want to work directly with pages shortly.

 
 We had thought about doing this, but were a little nervous since we did
 not know of any other drivers that worked directly with pages.  If this
 is an official direction to work directly with pages, we will. 

s2io does. e1000 does it with skb frags.
If your hardware allows header split and driver can put headers into
skb-data and real data into frag_list, that allows to create various
interesting things like receiving zero-copy support and netchannels
support. It is work in progress, not official direction currently,
but this definitely will help your driver to support future high 
performance extensions.

 Brice

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] myri10ge - First half of the driver

2006-05-12 Thread David S. Miller
From: Evgeniy Polyakov [EMAIL PROTECTED]
Date: Fri, 12 May 2006 10:47:11 +0400

 On Fri, May 12, 2006 at 01:53:44AM +0200, Brice Goglin ([EMAIL PROTECTED]) 
 wrote:
   Imho you will want to work directly with pages shortly.
 
  
  We had thought about doing this, but were a little nervous since we did
  not know of any other drivers that worked directly with pages.  If this
  is an official direction to work directly with pages, we will. 
 
 s2io does. e1000 does it with skb frags.
 If your hardware allows header split and driver can put headers into
 skb-data and real data into frag_list, that allows to create various
 interesting things like receiving zero-copy support and netchannels
 support. It is work in progress, not official direction currently,
 but this definitely will help your driver to support future high 
 performance extensions.

The most important impact is not having to use order 1 pages
for jumbo MTU frames, which are more likely to fail allocations
thant order 0 pages under heavy load.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] myri10ge - First half of the driver

2006-05-11 Thread Brice Goglin
Roland Dreier wrote:
   +#define myri10ge_pio_copy(to,from,size) __iowrite64_copy(to,from,size/8)

 Why do you need this wrapper?  Why not just call __iowrite64_copy()
 without the obfuscation?  Anyone reading the code will just have to
 search back to this define and mentally translate the size back and
 forth all the time.
   

Well, I know that abstraction layer is bad. But in this case I really
think that a name like myri10ge_pio_copy(size) is way less obfuscating
than __iowrite64_copy(size/8).
Will fix it if it really matters.


   +int myri10ge_hyper_msi_cap_on(struct pci_dev *pdev)
   +{
   +  uint8_t cap_off;
   +  int nbcap = 0;
   +
   +  cap_off = PCI_CAPABILITY_LIST - 1;
   +  /* go through all caps looking for a hypertransport msi mapping */

 This looks like something that should be fixed up in the general PCI
 quirk handling rather than in every driver...

   +static int
   +myri10ge_use_msi(struct pci_dev *pdev)
   +{
   +  if (myri10ge_msi == 1 || myri10ge_msi == 0)
   +  return myri10ge_msi;
   +
   +  /*  find root complex for our device */
   +  while (pdev-bus  pdev-bus-self) {
   +  pdev = pdev-bus-self;
   +  }

 Similarly looks like generic PCI code (if it's needed at all).  If I
 understand correctly you're trying to check if MSI has a chance at
 working on the system, but a network device driver has no business
 walking up the PCI hierarchy.
   

Right, I will look at moving all this to the core PCI code.


Thanks for all the comments.

Brice

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] myri10ge - First half of the driver

2006-05-11 Thread Brice Goglin
Francois Romieu wrote:

 +spin_lock(mgp-cmd_lock);
 +response-result = 0x;
 +mb();
 +myri10ge_pio_copy((void __iomem *) cmd_addr, buf, sizeof (*buf));
 +
 +/* wait up to 2 seconds */
 

 You must not hold a spinlock for up to 2 seconds.
   

We are working on reducing the delay to about 15ms. It only occurs when
the driver is loaded or the link brought up.

 +for (sleep_total = 0; sleep_total  (2 * 1000); sleep_total += 10) {
 +mb();
 +if (response-result != 0x) {
 +if (response-result == 0) {
 +data-data0 = ntohl(response-data);
 +spin_unlock(mgp-cmd_lock);
 +return 0;
 +} else {
 +dev_err(mgp-pdev-dev,
 +command %d failed, result = %d\n,
 +   cmd, ntohl(response-result));
 +spin_unlock(mgp-cmd_lock);
 +return -ENXIO;
 

 Return in a middle of a spinlock-intensive function. :o(
   

What do you mean ?

   
 +{
 +struct sk_buff *skb;
 +unsigned long data, roundup;
 +
 +skb = dev_alloc_skb(bytes + 4096 + MYRI10GE_MCP_ETHER_PAD);
 +if (skb == NULL)
 +return NULL;
 

 Imho you will want to work directly with pages shortly.
   

We had thought about doing this, but were a little nervous since we did
not know of any other drivers that worked directly with pages.  If this
is an official direction to work directly with pages, we will. But the
existing approach is well tested through our beta cycle, and we would
prefer to leave it as is and update to a pages based approach in the
future.


Thanks a lot for all the comments.

Brice

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] myri10ge - First half of the driver

2006-05-10 Thread Stephen Hemminger
On Wed, 10 May 2006 14:40:22 -0700 (PDT)
Brice Goglin [EMAIL PROTECTED] wrote:

 [PATCH 4/6] myri10ge - First half of the driver
 
 The first half of the myri10ge driver core.
 

Splitting it in half, might help email restrictions, but it kills
future users of 'git bisect' who expect to have every kernel buildable.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] myri10ge - First half of the driver

2006-05-10 Thread Roland Dreier
  +typedef struct {
  +mcp_kreq_ether_recv_t __iomem *lanai;   /* lanai ptr for recv ring */
  +volatile uint8_t __iomem *wc_fifo;  /* w/c rx dma addr fifo address 
  */
  +mcp_kreq_ether_recv_t *shadow;  /* host shadow of recv ring */
  +struct myri10ge_rx_buffer_state *info;
  +int cnt;
  +int alloc_fail;
  +int mask;   /* number of rx slots -1 */
  +} myri10ge_rx_buf_t;

Why is wc_fifo volatile?  The only places you actually use it, you
seem to cast away the volatile anyway.

Also, again, no typedef of structures please.

  +#define myri10ge_pio_copy(to,from,size) __iowrite64_copy(to,from,size/8)

Why do you need this wrapper?  Why not just call __iowrite64_copy()
without the obfuscation?  Anyone reading the code will just have to
search back to this define and mentally translate the size back and
forth all the time.

  +int myri10ge_hyper_msi_cap_on(struct pci_dev *pdev)
  +{
  +uint8_t cap_off;
  +int nbcap = 0;
  +
  +cap_off = PCI_CAPABILITY_LIST - 1;
  +/* go through all caps looking for a hypertransport msi mapping */

This looks like something that should be fixed up in the general PCI
quirk handling rather than in every driver...

  +static int
  +myri10ge_use_msi(struct pci_dev *pdev)
  +{
  +if (myri10ge_msi == 1 || myri10ge_msi == 0)
  +return myri10ge_msi;
  +
  +/*  find root complex for our device */
  +while (pdev-bus  pdev-bus-self) {
  +pdev = pdev-bus-self;
  +}

Similarly looks like generic PCI code (if it's needed at all).  If I
understand correctly you're trying to check if MSI has a chance at
working on the system, but a network device driver has no business
walking up the PCI hierarchy.

  +buf = (mcp_cmd_t *) ((unsigned long)(buf_bytes + 7)  ~7UL);

ALIGN() from linux/kernel.h?

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] myri10ge - First half of the driver

2006-05-10 Thread Francois Romieu
Brice Goglin [EMAIL PROTECTED] :
 [PATCH 4/6] myri10ge - First half of the driver
 
 The first half of the myri10ge driver core.
 
 Signed-off-by: Brice Goglin [EMAIL PROTECTED]
 Signed-off-by: Andrew J. Gallatin [EMAIL PROTECTED]
 
  myri10ge.c | 1483 
 +
  1 file changed, 1483 insertions(+)
 
 --- /dev/null 2006-05-09 19:43:19.324446250 +0200
 +++ linux/drivers/net/myri10ge/myri10ge.c 2006-05-09 23:00:55.0 
 +0200
[...]
 +module_param(myri10ge_flow_control, int, S_IRUGO);
 +module_param(myri10ge_deassert_wait, int, S_IRUGO | S_IWUSR);
 +module_param(myri10ge_force_firmware, int, S_IRUGO);
 +module_param(myri10ge_skb_cross_4k, int, S_IRUGO | S_IWUSR);
 +module_param(myri10ge_initial_mtu, int, S_IRUGO);
 +module_param(myri10ge_napi, int, S_IRUGO);
 +module_param(myri10ge_napi_weight, int, S_IRUGO);
 +module_param(myri10ge_watchdog_timeout, int, S_IRUGO);
 +module_param(myri10ge_max_irq_loops, int, S_IRUGO);

MODULE_PARM_DESC() would be nice.

 +
 +#define MYRI10GE_FW_OFFSET 1024*1024
 +#define MYRI10GE_HIGHPART_TO_U32(X) \
 +(sizeof (X) == 8) ? ((uint32_t)((uint64_t)(X)  32)) : (0)
 +#define MYRI10GE_LOWPART_TO_U32(X) ((uint32_t)(X))
 +
 +#define myri10ge_pio_copy(to,from,size) __iowrite64_copy(to,from,size/8)
 +
 +int myri10ge_hyper_msi_cap_on(struct pci_dev *pdev)

static int ?

[...]
 +static int
 +myri10ge_send_cmd(struct myri10ge_priv *mgp, uint32_t cmd,
 +   myri10ge_cmd_t *data)
 +{
 + mcp_cmd_t *buf;
 + char buf_bytes[sizeof(*buf) + 8];
 + volatile mcp_cmd_response_t *response = mgp-cmd;
 + volatile char __iomem *cmd_addr = mgp-sram + MYRI10GE_MCP_CMD_OFFSET;
 + uint32_t dma_low, dma_high;
 + int sleep_total = 0;
 +
 + /* ensure buf is aligned to 8 bytes */
 + buf = (mcp_cmd_t *) ((unsigned long)(buf_bytes + 7)  ~7UL);
 +
 + buf-data0 = htonl(data-data0);
 + buf-data1 = htonl(data-data1);
 + buf-data2 = htonl(data-data2);
 + buf-cmd = htonl(cmd);
 + dma_low = MYRI10GE_LOWPART_TO_U32(mgp-cmd_bus);
 + dma_high = MYRI10GE_HIGHPART_TO_U32(mgp-cmd_bus);
 +
 + buf-response_addr.low = htonl(dma_low);
 + buf-response_addr.high = htonl(dma_high);
 + spin_lock(mgp-cmd_lock);
 + response-result = 0x;
 + mb();
 + myri10ge_pio_copy((void __iomem *) cmd_addr, buf, sizeof (*buf));
 +
 + /* wait up to 2 seconds */

You must not hold a spinlock for up to 2 seconds.

 + for (sleep_total = 0; sleep_total  (2 * 1000); sleep_total += 10) {
 + mb();
 + if (response-result != 0x) {
 + if (response-result == 0) {
 + data-data0 = ntohl(response-data);
 + spin_unlock(mgp-cmd_lock);
 + return 0;
 + } else {
 + dev_err(mgp-pdev-dev,
 + command %d failed, result = %d\n,
 +cmd, ntohl(response-result));
 + spin_unlock(mgp-cmd_lock);
 + return -ENXIO;

Return in a middle of a spinlock-intensive function. :o(

 + }
 + }
 + udelay(1000 * 10);
 + }
 + spin_unlock(mgp-cmd_lock);
 + dev_err(mgp-pdev-dev, command %d timed out, result = %d\n,
 +cmd, ntohl(response-result));
 + return -EAGAIN;
 +}
 +
 +
 +/*
 + * The eeprom strings on the lanaiX have the format
 + * SN=x\0
 + * MAC=x:x:x:x:x:x\0
 + * PT:ddd mmm xx xx:xx:xx xx\0
 + * PV:ddd mmm xx xx:xx:xx xx\0
 + */
 +int
 +myri10ge_read_mac_addr(struct myri10ge_priv *mgp)

static int ?

[...]
 +static void
 +myri10ge_dummy_rdma(struct myri10ge_priv *mgp, int enable)
 +{
 + volatile uint32_t *confirm;
 + volatile char __iomem *submit;
 + uint32_t buf[16];
 + uint32_t dma_low, dma_high;
 + int i;
 +
 + /* clear confirmation addr */
 + confirm = (volatile uint32_t *) mgp-cmd;
 + *confirm = 0;
 + mb();
 +
 + /* send a rdma command to the PCIe engine, and wait for the
 +  * response in the confirmation address.  The firmware should
 +  * write a -1 there to indicate it is alive and well
 +  */
 + dma_low = MYRI10GE_LOWPART_TO_U32(mgp-cmd_bus);
 + dma_high = MYRI10GE_HIGHPART_TO_U32(mgp-cmd_bus);
 +
 + buf[0] = htonl(dma_high);   /* confirm addr MSW */
 + buf[1] = htonl(dma_low);/* confirm addr LSW */
 + buf[2] = htonl(0x); /* confirm data */
 + buf[3] = htonl(dma_high);   /* dummy addr MSW */
 + buf[4] = htonl(dma_low);/* dummy addr LSW */
 + buf[5] = htonl(enable); /* enable? */
 +
 + submit = mgp-sram + 0xfc01c0;
 +
 + myri10ge_pio_copy((void __iomem *) submit, buf, sizeof (buf));
 + mb();
 + udelay(1000);
 + mb();
 + i = 0;
 + while (*confirm != 0x  i  20) {
 +