Re: [PATCH] ucc_geth: Rework the TX logic.

2009-03-31 Thread Li Yang
On Tue, Mar 31, 2009 at 1:22 AM, Scott Wood scottw...@freescale.com wrote:
 Joakim Tjernlund wrote:

 gianfar does not seem to use in_/out_ functions for the BDs. Works just
 fine that too it seems.

 It does now that it has explicit barriers in a few places.  Before they were
 added, it would sometimes fail under load.  That was due to a compiler
 reordering, but CPU reordering was possible as well.

I noticed that in gianfar these memory access is not protected by
volatile.  Can this be the reason why the compiler did some unwanted
optimization?

- Leo
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] ucc_geth: Rework the TX logic.

2009-03-31 Thread Joakim Tjernlund
Scott Wood scottw...@freescale.com wrote on 30/03/2009 21:32:23:
 
 Joakim Tjernlund wrote:
  different since descriptors are in MURAM which is ioremap()ed -- 
though 
  switching to a cacheable mapping with barriers should be a 
performance 
  improvement.
  
  I always thought that MURAM was very fast. The whole reason to have 
BDs in
  MURAM is that it is faster than normal RAM, at least that is what I 
  thought.
 
 Yeah, on second thought it probably wouldn't be worth it.  There's also 
 the question of under what circumstances the QE's MURAM accesses will be 

 cache-coherent.

I am a bit confused, what isn't worth it? Currently MURAM isn't used by 
ucc_geth, but
it is easy to change. Swap MEM_PART_SYSTEM to MEM_PART_MURAM, however, 
just
tried that and the driver stopped working. I known this worked earlier 
because
I tried it and I even think I sent a patch to Leo.

What choices do we have, I see three:

1) MEM_PART_SYSTEM, as today.
2) MEM_PART_MURAM. I guess this should be uncacheable memory?
3) as gianfar, dma_alloc_coherent(). I presume this is uncacheable memory?

My guess would be 2 or 3. Do they have the same synchronization
sematics?

 
 As for the CPU not reordering guarded+cache inhibited accesses, that 
 initially seemed to be true for the new arch stuff (book3e/book3s, but 
 not really, see below), but the classic arch documentation only 
 guarantees stores to such regions to be in-order (and the 
 explicitly-specified operation of eieio on I+G accesses wouldn't make 
 much sense if they were already guaranteed to be in-order).
 
 Then I looked at the EREF to see what older book E documents had to say 
 on the issue, and it suggests that when the architecture document says 
 out of order, it really means speculative (and reading the arch 
 doc's definition of out of order seems to confirm this -- redefining 
 terms is bad, m'kay?).  So it seems that the simple answer is no, 
 guarded storage is not guaranteed to be in order, unless the only thing 
 that can cause an out-of-order access is speculative execution.

Very informative, thanks.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] ucc_geth: Rework the TX logic.

2009-03-31 Thread Li Yang
On Tue, Mar 31, 2009 at 5:07 PM, Joakim Tjernlund
joakim.tjernl...@transmode.se wrote:
 Scott Wood scottw...@freescale.com wrote on 30/03/2009 21:32:23:

 Joakim Tjernlund wrote:
  different since descriptors are in MURAM which is ioremap()ed --
 though
  switching to a cacheable mapping with barriers should be a
 performance
  improvement.
 
  I always thought that MURAM was very fast. The whole reason to have
 BDs in
  MURAM is that it is faster than normal RAM, at least that is what I
  thought.

 Yeah, on second thought it probably wouldn't be worth it.  There's also
 the question of under what circumstances the QE's MURAM accesses will be

 cache-coherent.

 I am a bit confused, what isn't worth it? Currently MURAM isn't used by
 ucc_geth, but
 it is easy to change. Swap MEM_PART_SYSTEM to MEM_PART_MURAM, however,
 just
 tried that and the driver stopped working. I known this worked earlier
 because
 I tried it and I even think I sent a patch to Leo.

 What choices do we have, I see three:

 1) MEM_PART_SYSTEM, as today.
 2) MEM_PART_MURAM. I guess this should be uncacheable memory?
 3) as gianfar, dma_alloc_coherent(). I presume this is uncacheable memory?


1 and 3 are the same.  All of them use cacheable memory as we have a
hardware coherency module to take care of the cache coherency problem.
 However it might be better to use dma_alloc_coherent() for the code
to be more readible.  Thanks.

- Leo
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] ucc_geth: Rework the TX logic.

2009-03-31 Thread Scott Wood

Joakim Tjernlund wrote:

I am a bit confused, what isn't worth it?


Enabling cacheing on MURAM, at least when used for buffer descriptors. 
The cache line ping-pong would probably outweigh the cost of the 
uncached accesses.



Currently MURAM isn't used by ucc_geth,


Hmm.  I looked in the driver and saw numerous muram allocations, but I 
didn't try to follow the driver enough to ensure that they were for the 
ring.  I'd assumed it was similar to the CPM1/CPM2 driver.



3) as gianfar, dma_alloc_coherent(). I presume this is uncacheable memory?


It would be uncacheable on systems without coherent DMA, but I don't 
think there are any such systems that use gianfar.



My guess would be 2 or 3. Do they have the same synchronization
sematics?


No, unfortunately.  PowerPC sync instructions are a bit complicated. 
For example, you can use eieio to sync between reading the interrupt 
status register and checking the ring buffer, if they're both mapped 
I+G, but not if the former is I+G and the latter is cacheable (you need 
a full sync in that case).


-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] ucc_geth: Rework the TX logic.

2009-03-30 Thread Joakim Tjernlund
Scott Wood scottw...@freescale.com wrote on 27/03/2009 14:26:00:
 
 Joakim Tjernlund wrote:
  This is a bit better than the current type casts. Moving over to qe_bd
  accesses is a bigger cleanup that will have to be a seperate patch.
  I am not sure it is an all win as you probably loose the ability
  to read both status and len in one access.
 
 Not if you define the struct to have one status_len field (or a union of 

 that with separate fields), as gianfar does.

gianfar does not seem to use in_/out_ functions for the BDs. Works just
fine that too it seems. I always felt that the in_/out_ functions
was extra baggage that the Freescale CPUs don't need. There is
something in between that is missing.

 Jocke
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] ucc_geth: Rework the TX logic.

2009-03-30 Thread Scott Wood

Joakim Tjernlund wrote:

gianfar does not seem to use in_/out_ functions for the BDs. Works just
fine that too it seems.


It does now that it has explicit barriers in a few places.  Before they 
were added, it would sometimes fail under load.  That was due to a 
compiler reordering, but CPU reordering was possible as well.


-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] ucc_geth: Rework the TX logic.

2009-03-30 Thread Joakim Tjernlund
Scott Wood scottw...@freescale.com wrote on 30/03/2009 19:22:03:
 
 Joakim Tjernlund wrote:
  gianfar does not seem to use in_/out_ functions for the BDs. Works 
just
  fine that too it seems.
 
 It does now that it has explicit barriers in a few places.  Before they

In 2.6.29 or later?
 
 were added, it would sometimes fail under load.  That was due to a 
 compiler reordering, but CPU reordering was possible as well.

Does not the CPU skip reordering if the guarded bit is set?

Could we not use the same in ucc_geth as well? Then people would
not need to worry about performance issues.

 Jocke
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] ucc_geth: Rework the TX logic.

2009-03-30 Thread Scott Wood

Joakim Tjernlund wrote:

Scott Wood scottw...@freescale.com wrote on 30/03/2009 19:22:03:

Joakim Tjernlund wrote:
gianfar does not seem to use in_/out_ functions for the BDs. Works 

just

fine that too it seems.

It does now that it has explicit barriers in a few places.  Before they


In 2.6.29 or later?


No, it was earlier.

were added, it would sometimes fail under load.  That was due to a 
compiler reordering, but CPU reordering was possible as well.


Does not the CPU skip reordering if the guarded bit is set?


The guarded bit is typically not set for DMA buffers.  ucc_geth is a bit 
different since descriptors are in MURAM which is ioremap()ed -- though 
switching to a cacheable mapping with barriers should be a performance 
improvement.


-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] ucc_geth: Rework the TX logic.

2009-03-30 Thread Joakim Tjernlund
Scott Wood scottw...@freescale.com wrote on 30/03/2009 19:45:17:
 
 Joakim Tjernlund wrote:
  Scott Wood scottw...@freescale.com wrote on 30/03/2009 19:22:03:
  Joakim Tjernlund wrote:
  gianfar does not seem to use in_/out_ functions for the BDs. Works 
  just
  fine that too it seems.
  It does now that it has explicit barriers in a few places.  Before 
they
  
  In 2.6.29 or later?
 
 No, it was earlier.

Ah, I see now. The eieio() stuff.

 
  were added, it would sometimes fail under load.  That was due to a 
  compiler reordering, but CPU reordering was possible as well.
  
  Does not the CPU skip reordering if the guarded bit is set?
 
 The guarded bit is typically not set for DMA buffers.  ucc_geth is a bit 

 different since descriptors are in MURAM which is ioremap()ed -- though 
 switching to a cacheable mapping with barriers should be a performance 
 improvement.

I always thought that MURAM was very fast. The whole reason to have BDs in
MURAM is that it is faster than normal RAM, at least that is what I 
thought.

 Jocke

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] ucc_geth: Rework the TX logic.

2009-03-30 Thread Scott Wood

Joakim Tjernlund wrote:
different since descriptors are in MURAM which is ioremap()ed -- though 
switching to a cacheable mapping with barriers should be a performance 
improvement.


I always thought that MURAM was very fast. The whole reason to have BDs in
MURAM is that it is faster than normal RAM, at least that is what I 
thought.


Yeah, on second thought it probably wouldn't be worth it.  There's also 
the question of under what circumstances the QE's MURAM accesses will be 
cache-coherent.


As for the CPU not reordering guarded+cache inhibited accesses, that 
initially seemed to be true for the new arch stuff (book3e/book3s, but 
not really, see below), but the classic arch documentation only 
guarantees stores to such regions to be in-order (and the 
explicitly-specified operation of eieio on I+G accesses wouldn't make 
much sense if they were already guaranteed to be in-order).


Then I looked at the EREF to see what older book E documents had to say 
on the issue, and it suggests that when the architecture document says 
out of order, it really means speculative (and reading the arch 
doc's definition of out of order seems to confirm this -- redefining 
terms is bad, m'kay?).  So it seems that the simple answer is no, 
guarded storage is not guaranteed to be in order, unless the only thing 
that can cause an out-of-order access is speculative execution.


-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] ucc_geth: Rework the TX logic.

2009-03-27 Thread Li Yang
On Fri, Mar 27, 2009 at 1:44 AM, Joakim Tjernlund
joakim.tjernl...@transmode.se wrote:
 The line:
  if ((bd == ugeth-txBd[txQ])  (netif_queue_stopped(dev) == 0))
       break;
 in ucc_geth_tx() didn not make sense to me. Rework  cleanup
 this logic to something understandable.
 ---

 Reworked the patch according to Antons comments.

  drivers/net/ucc_geth.c |   66 +++
  1 files changed, 38 insertions(+), 28 deletions(-)

 diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
 index 7fc91aa..465de3a 100644
 --- a/drivers/net/ucc_geth.c
 +++ b/drivers/net/ucc_geth.c
 @@ -161,6 +161,17 @@ static struct ucc_geth_info ugeth_primary_info = {

  static struct ucc_geth_info ugeth_info[8];

 +
 +static inline u32 __iomem *bd2buf(u8 __iomem *bd)
 +{
 +       return (u32 __iomem *)(bd+4);
 +}
 +
 +static inline u32 __iomem *bd2status(u8 __iomem *bd)
 +{
 +       return (u32 __iomem *)bd;
 +}


When the driver was reviewed for upstream merge, I was told to remove
this kind of thing in favor of direct struct qe_bd access.  So do we
really have a guideline on this?  Or it's just a matter of personal
preference?

 +
  #ifdef DEBUG
  static void mem_disp(u8 *addr, int size)
  {
 @@ -3048,6 +3059,7 @@ static int ucc_geth_start_xmit(struct sk_buff *skb, 
 struct net_device *dev)
        u8 __iomem *bd;                 /* BD pointer */
        u32 bd_status;
        u8 txQ = 0;
 +       int tx_ind;

        ugeth_vdbg(%s: IN, __func__);

 @@ -3057,21 +3069,19 @@ static int ucc_geth_start_xmit(struct sk_buff *skb, 
 struct net_device *dev)

        /* Start from the next BD that should be filled */
        bd = ugeth-txBd[txQ];
 -       bd_status = in_be32((u32 __iomem *)bd);
 +       bd_status = in_be32(bd2status(bd));
        /* Save the skb pointer so we can free it later */
 -       ugeth-tx_skbuff[txQ][ugeth-skb_curtx[txQ]] = skb;
 +       tx_ind = ugeth-skb_curtx[txQ];
 +       ugeth-tx_skbuff[txQ][tx_ind] = skb;

        /* Update the current skb pointer (wrapping if this was the last) */
 -       ugeth-skb_curtx[txQ] =
 -           (ugeth-skb_curtx[txQ] +
 -            1)  TX_RING_MOD_MASK(ugeth-ug_info-bdRingLenTx[txQ]);
 +       tx_ind = (tx_ind + 1)  
 TX_RING_MOD_MASK(ugeth-ug_info-bdRingLenTx[txQ]);
 +       ugeth-skb_curtx[txQ] = tx_ind;

        /* set up the buffer descriptor */
 -       out_be32(((struct qe_bd __iomem *)bd)-buf,
 -                     dma_map_single(ugeth-dev-dev, skb-data,
 -                             skb-len, DMA_TO_DEVICE));
 -
 -       /* printk(KERN_DEBUGskb-data is 0x%x\n,skb-data); */
 +       out_be32(bd2buf(bd),
 +                dma_map_single(ugeth-dev-dev, skb-data,
 +                               skb-len, DMA_TO_DEVICE));

        bd_status = (bd_status  T_W) | T_R | T_I | T_L | skb-len;

 @@ -3088,9 +3098,9 @@ static int ucc_geth_start_xmit(struct sk_buff *skb, 
 struct net_device *dev)

        /* If the next BD still needs to be cleaned up, then the bds
           are full.  We need to tell the kernel to stop sending us stuff. */
 -       if (bd == ugeth-confBd[txQ]) {
 -               if (!netif_queue_stopped(dev))
 -                       netif_stop_queue(dev);
 +
 +       if (!in_be32(bd2buf(bd))) {
 +               netif_stop_queue(dev);

Why does this make more sense?  The BD with a NULL buffer means the
ring is full?  It's not the normal case.

        }

        ugeth-txBd[txQ] = bd;
 @@ -3198,41 +3208,41 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
        struct ucc_geth_private *ugeth = netdev_priv(dev);
        u8 __iomem *bd;         /* BD pointer */
        u32 bd_status;
 +       int tx_ind, num_freed;

        bd = ugeth-confBd[txQ];
 -       bd_status = in_be32((u32 __iomem *)bd);
 -
 +       bd_status = in_be32(bd2status(bd));
 +       tx_ind = ugeth-skb_dirtytx[txQ];
 +       num_freed = 0;
        /* Normal processing. */
        while ((bd_status  T_R) == 0) {
                /* BD contains already transmitted buffer.   */
                /* Handle the transmitted buffer and release */
                /* the BD to be used with the current frame  */

 -               if ((bd == ugeth-txBd[txQ])  (netif_queue_stopped(dev) == 
 0))
 -                       break;
 +               if (bd == ugeth-txBd[txQ])
 +                       break; /* Queue is empty */

                dev-stats.tx_packets++;

                /* Free the sk buffer associated with this TxBD */
 -               dev_kfree_skb(ugeth-
 -                                 tx_skbuff[txQ][ugeth-skb_dirtytx[txQ]]);
 -               ugeth-tx_skbuff[txQ][ugeth-skb_dirtytx[txQ]] = NULL;
 -               ugeth-skb_dirtytx[txQ] =
 -                   (ugeth-skb_dirtytx[txQ] +
 -                    1)  TX_RING_MOD_MASK(ugeth-ug_info-bdRingLenTx[txQ]);
 -
 -               /* We freed a buffer, so now we can restart transmission */
 -               if (netif_queue_stopped(dev))
 -                       netif_wake_queue(dev);
 +             

Re: [PATCH] ucc_geth: Rework the TX logic.

2009-03-27 Thread Joakim Tjernlund
pku@gmail.com wrote on 27/03/2009 10:45:12:
 On Fri, Mar 27, 2009 at 1:44 AM, Joakim Tjernlund
 joakim.tjernl...@transmode.se wrote:
  The line:
   if ((bd == ugeth-txBd[txQ])  (netif_queue_stopped(dev) == 0))
break;
  in ucc_geth_tx() didn not make sense to me. Rework  cleanup
  this logic to something understandable.
  ---
 
  Reworked the patch according to Antons comments.
 
   drivers/net/ucc_geth.c |   66 
+++
   1 files changed, 38 insertions(+), 28 deletions(-)
 
  diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
  index 7fc91aa..465de3a 100644
  --- a/drivers/net/ucc_geth.c
  +++ b/drivers/net/ucc_geth.c
  @@ -161,6 +161,17 @@ static struct ucc_geth_info ugeth_primary_info = 
{
 
   static struct ucc_geth_info ugeth_info[8];
 
  +
  +static inline u32 __iomem *bd2buf(u8 __iomem *bd)
  +{
  +   return (u32 __iomem *)(bd+4);
  +}
  +
  +static inline u32 __iomem *bd2status(u8 __iomem *bd)
  +{
  +   return (u32 __iomem *)bd;
  +}
 
 
 When the driver was reviewed for upstream merge, I was told to remove
 this kind of thing in favor of direct struct qe_bd access.  So do we
 really have a guideline on this?  Or it's just a matter of personal
 preference?

This is a bit better than the current type casts. Moving over to qe_bd
accesses is a bigger cleanup that will have to be a seperate patch.
I am not sure it is an all win as you probably loose the ability
to read both status and len in one access.

 
  +
   #ifdef DEBUG
   static void mem_disp(u8 *addr, int size)
   {
  @@ -3048,6 +3059,7 @@ static int ucc_geth_start_xmit(struct sk_buff 
*skb, struct net_device *dev)
 u8 __iomem *bd; /* BD pointer */
 u32 bd_status;
 u8 txQ = 0;
  +   int tx_ind;
 
 ugeth_vdbg(%s: IN, __func__);
 
  @@ -3057,21 +3069,19 @@ static int ucc_geth_start_xmit(struct sk_buff 
*skb, struct net_device *dev)
 
 /* Start from the next BD that should be filled */
 bd = ugeth-txBd[txQ];
  -   bd_status = in_be32((u32 __iomem *)bd);
  +   bd_status = in_be32(bd2status(bd));
 /* Save the skb pointer so we can free it later */
  -   ugeth-tx_skbuff[txQ][ugeth-skb_curtx[txQ]] = skb;
  +   tx_ind = ugeth-skb_curtx[txQ];
  +   ugeth-tx_skbuff[txQ][tx_ind] = skb;
 
 /* Update the current skb pointer (wrapping if this was the 
last) */
  -   ugeth-skb_curtx[txQ] =
  -   (ugeth-skb_curtx[txQ] +
  -1)  TX_RING_MOD_MASK(ugeth-ug_info-bdRingLenTx[txQ]);
  +   tx_ind = (tx_ind + 1)  
TX_RING_MOD_MASK(ugeth-ug_info-bdRingLenTx[txQ]);
  +   ugeth-skb_curtx[txQ] = tx_ind;
 
 /* set up the buffer descriptor */
  -   out_be32(((struct qe_bd __iomem *)bd)-buf,
  - dma_map_single(ugeth-dev-dev, skb-data,
  - skb-len, DMA_TO_DEVICE));
  -
  -   /* printk(KERN_DEBUGskb-data is 0x%x\n,skb-data); */
  +   out_be32(bd2buf(bd),
  +dma_map_single(ugeth-dev-dev, skb-data,
  +   skb-len, DMA_TO_DEVICE));
 
 bd_status = (bd_status  T_W) | T_R | T_I | T_L | skb-len;
 
  @@ -3088,9 +3098,9 @@ static int ucc_geth_start_xmit(struct sk_buff 
*skb, struct net_device *dev)
 
 /* If the next BD still needs to be cleaned up, then the bds
are full.  We need to tell the kernel to stop sending us 
stuff. */
  -   if (bd == ugeth-confBd[txQ]) {
  -   if (!netif_queue_stopped(dev))
  -   netif_stop_queue(dev);
  +
  +   if (!in_be32(bd2buf(bd))) {
  +   netif_stop_queue(dev);
 
 Why does this make more sense?  The BD with a NULL buffer means the
 ring is full?  It's not the normal case.
 
 }
 
 ugeth-txBd[txQ] = bd;
  @@ -3198,41 +3208,41 @@ static int ucc_geth_tx(struct net_device *dev, 
u8 txQ)
 struct ucc_geth_private *ugeth = netdev_priv(dev);
 u8 __iomem *bd; /* BD pointer */
 u32 bd_status;
  +   int tx_ind, num_freed;
 
 bd = ugeth-confBd[txQ];
  -   bd_status = in_be32((u32 __iomem *)bd);
  -
  +   bd_status = in_be32(bd2status(bd));
  +   tx_ind = ugeth-skb_dirtytx[txQ];
  +   num_freed = 0;
 /* Normal processing. */
 while ((bd_status  T_R) == 0) {
 /* BD contains already transmitted buffer.   */
 /* Handle the transmitted buffer and release */
 /* the BD to be used with the current frame  */
 
  -   if ((bd == ugeth-txBd[txQ])  
(netif_queue_stopped(dev) == 0))
  -   break;
  +   if (bd == ugeth-txBd[txQ])
  +   break; /* Queue is empty */
 
 dev-stats.tx_packets++;
 
 /* Free the sk buffer associated with this TxBD */
  -   dev_kfree_skb(ugeth-
  - 
tx_skbuff[txQ][ugeth-skb_dirtytx[txQ]]);
 

Re: [PATCH] ucc_geth: Rework the TX logic.

2009-03-27 Thread Li Yang
On Fri, Mar 27, 2009 at 6:23 PM, Joakim Tjernlund
joakim.tjernl...@transmode.se wrote:
 pku@gmail.com wrote on 27/03/2009 10:45:12:
 On Fri, Mar 27, 2009 at 1:44 AM, Joakim Tjernlund
 joakim.tjernl...@transmode.se wrote:
  The line:
   if ((bd == ugeth-txBd[txQ])  (netif_queue_stopped(dev) == 0))
        break;
  in ucc_geth_tx() didn not make sense to me. Rework  cleanup
  this logic to something understandable.
  ---
 
  Reworked the patch according to Antons comments.
 
   drivers/net/ucc_geth.c |   66
 +++
   1 files changed, 38 insertions(+), 28 deletions(-)
 
  diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
  index 7fc91aa..465de3a 100644
  --- a/drivers/net/ucc_geth.c
  +++ b/drivers/net/ucc_geth.c
  @@ -161,6 +161,17 @@ static struct ucc_geth_info ugeth_primary_info =
 {
 
   static struct ucc_geth_info ugeth_info[8];
 
  +
  +static inline u32 __iomem *bd2buf(u8 __iomem *bd)
  +{
  +       return (u32 __iomem *)(bd+4);
  +}
  +
  +static inline u32 __iomem *bd2status(u8 __iomem *bd)
  +{
  +       return (u32 __iomem *)bd;
  +}


 When the driver was reviewed for upstream merge, I was told to remove
 this kind of thing in favor of direct struct qe_bd access.  So do we
 really have a guideline on this?  Or it's just a matter of personal
 preference?

 This is a bit better than the current type casts. Moving over to qe_bd
 accesses is a bigger cleanup that will have to be a seperate patch.
 I am not sure it is an all win as you probably loose the ability
 to read both status and len in one access.


  +
   #ifdef DEBUG
   static void mem_disp(u8 *addr, int size)
   {
  @@ -3048,6 +3059,7 @@ static int ucc_geth_start_xmit(struct sk_buff
 *skb, struct net_device *dev)
         u8 __iomem *bd;                 /* BD pointer */
         u32 bd_status;
         u8 txQ = 0;
  +       int tx_ind;
 
         ugeth_vdbg(%s: IN, __func__);
 
  @@ -3057,21 +3069,19 @@ static int ucc_geth_start_xmit(struct sk_buff
 *skb, struct net_device *dev)
 
         /* Start from the next BD that should be filled */
         bd = ugeth-txBd[txQ];
  -       bd_status = in_be32((u32 __iomem *)bd);
  +       bd_status = in_be32(bd2status(bd));
         /* Save the skb pointer so we can free it later */
  -       ugeth-tx_skbuff[txQ][ugeth-skb_curtx[txQ]] = skb;
  +       tx_ind = ugeth-skb_curtx[txQ];
  +       ugeth-tx_skbuff[txQ][tx_ind] = skb;
 
         /* Update the current skb pointer (wrapping if this was the
 last) */
  -       ugeth-skb_curtx[txQ] =
  -           (ugeth-skb_curtx[txQ] +
  -            1)  TX_RING_MOD_MASK(ugeth-ug_info-bdRingLenTx[txQ]);
  +       tx_ind = (tx_ind + 1) 
 TX_RING_MOD_MASK(ugeth-ug_info-bdRingLenTx[txQ]);
  +       ugeth-skb_curtx[txQ] = tx_ind;
 
         /* set up the buffer descriptor */
  -       out_be32(((struct qe_bd __iomem *)bd)-buf,
  -                     dma_map_single(ugeth-dev-dev, skb-data,
  -                             skb-len, DMA_TO_DEVICE));
  -
  -       /* printk(KERN_DEBUGskb-data is 0x%x\n,skb-data); */
  +       out_be32(bd2buf(bd),
  +                dma_map_single(ugeth-dev-dev, skb-data,
  +                               skb-len, DMA_TO_DEVICE));
 
         bd_status = (bd_status  T_W) | T_R | T_I | T_L | skb-len;
 
  @@ -3088,9 +3098,9 @@ static int ucc_geth_start_xmit(struct sk_buff
 *skb, struct net_device *dev)
 
         /* If the next BD still needs to be cleaned up, then the bds
            are full.  We need to tell the kernel to stop sending us
 stuff. */
  -       if (bd == ugeth-confBd[txQ]) {
  -               if (!netif_queue_stopped(dev))
  -                       netif_stop_queue(dev);
  +
  +       if (!in_be32(bd2buf(bd))) {
  +               netif_stop_queue(dev);

 Why does this make more sense?  The BD with a NULL buffer means the
 ring is full?  It's not the normal case.

         }
 
         ugeth-txBd[txQ] = bd;
  @@ -3198,41 +3208,41 @@ static int ucc_geth_tx(struct net_device *dev,
 u8 txQ)
         struct ucc_geth_private *ugeth = netdev_priv(dev);
         u8 __iomem *bd;         /* BD pointer */
         u32 bd_status;
  +       int tx_ind, num_freed;
 
         bd = ugeth-confBd[txQ];
  -       bd_status = in_be32((u32 __iomem *)bd);
  -
  +       bd_status = in_be32(bd2status(bd));
  +       tx_ind = ugeth-skb_dirtytx[txQ];
  +       num_freed = 0;
         /* Normal processing. */
         while ((bd_status  T_R) == 0) {
                 /* BD contains already transmitted buffer.   */
                 /* Handle the transmitted buffer and release */
                 /* the BD to be used with the current frame  */
 
  -               if ((bd == ugeth-txBd[txQ]) 
 (netif_queue_stopped(dev) == 0))
  -                       break;
  +               if (bd == ugeth-txBd[txQ])
  +                       break; /* Queue is empty */
 
                 dev-stats.tx_packets++;
 
                 /* Free the sk buffer associated with this TxBD */
  -               

Re: [PATCH] ucc_geth: Rework the TX logic.

2009-03-27 Thread Joakim Tjernlund
pku@gmail.com wrote on 27/03/2009 11:39:07:
 On Fri, Mar 27, 2009 at 6:23 PM, Joakim Tjernlund
   -   /* We freed a buffer, so now we can restart
  transmission */
   -   if (netif_queue_stopped(dev))
   -   netif_wake_queue(dev);
   +   dev_kfree_skb(ugeth-tx_skbuff[txQ][tx_ind]);
   +   ugeth-tx_skbuff[txQ][tx_ind] = NULL;
   +   out_be32(bd2buf(bd), 0); /* Mark it free */
 
  Now I see why you depend on the buffer to judge if the BD ring is
  full.  If you want to do this, make sure it's well documented in 
code,
  or others will be confused.  And as Anton already commented, in/out
  access is slow.  I think the original way can be fixed if you think
  it's broken, and will have better performance.
 
  The code could use a better comment, but I really think this method
  is superior and more robust. The problem is that in/out functions is
  much slower than it has to be for QE IO memory.
 
  The original way is broken and I tired to fix it but it always broke
  as soon one applied some load.
 
 The only difference I see between your method and the original code is
 that you update the cleanup progress on every BD.  The original code
 updates progress only after all sent BDs are processed.  You can try
 move ugeth-confBd[txQ] = bd; into the loop then it will be the same
 as your proposed code.

Already tried that+lots of other combinatios. Doesn't work

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] ucc_geth: Rework the TX logic.

2009-03-27 Thread Scott Wood

Joakim Tjernlund wrote:

This is a bit better than the current type casts. Moving over to qe_bd
accesses is a bigger cleanup that will have to be a seperate patch.
I am not sure it is an all win as you probably loose the ability
to read both status and len in one access.


Not if you define the struct to have one status_len field (or a union of 
that with separate fields), as gianfar does.


-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] ucc_geth: Rework the TX logic.

2009-03-26 Thread Anton Vorontsov
On Thu, Mar 26, 2009 at 06:44:05PM +0100, Joakim Tjernlund wrote:
 The line:
  if ((bd == ugeth-txBd[txQ])  (netif_queue_stopped(dev) == 0))
break;
 in ucc_geth_tx() didn not make sense to me. Rework  cleanup
 this logic to something understandable.
 ---
 
 Reworked the patch according to Antons comments.
 
  drivers/net/ucc_geth.c |   66 +++
  1 files changed, 38 insertions(+), 28 deletions(-)
 
 diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
 index 7fc91aa..465de3a 100644
 --- a/drivers/net/ucc_geth.c
 +++ b/drivers/net/ucc_geth.c
 @@ -161,6 +161,17 @@ static struct ucc_geth_info ugeth_primary_info = {
  
  static struct ucc_geth_info ugeth_info[8];
  
 +
 +static inline u32 __iomem *bd2buf(u8 __iomem *bd)
 +{
 + return (u32 __iomem *)(bd+4);

Spaces around +...

Also, now it's obvious that we're just reading status or buf.

So instead of these inlines we could use struct qe_bd as described
in asm/qe.h.

I.e.

struct qe_bd *bd = ...;

in_be32(bd-buf);
in_be16(bd-status);

Oh, wait. We can't, ucc_geth assumes status is u32, which includes
the length field, i.e. ucc_fast.h defines:

#define T_W 0x2000

:-(

The cleanup work surely desires a separate patch, so bd2buf and
bd2status are OK, for now.

I'll test the patch as soon as I'll get some QE board back on
my table (actually I have one QE board handy, but it's a 1GHz
one, while I'd like to test the patch on a slow machine, where
we'll actually see performance regressions).

[...]
 +   tx_ind = (tx_ind + 1)  
 TX_RING_MOD_MASK(ugeth-ug_info-bdRingLenTx[txQ]);

Line over 80 columns.

[...]
 +   if (num_freed)
 +   netif_wake_queue(dev); /* We freed some buffers, so restart 
 transmission */

Ditto.

Please make sure your patches pass scripts/checkpatch.pl.

Thanks,

-- 
Anton Vorontsov
email: cbouatmai...@gmail.com
irc://irc.freenode.net/bd2
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] ucc_geth: Rework the TX logic.

2009-03-26 Thread Joakim Tjernlund
Anton Vorontsov avoront...@ru.mvista.com wrote on 26/03/2009 19:03:54:
 
 On Thu, Mar 26, 2009 at 06:44:05PM +0100, Joakim Tjernlund wrote:
  The line:
   if ((bd == ugeth-txBd[txQ])  (netif_queue_stopped(dev) == 0))
 break;
  in ucc_geth_tx() didn not make sense to me. Rework  cleanup
  this logic to something understandable.
  ---
  
  Reworked the patch according to Antons comments.
  
   drivers/net/ucc_geth.c |   66 
+++
   1 files changed, 38 insertions(+), 28 deletions(-)
  
  diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
  index 7fc91aa..465de3a 100644
  --- a/drivers/net/ucc_geth.c
  +++ b/drivers/net/ucc_geth.c
  @@ -161,6 +161,17 @@ static struct ucc_geth_info ugeth_primary_info = 
{
  
   static struct ucc_geth_info ugeth_info[8];
  
  +
  +static inline u32 __iomem *bd2buf(u8 __iomem *bd)
  +{
  +   return (u32 __iomem *)(bd+4);
 
 Spaces around +...

Bugger, I forgot when I moved it.

 
 Also, now it's obvious that we're just reading status or buf.
 
 So instead of these inlines we could use struct qe_bd as described
 in asm/qe.h.
 
 I.e.
 
 struct qe_bd *bd = ...;
 
 in_be32(bd-buf);
 in_be16(bd-status);
 
 Oh, wait. We can't, ucc_geth assumes status is u32, which includes
 the length field, i.e. ucc_fast.h defines:
 
 #define T_W 0x2000
 
 :-(
 
 The cleanup work surely desires a separate patch, so bd2buf and
 bd2status are OK, for now.

Exactly, I did look at using struct qe_bd *bd, but that will
affect lots of stuff. Reading the status and len in one go is
also quite handy so I not sure it is a good idea to convert.

 
 I'll test the patch as soon as I'll get some QE board back on
 my table (actually I have one QE board handy, but it's a 1GHz
 one, while I'd like to test the patch on a slow machine, where
 we'll actually see performance regressions).

Good, I have more coming but it touches the same code so I really want
to sort out this one first.

 Jocke
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev