Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver

2020-05-18 Thread Arnd Bergmann
On Mon, May 18, 2020 at 4:07 PM Bartosz Golaszewski  wrote:
> pt., 15 maj 2020 o 15:32 Arnd Bergmann  napisał(a):

> > I would get rid of the 'count' here, as it duplicates the information
> > that is already known from the difference between head and tail, and you
> > can't update it atomically without holding a lock around the access to
> > the ring. The way I'd do this is to have the head and tail pointers
> > in separate cache lines, and then use READ_ONCE/WRITE_ONCE
> > and smp barriers to access them, with each one updated on one
> > thread but read by the other.
> >
>
> Your previous solution seems much more reliable though. For instance
> in the above: when we're doing the TX cleanup (we got the TX ready
> irq, we're iterating over descriptors until we know there are no more
> packets scheduled (count == 0) or we encounter one that's still owned
> by DMA), a parallel TX path can schedule new packets to be sent and I
> don't see how we can atomically check the count (understood as a
> difference between tail and head) and run a new iteration (where we'd
> modify the head or tail) without risking the other path getting in the
> way. We'd have to always check the descriptor.

It should be enough to read both pointers once at the start of each
side, then do whatever work you want to do (cleaning, sending,
receiving, refilling) and finally updating the one pointer that changed.
If both sides do that, you minimize the cache line bouncing and
always do a useful amount of work that guarantees forward progress
and does not interfere with the other side.

> I experimented a bit with this and couldn't come up with anything that
> would pass any stress test.
>
> On the other hand: spin_lock_bh() works fine and I like your approach
> from the previous e-mail - except for the work for updating stats as
> we could potentially lose some stats when we're updating in process
> context with RX/TX paths running in parallel in napi context but that
> would be rare enough to overlook it.
>
> I hope v4 will be good enough even with spinlocks. :)

Yes, it should be fine. Avoiding all the locks is mainly an optimization
for the number of CPU cycles spent per packet, the other points
are more important to get right, in particular the flow control.

  Arnd


Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver

2020-05-18 Thread Bartosz Golaszewski
pt., 15 maj 2020 o 15:32 Arnd Bergmann  napisał(a):
>
> On Thu, May 14, 2020 at 10:00 AM Bartosz Golaszewski  wrote:
> > +static int mtk_mac_ring_pop_tail(struct mtk_mac_ring *ring,
> > +struct mtk_mac_ring_desc_data *desc_data)
>
> I took another look at this function because of your comment on the locking
> the descriptor updates, which seemed suspicious as the device side does not
> actually use the locks to access them
>
> > +{
> > +   struct mtk_mac_ring_desc *desc = &ring->descs[ring->tail];
> > +   unsigned int status;
> > +
> > +   /* Let the device release the descriptor. */
> > +   dma_rmb();
> > +   status = desc->status;
> > +   if (!(status & MTK_MAC_DESC_BIT_COWN))
> > +   return -1;
>
> The dma_rmb() seems odd here, as I don't see which prior read
> is being protected by this.
>
> > +   desc_data->len = status & MTK_MAC_DESC_MSK_LEN;
> > +   desc_data->flags = status & ~MTK_MAC_DESC_MSK_LEN;
> > +   desc_data->dma_addr = ring->dma_addrs[ring->tail];
> > +   desc_data->skb = ring->skbs[ring->tail];
> > +
> > +   desc->data_ptr = 0;
> > +   desc->status = MTK_MAC_DESC_BIT_COWN;
> > +   if (status & MTK_MAC_DESC_BIT_EOR)
> > +   desc->status |= MTK_MAC_DESC_BIT_EOR;
> > +
> > +   /* Flush writes to descriptor memory. */
> > +   dma_wmb();
>
> The comment and the barrier here seem odd as well. I would have expected
> a barrier after the update to the data pointer, and only a single store
> but no read of the status flag instead of the read-modify-write,
> something like
>
>   desc->data_ptr = 0;
>   dma_wmb(); /* make pointer update visible before status update */
>   desc->status = MTK_MAC_DESC_BIT_COWN | (status & MTK_MAC_DESC_BIT_EOR);
>
> > +   ring->tail = (ring->tail + 1) % MTK_MAC_RING_NUM_DESCS;
> > +   ring->count--;
>
> I would get rid of the 'count' here, as it duplicates the information
> that is already known from the difference between head and tail, and you
> can't update it atomically without holding a lock around the access to
> the ring. The way I'd do this is to have the head and tail pointers
> in separate cache lines, and then use READ_ONCE/WRITE_ONCE
> and smp barriers to access them, with each one updated on one
> thread but read by the other.
>

Your previous solution seems much more reliable though. For instance
in the above: when we're doing the TX cleanup (we got the TX ready
irq, we're iterating over descriptors until we know there are no more
packets scheduled (count == 0) or we encounter one that's still owned
by DMA), a parallel TX path can schedule new packets to be sent and I
don't see how we can atomically check the count (understood as a
difference between tail and head) and run a new iteration (where we'd
modify the head or tail) without risking the other path getting in the
way. We'd have to always check the descriptor.

I experimented a bit with this and couldn't come up with anything that
would pass any stress test.

On the other hand: spin_lock_bh() works fine and I like your approach
from the previous e-mail - except for the work for updating stats as
we could potentially lose some stats when we're updating in process
context with RX/TX paths running in parallel in napi context but that
would be rare enough to overlook it.

I hope v4 will be good enough even with spinlocks. :)

Bart


Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver

2020-05-15 Thread Arnd Bergmann
On Thu, May 14, 2020 at 10:00 AM Bartosz Golaszewski  wrote:
> +static int mtk_mac_ring_pop_tail(struct mtk_mac_ring *ring,
> +struct mtk_mac_ring_desc_data *desc_data)

I took another look at this function because of your comment on the locking
the descriptor updates, which seemed suspicious as the device side does not
actually use the locks to access them

> +{
> +   struct mtk_mac_ring_desc *desc = &ring->descs[ring->tail];
> +   unsigned int status;
> +
> +   /* Let the device release the descriptor. */
> +   dma_rmb();
> +   status = desc->status;
> +   if (!(status & MTK_MAC_DESC_BIT_COWN))
> +   return -1;

The dma_rmb() seems odd here, as I don't see which prior read
is being protected by this.

> +   desc_data->len = status & MTK_MAC_DESC_MSK_LEN;
> +   desc_data->flags = status & ~MTK_MAC_DESC_MSK_LEN;
> +   desc_data->dma_addr = ring->dma_addrs[ring->tail];
> +   desc_data->skb = ring->skbs[ring->tail];
> +
> +   desc->data_ptr = 0;
> +   desc->status = MTK_MAC_DESC_BIT_COWN;
> +   if (status & MTK_MAC_DESC_BIT_EOR)
> +   desc->status |= MTK_MAC_DESC_BIT_EOR;
> +
> +   /* Flush writes to descriptor memory. */
> +   dma_wmb();

The comment and the barrier here seem odd as well. I would have expected
a barrier after the update to the data pointer, and only a single store
but no read of the status flag instead of the read-modify-write,
something like

  desc->data_ptr = 0;
  dma_wmb(); /* make pointer update visible before status update */
  desc->status = MTK_MAC_DESC_BIT_COWN | (status & MTK_MAC_DESC_BIT_EOR);

> +   ring->tail = (ring->tail + 1) % MTK_MAC_RING_NUM_DESCS;
> +   ring->count--;

I would get rid of the 'count' here, as it duplicates the information
that is already known from the difference between head and tail, and you
can't update it atomically without holding a lock around the access to
the ring. The way I'd do this is to have the head and tail pointers
in separate cache lines, and then use READ_ONCE/WRITE_ONCE
and smp barriers to access them, with each one updated on one
thread but read by the other.

 Arnd


Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver

2020-05-15 Thread Andrew Lunn
On Fri, May 15, 2020 at 09:11:14AM +0200, Bartosz Golaszewski wrote:
> czw., 14 maj 2020 o 18:19 Arnd Bergmann  napisał(a):
> >
> > On Thu, May 14, 2020 at 10:00 AM Bartosz Golaszewski  wrote:
> > >
> > > From: Bartosz Golaszewski 
> > >
> > > This adds the driver for the MediaTek Ethernet MAC used on the MT8* SoC
> > > family. For now we only support full-duplex.
> > >
> > > Signed-off-by: Bartosz Golaszewski 
> >
> > Looks very nice overall. Just a few things I noticed, and some ideas
> > that may or may not make sense:
> >
> > > +/* This is defined to 0 on arm64 in arch/arm64/include/asm/processor.h 
> > > but
> > > + * this IP doesn't work without this alignment being equal to 2.
> > > + */
> > > +#ifdef NET_IP_ALIGN
> > > +#undef NET_IP_ALIGN
> > > +#endif
> > > +#define NET_IP_ALIGN   2
> >
> > Maybe you should just define your own macro instead of replacing
> > the normal one then?
> >
> 
> I did in an earlier version and was told to use NET_IP_ALIGN but then
> found out its value on arm64 doesn't work for me so I did the thing
> that won't make anybody happy - redefine the existing constant. :)

Hi Bartosz

I did not realise ARM64 set it to 0. As Arnd suggested, please define
your own macro.

Andrew


Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver

2020-05-15 Thread Arnd Bergmann
On Fri, May 15, 2020 at 2:56 PM Bartosz Golaszewski  wrote:
> pt., 15 maj 2020 o 14:04 Arnd Bergmann  napisał(a):
> > On Fri, May 15, 2020 at 9:11 AM Bartosz Golaszewski  wrote:

> > > >
> > > > It looks like most of the stuff inside of the loop can be pulled out
> > > > and only done once here.
> > > >
> > >
> > > I did that in one of the previous submissions but it was pointed out
> > > to me that a parallel TX path may fill up the queue before I wake it.
> >
> > Right, I see you plugged that hole, however the way you hold the
> > spinlock across the expensive DMA management but then give it
> > up in each loop iteration feels like this is not the most efficient
> > way.
> >
>
> Maybe my thinking is wrong here, but I assumed that with a spinlock
> it's better to give other threads the chance to run in between each
> iteration. I didn't benchmark it though.

It depends. You want to avoid lock contention (two threads trying to
get the lock at the same time) but you also want to avoid bouncing
around the spinlock between the caches.

In the contention case, what I think would happen here is that the
cleanup thread gives up the lock and the xmit function gets it, but
then the cleanup thread is spinning again, so you are still blocked
on one of the two CPUs but also pay the overhead of synchronizing
between the two.

Holding the lock the whole time would speed up both the good case
(no contention) and the bad case (bouncing the lock) a little bit
because it saves some overhead. Holding the lock for shorter
times (i.e. not during the cache operations) would reduce the
amount of lock-contention but not help in the good case.

Not needing a lock at all is generally best, but getting it right
is tricky ;-)

  Arnd


Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver

2020-05-15 Thread Bartosz Golaszewski
pt., 15 maj 2020 o 14:04 Arnd Bergmann  napisał(a):
>
> On Fri, May 15, 2020 at 9:11 AM Bartosz Golaszewski  wrote:
> >
> > czw., 14 maj 2020 o 18:19 Arnd Bergmann  napisał(a):
> > >
> > > On Thu, May 14, 2020 at 10:00 AM Bartosz Golaszewski  
> > > wrote:
> > > > +static unsigned int mtk_mac_intr_read_and_clear(struct mtk_mac_priv 
> > > > *priv)
> > > > +{
> > > > +   unsigned int val;
> > > > +
> > > > +   regmap_read(priv->regs, MTK_MAC_REG_INT_STS, &val);
> > > > +   regmap_write(priv->regs, MTK_MAC_REG_INT_STS, val);
> > > > +
> > > > +   return val;
> > > > +}
> > >
> > > Do you actually need to read the register? That is usually a relatively
> > > expensive operation, so if possible try to use clear the bits when
> > > you don't care which bits were set.
> > >
> >
> > I do care, I'm afraid. The returned value is being used in the napi
> > poll callback to see which ring to process.
>
> I suppose the other callers are not performance critical.
>
> For the rx and tx processing, it should be better to just always look at
> the queue directly and ignore the irq status, in particular when you
> are already in polling mode: suppose you receive ten frames at once
> and only process five but clear the irq flag.
>
> When the poll function is called again, you still need to process the
> others, but I would assume that the status tells you that nothing
> new has arrived so you don't process them until the next interrupt.
>
> For the statistics, I assume you do need to look at the irq status,
> but this doesn't have to be done in the poll function. How about
> something like:
>
> - in hardirq context, read the irq status word
> - irq rx or tx irq pending, call napi_schedule
> - if stats irq pending, schedule a work function
> - in napi poll, process both queues until empty or
>   budget exhausted
> - if packet processing completed in poll function
>   ack the irq and check again, call napi_complete
> - in work function, handle stats irq, then ack it
>

I see your point. I'll try to come up with something and send a new
version on Monday.

> > > > +static void mtk_mac_tx_complete_all(struct mtk_mac_priv *priv)
> > > > +{
> > > > +   struct mtk_mac_ring *ring = &priv->tx_ring;
> > > > +   struct net_device *ndev = priv->ndev;
> > > > +   int ret;
> > > > +
> > > > +   for (;;) {
> > > > +   mtk_mac_lock(priv);
> > > > +
> > > > +   if (!mtk_mac_ring_descs_available(ring)) {
> > > > +   mtk_mac_unlock(priv);
> > > > +   break;
> > > > +   }
> > > > +
> > > > +   ret = mtk_mac_tx_complete_one(priv);
> > > > +   if (ret) {
> > > > +   mtk_mac_unlock(priv);
> > > > +   break;
> > > > +   }
> > > > +
> > > > +   if (netif_queue_stopped(ndev))
> > > > +   netif_wake_queue(ndev);
> > > > +
> > > > +   mtk_mac_unlock(priv);
> > > > +   }
> > > > +}
> > >
> > > It looks like most of the stuff inside of the loop can be pulled out
> > > and only done once here.
> > >
> >
> > I did that in one of the previous submissions but it was pointed out
> > to me that a parallel TX path may fill up the queue before I wake it.
>
> Right, I see you plugged that hole, however the way you hold the
> spinlock across the expensive DMA management but then give it
> up in each loop iteration feels like this is not the most efficient
> way.
>

Maybe my thinking is wrong here, but I assumed that with a spinlock
it's better to give other threads the chance to run in between each
iteration. I didn't benchmark it though.

> The easy way would be to just hold the lock across the entire
> loop and then be sure you do it right. Alternatively you could
> minimize the locking and only do the wakeup after up do the final
> update to the tail pointer, at which point you know the queue is not
> full because you have just freed up at least one entry.
>

Makes sense, I'll see what I can do.

Bartosz


Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver

2020-05-15 Thread Arnd Bergmann
On Fri, May 15, 2020 at 9:11 AM Bartosz Golaszewski  wrote:
>
> czw., 14 maj 2020 o 18:19 Arnd Bergmann  napisał(a):
> >
> > On Thu, May 14, 2020 at 10:00 AM Bartosz Golaszewski  wrote:
> > > +static unsigned int mtk_mac_intr_read_and_clear(struct mtk_mac_priv 
> > > *priv)
> > > +{
> > > +   unsigned int val;
> > > +
> > > +   regmap_read(priv->regs, MTK_MAC_REG_INT_STS, &val);
> > > +   regmap_write(priv->regs, MTK_MAC_REG_INT_STS, val);
> > > +
> > > +   return val;
> > > +}
> >
> > Do you actually need to read the register? That is usually a relatively
> > expensive operation, so if possible try to use clear the bits when
> > you don't care which bits were set.
> >
>
> I do care, I'm afraid. The returned value is being used in the napi
> poll callback to see which ring to process.

I suppose the other callers are not performance critical.

For the rx and tx processing, it should be better to just always look at
the queue directly and ignore the irq status, in particular when you
are already in polling mode: suppose you receive ten frames at once
and only process five but clear the irq flag.

When the poll function is called again, you still need to process the
others, but I would assume that the status tells you that nothing
new has arrived so you don't process them until the next interrupt.

For the statistics, I assume you do need to look at the irq status,
but this doesn't have to be done in the poll function. How about
something like:

- in hardirq context, read the irq status word
- irq rx or tx irq pending, call napi_schedule
- if stats irq pending, schedule a work function
- in napi poll, process both queues until empty or
  budget exhausted
- if packet processing completed in poll function
  ack the irq and check again, call napi_complete
- in work function, handle stats irq, then ack it

> > > +static void mtk_mac_tx_complete_all(struct mtk_mac_priv *priv)
> > > +{
> > > +   struct mtk_mac_ring *ring = &priv->tx_ring;
> > > +   struct net_device *ndev = priv->ndev;
> > > +   int ret;
> > > +
> > > +   for (;;) {
> > > +   mtk_mac_lock(priv);
> > > +
> > > +   if (!mtk_mac_ring_descs_available(ring)) {
> > > +   mtk_mac_unlock(priv);
> > > +   break;
> > > +   }
> > > +
> > > +   ret = mtk_mac_tx_complete_one(priv);
> > > +   if (ret) {
> > > +   mtk_mac_unlock(priv);
> > > +   break;
> > > +   }
> > > +
> > > +   if (netif_queue_stopped(ndev))
> > > +   netif_wake_queue(ndev);
> > > +
> > > +   mtk_mac_unlock(priv);
> > > +   }
> > > +}
> >
> > It looks like most of the stuff inside of the loop can be pulled out
> > and only done once here.
> >
>
> I did that in one of the previous submissions but it was pointed out
> to me that a parallel TX path may fill up the queue before I wake it.

Right, I see you plugged that hole, however the way you hold the
spinlock across the expensive DMA management but then give it
up in each loop iteration feels like this is not the most efficient
way.

The easy way would be to just hold the lock across the entire
loop and then be sure you do it right. Alternatively you could
minimize the locking and only do the wakeup after up do the final
update to the tail pointer, at which point you know the queue is not
full because you have just freed up at least one entry.

> > > +static int mtk_mac_poll(struct napi_struct *napi, int budget)
> > > +{
> > > +   struct mtk_mac_priv *priv;
> > > +   unsigned int status;
> > > +   int received = 0;
> > > +
> > > +   priv = container_of(napi, struct mtk_mac_priv, napi);
> > > +
> > > +   status = mtk_mac_intr_read_and_clear(priv);
> > > +
> > > +   /* Clean up TX */
> > > +   if (status & MTK_MAC_BIT_INT_STS_TNTC)
> > > +   mtk_mac_tx_complete_all(priv);
> > > +
> > > +   /* Receive up to $budget packets */
> > > +   if (status & MTK_MAC_BIT_INT_STS_FNRC)
> > > +   received = mtk_mac_process_rx(priv, budget);
> > > +
> > > +   /* One of the counter reached 0x800 - update stats and reset 
> > > all
> > > +* counters.
> > > +*/
> > > +   if (status & MTK_MAC_REG_INT_STS_MIB_CNT_TH) {
> > > +   mtk_mac_update_stats(priv);
> > > +   mtk_mac_reset_counters(priv);
> > > +   }
> > > +
> > > +   if (received < budget)
> > > +   napi_complete_done(napi, received);
> > > +
> > > +   mtk_mac_intr_unmask_all(priv);
> > > +
> > > +   return received;
> > > +}
> >
> > I think you want to leave (at least some of) the interrupts masked
> > if your budget is exhausted, to avoid generating unnecessary
> > irqs.
> >
>
> The networking stack shouldn't queue any new TX packets if the queue
> is stopped - is this really worth complicating the code? Looks like
> premature optimization

Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver

2020-05-15 Thread Bartosz Golaszewski
czw., 14 maj 2020 o 18:19 Arnd Bergmann  napisał(a):
>
> On Thu, May 14, 2020 at 10:00 AM Bartosz Golaszewski  wrote:
> >
> > From: Bartosz Golaszewski 
> >
> > This adds the driver for the MediaTek Ethernet MAC used on the MT8* SoC
> > family. For now we only support full-duplex.
> >
> > Signed-off-by: Bartosz Golaszewski 
>
> Looks very nice overall. Just a few things I noticed, and some ideas
> that may or may not make sense:
>
> > +/* This is defined to 0 on arm64 in arch/arm64/include/asm/processor.h but
> > + * this IP doesn't work without this alignment being equal to 2.
> > + */
> > +#ifdef NET_IP_ALIGN
> > +#undef NET_IP_ALIGN
> > +#endif
> > +#define NET_IP_ALIGN   2
>
> Maybe you should just define your own macro instead of replacing
> the normal one then?
>

I did in an earlier version and was told to use NET_IP_ALIGN but then
found out its value on arm64 doesn't work for me so I did the thing
that won't make anybody happy - redefine the existing constant. :)

> > +static void mtk_mac_lock(struct mtk_mac_priv *priv)
> > +{
> > +   spin_lock_irqsave(&priv->lock, priv->lock_flags);
> > +}
> > +
> > +static void mtk_mac_unlock(struct mtk_mac_priv *priv)
> > +{
> > +   spin_unlock_irqrestore(&priv->lock, priv->lock_flags);
> > +}
>
> This looks wrong: you should not have shared 'flags' passed into
> spin_lock_irqsave(), and I don't even see a need to use the
> irqsave variant of the lock in the first place.
>
> Maybe start by open-coding the lock and remove the wrappers
> above.
>
> Then see if you can use a cheaper spin_lock_bh() or plain spin_lock()
> instead of irqsave.
>

This is from an earlier version where I did a lot more in hard irq
context. Now that almost all of the processing happens in soft-irq
context I guess you're right - I can go with a regular spin_lock().

> Finally, see if this can be done in a lockless way by relying on
> appropriate barriers and separating the writers into separate
> cache lines. From a brief look at the driver I think it can be done
> without too much trouble.
>

Unfortunately I do need some locking. Accessing RX and TX descriptors
at the same time seems to upset the controller. I experimented a lot
with barriers but it turned out that I got a lot of weird bugs at high
throughput.

> > +static unsigned int mtk_mac_intr_read_and_clear(struct mtk_mac_priv *priv)
> > +{
> > +   unsigned int val;
> > +
> > +   regmap_read(priv->regs, MTK_MAC_REG_INT_STS, &val);
> > +   regmap_write(priv->regs, MTK_MAC_REG_INT_STS, val);
> > +
> > +   return val;
> > +}
>
> Do you actually need to read the register? That is usually a relatively
> expensive operation, so if possible try to use clear the bits when
> you don't care which bits were set.
>

I do care, I'm afraid. The returned value is being used in the napi
poll callback to see which ring to process.

> > +/* All processing for TX and RX happens in the napi poll callback. */
> > +static irqreturn_t mtk_mac_handle_irq(int irq, void *data)
> > +{
> > +   struct mtk_mac_priv *priv;
> > +   struct net_device *ndev;
> > +
> > +   ndev = data;
> > +   priv = netdev_priv(ndev);
> > +
> > +   if (netif_running(ndev)) {
> > +   mtk_mac_intr_mask_all(priv);
> > +   napi_schedule(&priv->napi);
> > +   }
> > +
> > +   return IRQ_HANDLED;
>
>
> > +static int mtk_mac_netdev_start_xmit(struct sk_buff *skb,
> > +struct net_device *ndev)
> > +{
> > +   struct mtk_mac_priv *priv = netdev_priv(ndev);
> > +   struct mtk_mac_ring *ring = &priv->tx_ring;
> > +   struct device *dev = mtk_mac_get_dev(priv);
> > +   struct mtk_mac_ring_desc_data desc_data;
> > +
> > +   desc_data.dma_addr = mtk_mac_dma_map_tx(priv, skb);
> > +   if (dma_mapping_error(dev, desc_data.dma_addr))
> > +   goto err_drop_packet;
> > +
> > +   desc_data.skb = skb;
> > +   desc_data.len = skb->len;
> > +
> > +   mtk_mac_lock(priv);
> > +   mtk_mac_ring_push_head_tx(ring, &desc_data);
> > +
> > +   if (mtk_mac_ring_full(ring))
> > +   netif_stop_queue(ndev);
> > +   mtk_mac_unlock(priv);
> > +
> > +   mtk_mac_dma_resume_tx(priv);
> > +
> > +   return NETDEV_TX_OK;
> > +
> > +err_drop_packet:
> > +   dev_kfree_skb(skb);
> > +   ndev->stats.tx_dropped++;
> > +   return NETDEV_TX_BUSY;
> > +}
>
> I would always add BQL flow control in new drivers, using
> netdev_sent_queue here...
>

Ok, will do.

> > +static int mtk_mac_tx_complete_one(struct mtk_mac_priv *priv)
> > +{
> > +   struct mtk_mac_ring *ring = &priv->tx_ring;
> > +   struct mtk_mac_ring_desc_data desc_data;
> > +   int ret;
> > +
> > +   ret = mtk_mac_ring_pop_tail(ring, &desc_data);
> > +   if (ret)
> > +   return ret;
> > +
> > +   mtk_mac_dma_unmap_tx(priv, &desc_data);
> > +   dev_kfree_skb_irq(desc_data.skb);
> > +
> > +   return 0;
> > +}

Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver

2020-05-14 Thread Arnd Bergmann
On Thu, May 14, 2020 at 10:00 AM Bartosz Golaszewski  wrote:
>
> From: Bartosz Golaszewski 
>
> This adds the driver for the MediaTek Ethernet MAC used on the MT8* SoC
> family. For now we only support full-duplex.
>
> Signed-off-by: Bartosz Golaszewski 

Looks very nice overall. Just a few things I noticed, and some ideas
that may or may not make sense:

> +/* This is defined to 0 on arm64 in arch/arm64/include/asm/processor.h but
> + * this IP doesn't work without this alignment being equal to 2.
> + */
> +#ifdef NET_IP_ALIGN
> +#undef NET_IP_ALIGN
> +#endif
> +#define NET_IP_ALIGN   2

Maybe you should just define your own macro instead of replacing
the normal one then?

> +static void mtk_mac_lock(struct mtk_mac_priv *priv)
> +{
> +   spin_lock_irqsave(&priv->lock, priv->lock_flags);
> +}
> +
> +static void mtk_mac_unlock(struct mtk_mac_priv *priv)
> +{
> +   spin_unlock_irqrestore(&priv->lock, priv->lock_flags);
> +}

This looks wrong: you should not have shared 'flags' passed into
spin_lock_irqsave(), and I don't even see a need to use the
irqsave variant of the lock in the first place.

Maybe start by open-coding the lock and remove the wrappers
above.

Then see if you can use a cheaper spin_lock_bh() or plain spin_lock()
instead of irqsave.

Finally, see if this can be done in a lockless way by relying on
appropriate barriers and separating the writers into separate
cache lines. From a brief look at the driver I think it can be done
without too much trouble.

> +static unsigned int mtk_mac_intr_read_and_clear(struct mtk_mac_priv *priv)
> +{
> +   unsigned int val;
> +
> +   regmap_read(priv->regs, MTK_MAC_REG_INT_STS, &val);
> +   regmap_write(priv->regs, MTK_MAC_REG_INT_STS, val);
> +
> +   return val;
> +}

Do you actually need to read the register? That is usually a relatively
expensive operation, so if possible try to use clear the bits when
you don't care which bits were set.

> +/* All processing for TX and RX happens in the napi poll callback. */
> +static irqreturn_t mtk_mac_handle_irq(int irq, void *data)
> +{
> +   struct mtk_mac_priv *priv;
> +   struct net_device *ndev;
> +
> +   ndev = data;
> +   priv = netdev_priv(ndev);
> +
> +   if (netif_running(ndev)) {
> +   mtk_mac_intr_mask_all(priv);
> +   napi_schedule(&priv->napi);
> +   }
> +
> +   return IRQ_HANDLED;


> +static int mtk_mac_netdev_start_xmit(struct sk_buff *skb,
> +struct net_device *ndev)
> +{
> +   struct mtk_mac_priv *priv = netdev_priv(ndev);
> +   struct mtk_mac_ring *ring = &priv->tx_ring;
> +   struct device *dev = mtk_mac_get_dev(priv);
> +   struct mtk_mac_ring_desc_data desc_data;
> +
> +   desc_data.dma_addr = mtk_mac_dma_map_tx(priv, skb);
> +   if (dma_mapping_error(dev, desc_data.dma_addr))
> +   goto err_drop_packet;
> +
> +   desc_data.skb = skb;
> +   desc_data.len = skb->len;
> +
> +   mtk_mac_lock(priv);
> +   mtk_mac_ring_push_head_tx(ring, &desc_data);
> +
> +   if (mtk_mac_ring_full(ring))
> +   netif_stop_queue(ndev);
> +   mtk_mac_unlock(priv);
> +
> +   mtk_mac_dma_resume_tx(priv);
> +
> +   return NETDEV_TX_OK;
> +
> +err_drop_packet:
> +   dev_kfree_skb(skb);
> +   ndev->stats.tx_dropped++;
> +   return NETDEV_TX_BUSY;
> +}

I would always add BQL flow control in new drivers, using
netdev_sent_queue here...

> +static int mtk_mac_tx_complete_one(struct mtk_mac_priv *priv)
> +{
> +   struct mtk_mac_ring *ring = &priv->tx_ring;
> +   struct mtk_mac_ring_desc_data desc_data;
> +   int ret;
> +
> +   ret = mtk_mac_ring_pop_tail(ring, &desc_data);
> +   if (ret)
> +   return ret;
> +
> +   mtk_mac_dma_unmap_tx(priv, &desc_data);
> +   dev_kfree_skb_irq(desc_data.skb);
> +
> +   return 0;
> +}

... and netdev_completed_queue()  here.

> +static void mtk_mac_tx_complete_all(struct mtk_mac_priv *priv)
> +{
> +   struct mtk_mac_ring *ring = &priv->tx_ring;
> +   struct net_device *ndev = priv->ndev;
> +   int ret;
> +
> +   for (;;) {
> +   mtk_mac_lock(priv);
> +
> +   if (!mtk_mac_ring_descs_available(ring)) {
> +   mtk_mac_unlock(priv);
> +   break;
> +   }
> +
> +   ret = mtk_mac_tx_complete_one(priv);
> +   if (ret) {
> +   mtk_mac_unlock(priv);
> +   break;
> +   }
> +
> +   if (netif_queue_stopped(ndev))
> +   netif_wake_queue(ndev);
> +
> +   mtk_mac_unlock(priv);
> +   }
> +}

It looks like most of the stuff inside of the loop can be pulled out
and only done once here.

> +static int mtk_mac_poll(struct napi_struct *napi, int budget)
> +{
> +   struct mtk_mac_priv *priv;
> +   unsigned int status;
> +   i

[PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver

2020-05-14 Thread Bartosz Golaszewski
From: Bartosz Golaszewski 

This adds the driver for the MediaTek Ethernet MAC used on the MT8* SoC
family. For now we only support full-duplex.

Signed-off-by: Bartosz Golaszewski 
---
 drivers/net/ethernet/mediatek/Kconfig   |6 +
 drivers/net/ethernet/mediatek/Makefile  |1 +
 drivers/net/ethernet/mediatek/mtk_eth_mac.c | 1578 +++
 3 files changed, 1585 insertions(+)
 create mode 100644 drivers/net/ethernet/mediatek/mtk_eth_mac.c

diff --git a/drivers/net/ethernet/mediatek/Kconfig 
b/drivers/net/ethernet/mediatek/Kconfig
index 5079b8090f16..5c3793076765 100644
--- a/drivers/net/ethernet/mediatek/Kconfig
+++ b/drivers/net/ethernet/mediatek/Kconfig
@@ -14,4 +14,10 @@ config NET_MEDIATEK_SOC
  This driver supports the gigabit ethernet MACs in the
  MediaTek SoC family.
 
+config NET_MEDIATEK_MAC
+   tristate "MediaTek Ethernet MAC support"
+   select PHYLIB
+   help
+ This driver supports the ethernet IP on MediaTek MT85** SoCs.
+
 endif #NET_VENDOR_MEDIATEK
diff --git a/drivers/net/ethernet/mediatek/Makefile 
b/drivers/net/ethernet/mediatek/Makefile
index 3362fb7ef859..f7f5638943a0 100644
--- a/drivers/net/ethernet/mediatek/Makefile
+++ b/drivers/net/ethernet/mediatek/Makefile
@@ -5,3 +5,4 @@
 
 obj-$(CONFIG_NET_MEDIATEK_SOC) += mtk_eth.o
 mtk_eth-y := mtk_eth_soc.o mtk_sgmii.o mtk_eth_path.o
+obj-$(CONFIG_NET_MEDIATEK_MAC) += mtk_eth_mac.o
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_mac.c 
b/drivers/net/ethernet/mediatek/mtk_eth_mac.c
new file mode 100644
index ..6fbe49e861d6
--- /dev/null
+++ b/drivers/net/ethernet/mediatek/mtk_eth_mac.c
@@ -0,0 +1,1578 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2020 MediaTek Corporation
+ * Copyright (c) 2020 BayLibre SAS
+ *
+ * Author: Bartosz Golaszewski 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define MTK_MAC_DRVNAME"mtk_eth_mac"
+
+#define MTK_MAC_WAIT_TIMEOUT   300
+#define MTK_MAC_MAX_FRAME_SIZE 1514
+#define MTK_MAC_SKB_ALIGNMENT  16
+#define MTK_MAC_NAPI_WEIGHT64
+#define MTK_MAC_HASHTABLE_MC_LIMIT 256
+#define MTK_MAC_HASHTABLE_SIZE_MAX 512
+
+/* This is defined to 0 on arm64 in arch/arm64/include/asm/processor.h but
+ * this IP doesn't work without this alignment being equal to 2.
+ */
+#ifdef NET_IP_ALIGN
+#undef NET_IP_ALIGN
+#endif
+#define NET_IP_ALIGN   2
+
+static const char *const mtk_mac_clk_names[] = { "core", "reg", "trans" };
+#define MTK_MAC_NCLKS ARRAY_SIZE(mtk_mac_clk_names)
+
+/* PHY Control Register 0 */
+#define MTK_MAC_REG_PHY_CTRL0  0x
+#define MTK_MAC_BIT_PHY_CTRL0_WTCMDBIT(13)
+#define MTK_MAC_BIT_PHY_CTRL0_RDCMDBIT(14)
+#define MTK_MAC_BIT_PHY_CTRL0_RWOK BIT(15)
+#define MTK_MAC_MSK_PHY_CTRL0_PREG GENMASK(12, 8)
+#define MTK_MAC_OFF_PHY_CTRL0_PREG 8
+#define MTK_MAC_MSK_PHY_CTRL0_RWDATA   GENMASK(31, 16)
+#define MTK_MAC_OFF_PHY_CTRL0_RWDATA   16
+
+/* PHY Control Register 1 */
+#define MTK_MAC_REG_PHY_CTRL1  0x0004
+#define MTK_MAC_BIT_PHY_CTRL1_LINK_ST  BIT(0)
+#define MTK_MAC_BIT_PHY_CTRL1_AN_ENBIT(8)
+#define MTK_MAC_OFF_PHY_CTRL1_FORCE_SPD9
+#define MTK_MAC_VAL_PHY_CTRL1_FORCE_SPD_10M0x00
+#define MTK_MAC_VAL_PHY_CTRL1_FORCE_SPD_100M   0x01
+#define MTK_MAC_VAL_PHY_CTRL1_FORCE_SPD_1000M  0x02
+#define MTK_MAC_BIT_PHY_CTRL1_FORCE_DPXBIT(11)
+#define MTK_MAC_BIT_PHY_CTRL1_FORCE_FC_RX  BIT(12)
+#define MTK_MAC_BIT_PHY_CTRL1_FORCE_FC_TX  BIT(13)
+
+/* MAC Configuration Register */
+#define MTK_MAC_REG_MAC_CFG0x0008
+#define MTK_MAC_OFF_MAC_CFG_IPG10
+#define MTK_MAC_VAL_MAC_CFG_IPG_96BIT  GENMASK(4, 0)
+#define MTK_MAC_BIT_MAC_CFG_MAXLEN_1522BIT(16)
+#define MTK_MAC_BIT_MAC_CFG_AUTO_PAD   BIT(19)
+#define MTK_MAC_BIT_MAC_CFG_CRC_STRIP  BIT(20)
+#define MTK_MAC_BIT_MAC_CFG_VLAN_STRIP BIT(22)
+#define MTK_MAC_BIT_MAC_CFG_NIC_PD BIT(31)
+
+/* Flow-Control Configuration Register */
+#define MTK_MAC_REG_FC_CFG 0x000c
+#define MTK_MAC_BIT_FC_CFG_BP_EN   BIT(7)
+#define MTK_MAC_BIT_FC_CFG_UC_PAUSE_DIRBIT(8)
+#define MTK_MAC_OFF_FC_CFG_SEND_PAUSE_TH   16
+#define MTK_MAC_MSK_FC_CFG_SEND_PAUSE_TH   GENMASK(27, 16)
+#define MTK_MAC_VAL_FC_CFG_SEND_PAUSE_TH_2K0x800
+
+/* ARL Configuration Register */
+#define MTK_MAC_REG_ARL_CFG0x0010
+#define MTK_MAC_BIT_ARL_CFG_HASH_ALG   BIT(0)
+#define MTK_MAC_BIT_ARL_CFG_MISC_MODE  BIT(4)
+
+/* MAC High and Low Bytes Registers */
+#define MT