Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver
On Mon, May 18, 2020 at 4:07 PM Bartosz Golaszewski wrote: > pt., 15 maj 2020 o 15:32 Arnd Bergmann napisał(a): > > I would get rid of the 'count' here, as it duplicates the information > > that is already known from the difference between head and tail, and you > > can't update it atomically without holding a lock around the access to > > the ring. The way I'd do this is to have the head and tail pointers > > in separate cache lines, and then use READ_ONCE/WRITE_ONCE > > and smp barriers to access them, with each one updated on one > > thread but read by the other. > > > > Your previous solution seems much more reliable though. For instance > in the above: when we're doing the TX cleanup (we got the TX ready > irq, we're iterating over descriptors until we know there are no more > packets scheduled (count == 0) or we encounter one that's still owned > by DMA), a parallel TX path can schedule new packets to be sent and I > don't see how we can atomically check the count (understood as a > difference between tail and head) and run a new iteration (where we'd > modify the head or tail) without risking the other path getting in the > way. We'd have to always check the descriptor. It should be enough to read both pointers once at the start of each side, then do whatever work you want to do (cleaning, sending, receiving, refilling) and finally updating the one pointer that changed. If both sides do that, you minimize the cache line bouncing and always do a useful amount of work that guarantees forward progress and does not interfere with the other side. > I experimented a bit with this and couldn't come up with anything that > would pass any stress test. > > On the other hand: spin_lock_bh() works fine and I like your approach > from the previous e-mail - except for the work for updating stats as > we could potentially lose some stats when we're updating in process > context with RX/TX paths running in parallel in napi context but that > would be rare enough to overlook it. > > I hope v4 will be good enough even with spinlocks. :) Yes, it should be fine. Avoiding all the locks is mainly an optimization for the number of CPU cycles spent per packet, the other points are more important to get right, in particular the flow control. Arnd
Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver
pt., 15 maj 2020 o 15:32 Arnd Bergmann napisał(a): > > On Thu, May 14, 2020 at 10:00 AM Bartosz Golaszewski wrote: > > +static int mtk_mac_ring_pop_tail(struct mtk_mac_ring *ring, > > +struct mtk_mac_ring_desc_data *desc_data) > > I took another look at this function because of your comment on the locking > the descriptor updates, which seemed suspicious as the device side does not > actually use the locks to access them > > > +{ > > + struct mtk_mac_ring_desc *desc = &ring->descs[ring->tail]; > > + unsigned int status; > > + > > + /* Let the device release the descriptor. */ > > + dma_rmb(); > > + status = desc->status; > > + if (!(status & MTK_MAC_DESC_BIT_COWN)) > > + return -1; > > The dma_rmb() seems odd here, as I don't see which prior read > is being protected by this. > > > + desc_data->len = status & MTK_MAC_DESC_MSK_LEN; > > + desc_data->flags = status & ~MTK_MAC_DESC_MSK_LEN; > > + desc_data->dma_addr = ring->dma_addrs[ring->tail]; > > + desc_data->skb = ring->skbs[ring->tail]; > > + > > + desc->data_ptr = 0; > > + desc->status = MTK_MAC_DESC_BIT_COWN; > > + if (status & MTK_MAC_DESC_BIT_EOR) > > + desc->status |= MTK_MAC_DESC_BIT_EOR; > > + > > + /* Flush writes to descriptor memory. */ > > + dma_wmb(); > > The comment and the barrier here seem odd as well. I would have expected > a barrier after the update to the data pointer, and only a single store > but no read of the status flag instead of the read-modify-write, > something like > > desc->data_ptr = 0; > dma_wmb(); /* make pointer update visible before status update */ > desc->status = MTK_MAC_DESC_BIT_COWN | (status & MTK_MAC_DESC_BIT_EOR); > > > + ring->tail = (ring->tail + 1) % MTK_MAC_RING_NUM_DESCS; > > + ring->count--; > > I would get rid of the 'count' here, as it duplicates the information > that is already known from the difference between head and tail, and you > can't update it atomically without holding a lock around the access to > the ring. The way I'd do this is to have the head and tail pointers > in separate cache lines, and then use READ_ONCE/WRITE_ONCE > and smp barriers to access them, with each one updated on one > thread but read by the other. > Your previous solution seems much more reliable though. For instance in the above: when we're doing the TX cleanup (we got the TX ready irq, we're iterating over descriptors until we know there are no more packets scheduled (count == 0) or we encounter one that's still owned by DMA), a parallel TX path can schedule new packets to be sent and I don't see how we can atomically check the count (understood as a difference between tail and head) and run a new iteration (where we'd modify the head or tail) without risking the other path getting in the way. We'd have to always check the descriptor. I experimented a bit with this and couldn't come up with anything that would pass any stress test. On the other hand: spin_lock_bh() works fine and I like your approach from the previous e-mail - except for the work for updating stats as we could potentially lose some stats when we're updating in process context with RX/TX paths running in parallel in napi context but that would be rare enough to overlook it. I hope v4 will be good enough even with spinlocks. :) Bart
Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver
On Thu, May 14, 2020 at 10:00 AM Bartosz Golaszewski wrote: > +static int mtk_mac_ring_pop_tail(struct mtk_mac_ring *ring, > +struct mtk_mac_ring_desc_data *desc_data) I took another look at this function because of your comment on the locking the descriptor updates, which seemed suspicious as the device side does not actually use the locks to access them > +{ > + struct mtk_mac_ring_desc *desc = &ring->descs[ring->tail]; > + unsigned int status; > + > + /* Let the device release the descriptor. */ > + dma_rmb(); > + status = desc->status; > + if (!(status & MTK_MAC_DESC_BIT_COWN)) > + return -1; The dma_rmb() seems odd here, as I don't see which prior read is being protected by this. > + desc_data->len = status & MTK_MAC_DESC_MSK_LEN; > + desc_data->flags = status & ~MTK_MAC_DESC_MSK_LEN; > + desc_data->dma_addr = ring->dma_addrs[ring->tail]; > + desc_data->skb = ring->skbs[ring->tail]; > + > + desc->data_ptr = 0; > + desc->status = MTK_MAC_DESC_BIT_COWN; > + if (status & MTK_MAC_DESC_BIT_EOR) > + desc->status |= MTK_MAC_DESC_BIT_EOR; > + > + /* Flush writes to descriptor memory. */ > + dma_wmb(); The comment and the barrier here seem odd as well. I would have expected a barrier after the update to the data pointer, and only a single store but no read of the status flag instead of the read-modify-write, something like desc->data_ptr = 0; dma_wmb(); /* make pointer update visible before status update */ desc->status = MTK_MAC_DESC_BIT_COWN | (status & MTK_MAC_DESC_BIT_EOR); > + ring->tail = (ring->tail + 1) % MTK_MAC_RING_NUM_DESCS; > + ring->count--; I would get rid of the 'count' here, as it duplicates the information that is already known from the difference between head and tail, and you can't update it atomically without holding a lock around the access to the ring. The way I'd do this is to have the head and tail pointers in separate cache lines, and then use READ_ONCE/WRITE_ONCE and smp barriers to access them, with each one updated on one thread but read by the other. Arnd
Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver
On Fri, May 15, 2020 at 09:11:14AM +0200, Bartosz Golaszewski wrote: > czw., 14 maj 2020 o 18:19 Arnd Bergmann napisał(a): > > > > On Thu, May 14, 2020 at 10:00 AM Bartosz Golaszewski wrote: > > > > > > From: Bartosz Golaszewski > > > > > > This adds the driver for the MediaTek Ethernet MAC used on the MT8* SoC > > > family. For now we only support full-duplex. > > > > > > Signed-off-by: Bartosz Golaszewski > > > > Looks very nice overall. Just a few things I noticed, and some ideas > > that may or may not make sense: > > > > > +/* This is defined to 0 on arm64 in arch/arm64/include/asm/processor.h > > > but > > > + * this IP doesn't work without this alignment being equal to 2. > > > + */ > > > +#ifdef NET_IP_ALIGN > > > +#undef NET_IP_ALIGN > > > +#endif > > > +#define NET_IP_ALIGN 2 > > > > Maybe you should just define your own macro instead of replacing > > the normal one then? > > > > I did in an earlier version and was told to use NET_IP_ALIGN but then > found out its value on arm64 doesn't work for me so I did the thing > that won't make anybody happy - redefine the existing constant. :) Hi Bartosz I did not realise ARM64 set it to 0. As Arnd suggested, please define your own macro. Andrew
Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver
On Fri, May 15, 2020 at 2:56 PM Bartosz Golaszewski wrote: > pt., 15 maj 2020 o 14:04 Arnd Bergmann napisał(a): > > On Fri, May 15, 2020 at 9:11 AM Bartosz Golaszewski wrote: > > > > > > > > It looks like most of the stuff inside of the loop can be pulled out > > > > and only done once here. > > > > > > > > > > I did that in one of the previous submissions but it was pointed out > > > to me that a parallel TX path may fill up the queue before I wake it. > > > > Right, I see you plugged that hole, however the way you hold the > > spinlock across the expensive DMA management but then give it > > up in each loop iteration feels like this is not the most efficient > > way. > > > > Maybe my thinking is wrong here, but I assumed that with a spinlock > it's better to give other threads the chance to run in between each > iteration. I didn't benchmark it though. It depends. You want to avoid lock contention (two threads trying to get the lock at the same time) but you also want to avoid bouncing around the spinlock between the caches. In the contention case, what I think would happen here is that the cleanup thread gives up the lock and the xmit function gets it, but then the cleanup thread is spinning again, so you are still blocked on one of the two CPUs but also pay the overhead of synchronizing between the two. Holding the lock the whole time would speed up both the good case (no contention) and the bad case (bouncing the lock) a little bit because it saves some overhead. Holding the lock for shorter times (i.e. not during the cache operations) would reduce the amount of lock-contention but not help in the good case. Not needing a lock at all is generally best, but getting it right is tricky ;-) Arnd
Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver
pt., 15 maj 2020 o 14:04 Arnd Bergmann napisał(a): > > On Fri, May 15, 2020 at 9:11 AM Bartosz Golaszewski wrote: > > > > czw., 14 maj 2020 o 18:19 Arnd Bergmann napisał(a): > > > > > > On Thu, May 14, 2020 at 10:00 AM Bartosz Golaszewski > > > wrote: > > > > +static unsigned int mtk_mac_intr_read_and_clear(struct mtk_mac_priv > > > > *priv) > > > > +{ > > > > + unsigned int val; > > > > + > > > > + regmap_read(priv->regs, MTK_MAC_REG_INT_STS, &val); > > > > + regmap_write(priv->regs, MTK_MAC_REG_INT_STS, val); > > > > + > > > > + return val; > > > > +} > > > > > > Do you actually need to read the register? That is usually a relatively > > > expensive operation, so if possible try to use clear the bits when > > > you don't care which bits were set. > > > > > > > I do care, I'm afraid. The returned value is being used in the napi > > poll callback to see which ring to process. > > I suppose the other callers are not performance critical. > > For the rx and tx processing, it should be better to just always look at > the queue directly and ignore the irq status, in particular when you > are already in polling mode: suppose you receive ten frames at once > and only process five but clear the irq flag. > > When the poll function is called again, you still need to process the > others, but I would assume that the status tells you that nothing > new has arrived so you don't process them until the next interrupt. > > For the statistics, I assume you do need to look at the irq status, > but this doesn't have to be done in the poll function. How about > something like: > > - in hardirq context, read the irq status word > - irq rx or tx irq pending, call napi_schedule > - if stats irq pending, schedule a work function > - in napi poll, process both queues until empty or > budget exhausted > - if packet processing completed in poll function > ack the irq and check again, call napi_complete > - in work function, handle stats irq, then ack it > I see your point. I'll try to come up with something and send a new version on Monday. > > > > +static void mtk_mac_tx_complete_all(struct mtk_mac_priv *priv) > > > > +{ > > > > + struct mtk_mac_ring *ring = &priv->tx_ring; > > > > + struct net_device *ndev = priv->ndev; > > > > + int ret; > > > > + > > > > + for (;;) { > > > > + mtk_mac_lock(priv); > > > > + > > > > + if (!mtk_mac_ring_descs_available(ring)) { > > > > + mtk_mac_unlock(priv); > > > > + break; > > > > + } > > > > + > > > > + ret = mtk_mac_tx_complete_one(priv); > > > > + if (ret) { > > > > + mtk_mac_unlock(priv); > > > > + break; > > > > + } > > > > + > > > > + if (netif_queue_stopped(ndev)) > > > > + netif_wake_queue(ndev); > > > > + > > > > + mtk_mac_unlock(priv); > > > > + } > > > > +} > > > > > > It looks like most of the stuff inside of the loop can be pulled out > > > and only done once here. > > > > > > > I did that in one of the previous submissions but it was pointed out > > to me that a parallel TX path may fill up the queue before I wake it. > > Right, I see you plugged that hole, however the way you hold the > spinlock across the expensive DMA management but then give it > up in each loop iteration feels like this is not the most efficient > way. > Maybe my thinking is wrong here, but I assumed that with a spinlock it's better to give other threads the chance to run in between each iteration. I didn't benchmark it though. > The easy way would be to just hold the lock across the entire > loop and then be sure you do it right. Alternatively you could > minimize the locking and only do the wakeup after up do the final > update to the tail pointer, at which point you know the queue is not > full because you have just freed up at least one entry. > Makes sense, I'll see what I can do. Bartosz
Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver
On Fri, May 15, 2020 at 9:11 AM Bartosz Golaszewski wrote: > > czw., 14 maj 2020 o 18:19 Arnd Bergmann napisał(a): > > > > On Thu, May 14, 2020 at 10:00 AM Bartosz Golaszewski wrote: > > > +static unsigned int mtk_mac_intr_read_and_clear(struct mtk_mac_priv > > > *priv) > > > +{ > > > + unsigned int val; > > > + > > > + regmap_read(priv->regs, MTK_MAC_REG_INT_STS, &val); > > > + regmap_write(priv->regs, MTK_MAC_REG_INT_STS, val); > > > + > > > + return val; > > > +} > > > > Do you actually need to read the register? That is usually a relatively > > expensive operation, so if possible try to use clear the bits when > > you don't care which bits were set. > > > > I do care, I'm afraid. The returned value is being used in the napi > poll callback to see which ring to process. I suppose the other callers are not performance critical. For the rx and tx processing, it should be better to just always look at the queue directly and ignore the irq status, in particular when you are already in polling mode: suppose you receive ten frames at once and only process five but clear the irq flag. When the poll function is called again, you still need to process the others, but I would assume that the status tells you that nothing new has arrived so you don't process them until the next interrupt. For the statistics, I assume you do need to look at the irq status, but this doesn't have to be done in the poll function. How about something like: - in hardirq context, read the irq status word - irq rx or tx irq pending, call napi_schedule - if stats irq pending, schedule a work function - in napi poll, process both queues until empty or budget exhausted - if packet processing completed in poll function ack the irq and check again, call napi_complete - in work function, handle stats irq, then ack it > > > +static void mtk_mac_tx_complete_all(struct mtk_mac_priv *priv) > > > +{ > > > + struct mtk_mac_ring *ring = &priv->tx_ring; > > > + struct net_device *ndev = priv->ndev; > > > + int ret; > > > + > > > + for (;;) { > > > + mtk_mac_lock(priv); > > > + > > > + if (!mtk_mac_ring_descs_available(ring)) { > > > + mtk_mac_unlock(priv); > > > + break; > > > + } > > > + > > > + ret = mtk_mac_tx_complete_one(priv); > > > + if (ret) { > > > + mtk_mac_unlock(priv); > > > + break; > > > + } > > > + > > > + if (netif_queue_stopped(ndev)) > > > + netif_wake_queue(ndev); > > > + > > > + mtk_mac_unlock(priv); > > > + } > > > +} > > > > It looks like most of the stuff inside of the loop can be pulled out > > and only done once here. > > > > I did that in one of the previous submissions but it was pointed out > to me that a parallel TX path may fill up the queue before I wake it. Right, I see you plugged that hole, however the way you hold the spinlock across the expensive DMA management but then give it up in each loop iteration feels like this is not the most efficient way. The easy way would be to just hold the lock across the entire loop and then be sure you do it right. Alternatively you could minimize the locking and only do the wakeup after up do the final update to the tail pointer, at which point you know the queue is not full because you have just freed up at least one entry. > > > +static int mtk_mac_poll(struct napi_struct *napi, int budget) > > > +{ > > > + struct mtk_mac_priv *priv; > > > + unsigned int status; > > > + int received = 0; > > > + > > > + priv = container_of(napi, struct mtk_mac_priv, napi); > > > + > > > + status = mtk_mac_intr_read_and_clear(priv); > > > + > > > + /* Clean up TX */ > > > + if (status & MTK_MAC_BIT_INT_STS_TNTC) > > > + mtk_mac_tx_complete_all(priv); > > > + > > > + /* Receive up to $budget packets */ > > > + if (status & MTK_MAC_BIT_INT_STS_FNRC) > > > + received = mtk_mac_process_rx(priv, budget); > > > + > > > + /* One of the counter reached 0x800 - update stats and reset > > > all > > > +* counters. > > > +*/ > > > + if (status & MTK_MAC_REG_INT_STS_MIB_CNT_TH) { > > > + mtk_mac_update_stats(priv); > > > + mtk_mac_reset_counters(priv); > > > + } > > > + > > > + if (received < budget) > > > + napi_complete_done(napi, received); > > > + > > > + mtk_mac_intr_unmask_all(priv); > > > + > > > + return received; > > > +} > > > > I think you want to leave (at least some of) the interrupts masked > > if your budget is exhausted, to avoid generating unnecessary > > irqs. > > > > The networking stack shouldn't queue any new TX packets if the queue > is stopped - is this really worth complicating the code? Looks like > premature optimization
Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver
czw., 14 maj 2020 o 18:19 Arnd Bergmann napisał(a): > > On Thu, May 14, 2020 at 10:00 AM Bartosz Golaszewski wrote: > > > > From: Bartosz Golaszewski > > > > This adds the driver for the MediaTek Ethernet MAC used on the MT8* SoC > > family. For now we only support full-duplex. > > > > Signed-off-by: Bartosz Golaszewski > > Looks very nice overall. Just a few things I noticed, and some ideas > that may or may not make sense: > > > +/* This is defined to 0 on arm64 in arch/arm64/include/asm/processor.h but > > + * this IP doesn't work without this alignment being equal to 2. > > + */ > > +#ifdef NET_IP_ALIGN > > +#undef NET_IP_ALIGN > > +#endif > > +#define NET_IP_ALIGN 2 > > Maybe you should just define your own macro instead of replacing > the normal one then? > I did in an earlier version and was told to use NET_IP_ALIGN but then found out its value on arm64 doesn't work for me so I did the thing that won't make anybody happy - redefine the existing constant. :) > > +static void mtk_mac_lock(struct mtk_mac_priv *priv) > > +{ > > + spin_lock_irqsave(&priv->lock, priv->lock_flags); > > +} > > + > > +static void mtk_mac_unlock(struct mtk_mac_priv *priv) > > +{ > > + spin_unlock_irqrestore(&priv->lock, priv->lock_flags); > > +} > > This looks wrong: you should not have shared 'flags' passed into > spin_lock_irqsave(), and I don't even see a need to use the > irqsave variant of the lock in the first place. > > Maybe start by open-coding the lock and remove the wrappers > above. > > Then see if you can use a cheaper spin_lock_bh() or plain spin_lock() > instead of irqsave. > This is from an earlier version where I did a lot more in hard irq context. Now that almost all of the processing happens in soft-irq context I guess you're right - I can go with a regular spin_lock(). > Finally, see if this can be done in a lockless way by relying on > appropriate barriers and separating the writers into separate > cache lines. From a brief look at the driver I think it can be done > without too much trouble. > Unfortunately I do need some locking. Accessing RX and TX descriptors at the same time seems to upset the controller. I experimented a lot with barriers but it turned out that I got a lot of weird bugs at high throughput. > > +static unsigned int mtk_mac_intr_read_and_clear(struct mtk_mac_priv *priv) > > +{ > > + unsigned int val; > > + > > + regmap_read(priv->regs, MTK_MAC_REG_INT_STS, &val); > > + regmap_write(priv->regs, MTK_MAC_REG_INT_STS, val); > > + > > + return val; > > +} > > Do you actually need to read the register? That is usually a relatively > expensive operation, so if possible try to use clear the bits when > you don't care which bits were set. > I do care, I'm afraid. The returned value is being used in the napi poll callback to see which ring to process. > > +/* All processing for TX and RX happens in the napi poll callback. */ > > +static irqreturn_t mtk_mac_handle_irq(int irq, void *data) > > +{ > > + struct mtk_mac_priv *priv; > > + struct net_device *ndev; > > + > > + ndev = data; > > + priv = netdev_priv(ndev); > > + > > + if (netif_running(ndev)) { > > + mtk_mac_intr_mask_all(priv); > > + napi_schedule(&priv->napi); > > + } > > + > > + return IRQ_HANDLED; > > > > +static int mtk_mac_netdev_start_xmit(struct sk_buff *skb, > > +struct net_device *ndev) > > +{ > > + struct mtk_mac_priv *priv = netdev_priv(ndev); > > + struct mtk_mac_ring *ring = &priv->tx_ring; > > + struct device *dev = mtk_mac_get_dev(priv); > > + struct mtk_mac_ring_desc_data desc_data; > > + > > + desc_data.dma_addr = mtk_mac_dma_map_tx(priv, skb); > > + if (dma_mapping_error(dev, desc_data.dma_addr)) > > + goto err_drop_packet; > > + > > + desc_data.skb = skb; > > + desc_data.len = skb->len; > > + > > + mtk_mac_lock(priv); > > + mtk_mac_ring_push_head_tx(ring, &desc_data); > > + > > + if (mtk_mac_ring_full(ring)) > > + netif_stop_queue(ndev); > > + mtk_mac_unlock(priv); > > + > > + mtk_mac_dma_resume_tx(priv); > > + > > + return NETDEV_TX_OK; > > + > > +err_drop_packet: > > + dev_kfree_skb(skb); > > + ndev->stats.tx_dropped++; > > + return NETDEV_TX_BUSY; > > +} > > I would always add BQL flow control in new drivers, using > netdev_sent_queue here... > Ok, will do. > > +static int mtk_mac_tx_complete_one(struct mtk_mac_priv *priv) > > +{ > > + struct mtk_mac_ring *ring = &priv->tx_ring; > > + struct mtk_mac_ring_desc_data desc_data; > > + int ret; > > + > > + ret = mtk_mac_ring_pop_tail(ring, &desc_data); > > + if (ret) > > + return ret; > > + > > + mtk_mac_dma_unmap_tx(priv, &desc_data); > > + dev_kfree_skb_irq(desc_data.skb); > > + > > + return 0; > > +}
Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver
On Thu, May 14, 2020 at 10:00 AM Bartosz Golaszewski wrote: > > From: Bartosz Golaszewski > > This adds the driver for the MediaTek Ethernet MAC used on the MT8* SoC > family. For now we only support full-duplex. > > Signed-off-by: Bartosz Golaszewski Looks very nice overall. Just a few things I noticed, and some ideas that may or may not make sense: > +/* This is defined to 0 on arm64 in arch/arm64/include/asm/processor.h but > + * this IP doesn't work without this alignment being equal to 2. > + */ > +#ifdef NET_IP_ALIGN > +#undef NET_IP_ALIGN > +#endif > +#define NET_IP_ALIGN 2 Maybe you should just define your own macro instead of replacing the normal one then? > +static void mtk_mac_lock(struct mtk_mac_priv *priv) > +{ > + spin_lock_irqsave(&priv->lock, priv->lock_flags); > +} > + > +static void mtk_mac_unlock(struct mtk_mac_priv *priv) > +{ > + spin_unlock_irqrestore(&priv->lock, priv->lock_flags); > +} This looks wrong: you should not have shared 'flags' passed into spin_lock_irqsave(), and I don't even see a need to use the irqsave variant of the lock in the first place. Maybe start by open-coding the lock and remove the wrappers above. Then see if you can use a cheaper spin_lock_bh() or plain spin_lock() instead of irqsave. Finally, see if this can be done in a lockless way by relying on appropriate barriers and separating the writers into separate cache lines. From a brief look at the driver I think it can be done without too much trouble. > +static unsigned int mtk_mac_intr_read_and_clear(struct mtk_mac_priv *priv) > +{ > + unsigned int val; > + > + regmap_read(priv->regs, MTK_MAC_REG_INT_STS, &val); > + regmap_write(priv->regs, MTK_MAC_REG_INT_STS, val); > + > + return val; > +} Do you actually need to read the register? That is usually a relatively expensive operation, so if possible try to use clear the bits when you don't care which bits were set. > +/* All processing for TX and RX happens in the napi poll callback. */ > +static irqreturn_t mtk_mac_handle_irq(int irq, void *data) > +{ > + struct mtk_mac_priv *priv; > + struct net_device *ndev; > + > + ndev = data; > + priv = netdev_priv(ndev); > + > + if (netif_running(ndev)) { > + mtk_mac_intr_mask_all(priv); > + napi_schedule(&priv->napi); > + } > + > + return IRQ_HANDLED; > +static int mtk_mac_netdev_start_xmit(struct sk_buff *skb, > +struct net_device *ndev) > +{ > + struct mtk_mac_priv *priv = netdev_priv(ndev); > + struct mtk_mac_ring *ring = &priv->tx_ring; > + struct device *dev = mtk_mac_get_dev(priv); > + struct mtk_mac_ring_desc_data desc_data; > + > + desc_data.dma_addr = mtk_mac_dma_map_tx(priv, skb); > + if (dma_mapping_error(dev, desc_data.dma_addr)) > + goto err_drop_packet; > + > + desc_data.skb = skb; > + desc_data.len = skb->len; > + > + mtk_mac_lock(priv); > + mtk_mac_ring_push_head_tx(ring, &desc_data); > + > + if (mtk_mac_ring_full(ring)) > + netif_stop_queue(ndev); > + mtk_mac_unlock(priv); > + > + mtk_mac_dma_resume_tx(priv); > + > + return NETDEV_TX_OK; > + > +err_drop_packet: > + dev_kfree_skb(skb); > + ndev->stats.tx_dropped++; > + return NETDEV_TX_BUSY; > +} I would always add BQL flow control in new drivers, using netdev_sent_queue here... > +static int mtk_mac_tx_complete_one(struct mtk_mac_priv *priv) > +{ > + struct mtk_mac_ring *ring = &priv->tx_ring; > + struct mtk_mac_ring_desc_data desc_data; > + int ret; > + > + ret = mtk_mac_ring_pop_tail(ring, &desc_data); > + if (ret) > + return ret; > + > + mtk_mac_dma_unmap_tx(priv, &desc_data); > + dev_kfree_skb_irq(desc_data.skb); > + > + return 0; > +} ... and netdev_completed_queue() here. > +static void mtk_mac_tx_complete_all(struct mtk_mac_priv *priv) > +{ > + struct mtk_mac_ring *ring = &priv->tx_ring; > + struct net_device *ndev = priv->ndev; > + int ret; > + > + for (;;) { > + mtk_mac_lock(priv); > + > + if (!mtk_mac_ring_descs_available(ring)) { > + mtk_mac_unlock(priv); > + break; > + } > + > + ret = mtk_mac_tx_complete_one(priv); > + if (ret) { > + mtk_mac_unlock(priv); > + break; > + } > + > + if (netif_queue_stopped(ndev)) > + netif_wake_queue(ndev); > + > + mtk_mac_unlock(priv); > + } > +} It looks like most of the stuff inside of the loop can be pulled out and only done once here. > +static int mtk_mac_poll(struct napi_struct *napi, int budget) > +{ > + struct mtk_mac_priv *priv; > + unsigned int status; > + i
[PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver
From: Bartosz Golaszewski This adds the driver for the MediaTek Ethernet MAC used on the MT8* SoC family. For now we only support full-duplex. Signed-off-by: Bartosz Golaszewski --- drivers/net/ethernet/mediatek/Kconfig |6 + drivers/net/ethernet/mediatek/Makefile |1 + drivers/net/ethernet/mediatek/mtk_eth_mac.c | 1578 +++ 3 files changed, 1585 insertions(+) create mode 100644 drivers/net/ethernet/mediatek/mtk_eth_mac.c diff --git a/drivers/net/ethernet/mediatek/Kconfig b/drivers/net/ethernet/mediatek/Kconfig index 5079b8090f16..5c3793076765 100644 --- a/drivers/net/ethernet/mediatek/Kconfig +++ b/drivers/net/ethernet/mediatek/Kconfig @@ -14,4 +14,10 @@ config NET_MEDIATEK_SOC This driver supports the gigabit ethernet MACs in the MediaTek SoC family. +config NET_MEDIATEK_MAC + tristate "MediaTek Ethernet MAC support" + select PHYLIB + help + This driver supports the ethernet IP on MediaTek MT85** SoCs. + endif #NET_VENDOR_MEDIATEK diff --git a/drivers/net/ethernet/mediatek/Makefile b/drivers/net/ethernet/mediatek/Makefile index 3362fb7ef859..f7f5638943a0 100644 --- a/drivers/net/ethernet/mediatek/Makefile +++ b/drivers/net/ethernet/mediatek/Makefile @@ -5,3 +5,4 @@ obj-$(CONFIG_NET_MEDIATEK_SOC) += mtk_eth.o mtk_eth-y := mtk_eth_soc.o mtk_sgmii.o mtk_eth_path.o +obj-$(CONFIG_NET_MEDIATEK_MAC) += mtk_eth_mac.o diff --git a/drivers/net/ethernet/mediatek/mtk_eth_mac.c b/drivers/net/ethernet/mediatek/mtk_eth_mac.c new file mode 100644 index ..6fbe49e861d6 --- /dev/null +++ b/drivers/net/ethernet/mediatek/mtk_eth_mac.c @@ -0,0 +1,1578 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2020 MediaTek Corporation + * Copyright (c) 2020 BayLibre SAS + * + * Author: Bartosz Golaszewski + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define MTK_MAC_DRVNAME"mtk_eth_mac" + +#define MTK_MAC_WAIT_TIMEOUT 300 +#define MTK_MAC_MAX_FRAME_SIZE 1514 +#define MTK_MAC_SKB_ALIGNMENT 16 +#define MTK_MAC_NAPI_WEIGHT64 +#define MTK_MAC_HASHTABLE_MC_LIMIT 256 +#define MTK_MAC_HASHTABLE_SIZE_MAX 512 + +/* This is defined to 0 on arm64 in arch/arm64/include/asm/processor.h but + * this IP doesn't work without this alignment being equal to 2. + */ +#ifdef NET_IP_ALIGN +#undef NET_IP_ALIGN +#endif +#define NET_IP_ALIGN 2 + +static const char *const mtk_mac_clk_names[] = { "core", "reg", "trans" }; +#define MTK_MAC_NCLKS ARRAY_SIZE(mtk_mac_clk_names) + +/* PHY Control Register 0 */ +#define MTK_MAC_REG_PHY_CTRL0 0x +#define MTK_MAC_BIT_PHY_CTRL0_WTCMDBIT(13) +#define MTK_MAC_BIT_PHY_CTRL0_RDCMDBIT(14) +#define MTK_MAC_BIT_PHY_CTRL0_RWOK BIT(15) +#define MTK_MAC_MSK_PHY_CTRL0_PREG GENMASK(12, 8) +#define MTK_MAC_OFF_PHY_CTRL0_PREG 8 +#define MTK_MAC_MSK_PHY_CTRL0_RWDATA GENMASK(31, 16) +#define MTK_MAC_OFF_PHY_CTRL0_RWDATA 16 + +/* PHY Control Register 1 */ +#define MTK_MAC_REG_PHY_CTRL1 0x0004 +#define MTK_MAC_BIT_PHY_CTRL1_LINK_ST BIT(0) +#define MTK_MAC_BIT_PHY_CTRL1_AN_ENBIT(8) +#define MTK_MAC_OFF_PHY_CTRL1_FORCE_SPD9 +#define MTK_MAC_VAL_PHY_CTRL1_FORCE_SPD_10M0x00 +#define MTK_MAC_VAL_PHY_CTRL1_FORCE_SPD_100M 0x01 +#define MTK_MAC_VAL_PHY_CTRL1_FORCE_SPD_1000M 0x02 +#define MTK_MAC_BIT_PHY_CTRL1_FORCE_DPXBIT(11) +#define MTK_MAC_BIT_PHY_CTRL1_FORCE_FC_RX BIT(12) +#define MTK_MAC_BIT_PHY_CTRL1_FORCE_FC_TX BIT(13) + +/* MAC Configuration Register */ +#define MTK_MAC_REG_MAC_CFG0x0008 +#define MTK_MAC_OFF_MAC_CFG_IPG10 +#define MTK_MAC_VAL_MAC_CFG_IPG_96BIT GENMASK(4, 0) +#define MTK_MAC_BIT_MAC_CFG_MAXLEN_1522BIT(16) +#define MTK_MAC_BIT_MAC_CFG_AUTO_PAD BIT(19) +#define MTK_MAC_BIT_MAC_CFG_CRC_STRIP BIT(20) +#define MTK_MAC_BIT_MAC_CFG_VLAN_STRIP BIT(22) +#define MTK_MAC_BIT_MAC_CFG_NIC_PD BIT(31) + +/* Flow-Control Configuration Register */ +#define MTK_MAC_REG_FC_CFG 0x000c +#define MTK_MAC_BIT_FC_CFG_BP_EN BIT(7) +#define MTK_MAC_BIT_FC_CFG_UC_PAUSE_DIRBIT(8) +#define MTK_MAC_OFF_FC_CFG_SEND_PAUSE_TH 16 +#define MTK_MAC_MSK_FC_CFG_SEND_PAUSE_TH GENMASK(27, 16) +#define MTK_MAC_VAL_FC_CFG_SEND_PAUSE_TH_2K0x800 + +/* ARL Configuration Register */ +#define MTK_MAC_REG_ARL_CFG0x0010 +#define MTK_MAC_BIT_ARL_CFG_HASH_ALG BIT(0) +#define MTK_MAC_BIT_ARL_CFG_MISC_MODE BIT(4) + +/* MAC High and Low Bytes Registers */ +#define MT