Re: [PATCH net-next] net: hns: optimize XGE capability by reducing cpu usage
On 2015/12/8 14:30, Du, Fan wrote: > > > On 2015/12/8 14:22, Yankejian (Hackim Yim) wrote: >> >> On 2015/12/7 16:58, Du, Fan wrote: >>> > >>> > >>> >On 2015/12/5 15:32, yankejian wrote: >>here is the patch raising the performance of XGE by: >>1)changes the way page management method for enet momery, and >>2)reduces the count of rmb, and >>3)adds Memory prefetching >>> > >>> >Any numbers on how much it boost performance? >>> > >> it is almost the same as 82599. > > I mean how much it improves performance *BEFORE* and *AFTER* this patch > for Huawei XGE chip, because the commit log states it "raising the > performance", > but did give numbers of the testing. > > . > Hi Du Fan, the bandwidth's raising is not obviously, but the cpu usage degracing up to 50% -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: hns: optimize XGE capability by reducing cpu usage
On 2015/12/8 14:22, Yankejian (Hackim Yim) wrote: On 2015/12/7 16:58, Du, Fan wrote: > > >On 2015/12/5 15:32, yankejian wrote: >>here is the patch raising the performance of XGE by: >>1)changes the way page management method for enet momery, and >>2)reduces the count of rmb, and >>3)adds Memory prefetching > >Any numbers on how much it boost performance? > it is almost the same as 82599. I mean how much it improves performance *BEFORE* and *AFTER* this patch for Huawei XGE chip, because the commit log states it "raising the performance", but did give numbers of the testing. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: hns: optimize XGE capability by reducing cpu usage
On 2015/12/7 16:58, Du, Fan wrote: > > > On 2015/12/5 15:32, yankejian wrote: >> here is the patch raising the performance of XGE by: >> 1)changes the way page management method for enet momery, and >> 2)reduces the count of rmb, and >> 3)adds Memory prefetching > > Any numbers on how much it boost performance? > it is almost the same as 82599. >> Signed-off-by: yankejian >> --- >> drivers/net/ethernet/hisilicon/hns/hnae.h | 5 +- >> drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c | 1 - >> drivers/net/ethernet/hisilicon/hns/hns_enet.c | 79 >> +++ >> 3 files changed, 55 insertions(+), 30 deletions(-) >> >> diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.h >> b/drivers/net/ethernet/hisilicon/hns/hnae.h >> index d1f3316..6ca94dc 100644 >> --- a/drivers/net/ethernet/hisilicon/hns/hnae.h >> +++ b/drivers/net/ethernet/hisilicon/hns/hnae.h >> @@ -341,7 +341,8 @@ struct hnae_queue { >> void __iomem *io_base; >> phys_addr_t phy_base; >> struct hnae_ae_dev *dev;/* the device who use this queue */ >> -struct hnae_ring rx_ring, tx_ring; >> +struct hnae_ring rx_ring cacheline_internodealigned_in_smp; >> +struct hnae_ring tx_ring cacheline_internodealigned_in_smp; >> struct hnae_handle *handle; >> }; >> >> @@ -597,11 +598,9 @@ static inline void hnae_replace_buffer(struct hnae_ring >> *ring, int i, >> struct hnae_desc_cb *res_cb) >> { >> struct hnae_buf_ops *bops = ring->q->handle->bops; >> -struct hnae_desc_cb tmp_cb = ring->desc_cb[i]; >> >> bops->unmap_buffer(ring, &ring->desc_cb[i]); >> ring->desc_cb[i] = *res_cb; >> -*res_cb = tmp_cb; >> ring->desc[i].addr = (__le64)ring->desc_cb[i].dma; >> ring->desc[i].rx.ipoff_bnum_pid_flag = 0; >> } >> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c >> b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c >> index 77c6edb..522b264 100644 >> --- a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c >> +++ b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c >> @@ -341,7 +341,6 @@ void hns_ae_toggle_ring_irq(struct hnae_ring *ring, u32 >> mask) >> else >> flag = RCB_INT_FLAG_RX; >> >> -hns_rcb_int_clr_hw(ring->q, flag); >> hns_rcb_int_ctrl_hw(ring->q, flag, mask); >> } >> >> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c >> b/drivers/net/ethernet/hisilicon/hns/hns_enet.c >> index cad2663..e2be510 100644 >> --- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c >> +++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c >> @@ -33,6 +33,7 @@ >> >> #define RCB_IRQ_NOT_INITED 0 >> #define RCB_IRQ_INITED 1 >> +#define HNS_BUFFER_SIZE_2048 2048 >> >> #define BD_MAX_SEND_SIZE 8191 >> #define SKB_TMP_LEN(SKB) \ >> @@ -491,13 +492,51 @@ static unsigned int hns_nic_get_headlen(unsigned char >> *data, u32 flag, >> return max_size; >> } >> >> -static void >> -hns_nic_reuse_page(struct hnae_desc_cb *desc_cb, int tsize, int last_offset) >> +static void hns_nic_reuse_page(struct sk_buff *skb, int i, >> + struct hnae_ring *ring, int pull_len, >> + struct hnae_desc_cb *desc_cb) >> { >> +struct hnae_desc *desc; >> +int truesize, size; >> +int last_offset = 0; >> + >> +desc = &ring->desc[ring->next_to_clean]; >> +size = le16_to_cpu(desc->rx.size); >> + >> +#if (PAGE_SIZE < 8192) >> +if (hnae_buf_size(ring) == HNS_BUFFER_SIZE_2048) { >> +truesize = hnae_buf_size(ring); >> +} else { >> +truesize = ALIGN(size, L1_CACHE_BYTES); >> +last_offset = hnae_page_size(ring) - hnae_buf_size(ring); >> +} >> + >> +#else >> +truesize = ALIGN(size, L1_CACHE_BYTES); >> +last_offset = hnae_page_size(ring) - hnae_buf_size(ring); >> +#endif >> + >> +skb_add_rx_frag(skb, i, desc_cb->priv, desc_cb->page_offset + pull_len, >> +size - pull_len, truesize - pull_len); >> + >>/* avoid re-using remote pages,flag default unreuse */ >> if (likely(page_to_nid(desc_cb->priv) == numa_node_id())) { >> +#if (PAGE_SIZE < 8192) >> +if (hnae_buf_size(ring) == HNS_BUFFER_SIZE_2048) { >> +/* if we are only owner of page we can reuse it */ >> +if (likely(page_count(desc_cb->priv) == 1)) { >> +/* flip page offset to other buffer */ >> +desc_cb->page_offset ^= truesize; >> + >> +desc_cb->reuse_flag = 1; >> +/* bump ref count on page before it is given*/ >> +get_page(desc_cb->priv); >> +} >> +return; >> +} >> +#endif >> /* move offset up to the next cache line */ >> -desc_cb->page_offset += tsize; >> +desc_cb->page_offset += truesize; >> >> if (desc_cb->page_offset <= last_offset) { >> desc_cb->reuse_flag = 1; >> @@ -529,11 +568,10 @@ static int hns_nic_poll_rx_skb(struct >> hns_
Re: [PATCH net-next] net: hns: optimize XGE capability by reducing cpu usage
On 2015/12/7 17:05, Joe Perches wrote: > On Mon, 2015-12-07 at 16:58 +0800, Yankejian (Hackim Yim) wrote: >> On 2015/12/7 11:32, Joe Perches wrote: >>> On Sun, 2015-12-06 at 22:29 -0500, David Miller wrote: > From: yankejian > Date: Sat, 5 Dec 2015 15:32:29 +0800 > >>> +#if (PAGE_SIZE < 8192) >>> + if (hnae_buf_size(ring) == HNS_BUFFER_SIZE_2048) { >>> + truesize = hnae_buf_size(ring); >>> + } else { >>> + truesize = ALIGN(size, L1_CACHE_BYTES); >>> + last_offset = hnae_page_size(ring) - hnae_buf_size(ring); >>> + } >>> + >>> +#else >>> + truesize = ALIGN(size, L1_CACHE_BYTES); >>> + last_offset = hnae_page_size(ring) - hnae_buf_size(ring); >>> +#endif > This is not indented properly, and it looks terrible. >>> And it makes one curious as to why last_offset isn't set >>> in the first block. >> Hi Joe, > Hello. > >> if hnae_buf_size que equal to HNS_BUFFER_SIZE, last_offset is useless in the >> routines of this function. >> so it is ignored in the first block. thanks for your suggestion. > More to the point, last_offset is initialized to 0. > > It'd be clearer not to initialize it at all and thanks, that is a good idea. > set it to 0 in the first block and not overwrite > the initialization in each subsequent block. because it is useless, i think we'd better ignored it. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > . > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: hns: optimize XGE capability by reducing cpu usage
On Mon, 2015-12-07 at 16:58 +0800, Yankejian (Hackim Yim) wrote: > On 2015/12/7 11:32, Joe Perches wrote: > > On Sun, 2015-12-06 at 22:29 -0500, David Miller wrote: > > > > From: yankejian > > > > Date: Sat, 5 Dec 2015 15:32:29 +0800 > > > > > > > > > > +#if (PAGE_SIZE < 8192) > > > > > > + if (hnae_buf_size(ring) == HNS_BUFFER_SIZE_2048) { > > > > > > + truesize = hnae_buf_size(ring); > > > > > > + } else { > > > > > > + truesize = ALIGN(size, L1_CACHE_BYTES); > > > > > > + last_offset = hnae_page_size(ring) - > > > > > > hnae_buf_size(ring); > > > > > > + } > > > > > > + > > > > > > +#else > > > > > > + truesize = ALIGN(size, L1_CACHE_BYTES); > > > > > > + last_offset = hnae_page_size(ring) - > > > > > > hnae_buf_size(ring); > > > > > > +#endif > > > > > > > > This is not indented properly, and it looks terrible. > > And it makes one curious as to why last_offset isn't set > > in the first block. > > Hi Joe, Hello. > if hnae_buf_size que equal to HNS_BUFFER_SIZE, last_offset is useless in the > routines of this function. > so it is ignored in the first block. thanks for your suggestion. More to the point, last_offset is initialized to 0. It'd be clearer not to initialize it at all and set it to 0 in the first block and not overwrite the initialization in each subsequent block. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: hns: optimize XGE capability by reducing cpu usage
On 2015/12/7 11:32, Joe Perches wrote: > On Sun, 2015-12-06 at 22:29 -0500, David Miller wrote: >> > From: yankejian >> > Date: Sat, 5 Dec 2015 15:32:29 +0800 >> > >>> > > +#if (PAGE_SIZE < 8192) >>> > > + if (hnae_buf_size(ring) == HNS_BUFFER_SIZE_2048) { >>> > > + truesize = hnae_buf_size(ring); >>> > > + } else { >>> > > + truesize = ALIGN(size, L1_CACHE_BYTES); >>> > > + last_offset = hnae_page_size(ring) - hnae_buf_size(ring); >>> > > + } >>> > > + >>> > > +#else >>> > > + truesize = ALIGN(size, L1_CACHE_BYTES); >>> > > + last_offset = hnae_page_size(ring) - hnae_buf_size(ring); >>> > > +#endif >> > >> > This is not indented properly, and it looks terrible. > And it makes one curious as to why last_offset isn't set > in the first block. Hi Joe, if hnae_buf_size que equal to HNS_BUFFER_SIZE, last_offset is useless in the routines of this function. so it is ignored in the first block. thanks for your suggestion. Best regards, yankejian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: hns: optimize XGE capability by reducing cpu usage
On 2015/12/5 15:32, yankejian wrote: here is the patch raising the performance of XGE by: 1)changes the way page management method for enet momery, and 2)reduces the count of rmb, and 3)adds Memory prefetching Any numbers on how much it boost performance? Signed-off-by: yankejian --- drivers/net/ethernet/hisilicon/hns/hnae.h | 5 +- drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c | 1 - drivers/net/ethernet/hisilicon/hns/hns_enet.c | 79 +++ 3 files changed, 55 insertions(+), 30 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.h b/drivers/net/ethernet/hisilicon/hns/hnae.h index d1f3316..6ca94dc 100644 --- a/drivers/net/ethernet/hisilicon/hns/hnae.h +++ b/drivers/net/ethernet/hisilicon/hns/hnae.h @@ -341,7 +341,8 @@ struct hnae_queue { void __iomem *io_base; phys_addr_t phy_base; struct hnae_ae_dev *dev;/* the device who use this queue */ - struct hnae_ring rx_ring, tx_ring; + struct hnae_ring rx_ring cacheline_internodealigned_in_smp; + struct hnae_ring tx_ring cacheline_internodealigned_in_smp; struct hnae_handle *handle; }; @@ -597,11 +598,9 @@ static inline void hnae_replace_buffer(struct hnae_ring *ring, int i, struct hnae_desc_cb *res_cb) { struct hnae_buf_ops *bops = ring->q->handle->bops; - struct hnae_desc_cb tmp_cb = ring->desc_cb[i]; bops->unmap_buffer(ring, &ring->desc_cb[i]); ring->desc_cb[i] = *res_cb; - *res_cb = tmp_cb; ring->desc[i].addr = (__le64)ring->desc_cb[i].dma; ring->desc[i].rx.ipoff_bnum_pid_flag = 0; } diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c index 77c6edb..522b264 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c @@ -341,7 +341,6 @@ void hns_ae_toggle_ring_irq(struct hnae_ring *ring, u32 mask) else flag = RCB_INT_FLAG_RX; - hns_rcb_int_clr_hw(ring->q, flag); hns_rcb_int_ctrl_hw(ring->q, flag, mask); } diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c b/drivers/net/ethernet/hisilicon/hns/hns_enet.c index cad2663..e2be510 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c @@ -33,6 +33,7 @@ #define RCB_IRQ_NOT_INITED 0 #define RCB_IRQ_INITED 1 +#define HNS_BUFFER_SIZE_2048 2048 #define BD_MAX_SEND_SIZE 8191 #define SKB_TMP_LEN(SKB) \ @@ -491,13 +492,51 @@ static unsigned int hns_nic_get_headlen(unsigned char *data, u32 flag, return max_size; } -static void -hns_nic_reuse_page(struct hnae_desc_cb *desc_cb, int tsize, int last_offset) +static void hns_nic_reuse_page(struct sk_buff *skb, int i, + struct hnae_ring *ring, int pull_len, + struct hnae_desc_cb *desc_cb) { + struct hnae_desc *desc; + int truesize, size; + int last_offset = 0; + + desc = &ring->desc[ring->next_to_clean]; + size = le16_to_cpu(desc->rx.size); + +#if (PAGE_SIZE < 8192) + if (hnae_buf_size(ring) == HNS_BUFFER_SIZE_2048) { + truesize = hnae_buf_size(ring); + } else { + truesize = ALIGN(size, L1_CACHE_BYTES); + last_offset = hnae_page_size(ring) - hnae_buf_size(ring); + } + +#else + truesize = ALIGN(size, L1_CACHE_BYTES); + last_offset = hnae_page_size(ring) - hnae_buf_size(ring); +#endif + + skb_add_rx_frag(skb, i, desc_cb->priv, desc_cb->page_offset + pull_len, + size - pull_len, truesize - pull_len); + /* avoid re-using remote pages,flag default unreuse */ if (likely(page_to_nid(desc_cb->priv) == numa_node_id())) { +#if (PAGE_SIZE < 8192) + if (hnae_buf_size(ring) == HNS_BUFFER_SIZE_2048) { + /* if we are only owner of page we can reuse it */ + if (likely(page_count(desc_cb->priv) == 1)) { + /* flip page offset to other buffer */ + desc_cb->page_offset ^= truesize; + + desc_cb->reuse_flag = 1; + /* bump ref count on page before it is given*/ + get_page(desc_cb->priv); + } + return; + } +#endif /* move offset up to the next cache line */ - desc_cb->page_offset += tsize; + desc_cb->page_offset += truesize; if (desc_cb->page_offset <= last_offset) { desc_cb->reuse_flag = 1; @@ -529,11 +568,10 @@ static int hns_nic_poll_rx_skb(struct hns_nic_ring_data *ring_data, struct hnae_desc *desc; struct hnae_desc_cb *desc_cb;
Re: [PATCH net-next] net: hns: optimize XGE capability by reducing cpu usage
On 2015/12/7 11:29, David Miller wrote: > From: yankejian > Date: Sat, 5 Dec 2015 15:32:29 +0800 > >> +#if (PAGE_SIZE < 8192) >> +if (hnae_buf_size(ring) == HNS_BUFFER_SIZE_2048) { >> +truesize = hnae_buf_size(ring); >> +} else { >> +truesize = ALIGN(size, L1_CACHE_BYTES); >> +last_offset = hnae_page_size(ring) - hnae_buf_size(ring); >> +} >> + >> +#else >> +truesize = ALIGN(size, L1_CACHE_BYTES); >> +last_offset = hnae_page_size(ring) - hnae_buf_size(ring); >> +#endif > This is not indented properly, and it looks terrible. > > . > Hi David, Thanks for pointing it out. i will pay attention next time. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: hns: optimize XGE capability by reducing cpu usage
On Sun, 2015-12-06 at 22:29 -0500, David Miller wrote: > From: yankejian > Date: Sat, 5 Dec 2015 15:32:29 +0800 > > > +#if (PAGE_SIZE < 8192) > > + if (hnae_buf_size(ring) == HNS_BUFFER_SIZE_2048) { > > + truesize = hnae_buf_size(ring); > > + } else { > > + truesize = ALIGN(size, L1_CACHE_BYTES); > > + last_offset = hnae_page_size(ring) - hnae_buf_size(ring); > > + } > > + > > +#else > > + truesize = ALIGN(size, L1_CACHE_BYTES); > > + last_offset = hnae_page_size(ring) - hnae_buf_size(ring); > > +#endif > > This is not indented properly, and it looks terrible. And it makes one curious as to why last_offset isn't set in the first block. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: hns: optimize XGE capability by reducing cpu usage
From: yankejian Date: Sat, 5 Dec 2015 15:32:29 +0800 > +#if (PAGE_SIZE < 8192) > + if (hnae_buf_size(ring) == HNS_BUFFER_SIZE_2048) { > + truesize = hnae_buf_size(ring); > + } else { > + truesize = ALIGN(size, L1_CACHE_BYTES); > + last_offset = hnae_page_size(ring) - hnae_buf_size(ring); > + } > + > +#else > + truesize = ALIGN(size, L1_CACHE_BYTES); > + last_offset = hnae_page_size(ring) - hnae_buf_size(ring); > +#endif This is not indented properly, and it looks terrible. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html