Re: [PATCH net-next 1/8] qed: LL2 to use packed information for tx
From: "Mintz, Yuval" Date: Fri, 9 Jun 2017 07:08:54 + > But one thing I thought of asking - do we consider layouts of relatively > insignificant structures to be some golden coding standard? It just shows lack of thought when writing things. I want to see code that shows the author put some care into it and didn't just slap things together. Thank you.
RE: [PATCH net-next 1/8] qed: LL2 to use packed information for tx
From: Mintz, Yuval > Sent: 09 June 2017 08:52 > > From: David Laight [mailto:david.lai...@aculab.com] > > Sent: Friday, June 09, 2017 10:28 AM > > To: 'David Miller' ; Mintz, Yuval > > > > Cc: netdev@vger.kernel.org; linux-r...@vger.kernel.org; Kalderon, Michal > > > > Subject: RE: [PATCH net-next 1/8] qed: LL2 to use packed information for tx > > > > From: David Miller > > > Sent: 09 June 2017 00:24 > > > > > > From: Yuval Mintz > > > Date: Thu, 8 Jun 2017 19:13:16 +0300 > > > > > > > @@ -67,6 +79,21 @@ struct qed_ll2_stats { > > > > u64 sent_bcast_pkts; > > > > }; > > > > > > > > +struct qed_ll2_tx_pkt_info { > > > > + u8 num_of_bds; > > > > + u16 vlan; > > > > + u8 bd_flags; > > > > + u16 l4_hdr_offset_w;/* from start of packet */ > > > > + enum qed_ll2_tx_dest tx_dest; > > > > + enum qed_ll2_roce_flavor_type qed_roce_flavor; > > > > + dma_addr_t first_frag; > > > > + u16 first_frag_len; > > > > + bool enable_ip_cksum; > > > > + bool enable_l4_cksum; > > > > + bool calc_ip_len; > > > > + void *cookie; > > > > +}; > > > > + > > > > > > This layout is extremely inefficient, with lots of padding in between > > > struct members. > > > > > > Group small u8 members and u16 members together so that they consume > > > full 32-bit areas so you can eliminate all of the padding. > > > > I'd also query the use of u16 sizes/lengths, any arithmetic on u16 (and u8) > > variables is likely to generate extra code (on non-x86). > > You are using 32 bits for the 'enum' - I bet the values fit in 8 bits, so > > aren't > > really worried about size. > > > > If size did matter you can easily get the above down to 32 bytes. > > You're right, and that's exactly the point - since this is not data-path > critical > I don't see why the size/efficiency should matter [greatly]. It is just good practise so that it happens automatically when it does matter. Just swapping 'vlan' and 'bd_flags' would make it look much better. David
RE: [PATCH net-next 1/8] qed: LL2 to use packed information for tx
> -Original Message- > From: David Laight [mailto:david.lai...@aculab.com] > Sent: Friday, June 09, 2017 10:28 AM > To: 'David Miller' ; Mintz, Yuval > > Cc: netdev@vger.kernel.org; linux-r...@vger.kernel.org; Kalderon, Michal > > Subject: RE: [PATCH net-next 1/8] qed: LL2 to use packed information for tx > > From: David Miller > > Sent: 09 June 2017 00:24 > > > > From: Yuval Mintz > > Date: Thu, 8 Jun 2017 19:13:16 +0300 > > > > > @@ -67,6 +79,21 @@ struct qed_ll2_stats { > > > u64 sent_bcast_pkts; > > > }; > > > > > > +struct qed_ll2_tx_pkt_info { > > > + u8 num_of_bds; > > > + u16 vlan; > > > + u8 bd_flags; > > > + u16 l4_hdr_offset_w;/* from start of packet */ > > > + enum qed_ll2_tx_dest tx_dest; > > > + enum qed_ll2_roce_flavor_type qed_roce_flavor; > > > + dma_addr_t first_frag; > > > + u16 first_frag_len; > > > + bool enable_ip_cksum; > > > + bool enable_l4_cksum; > > > + bool calc_ip_len; > > > + void *cookie; > > > +}; > > > + > > > > This layout is extremely inefficient, with lots of padding in between > > struct members. > > > > Group small u8 members and u16 members together so that they consume > > full 32-bit areas so you can eliminate all of the padding. > > I'd also query the use of u16 sizes/lengths, any arithmetic on u16 (and u8) > variables is likely to generate extra code (on non-x86). > You are using 32 bits for the 'enum' - I bet the values fit in 8 bits, so > aren't > really worried about size. > > If size did matter you can easily get the above down to 32 bytes. You're right, and that's exactly the point - since this is not data-path critical I don't see why the size/efficiency should matter [greatly].
RE: [PATCH net-next 1/8] qed: LL2 to use packed information for tx
From: David Miller > Sent: 09 June 2017 00:24 > > From: Yuval Mintz > Date: Thu, 8 Jun 2017 19:13:16 +0300 > > > @@ -67,6 +79,21 @@ struct qed_ll2_stats { > > u64 sent_bcast_pkts; > > }; > > > > +struct qed_ll2_tx_pkt_info { > > + u8 num_of_bds; > > + u16 vlan; > > + u8 bd_flags; > > + u16 l4_hdr_offset_w;/* from start of packet */ > > + enum qed_ll2_tx_dest tx_dest; > > + enum qed_ll2_roce_flavor_type qed_roce_flavor; > > + dma_addr_t first_frag; > > + u16 first_frag_len; > > + bool enable_ip_cksum; > > + bool enable_l4_cksum; > > + bool calc_ip_len; > > + void *cookie; > > +}; > > + > > This layout is extremely inefficient, with lots of padding in between > struct members. > > Group small u8 members and u16 members together so that they consume > full 32-bit areas so you can eliminate all of the padding. I'd also query the use of u16 sizes/lengths, any arithmetic on u16 (and u8) variables is likely to generate extra code (on non-x86). You are using 32 bits for the 'enum' - I bet the values fit in 8 bits, so aren't really worried about size. If size did matter you can easily get the above down to 32 bytes. David
RE: [PATCH net-next 1/8] qed: LL2 to use packed information for tx
> > +struct qed_ll2_tx_pkt_info { > > + u8 num_of_bds; > > + u16 vlan; > > + u8 bd_flags; > > + u16 l4_hdr_offset_w;/* from start of packet */ > > + enum qed_ll2_tx_dest tx_dest; > > + enum qed_ll2_roce_flavor_type qed_roce_flavor; > > + dma_addr_t first_frag; > > + u16 first_frag_len; > > + bool enable_ip_cksum; > > + bool enable_l4_cksum; > > + bool calc_ip_len; > > + void *cookie; > > +}; > > + > > This layout is extremely inefficient, with lots of padding in between struct > members. > > Group small u8 members and u16 members together so that they consume > full 32-bit areas so you can eliminate all of the padding. While I can mend the holes, the total structure size doesn't really change: struct qed_ll2_tx_pkt_info { u8 num_of_bds; /* 0 1 */ /* XXX 1 byte hole, try to pack */ u16vlan; /* 2 2 */ u8 bd_flags; /* 4 1 */ /* XXX 1 byte hole, try to pack */ u16l4_hdr_offset_w; /* 6 2 */ enum qed_ll2_tx_dest tx_dest; /* 8 4 */ enum qed_ll2_roce_flavor_type qed_roce_flavor; /*12 4 */ dma_addr_t first_frag; /*16 8 */ u16first_frag_len; /*24 2 */ bool enable_ip_cksum; /*26 1 */ bool enable_l4_cksum; /*27 1 */ bool calc_ip_len; /*28 1 */ /* XXX 3 bytes hole, try to pack */ void * cookie; /*32 8 */ /* size: 40, cachelines: 1, members: 12 */ /* sum members: 35, holes: 3, sum holes: 5 */ /* last cacheline: 40 bytes */ }; Becomes: struct qed_ll2_tx_pkt_info { void * cookie; /* 0 8 */ dma_addr_t first_frag; /* 8 8 */ enum qed_ll2_tx_dest tx_dest; /*16 4 */ enum qed_ll2_roce_flavor_type qed_roce_flavor; /*20 4 */ u16vlan; /*24 2 */ u16l4_hdr_offset_w; /*26 2 */ u16first_frag_len; /*28 2 */ u8 num_of_bds; /*30 1 */ u8 bd_flags; /*31 1 */ bool enable_ip_cksum; /*32 1 */ bool enable_l4_cksum; /*33 1 */ bool calc_ip_len; /*34 1 */ /* size: 40, cachelines: 1, members: 12 */ /* padding: 5 */ /* last cacheline: 40 bytes */ }; I'm going to send the changed version in V2 as as there's no harm to it, [+ we *can* reduce the size of qed_ll2_comp_rx_data you commented for patch #2 in series] But one thing I thought of asking - do we consider layouts of relatively insignificant structures to be some golden coding standard?
Re: [PATCH net-next 1/8] qed: LL2 to use packed information for tx
From: Yuval Mintz Date: Thu, 8 Jun 2017 19:13:16 +0300 > @@ -67,6 +79,21 @@ struct qed_ll2_stats { > u64 sent_bcast_pkts; > }; > > +struct qed_ll2_tx_pkt_info { > + u8 num_of_bds; > + u16 vlan; > + u8 bd_flags; > + u16 l4_hdr_offset_w;/* from start of packet */ > + enum qed_ll2_tx_dest tx_dest; > + enum qed_ll2_roce_flavor_type qed_roce_flavor; > + dma_addr_t first_frag; > + u16 first_frag_len; > + bool enable_ip_cksum; > + bool enable_l4_cksum; > + bool calc_ip_len; > + void *cookie; > +}; > + This layout is extremely inefficient, with lots of padding in between struct members. Group small u8 members and u16 members together so that they consume full 32-bit areas so you can eliminate all of the padding.
[PATCH net-next 1/8] qed: LL2 to use packed information for tx
First step in revising the LL2 interface, this declares qed_ll2_tx_pkt_info as part of the ll2 interface, and uses it for transmission instead of receiving lots of parameters. Signed-off-by: Yuval Mintz --- drivers/net/ethernet/qlogic/qed/qed_ll2.c | 127 + drivers/net/ethernet/qlogic/qed/qed_ll2.h | 41 ++ drivers/net/ethernet/qlogic/qed/qed_roce.c | 18 ++-- include/linux/qed/qed_ll2_if.h | 27 ++ 4 files changed, 102 insertions(+), 111 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c index f67ed6d..0b75f5d 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c +++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c @@ -740,12 +740,13 @@ static void qed_ooo_submit_tx_buffers(struct qed_hwfn *p_hwfn, struct qed_ll2_info *p_ll2_conn) { + struct qed_ll2_tx_pkt_info tx_pkt; struct qed_ooo_buffer *p_buffer; - int rc; u16 l4_hdr_offset_w; dma_addr_t first_frag; u16 parse_flags; u8 bd_flags; + int rc; /* Submit Tx buffers here */ while ((p_buffer = qed_ooo_get_ready_buffer(p_hwfn, @@ -760,13 +761,18 @@ qed_ooo_submit_tx_buffers(struct qed_hwfn *p_hwfn, SET_FIELD(bd_flags, CORE_TX_BD_DATA_FORCE_VLAN_MODE, 1); SET_FIELD(bd_flags, CORE_TX_BD_DATA_L4_PROTOCOL, 1); - rc = qed_ll2_prepare_tx_packet(p_hwfn, p_ll2_conn->my_id, 1, - p_buffer->vlan, bd_flags, - l4_hdr_offset_w, - p_ll2_conn->conn.tx_dest, 0, - first_frag, - p_buffer->packet_length, - p_buffer, true); + memset(&tx_pkt, 0, sizeof(tx_pkt)); + tx_pkt.num_of_bds = 1; + tx_pkt.vlan = p_buffer->vlan; + tx_pkt.bd_flags = bd_flags; + tx_pkt.l4_hdr_offset_w = l4_hdr_offset_w; + tx_pkt.tx_dest = p_ll2_conn->conn.tx_dest; + tx_pkt.first_frag = first_frag; + tx_pkt.first_frag_len = p_buffer->packet_length; + tx_pkt.cookie = p_buffer; + + rc = qed_ll2_prepare_tx_packet(p_hwfn, p_ll2_conn->my_id, + &tx_pkt, true); if (rc) { qed_ooo_put_ready_buffer(p_hwfn, p_hwfn->p_ooo_info, p_buffer, false); @@ -1593,20 +1599,18 @@ int qed_ll2_post_rx_buffer(struct qed_hwfn *p_hwfn, static void qed_ll2_prepare_tx_packet_set(struct qed_hwfn *p_hwfn, struct qed_ll2_tx_queue *p_tx, struct qed_ll2_tx_packet *p_curp, - u8 num_of_bds, - dma_addr_t first_frag, - u16 first_frag_len, void *p_cookie, + struct qed_ll2_tx_pkt_info *pkt, u8 notify_fw) { list_del(&p_curp->list_entry); - p_curp->cookie = p_cookie; - p_curp->bd_used = num_of_bds; + p_curp->cookie = pkt->cookie; + p_curp->bd_used = pkt->num_of_bds; p_curp->notify_fw = notify_fw; p_tx->cur_send_packet = p_curp; p_tx->cur_send_frag_num = 0; - p_curp->bds_set[p_tx->cur_send_frag_num].tx_frag = first_frag; - p_curp->bds_set[p_tx->cur_send_frag_num].frag_len = first_frag_len; + p_curp->bds_set[p_tx->cur_send_frag_num].tx_frag = pkt->first_frag; + p_curp->bds_set[p_tx->cur_send_frag_num].frag_len = pkt->first_frag_len; p_tx->cur_send_frag_num++; } @@ -1614,32 +1618,33 @@ static void qed_ll2_prepare_tx_packet_set_bd(struct qed_hwfn *p_hwfn, struct qed_ll2_info *p_ll2, struct qed_ll2_tx_packet *p_curp, -u8 num_of_bds, -enum core_tx_dest tx_dest, -u16 vlan, -u8 bd_flags, -u16 l4_hdr_offset_w, -enum core_roce_flavor_type roce_flavor, -dma_addr_t first_frag, -u16 first_frag_len) +struct qed_ll2_tx_pkt_info *pkt) { struct qed_chain *p_tx_chain = &p_ll2->tx_queue.txq_chain; u16 prod_idx = qed_chain_get_prod_idx(p_tx_chain); struct core_tx_bd *start_bd = NULL; + enum core_roce_flavor_type roce_flavor; + enum core_tx_dest tx_dest; u16 bd_data = 0, frag_idx; + roce_flavor = (pkt