Re: [PATCH net-next 1/8] qed: LL2 to use packed information for tx

2017-06-09 Thread David Miller
From: "Mintz, Yuval" 
Date: Fri, 9 Jun 2017 07:08:54 +

> But one thing I thought of asking - do we consider layouts of relatively
> insignificant structures to be some golden coding standard?

It just shows lack of thought when writing things.

I want to see code that shows the author put some care into it
and didn't just slap things together.

Thank you.


RE: [PATCH net-next 1/8] qed: LL2 to use packed information for tx

2017-06-09 Thread David Laight
From: Mintz, Yuval
> Sent: 09 June 2017 08:52
> > From: David Laight [mailto:david.lai...@aculab.com]
> > Sent: Friday, June 09, 2017 10:28 AM
> > To: 'David Miller' ; Mintz, Yuval
> > 
> > Cc: netdev@vger.kernel.org; linux-r...@vger.kernel.org; Kalderon, Michal
> > 
> > Subject: RE: [PATCH net-next 1/8] qed: LL2 to use packed information for tx
> >
> > From: David Miller
> > > Sent: 09 June 2017 00:24
> > >
> > > From: Yuval Mintz 
> > > Date: Thu, 8 Jun 2017 19:13:16 +0300
> > >
> > > > @@ -67,6 +79,21 @@ struct qed_ll2_stats {
> > > > u64 sent_bcast_pkts;
> > > >  };
> > > >
> > > > +struct qed_ll2_tx_pkt_info {
> > > > +   u8 num_of_bds;
> > > > +   u16 vlan;
> > > > +   u8 bd_flags;
> > > > +   u16 l4_hdr_offset_w;/* from start of packet */
> > > > +   enum qed_ll2_tx_dest tx_dest;
> > > > +   enum qed_ll2_roce_flavor_type qed_roce_flavor;
> > > > +   dma_addr_t first_frag;
> > > > +   u16 first_frag_len;
> > > > +   bool enable_ip_cksum;
> > > > +   bool enable_l4_cksum;
> > > > +   bool calc_ip_len;
> > > > +   void *cookie;
> > > > +};
> > > > +
> > >
> > > This layout is extremely inefficient, with lots of padding in between
> > > struct members.
> > >
> > > Group small u8 members and u16 members together so that they consume
> > > full 32-bit areas so you can eliminate all of the padding.
> >
> > I'd also query the use of u16 sizes/lengths, any arithmetic on u16 (and u8)
> > variables is likely to generate extra code (on non-x86).
> > You are using 32 bits for the 'enum' - I bet the values fit in 8 bits, so 
> > aren't
> > really worried about size.
> >
> > If size did matter you can easily get the above down to 32 bytes.
> 
> You're right, and that's exactly the point -  since this is not data-path 
> critical
> I don't see why the size/efficiency should matter [greatly].

It is just good practise so that it happens automatically when it does matter.
Just swapping 'vlan' and 'bd_flags' would make it look much better.

David



RE: [PATCH net-next 1/8] qed: LL2 to use packed information for tx

2017-06-09 Thread Mintz, Yuval


> -Original Message-
> From: David Laight [mailto:david.lai...@aculab.com]
> Sent: Friday, June 09, 2017 10:28 AM
> To: 'David Miller' ; Mintz, Yuval
> 
> Cc: netdev@vger.kernel.org; linux-r...@vger.kernel.org; Kalderon, Michal
> 
> Subject: RE: [PATCH net-next 1/8] qed: LL2 to use packed information for tx
> 
> From: David Miller
> > Sent: 09 June 2017 00:24
> >
> > From: Yuval Mintz 
> > Date: Thu, 8 Jun 2017 19:13:16 +0300
> >
> > > @@ -67,6 +79,21 @@ struct qed_ll2_stats {
> > >   u64 sent_bcast_pkts;
> > >  };
> > >
> > > +struct qed_ll2_tx_pkt_info {
> > > + u8 num_of_bds;
> > > + u16 vlan;
> > > + u8 bd_flags;
> > > + u16 l4_hdr_offset_w;/* from start of packet */
> > > + enum qed_ll2_tx_dest tx_dest;
> > > + enum qed_ll2_roce_flavor_type qed_roce_flavor;
> > > + dma_addr_t first_frag;
> > > + u16 first_frag_len;
> > > + bool enable_ip_cksum;
> > > + bool enable_l4_cksum;
> > > + bool calc_ip_len;
> > > + void *cookie;
> > > +};
> > > +
> >
> > This layout is extremely inefficient, with lots of padding in between
> > struct members.
> >
> > Group small u8 members and u16 members together so that they consume
> > full 32-bit areas so you can eliminate all of the padding.
> 
> I'd also query the use of u16 sizes/lengths, any arithmetic on u16 (and u8)
> variables is likely to generate extra code (on non-x86).
> You are using 32 bits for the 'enum' - I bet the values fit in 8 bits, so 
> aren't
> really worried about size.
> 
> If size did matter you can easily get the above down to 32 bytes.

You're right, and that's exactly the point -  since this is not data-path 
critical
I don't see why the size/efficiency should matter [greatly].


RE: [PATCH net-next 1/8] qed: LL2 to use packed information for tx

2017-06-09 Thread David Laight
From: David Miller
> Sent: 09 June 2017 00:24
> 
> From: Yuval Mintz 
> Date: Thu, 8 Jun 2017 19:13:16 +0300
> 
> > @@ -67,6 +79,21 @@ struct qed_ll2_stats {
> > u64 sent_bcast_pkts;
> >  };
> >
> > +struct qed_ll2_tx_pkt_info {
> > +   u8 num_of_bds;
> > +   u16 vlan;
> > +   u8 bd_flags;
> > +   u16 l4_hdr_offset_w;/* from start of packet */
> > +   enum qed_ll2_tx_dest tx_dest;
> > +   enum qed_ll2_roce_flavor_type qed_roce_flavor;
> > +   dma_addr_t first_frag;
> > +   u16 first_frag_len;
> > +   bool enable_ip_cksum;
> > +   bool enable_l4_cksum;
> > +   bool calc_ip_len;
> > +   void *cookie;
> > +};
> > +
> 
> This layout is extremely inefficient, with lots of padding in between
> struct members.
> 
> Group small u8 members and u16 members together so that they consume
> full 32-bit areas so you can eliminate all of the padding.

I'd also query the use of u16 sizes/lengths, any arithmetic on u16 (and u8)
variables is likely to generate extra code (on non-x86).
You are using 32 bits for the 'enum' - I bet the values fit in 8 bits,
so aren't really worried about size.

If size did matter you can easily get the above down to 32 bytes.

David



RE: [PATCH net-next 1/8] qed: LL2 to use packed information for tx

2017-06-09 Thread Mintz, Yuval
> > +struct qed_ll2_tx_pkt_info {
> > +   u8 num_of_bds;
> > +   u16 vlan;
> > +   u8 bd_flags;
> > +   u16 l4_hdr_offset_w;/* from start of packet */
> > +   enum qed_ll2_tx_dest tx_dest;
> > +   enum qed_ll2_roce_flavor_type qed_roce_flavor;
> > +   dma_addr_t first_frag;
> > +   u16 first_frag_len;
> > +   bool enable_ip_cksum;
> > +   bool enable_l4_cksum;
> > +   bool calc_ip_len;
> > +   void *cookie;
> > +};
> > +
> 
> This layout is extremely inefficient, with lots of padding in between struct
> members.
> 
> Group small u8 members and u16 members together so that they consume
> full 32-bit areas so you can eliminate all of the padding.

While I can mend the holes, the total structure size doesn't really change:

struct qed_ll2_tx_pkt_info {
u8 num_of_bds;   /* 0 1 */

/* XXX 1 byte hole, try to pack */

u16vlan; /* 2 2 */
u8 bd_flags; /* 4 1 */

/* XXX 1 byte hole, try to pack */

u16l4_hdr_offset_w;  /* 6 2 */
enum qed_ll2_tx_dest   tx_dest;  /* 8 4 */
enum qed_ll2_roce_flavor_type qed_roce_flavor;   /*12 4 */
dma_addr_t first_frag;   /*16 8 */
u16first_frag_len;   /*24 2 */
bool   enable_ip_cksum;  /*26 1 */
bool   enable_l4_cksum;  /*27 1 */
bool   calc_ip_len;  /*28 1 */

/* XXX 3 bytes hole, try to pack */

void * cookie;   /*32 8 */

/* size: 40, cachelines: 1, members: 12 */
/* sum members: 35, holes: 3, sum holes: 5 */
/* last cacheline: 40 bytes */
};

Becomes:

struct qed_ll2_tx_pkt_info {
void * cookie;   /* 0 8 */
dma_addr_t first_frag;   /* 8 8 */
enum qed_ll2_tx_dest   tx_dest;  /*16 4 */
enum qed_ll2_roce_flavor_type qed_roce_flavor;   /*20 4 */
u16vlan; /*24 2 */
u16l4_hdr_offset_w;  /*26 2 */
u16first_frag_len;   /*28 2 */
u8 num_of_bds;   /*30 1 */
u8 bd_flags; /*31 1 */
bool   enable_ip_cksum;  /*32 1 */
bool   enable_l4_cksum;  /*33 1 */
bool   calc_ip_len;  /*34 1 */

/* size: 40, cachelines: 1, members: 12 */
/* padding: 5 */
/* last cacheline: 40 bytes */
};

I'm going to send the changed version in V2 as as there's no harm to it,
[+ we *can* reduce the size of qed_ll2_comp_rx_data you commented
for patch #2 in series]

But one thing I thought of asking - do we consider layouts of relatively
insignificant structures to be some golden coding standard?


Re: [PATCH net-next 1/8] qed: LL2 to use packed information for tx

2017-06-08 Thread David Miller
From: Yuval Mintz 
Date: Thu, 8 Jun 2017 19:13:16 +0300

> @@ -67,6 +79,21 @@ struct qed_ll2_stats {
>   u64 sent_bcast_pkts;
>  };
>  
> +struct qed_ll2_tx_pkt_info {
> + u8 num_of_bds;
> + u16 vlan;
> + u8 bd_flags;
> + u16 l4_hdr_offset_w;/* from start of packet */
> + enum qed_ll2_tx_dest tx_dest;
> + enum qed_ll2_roce_flavor_type qed_roce_flavor;
> + dma_addr_t first_frag;
> + u16 first_frag_len;
> + bool enable_ip_cksum;
> + bool enable_l4_cksum;
> + bool calc_ip_len;
> + void *cookie;
> +};
> +

This layout is extremely inefficient, with lots of padding in between
struct members.

Group small u8 members and u16 members together so that they consume
full 32-bit areas so you can eliminate all of the padding.


[PATCH net-next 1/8] qed: LL2 to use packed information for tx

2017-06-08 Thread Yuval Mintz
First step in revising the LL2 interface, this declares
qed_ll2_tx_pkt_info as part of the ll2 interface, and uses it for
transmission instead of receiving lots of parameters.

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_ll2.c  | 127 +
 drivers/net/ethernet/qlogic/qed/qed_ll2.h  |  41 ++
 drivers/net/ethernet/qlogic/qed/qed_roce.c |  18 ++--
 include/linux/qed/qed_ll2_if.h |  27 ++
 4 files changed, 102 insertions(+), 111 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c 
b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
index f67ed6d..0b75f5d 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
@@ -740,12 +740,13 @@ static void
 qed_ooo_submit_tx_buffers(struct qed_hwfn *p_hwfn,
  struct qed_ll2_info *p_ll2_conn)
 {
+   struct qed_ll2_tx_pkt_info tx_pkt;
struct qed_ooo_buffer *p_buffer;
-   int rc;
u16 l4_hdr_offset_w;
dma_addr_t first_frag;
u16 parse_flags;
u8 bd_flags;
+   int rc;
 
/* Submit Tx buffers here */
while ((p_buffer = qed_ooo_get_ready_buffer(p_hwfn,
@@ -760,13 +761,18 @@ qed_ooo_submit_tx_buffers(struct qed_hwfn *p_hwfn,
SET_FIELD(bd_flags, CORE_TX_BD_DATA_FORCE_VLAN_MODE, 1);
SET_FIELD(bd_flags, CORE_TX_BD_DATA_L4_PROTOCOL, 1);
 
-   rc = qed_ll2_prepare_tx_packet(p_hwfn, p_ll2_conn->my_id, 1,
-  p_buffer->vlan, bd_flags,
-  l4_hdr_offset_w,
-  p_ll2_conn->conn.tx_dest, 0,
-  first_frag,
-  p_buffer->packet_length,
-  p_buffer, true);
+   memset(&tx_pkt, 0, sizeof(tx_pkt));
+   tx_pkt.num_of_bds = 1;
+   tx_pkt.vlan = p_buffer->vlan;
+   tx_pkt.bd_flags = bd_flags;
+   tx_pkt.l4_hdr_offset_w = l4_hdr_offset_w;
+   tx_pkt.tx_dest = p_ll2_conn->conn.tx_dest;
+   tx_pkt.first_frag = first_frag;
+   tx_pkt.first_frag_len = p_buffer->packet_length;
+   tx_pkt.cookie = p_buffer;
+
+   rc = qed_ll2_prepare_tx_packet(p_hwfn, p_ll2_conn->my_id,
+  &tx_pkt, true);
if (rc) {
qed_ooo_put_ready_buffer(p_hwfn, p_hwfn->p_ooo_info,
 p_buffer, false);
@@ -1593,20 +1599,18 @@ int qed_ll2_post_rx_buffer(struct qed_hwfn *p_hwfn,
 static void qed_ll2_prepare_tx_packet_set(struct qed_hwfn *p_hwfn,
  struct qed_ll2_tx_queue *p_tx,
  struct qed_ll2_tx_packet *p_curp,
- u8 num_of_bds,
- dma_addr_t first_frag,
- u16 first_frag_len, void *p_cookie,
+ struct qed_ll2_tx_pkt_info *pkt,
  u8 notify_fw)
 {
list_del(&p_curp->list_entry);
-   p_curp->cookie = p_cookie;
-   p_curp->bd_used = num_of_bds;
+   p_curp->cookie = pkt->cookie;
+   p_curp->bd_used = pkt->num_of_bds;
p_curp->notify_fw = notify_fw;
p_tx->cur_send_packet = p_curp;
p_tx->cur_send_frag_num = 0;
 
-   p_curp->bds_set[p_tx->cur_send_frag_num].tx_frag = first_frag;
-   p_curp->bds_set[p_tx->cur_send_frag_num].frag_len = first_frag_len;
+   p_curp->bds_set[p_tx->cur_send_frag_num].tx_frag = pkt->first_frag;
+   p_curp->bds_set[p_tx->cur_send_frag_num].frag_len = pkt->first_frag_len;
p_tx->cur_send_frag_num++;
 }
 
@@ -1614,32 +1618,33 @@ static void
 qed_ll2_prepare_tx_packet_set_bd(struct qed_hwfn *p_hwfn,
 struct qed_ll2_info *p_ll2,
 struct qed_ll2_tx_packet *p_curp,
-u8 num_of_bds,
-enum core_tx_dest tx_dest,
-u16 vlan,
-u8 bd_flags,
-u16 l4_hdr_offset_w,
-enum core_roce_flavor_type roce_flavor,
-dma_addr_t first_frag,
-u16 first_frag_len)
+struct qed_ll2_tx_pkt_info *pkt)
 {
struct qed_chain *p_tx_chain = &p_ll2->tx_queue.txq_chain;
u16 prod_idx = qed_chain_get_prod_idx(p_tx_chain);
struct core_tx_bd *start_bd = NULL;
+   enum core_roce_flavor_type roce_flavor;
+   enum core_tx_dest tx_dest;
u16 bd_data = 0, frag_idx;
 
+   roce_flavor = (pkt