Re: [PATCH net-next v1 1/4] etf: Cancel timer if there are no pending skbs

2018-12-03 Thread Vinicius Costa Gomes
Hi,

Vinicius Costa Gomes  writes:

> From: Jesus Sanchez-Palencia 
>
> There is no point in firing the qdisc watchdog if there are no future
> skbs pending in the queue and the watchdog had been set previously.
>
> Signed-off-by: Jesus Sanchez-Palencia 

It seems that I made a mistake when submitting this series.

It should have been proposed to the net tree (instead of net-next) and
should have included a "Fixes" tag.

Is there a way to have this series considered for the net tree at this
point (given that it already landed on net-next)?


Cheers,
--
Vinicius


[next-queue PATCH v1 2/2] Documentation: igb: Add a section about CBS

2018-11-16 Thread Vinicius Costa Gomes
Add some pointers to the definition of the CBS algorithm, and some
notes about the limits of its implementation in the i210 family of
controllers.

Signed-off-by: Vinicius Costa Gomes 
---
 Documentation/networking/igb.rst | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/Documentation/networking/igb.rst b/Documentation/networking/igb.rst
index ba16b86d5593..e87a4a72ea2d 100644
--- a/Documentation/networking/igb.rst
+++ b/Documentation/networking/igb.rst
@@ -177,6 +177,25 @@ rate limit using the IProute2 tool. Download the latest 
version of the
 IProute2 tool from Sourceforge if your version does not have all the features
 you require.
 
+Credit Based Shaper (Qav Mode)
+--
+When enabling the CBS qdisc in the hardware offload mode, traffic shaping using
+the CBS (described in the IEEE 802.1Q-2018 Section 8.6.8.2 and discussed in the
+Annex L) algorithm will run in the i210 controller, so it's more accurate and
+uses less CPU.
+
+When using offloaded CBS, and the traffic rate obeys the configured rate
+(doesn't go above it), CBS should have little to no effect in the latency.
+
+The offloaded version of the algorithm has some limits, caused by how the idle
+slope is expressed in the adapter's registers. It can only represent idle 
slopes
+in 16.38431 kbps units, which means that if a idle slope of 2576kbps is
+requested, the controller will be configured to use a idle slope of ~2589 kbps,
+because the driver rounds the value up. For more details, see the comments on
+:c:func:`igb_config_tx_modes()`.
+
+NOTE: This feature is exclusive to i210 models.
+
 
 Support
 ===
-- 
2.19.1



[next-queue PATCH v1 1/2] igb: Change RXPBSIZE size when setting Qav mode

2018-11-16 Thread Vinicius Costa Gomes
From: Jesus Sanchez-Palencia 

Section 4.5.9 of the datasheet says that the total size of all packet
buffers combined (TxPB 0 + 1 + 2 + 3 + RxPB + BMC2OS + OS2BMC) must not
exceed 60KB. Today we are configuring a total of 62KB, so reduce the
RxPB from 32KB to 30KB in order to respect that.

The choice of changing RxPBSIZE here is mainly because it seems more
correct to give more priority to the transmit packet buffers over the
receiver ones when running in Qav mode. Also, the BMC2OS and OS2BMC
sizes are already too short.

Signed-off-by: Jesus Sanchez-Palencia 
---
 drivers/net/ethernet/intel/igb/e1000_defines.h | 1 +
 drivers/net/ethernet/intel/igb/igb_main.c  | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h 
b/drivers/net/ethernet/intel/igb/e1000_defines.h
index 8a28f3388f69..01fcfc6f3415 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -334,6 +334,7 @@
 
 #define I210_RXPBSIZE_DEFAULT  0x00A2 /* RXPBSIZE default */
 #define I210_RXPBSIZE_MASK 0x003F
+#define I210_RXPBSIZE_PB_30KB  0x001E
 #define I210_RXPBSIZE_PB_32KB  0x0020
 #define I210_TXPBSIZE_DEFAULT  0x0414 /* TXPBSIZE default */
 #define I210_TXPBSIZE_MASK 0xC0FF
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index bb4f3f64fbf0..e135adf46980 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -1934,7 +1934,7 @@ static void igb_setup_tx_mode(struct igb_adapter *adapter)
 
val = rd32(E1000_RXPBS);
val &= ~I210_RXPBSIZE_MASK;
-   val |= I210_RXPBSIZE_PB_32KB;
+   val |= I210_RXPBSIZE_PB_30KB;
wr32(E1000_RXPBS, val);
 
/* Section 8.12.9 states that MAX_TPKT_SIZE from DTXMXPKTSZ
-- 
2.19.1



[PATCH net-next v1 3/4] etf: Split timersortedlist_erase()

2018-11-14 Thread Vinicius Costa Gomes
From: Jesus Sanchez-Palencia 

This is just a refactor that will simplify the implementation of the
next patch in this series which will drop all expired packets on the
dequeue flow.

Signed-off-by: Jesus Sanchez-Palencia 
---
 net/sched/sch_etf.c | 44 +---
 1 file changed, 29 insertions(+), 15 deletions(-)

diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
index 52452b546564..bfe04748d5f0 100644
--- a/net/sched/sch_etf.c
+++ b/net/sched/sch_etf.c
@@ -190,10 +190,10 @@ static int etf_enqueue_timesortedlist(struct sk_buff 
*nskb, struct Qdisc *sch,
return NET_XMIT_SUCCESS;
 }
 
-static void timesortedlist_erase(struct Qdisc *sch, struct sk_buff *skb,
-bool drop)
+static void timesortedlist_drop(struct Qdisc *sch, struct sk_buff *skb)
 {
struct etf_sched_data *q = qdisc_priv(sch);
+   struct sk_buff *to_free = NULL;
 
rb_erase_cached(>rbnode, >head);
 
@@ -206,19 +206,33 @@ static void timesortedlist_erase(struct Qdisc *sch, 
struct sk_buff *skb,
 
qdisc_qstats_backlog_dec(sch, skb);
 
-   if (drop) {
-   struct sk_buff *to_free = NULL;
+   report_sock_error(skb, ECANCELED, SO_EE_CODE_TXTIME_MISSED);
 
-   report_sock_error(skb, ECANCELED, SO_EE_CODE_TXTIME_MISSED);
+   qdisc_drop(skb, sch, _free);
+   kfree_skb_list(to_free);
+   qdisc_qstats_overlimit(sch);
 
-   qdisc_drop(skb, sch, _free);
-   kfree_skb_list(to_free);
-   qdisc_qstats_overlimit(sch);
-   } else {
-   qdisc_bstats_update(sch, skb);
+   sch->q.qlen--;
+}
 
-   q->last = skb->tstamp;
-   }
+static void timesortedlist_remove(struct Qdisc *sch, struct sk_buff *skb)
+{
+   struct etf_sched_data *q = qdisc_priv(sch);
+
+   rb_erase_cached(>rbnode, >head);
+
+   /* The rbnode field in the skb re-uses these fields, now that
+* we are done with the rbnode, reset them.
+*/
+   skb->next = NULL;
+   skb->prev = NULL;
+   skb->dev = qdisc_dev(sch);
+
+   qdisc_qstats_backlog_dec(sch, skb);
+
+   qdisc_bstats_update(sch, skb);
+
+   q->last = skb->tstamp;
 
sch->q.qlen--;
 }
@@ -237,7 +251,7 @@ static struct sk_buff *etf_dequeue_timesortedlist(struct 
Qdisc *sch)
 
/* Drop if packet has expired while in queue. */
if (ktime_before(skb->tstamp, now)) {
-   timesortedlist_erase(sch, skb, true);
+   timesortedlist_drop(sch, skb);
skb = NULL;
goto out;
}
@@ -246,7 +260,7 @@ static struct sk_buff *etf_dequeue_timesortedlist(struct 
Qdisc *sch)
 * txtime from deadline to (now + delta).
 */
if (q->deadline_mode) {
-   timesortedlist_erase(sch, skb, false);
+   timesortedlist_remove(sch, skb);
skb->tstamp = now;
goto out;
}
@@ -255,7 +269,7 @@ static struct sk_buff *etf_dequeue_timesortedlist(struct 
Qdisc *sch)
 
/* Dequeue only if now is within the [txtime - delta, txtime] range. */
if (ktime_after(now, next))
-   timesortedlist_erase(sch, skb, false);
+   timesortedlist_remove(sch, skb);
else
skb = NULL;
 
-- 
2.19.1



[PATCH net-next v1 1/4] etf: Cancel timer if there are no pending skbs

2018-11-14 Thread Vinicius Costa Gomes
From: Jesus Sanchez-Palencia 

There is no point in firing the qdisc watchdog if there are no future
skbs pending in the queue and the watchdog had been set previously.

Signed-off-by: Jesus Sanchez-Palencia 
---
 net/sched/sch_etf.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
index 1538d6fa8165..fa85b24ac794 100644
--- a/net/sched/sch_etf.c
+++ b/net/sched/sch_etf.c
@@ -117,8 +117,10 @@ static void reset_watchdog(struct Qdisc *sch)
struct sk_buff *skb = etf_peek_timesortedlist(sch);
ktime_t next;
 
-   if (!skb)
+   if (!skb) {
+   qdisc_watchdog_cancel(>watchdog);
return;
+   }
 
next = ktime_sub_ns(skb->tstamp, q->delta);
qdisc_watchdog_schedule_ns(>watchdog, ktime_to_ns(next));
-- 
2.19.1



[PATCH net-next v1 4/4] etf: Drop all expired packets

2018-11-14 Thread Vinicius Costa Gomes
From: Jesus Sanchez-Palencia 

Currently on dequeue() ETF only drops the first expired packet, which
causes a problem if the next packet is already expired. When this
happens, the watchdog will be configured with a time in the past, fire
straight way and the packet will finally be dropped once the dequeue()
function of the qdisc is called again.

We can save quite a few cycles and improve the overall behavior of the
qdisc if we drop all expired packets if the next packet is expired.
This should allow ETF to recover faster from bad situations. But
packet drops are still a very serious warning that the requirements
imposed on the system aren't reasonable.

This was inspired by how the implementation of hrtimers use the
rb_tree inside the kernel.

Signed-off-by: Jesus Sanchez-Palencia 
---
 net/sched/sch_etf.c | 36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
index bfe04748d5f0..1150f22983df 100644
--- a/net/sched/sch_etf.c
+++ b/net/sched/sch_etf.c
@@ -190,29 +190,35 @@ static int etf_enqueue_timesortedlist(struct sk_buff 
*nskb, struct Qdisc *sch,
return NET_XMIT_SUCCESS;
 }
 
-static void timesortedlist_drop(struct Qdisc *sch, struct sk_buff *skb)
+static void timesortedlist_drop(struct Qdisc *sch, struct sk_buff *skb,
+   ktime_t now)
 {
struct etf_sched_data *q = qdisc_priv(sch);
struct sk_buff *to_free = NULL;
+   struct sk_buff *tmp = NULL;
 
-   rb_erase_cached(>rbnode, >head);
+   skb_rbtree_walk_from_safe(skb, tmp) {
+   if (ktime_after(skb->tstamp, now))
+   break;
 
-   /* The rbnode field in the skb re-uses these fields, now that
-* we are done with the rbnode, reset them.
-*/
-   skb->next = NULL;
-   skb->prev = NULL;
-   skb->dev = qdisc_dev(sch);
+   rb_erase_cached(>rbnode, >head);
 
-   qdisc_qstats_backlog_dec(sch, skb);
+   /* The rbnode field in the skb re-uses these fields, now that
+* we are done with the rbnode, reset them.
+*/
+   skb->next = NULL;
+   skb->prev = NULL;
+   skb->dev = qdisc_dev(sch);
 
-   report_sock_error(skb, ECANCELED, SO_EE_CODE_TXTIME_MISSED);
+   report_sock_error(skb, ECANCELED, SO_EE_CODE_TXTIME_MISSED);
 
-   qdisc_drop(skb, sch, _free);
-   kfree_skb_list(to_free);
-   qdisc_qstats_overlimit(sch);
+   qdisc_qstats_backlog_dec(sch, skb);
+   qdisc_drop(skb, sch, _free);
+   qdisc_qstats_overlimit(sch);
+   sch->q.qlen--;
+   }
 
-   sch->q.qlen--;
+   kfree_skb_list(to_free);
 }
 
 static void timesortedlist_remove(struct Qdisc *sch, struct sk_buff *skb)
@@ -251,7 +257,7 @@ static struct sk_buff *etf_dequeue_timesortedlist(struct 
Qdisc *sch)
 
/* Drop if packet has expired while in queue. */
if (ktime_before(skb->tstamp, now)) {
-   timesortedlist_drop(sch, skb);
+   timesortedlist_drop(sch, skb, now);
skb = NULL;
goto out;
}
-- 
2.19.1



[PATCH net-next v1 2/4] etf: Use cached rb_root

2018-11-14 Thread Vinicius Costa Gomes
From: Jesus Sanchez-Palencia 

ETF's peek() operation is heavily used so use an rb_root_cached instead
and leverage rb_first_cached() which will run in O(1) instead of
O(log n).

Even if on 'timesortedlist_clear()' we could be using rb_erase(), we
choose to use rb_erase_cached(), because if in the future we allow
runtime changes to ETF parameters, and need to do a '_clear()', this
might cause some hard to debug issues.

Signed-off-by: Jesus Sanchez-Palencia 
---
 net/sched/sch_etf.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
index fa85b24ac794..52452b546564 100644
--- a/net/sched/sch_etf.c
+++ b/net/sched/sch_etf.c
@@ -30,7 +30,7 @@ struct etf_sched_data {
int queue;
s32 delta; /* in ns */
ktime_t last; /* The txtime of the last skb sent to the netdevice. */
-   struct rb_root head;
+   struct rb_root_cached head;
struct qdisc_watchdog watchdog;
ktime_t (*get_time)(void);
 };
@@ -104,7 +104,7 @@ static struct sk_buff *etf_peek_timesortedlist(struct Qdisc 
*sch)
struct etf_sched_data *q = qdisc_priv(sch);
struct rb_node *p;
 
-   p = rb_first(>head);
+   p = rb_first_cached(>head);
if (!p)
return NULL;
 
@@ -156,8 +156,9 @@ static int etf_enqueue_timesortedlist(struct sk_buff *nskb, 
struct Qdisc *sch,
  struct sk_buff **to_free)
 {
struct etf_sched_data *q = qdisc_priv(sch);
-   struct rb_node **p = >head.rb_node, *parent = NULL;
+   struct rb_node **p = >head.rb_root.rb_node, *parent = NULL;
ktime_t txtime = nskb->tstamp;
+   bool leftmost = true;
 
if (!is_packet_valid(sch, nskb)) {
report_sock_error(nskb, EINVAL,
@@ -170,13 +171,15 @@ static int etf_enqueue_timesortedlist(struct sk_buff 
*nskb, struct Qdisc *sch,
 
parent = *p;
skb = rb_to_skb(parent);
-   if (ktime_after(txtime, skb->tstamp))
+   if (ktime_after(txtime, skb->tstamp)) {
p = >rb_right;
-   else
+   leftmost = false;
+   } else {
p = >rb_left;
+   }
}
rb_link_node(>rbnode, parent, p);
-   rb_insert_color(>rbnode, >head);
+   rb_insert_color_cached(>rbnode, >head, leftmost);
 
qdisc_qstats_backlog_inc(sch, nskb);
sch->q.qlen++;
@@ -192,7 +195,7 @@ static void timesortedlist_erase(struct Qdisc *sch, struct 
sk_buff *skb,
 {
struct etf_sched_data *q = qdisc_priv(sch);
 
-   rb_erase(>rbnode, >head);
+   rb_erase_cached(>rbnode, >head);
 
/* The rbnode field in the skb re-uses these fields, now that
 * we are done with the rbnode, reset them.
@@ -388,14 +391,14 @@ static int etf_init(struct Qdisc *sch, struct nlattr *opt,
 static void timesortedlist_clear(struct Qdisc *sch)
 {
struct etf_sched_data *q = qdisc_priv(sch);
-   struct rb_node *p = rb_first(>head);
+   struct rb_node *p = rb_first_cached(>head);
 
while (p) {
struct sk_buff *skb = rb_to_skb(p);
 
p = rb_next(p);
 
-   rb_erase(>rbnode, >head);
+   rb_erase_cached(>rbnode, >head);
rtnl_kfree_skbs(skb, skb);
sch->q.qlen--;
}
-- 
2.19.1



Re: [Intel-wired-lan] [RFC PATCH 1/4] ptp: add PTP_SYS_OFFSET_EXTENDED ioctl

2018-10-26 Thread Vinicius Costa Gomes
Hi Miroslav,

Miroslav Lichvar  writes:

> The PTP_SYS_OFFSET ioctl, which can be used to measure the offset
> between a PHC and the system clock, includes the total time that the
> gettime64 function of a driver needs to read the PHC timestamp.
>
> This typically involves reading of multiple PCI registers (sometimes in
> multiple iterations) and the register that contains the lowest bits of
> the timestamp is not read in the middle between the two readings of the
> system clock. This asymmetry causes the measured offset to have a
> significant error.
>
> Introduce a new ioctl, driver function, and helper functions, which
> allow the reading of the lowest register to be isolated from the other
> readings in order to reduce the asymmetry. The ioctl and driver function
> return three timestamps for each measurement:
> - system time right before reading the lowest bits of the PHC timestamp
> - PHC time
> - system time immediately after reading the lowest bits of the PHC
>   timestamp

Cool stuff!

Just one little thing below. Feel free to add my ack to the series.

Acked-by: Vinicius Costa Gomes 

>
> Cc: Richard Cochran 
> Cc: Jacob Keller 
> Signed-off-by: Miroslav Lichvar 
> ---
>  drivers/ptp/ptp_chardev.c| 39 
>  include/linux/ptp_clock_kernel.h | 26 +
>  include/uapi/linux/ptp_clock.h   | 12 ++
>  3 files changed, 77 insertions(+)
>
> diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
> index 2012551d93e0..1a04c437fd4f 100644
> --- a/drivers/ptp/ptp_chardev.c
> +++ b/drivers/ptp/ptp_chardev.c
> @@ -124,11 +124,13 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int 
> cmd, unsigned long arg)
>   struct ptp_clock_caps caps;
>   struct ptp_clock_request req;
>   struct ptp_sys_offset *sysoff = NULL;
> + struct ptp_sys_offset_extended *sysoff_extended = NULL;
>   struct ptp_sys_offset_precise precise_offset;
>   struct ptp_pin_desc pd;
>   struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
>   struct ptp_clock_info *ops = ptp->info;
>   struct ptp_clock_time *pct;
> + struct ptp_system_timestamp sts;
>   struct timespec64 ts;
>   struct system_device_crosststamp xtstamp;
>   int enable, err = 0;
> @@ -211,6 +213,43 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
> unsigned long arg)
>   err = -EFAULT;
>   break;
>  
> + case PTP_SYS_OFFSET_EXTENDED:
> + if (!ptp->info->gettimex64) {
> + err = -EOPNOTSUPP;
> + break;
> + }
> + sysoff_extended = memdup_user((void __user *)arg,
> +   sizeof(*sysoff_extended));

Looks like you forgot to free 'sysoff_extended', no? 


--
Vinicius


[PATCH iproute2 net-next v3 3/6] libnetlink: Add helper for getting a __s32 from netlink msgs

2018-10-05 Thread Vinicius Costa Gomes
From: Jesus Sanchez-Palencia 

This function retrieves a signed 32-bit integer from a netlink message
and returns it.

Signed-off-by: Jesus Sanchez-Palencia 
---
 include/libnetlink.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/libnetlink.h b/include/libnetlink.h
index ffc49e56..c44c3c72 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -185,6 +185,10 @@ static inline __u64 rta_getattr_u64(const struct rtattr 
*rta)
memcpy(, RTA_DATA(rta), sizeof(__u64));
return tmp;
 }
+static inline __s32 rta_getattr_s32(const struct rtattr *rta)
+{
+   return *(__s32 *)RTA_DATA(rta);
+}
 static inline __s64 rta_getattr_s64(const struct rtattr *rta)
 {
__s64 tmp;
-- 
2.19.0



[PATCH iproute2 net-next v3 4/6] include: add definitions for taprio [DO NOT COMMIT]

2018-10-05 Thread Vinicius Costa Gomes
DO NOT COMMIT

This patch exists only to ease the testing, until this header is
updated with the definitions from the kernel.

Signed-off-by: Vinicius Costa Gomes 
---
 include/uapi/linux/pkt_sched.h | 52 --
 1 file changed, 49 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 8975fd1a..89ee47c2 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -395,9 +395,9 @@ enum {
 struct tc_htb_xstats {
__u32 lends;
__u32 borrows;
-   __u32 giants;   /* too big packets (rate will not be accurate) */
-   __u32 tokens;
-   __u32 ctokens;
+   __u32 giants;   /* unused since 'Make HTB scheduler work with TSO.' */
+   __s32 tokens;
+   __s32 ctokens;
 };
 
 /* HFSC section */
@@ -1084,4 +1084,50 @@ enum {
CAKE_ATM_MAX
 };
 
+
+/* TAPRIO */
+enum {
+   TC_TAPRIO_CMD_SET_GATES = 0x00,
+   TC_TAPRIO_CMD_SET_AND_HOLD = 0x01,
+   TC_TAPRIO_CMD_SET_AND_RELEASE = 0x02,
+};
+
+enum {
+   TCA_TAPRIO_SCHED_ENTRY_UNSPEC,
+   TCA_TAPRIO_SCHED_ENTRY_INDEX, /* u32 */
+   TCA_TAPRIO_SCHED_ENTRY_CMD, /* u8 */
+   TCA_TAPRIO_SCHED_ENTRY_GATE_MASK, /* u32 */
+   TCA_TAPRIO_SCHED_ENTRY_INTERVAL, /* u32 */
+   __TCA_TAPRIO_SCHED_ENTRY_MAX,
+};
+#define TCA_TAPRIO_SCHED_ENTRY_MAX (__TCA_TAPRIO_SCHED_ENTRY_MAX - 1)
+
+/* The format for schedule entry list is:
+ * [TCA_TAPRIO_SCHED_ENTRY_LIST]
+ *   [TCA_TAPRIO_SCHED_ENTRY]
+ * [TCA_TAPRIO_SCHED_ENTRY_CMD]
+ * [TCA_TAPRIO_SCHED_ENTRY_GATES]
+ * [TCA_TAPRIO_SCHED_ENTRY_INTERVAL]
+ */
+enum {
+   TCA_TAPRIO_SCHED_UNSPEC,
+   TCA_TAPRIO_SCHED_ENTRY,
+   __TCA_TAPRIO_SCHED_MAX,
+};
+
+#define TCA_TAPRIO_SCHED_MAX (__TCA_TAPRIO_SCHED_MAX - 1)
+
+enum {
+   TCA_TAPRIO_ATTR_UNSPEC,
+   TCA_TAPRIO_ATTR_PRIOMAP, /* struct tc_mqprio_qopt */
+   TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST, /* nested of entry */
+   TCA_TAPRIO_ATTR_SCHED_BASE_TIME, /* s64 */
+   TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY, /* single entry */
+   TCA_TAPRIO_ATTR_SCHED_CLOCKID, /* s32 */
+   TCA_TAPRIO_PAD,
+   __TCA_TAPRIO_ATTR_MAX,
+};
+
+#define TCA_TAPRIO_ATTR_MAX (__TCA_TAPRIO_ATTR_MAX - 1)
+
 #endif
-- 
2.19.0



[PATCH iproute2 net-next v3 0/6] Introduce the taprio scheduler

2018-10-05 Thread Vinicius Costa Gomes
Hi,

Changes from v2:
  - Used the variable name (instead of the variable type) in a
sizeof() operator in patch 2/6 (Ilias Apalodimas);

Changes from v1:
  - Removed references to the "H" (Set-Gates-And-Hold-MAC) and "R"
(Set-Gates-And-Release-MAC) commands, as these commands will only
be used when Frame Preemption support is added (David Ahern);
  - Moved the functions that print and read commands to be closer (David
Ahern);

Changes from RFC:
  - Removed support for the sched-file parameter, in favor of
supporting the batch mode feature;

This is the iproute2 side of the taprio v1 series.

Please see the kernel side cover letter for more information about how
to test this.

Cheers,
--
Vinicius

Jesus Sanchez-Palencia (1):
  libnetlink: Add helper for getting a __s32 from netlink msgs

Vinicius Costa Gomes (5):
  utils: Implement get_s64()
  include: Add helper to retrieve a __s64 from a netlink msg
  include: add definitions for taprio [DO NOT COMMIT]
  tc: Add support for configuring the taprio scheduler
  taprio: Add manpage for tc-taprio(8)

 include/libnetlink.h   |  11 +
 include/uapi/linux/pkt_sched.h |  52 -
 include/utils.h|   1 +
 lib/utils.c|  21 ++
 man/man8/tc-taprio.8   | 142 
 tc/Makefile|   1 +
 tc/q_taprio.c  | 400 +
 7 files changed, 625 insertions(+), 3 deletions(-)
 create mode 100644 man/man8/tc-taprio.8
 create mode 100644 tc/q_taprio.c

-- 
2.19.0



[PATCH iproute2 net-next v3 5/6] tc: Add support for configuring the taprio scheduler

2018-10-05 Thread Vinicius Costa Gomes
This traffic scheduler allows traffic classes states (transmission
allowed/not allowed, in the simplest case) to be scheduled, according
to a pre-generated time sequence. This is the basis of the IEEE
802.1Qbv specification.

Example configuration:

tc qdisc replace dev enp3s0 parent root handle 100 taprio \
  num_tc 3 \
  map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
  queues 1@0 1@1 2@2 \
  base-time 1528743495910289987 \
  sched-entry S 01 30 \
  sched-entry S 02 30 \
  sched-entry S 04 30 \
  clockid CLOCK_TAI

The configuration format is similar to mqprio. The main difference is
the presence of a schedule, built by multiple "sched-entry"
definitions, each entry has the following format:

 sched-entry   

The only supported  is "S", which means "SetGateStates",
following the IEEE 802.1Qbv-2015 definition (Table 8-6). 
is a bitmask where each bit is a associated with a traffic class, so
bit 0 (the least significant bit) being "on" means that traffic class
0 is "active" for that schedule entry.  is a time duration
in nanoseconds that specifies for how long that state defined by 
and  should be held before moving to the next entry.

This schedule is circular, that is, after the last entry is executed
it starts from the first one, indefinitely.

The other parameters can be defined as follows:

 - base-time: specifies the instant when the schedule starts, if
  'base-time' is a time in the past, the schedule will start at

  base-time + (N * cycle-time)

   where N is the smallest integer so the resulting time is greater
   than "now", and "cycle-time" is the sum of all the intervals of the
   entries in the schedule;

 - clockid: specifies the reference clock to be used;

The parameters should be similar to what the IEEE 802.1Q family of
specification defines.

Signed-off-by: Vinicius Costa Gomes 
Signed-off-by: Jesus Sanchez-Palencia 
---
 tc/Makefile   |   1 +
 tc/q_taprio.c | 400 ++
 2 files changed, 401 insertions(+)
 create mode 100644 tc/q_taprio.c

diff --git a/tc/Makefile b/tc/Makefile
index 5a1a7ff9..25a28284 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -74,6 +74,7 @@ TCMODULES += e_bpf.o
 TCMODULES += f_matchall.o
 TCMODULES += q_cbs.o
 TCMODULES += q_etf.o
+TCMODULES += q_taprio.o
 
 TCSO :=
 ifeq ($(TC_CONFIG_ATM),y)
diff --git a/tc/q_taprio.c b/tc/q_taprio.c
new file mode 100644
index ..562dacb8
--- /dev/null
+++ b/tc/q_taprio.c
@@ -0,0 +1,400 @@
+/*
+ * q_taprio.c  Time Aware Priority Scheduler
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors:Vinicius Costa Gomes 
+ * Jesus Sanchez-Palencia 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+#include "tc_util.h"
+#include "list.h"
+
+struct sched_entry {
+   struct list_head list;
+   uint32_t index;
+   uint32_t interval;
+   uint32_t gatemask;
+   uint8_t cmd;
+};
+
+#define CLOCKID_INVALID (-1)
+static const struct static_clockid {
+   const char *name;
+   clockid_t clockid;
+} clockids_sysv[] = {
+   { "REALTIME", CLOCK_REALTIME },
+   { "TAI", CLOCK_TAI },
+   { "BOOTTIME", CLOCK_BOOTTIME },
+   { "MONOTONIC", CLOCK_MONOTONIC },
+   { NULL }
+};
+
+static void explain(void)
+{
+   fprintf(stderr, "Usage: ... taprio clockid CLOCKID\n");
+   fprintf(stderr, "  [num_tc NUMBER] [map P0 P1 ...] ");
+   fprintf(stderr, "  [queues COUNT@OFFSET COUNT@OFFSET 
COUNT@OFFSET ...] ");
+   fprintf(stderr, "  [ [sched-entry index cmd gate-mask 
interval] ... ] ");
+   fprintf(stderr, "  [base-time time] ");
+   fprintf(stderr, "\nCLOCKID must be a valid SYS-V id (i.e. CLOCK_TAI)");
+   fprintf(stderr, "\n");
+}
+
+static void explain_clockid(const char *val)
+{
+   fprintf(stderr, "taprio: illegal value for \"clockid\": \"%s\".\n", 
val);
+   fprintf(stderr, "It must be a valid SYS-V id (i.e. CLOCK_TAI)\n");
+}
+
+static int get_clockid(__s32 *val, const char *arg)
+{
+   const struct static_clockid *c;
+
+   /* Drop the CLOCK_ prefix if that is being used. */
+   if (strcasestr(arg, "CLOCK_") != NULL)
+   arg += sizeof("CLOCK_") - 1;
+
+   for (c = clockids_sysv; c->name; c++) {
+   

[PATCH iproute2 net-next v3 1/6] utils: Implement get_s64()

2018-10-05 Thread Vinicius Costa Gomes
Add this helper to read signed 64-bit integers from a string.

Signed-off-by: Vinicius Costa Gomes 
---
 include/utils.h |  1 +
 lib/utils.c | 21 +
 2 files changed, 22 insertions(+)

diff --git a/include/utils.h b/include/utils.h
index 8cb4349e..58574a05 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -139,6 +139,7 @@ int get_time_rtt(unsigned *val, const char *arg, int *raw);
 #define get_byte get_u8
 #define get_ushort get_u16
 #define get_short get_s16
+int get_s64(__s64 *val, const char *arg, int base);
 int get_u64(__u64 *val, const char *arg, int base);
 int get_u32(__u32 *val, const char *arg, int base);
 int get_s32(__s32 *val, const char *arg, int base);
diff --git a/lib/utils.c b/lib/utils.c
index e87ecf31..1b84b801 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -383,6 +383,27 @@ int get_u8(__u8 *val, const char *arg, int base)
return 0;
 }
 
+int get_s64(__s64 *val, const char *arg, int base)
+{
+   long res;
+   char *ptr;
+
+   errno = 0;
+
+   if (!arg || !*arg)
+   return -1;
+   res = strtoll(arg, , base);
+   if (!ptr || ptr == arg || *ptr)
+   return -1;
+   if ((res == LLONG_MIN || res == LLONG_MAX) && errno == ERANGE)
+   return -1;
+   if (res > INT64_MAX || res < INT64_MIN)
+   return -1;
+
+   *val = res;
+   return 0;
+}
+
 int get_s32(__s32 *val, const char *arg, int base)
 {
long res;
-- 
2.19.0



[PATCH iproute2 net-next v3 2/6] include: Add helper to retrieve a __s64 from a netlink msg

2018-10-05 Thread Vinicius Costa Gomes
This allows signed 64-bit integers to be retrieved from a netlink
message.

Signed-off-by: Vinicius Costa Gomes 
---
 include/libnetlink.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/libnetlink.h b/include/libnetlink.h
index 9d9249e6..ffc49e56 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -185,6 +185,13 @@ static inline __u64 rta_getattr_u64(const struct rtattr 
*rta)
memcpy(, RTA_DATA(rta), sizeof(__u64));
return tmp;
 }
+static inline __s64 rta_getattr_s64(const struct rtattr *rta)
+{
+   __s64 tmp;
+
+   memcpy(, RTA_DATA(rta), sizeof(tmp));
+   return tmp;
+}
 static inline const char *rta_getattr_str(const struct rtattr *rta)
 {
return (const char *)RTA_DATA(rta);
-- 
2.19.0



[PATCH iproute2 net-next v3 6/6] taprio: Add manpage for tc-taprio(8)

2018-10-05 Thread Vinicius Costa Gomes
This documents the parameters and provides an example of usage.

Signed-off-by: Vinicius Costa Gomes 
---
 man/man8/tc-taprio.8 | 142 +++
 1 file changed, 142 insertions(+)
 create mode 100644 man/man8/tc-taprio.8

diff --git a/man/man8/tc-taprio.8 b/man/man8/tc-taprio.8
new file mode 100644
index ..92055b43
--- /dev/null
+++ b/man/man8/tc-taprio.8
@@ -0,0 +1,142 @@
+.TH TAPRIO 8 "25 Sept 2018" "iproute2" "Linux"
+.SH NAME
+TAPRIO \- Time Aware Priority Shaper
+.SH SYNOPSIS
+.B tc qdisc ... dev
+dev
+.B parent
+classid
+.B [ handle
+major:
+.B ] taprio num_tc
+tcs
+.ti +8
+.B map
+P0 P1 P2 ...
+.B queues
+count1@offset1 count2@offset2 ...
+.ti +8
+.B base-time
+base-time
+.B clockid
+clockid
+.ti +8
+.B sched-entry
+  
+.ti +8
+.B sched-entry
+  
+.ti +8
+.B sched-entry
+  
+.ti +8
+.B sched-entry
+  
+
+.SH DESCRIPTION
+The TAPRIO qdisc implements a simplified version of the scheduling
+state machine defined by IEEE 802.1Q-2018 Section 8.6.9, which allows
+configuration of a sequence of gate states, where each gate state
+allows outgoing traffic for a subset (potentially empty) of traffic
+classes.
+
+How traffic is mapped to different hardware queues is similar to
+.BR mqprio(8)
+and so the
+.B map
+and
+.Q queues
+parameters have the same meaning.
+
+The other parameters specify the schedule, and at what point in time
+it should start (it can behave as the schedule started in the past).
+
+.SH PARAMETERS
+.TP
+num_tc
+.BR
+Number of traffic classes to use. Up to 16 classes supported.
+
+.TP
+map
+.br
+The priority to traffic class map. Maps priorities 0..15 to a specified
+traffic class. See
+.BR mqprio(8)
+for more details.
+
+.TP
+queues
+.br
+Provide count and offset of queue range for each traffic class. In the
+format,
+.B count@offset.
+Queue ranges for each traffic classes cannot overlap and must be a
+contiguous range of queues.
+
+.TP
+base-time
+.br
+Specifies the instant in nanoseconds, using the reference of
+.B clockid,
+defining the time when the schedule starts. If 'base-time' is a time
+in the past, the schedule will start at
+
+base-time + (N * cycle-time)
+
+where N is the smallest integer so the resulting time is greater than
+"now", and "cycle-time" is the sum of all the intervals of the entries
+in the schedule;
+
+.TP
+clockid
+.br
+Specifies the clock to be used by qdisc's internal timer for measuring
+time and scheduling events.
+
+.TP
+sched-entry
+.br
+There may multiple
+.B sched-entry
+parameters in a single schedule. Each one has the
+
+sched-entry   
+
+format. The only supported  is "S", which
+means "SetGateStates", following the IEEE 802.1Q-2018 definition
+(Table 8-7).  is a bitmask where each bit is a associated
+with a traffic class, so bit 0 (the least significant bit) being "on"
+means that traffic class 0 is "active" for that schedule entry.
+ is a time duration, in nanoseconds, that specifies for how
+long that state defined by  and  should be held
+before moving to the next entry.
+
+.SH EXAMPLES
+
+The following example shows how an traffic schedule with three traffic
+classes ("num_tc 3"), which are separated different traffic classes,
+we are going to call these TC 0, TC 1 and TC 2. We could read the
+"map" parameter below as: traffic with priority 3 is classified as TC
+0, priority 2 is classified as TC 1 and the rest is classified as TC
+2.
+
+The schedule will start at instant 1528743495910289987 using the
+reference CLOCK_TAI. The schedule is composed of three entries each of
+300us duration.
+
+.EX
+# tc qdisc replace dev eth0 parent root handle 100 taprio \\
+  num_tc 3 \\
+  map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \\
+  queues 1@0 1@1 2@2 \\
+  base-time 1528743495910289987 \\
+  sched-entry S 01 30 \\
+  sched-entry S 02 30 \\
+      sched-entry S 04 30 \\
+  clockid CLOCK_TAI
+.EE
+
+
+.SH AUTHORS
+Vinicius Costa Gomes 
-- 
2.19.0



Re: [PATCH iproute2 net-next v2 2/6] include: Add helper to retrieve a __s64 from a netlink msg

2018-10-05 Thread Vinicius Costa Gomes
Hi Ilias,

Ilias Apalodimas  writes:

> On Thu, Oct 04, 2018 at 04:17:07PM -0700, Vinicius Costa Gomes wrote:
>> This allows signed 64-bit integers to be retrieved from a netlink
>> message.
>> ---
>>  include/libnetlink.h | 7 +++
>>  1 file changed, 7 insertions(+)
>> 
>> diff --git a/include/libnetlink.h b/include/libnetlink.h
>> index 9d9249e6..88164975 100644
>> --- a/include/libnetlink.h
>> +++ b/include/libnetlink.h
>> @@ -185,6 +185,13 @@ static inline __u64 rta_getattr_u64(const struct rtattr 
>> *rta)
>>  memcpy(, RTA_DATA(rta), sizeof(__u64));
>>  return tmp;
>>  }
>> +static inline __s64 rta_getattr_s64(const struct rtattr *rta)
>> +{
>> +__s64 tmp;
>> +
>> +memcpy(, RTA_DATA(rta), sizeof(__s64));
> Maybe change it to sizeof(tmp)?

Will fix. Thanks.

>> +return tmp;
>> +}
>>  static inline const char *rta_getattr_str(const struct rtattr *rta)
>>  {
>>  return (const char *)RTA_DATA(rta);
>> -- 
>> 2.19.0
>> 


Cheers,
--
Vinicius


[PATCH iproute2 net-next v2 6/6] taprio: Add manpage for tc-taprio(8)

2018-10-04 Thread Vinicius Costa Gomes
This documents the parameters and provides an example of usage.

Signed-off-by: Vinicius Costa Gomes 
---
 man/man8/tc-taprio.8 | 142 +++
 1 file changed, 142 insertions(+)
 create mode 100644 man/man8/tc-taprio.8

diff --git a/man/man8/tc-taprio.8 b/man/man8/tc-taprio.8
new file mode 100644
index ..92055b43
--- /dev/null
+++ b/man/man8/tc-taprio.8
@@ -0,0 +1,142 @@
+.TH TAPRIO 8 "25 Sept 2018" "iproute2" "Linux"
+.SH NAME
+TAPRIO \- Time Aware Priority Shaper
+.SH SYNOPSIS
+.B tc qdisc ... dev
+dev
+.B parent
+classid
+.B [ handle
+major:
+.B ] taprio num_tc
+tcs
+.ti +8
+.B map
+P0 P1 P2 ...
+.B queues
+count1@offset1 count2@offset2 ...
+.ti +8
+.B base-time
+base-time
+.B clockid
+clockid
+.ti +8
+.B sched-entry
+  
+.ti +8
+.B sched-entry
+  
+.ti +8
+.B sched-entry
+  
+.ti +8
+.B sched-entry
+  
+
+.SH DESCRIPTION
+The TAPRIO qdisc implements a simplified version of the scheduling
+state machine defined by IEEE 802.1Q-2018 Section 8.6.9, which allows
+configuration of a sequence of gate states, where each gate state
+allows outgoing traffic for a subset (potentially empty) of traffic
+classes.
+
+How traffic is mapped to different hardware queues is similar to
+.BR mqprio(8)
+and so the
+.B map
+and
+.Q queues
+parameters have the same meaning.
+
+The other parameters specify the schedule, and at what point in time
+it should start (it can behave as the schedule started in the past).
+
+.SH PARAMETERS
+.TP
+num_tc
+.BR
+Number of traffic classes to use. Up to 16 classes supported.
+
+.TP
+map
+.br
+The priority to traffic class map. Maps priorities 0..15 to a specified
+traffic class. See
+.BR mqprio(8)
+for more details.
+
+.TP
+queues
+.br
+Provide count and offset of queue range for each traffic class. In the
+format,
+.B count@offset.
+Queue ranges for each traffic classes cannot overlap and must be a
+contiguous range of queues.
+
+.TP
+base-time
+.br
+Specifies the instant in nanoseconds, using the reference of
+.B clockid,
+defining the time when the schedule starts. If 'base-time' is a time
+in the past, the schedule will start at
+
+base-time + (N * cycle-time)
+
+where N is the smallest integer so the resulting time is greater than
+"now", and "cycle-time" is the sum of all the intervals of the entries
+in the schedule;
+
+.TP
+clockid
+.br
+Specifies the clock to be used by qdisc's internal timer for measuring
+time and scheduling events.
+
+.TP
+sched-entry
+.br
+There may multiple
+.B sched-entry
+parameters in a single schedule. Each one has the
+
+sched-entry   
+
+format. The only supported  is "S", which
+means "SetGateStates", following the IEEE 802.1Q-2018 definition
+(Table 8-7).  is a bitmask where each bit is a associated
+with a traffic class, so bit 0 (the least significant bit) being "on"
+means that traffic class 0 is "active" for that schedule entry.
+ is a time duration, in nanoseconds, that specifies for how
+long that state defined by  and  should be held
+before moving to the next entry.
+
+.SH EXAMPLES
+
+The following example shows how an traffic schedule with three traffic
+classes ("num_tc 3"), which are separated different traffic classes,
+we are going to call these TC 0, TC 1 and TC 2. We could read the
+"map" parameter below as: traffic with priority 3 is classified as TC
+0, priority 2 is classified as TC 1 and the rest is classified as TC
+2.
+
+The schedule will start at instant 1528743495910289987 using the
+reference CLOCK_TAI. The schedule is composed of three entries each of
+300us duration.
+
+.EX
+# tc qdisc replace dev eth0 parent root handle 100 taprio \\
+  num_tc 3 \\
+  map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \\
+  queues 1@0 1@1 2@2 \\
+  base-time 1528743495910289987 \\
+  sched-entry S 01 30 \\
+  sched-entry S 02 30 \\
+      sched-entry S 04 30 \\
+  clockid CLOCK_TAI
+.EE
+
+
+.SH AUTHORS
+Vinicius Costa Gomes 
-- 
2.19.0



[PATCH iproute2 net-next v2 2/6] include: Add helper to retrieve a __s64 from a netlink msg

2018-10-04 Thread Vinicius Costa Gomes
This allows signed 64-bit integers to be retrieved from a netlink
message.
---
 include/libnetlink.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/libnetlink.h b/include/libnetlink.h
index 9d9249e6..88164975 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -185,6 +185,13 @@ static inline __u64 rta_getattr_u64(const struct rtattr 
*rta)
memcpy(, RTA_DATA(rta), sizeof(__u64));
return tmp;
 }
+static inline __s64 rta_getattr_s64(const struct rtattr *rta)
+{
+   __s64 tmp;
+
+   memcpy(, RTA_DATA(rta), sizeof(__s64));
+   return tmp;
+}
 static inline const char *rta_getattr_str(const struct rtattr *rta)
 {
return (const char *)RTA_DATA(rta);
-- 
2.19.0



[PATCH iproute2 net-next v2 3/6] libnetlink: Add helper for getting a __s32 from netlink msgs

2018-10-04 Thread Vinicius Costa Gomes
From: Jesus Sanchez-Palencia 

This function retrieves a signed 32-bit integer from a netlink message
and returns it.

Signed-off-by: Jesus Sanchez-Palencia 
---
 include/libnetlink.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/libnetlink.h b/include/libnetlink.h
index 88164975..79ba793e 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -185,6 +185,10 @@ static inline __u64 rta_getattr_u64(const struct rtattr 
*rta)
memcpy(, RTA_DATA(rta), sizeof(__u64));
return tmp;
 }
+static inline __s32 rta_getattr_s32(const struct rtattr *rta)
+{
+   return *(__s32 *)RTA_DATA(rta);
+}
 static inline __s64 rta_getattr_s64(const struct rtattr *rta)
 {
__s64 tmp;
-- 
2.19.0



[PATCH iproute2 net-next v2 5/6] tc: Add support for configuring the taprio scheduler

2018-10-04 Thread Vinicius Costa Gomes
This traffic scheduler allows traffic classes states (transmission
allowed/not allowed, in the simplest case) to be scheduled, according
to a pre-generated time sequence. This is the basis of the IEEE
802.1Qbv specification.

Example configuration:

tc qdisc replace dev enp3s0 parent root handle 100 taprio \
  num_tc 3 \
  map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
  queues 1@0 1@1 2@2 \
  base-time 1528743495910289987 \
  sched-entry S 01 30 \
  sched-entry S 02 30 \
  sched-entry S 04 30 \
  clockid CLOCK_TAI

The configuration format is similar to mqprio. The main difference is
the presence of a schedule, built by multiple "sched-entry"
definitions, each entry has the following format:

 sched-entry   

The only supported  is "S", which means "SetGateStates",
following the IEEE 802.1Qbv-2015 definition (Table 8-6). 
is a bitmask where each bit is a associated with a traffic class, so
bit 0 (the least significant bit) being "on" means that traffic class
0 is "active" for that schedule entry.  is a time duration
in nanoseconds that specifies for how long that state defined by 
and  should be held before moving to the next entry.

This schedule is circular, that is, after the last entry is executed
it starts from the first one, indefinitely.

The other parameters can be defined as follows:

 - base-time: specifies the instant when the schedule starts, if
  'base-time' is a time in the past, the schedule will start at

  base-time + (N * cycle-time)

   where N is the smallest integer so the resulting time is greater
   than "now", and "cycle-time" is the sum of all the intervals of the
   entries in the schedule;

 - clockid: specifies the reference clock to be used;

The parameters should be similar to what the IEEE 802.1Q family of
specification defines.

Signed-off-by: Vinicius Costa Gomes 
Signed-off-by: Jesus Sanchez-Palencia 
---
 tc/Makefile   |   1 +
 tc/q_taprio.c | 400 ++
 2 files changed, 401 insertions(+)
 create mode 100644 tc/q_taprio.c

diff --git a/tc/Makefile b/tc/Makefile
index 5a1a7ff9..25a28284 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -74,6 +74,7 @@ TCMODULES += e_bpf.o
 TCMODULES += f_matchall.o
 TCMODULES += q_cbs.o
 TCMODULES += q_etf.o
+TCMODULES += q_taprio.o
 
 TCSO :=
 ifeq ($(TC_CONFIG_ATM),y)
diff --git a/tc/q_taprio.c b/tc/q_taprio.c
new file mode 100644
index ..562dacb8
--- /dev/null
+++ b/tc/q_taprio.c
@@ -0,0 +1,400 @@
+/*
+ * q_taprio.c  Time Aware Priority Scheduler
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors:Vinicius Costa Gomes 
+ * Jesus Sanchez-Palencia 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+#include "tc_util.h"
+#include "list.h"
+
+struct sched_entry {
+   struct list_head list;
+   uint32_t index;
+   uint32_t interval;
+   uint32_t gatemask;
+   uint8_t cmd;
+};
+
+#define CLOCKID_INVALID (-1)
+static const struct static_clockid {
+   const char *name;
+   clockid_t clockid;
+} clockids_sysv[] = {
+   { "REALTIME", CLOCK_REALTIME },
+   { "TAI", CLOCK_TAI },
+   { "BOOTTIME", CLOCK_BOOTTIME },
+   { "MONOTONIC", CLOCK_MONOTONIC },
+   { NULL }
+};
+
+static void explain(void)
+{
+   fprintf(stderr, "Usage: ... taprio clockid CLOCKID\n");
+   fprintf(stderr, "  [num_tc NUMBER] [map P0 P1 ...] ");
+   fprintf(stderr, "  [queues COUNT@OFFSET COUNT@OFFSET 
COUNT@OFFSET ...] ");
+   fprintf(stderr, "  [ [sched-entry index cmd gate-mask 
interval] ... ] ");
+   fprintf(stderr, "  [base-time time] ");
+   fprintf(stderr, "\nCLOCKID must be a valid SYS-V id (i.e. CLOCK_TAI)");
+   fprintf(stderr, "\n");
+}
+
+static void explain_clockid(const char *val)
+{
+   fprintf(stderr, "taprio: illegal value for \"clockid\": \"%s\".\n", 
val);
+   fprintf(stderr, "It must be a valid SYS-V id (i.e. CLOCK_TAI)\n");
+}
+
+static int get_clockid(__s32 *val, const char *arg)
+{
+   const struct static_clockid *c;
+
+   /* Drop the CLOCK_ prefix if that is being used. */
+   if (strcasestr(arg, "CLOCK_") != NULL)
+   arg += sizeof("CLOCK_") - 1;
+
+   for (c = clockids_sysv; c->name; c++) {
+   

[PATCH iproute2 net-next v2 1/6] utils: Implement get_s64()

2018-10-04 Thread Vinicius Costa Gomes
Add this helper to read signed 64-bit integers from a string.

Signed-off-by: Vinicius Costa Gomes 
---
 include/utils.h |  1 +
 lib/utils.c | 21 +
 2 files changed, 22 insertions(+)

diff --git a/include/utils.h b/include/utils.h
index 8cb4349e..58574a05 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -139,6 +139,7 @@ int get_time_rtt(unsigned *val, const char *arg, int *raw);
 #define get_byte get_u8
 #define get_ushort get_u16
 #define get_short get_s16
+int get_s64(__s64 *val, const char *arg, int base);
 int get_u64(__u64 *val, const char *arg, int base);
 int get_u32(__u32 *val, const char *arg, int base);
 int get_s32(__s32 *val, const char *arg, int base);
diff --git a/lib/utils.c b/lib/utils.c
index e87ecf31..1b84b801 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -383,6 +383,27 @@ int get_u8(__u8 *val, const char *arg, int base)
return 0;
 }
 
+int get_s64(__s64 *val, const char *arg, int base)
+{
+   long res;
+   char *ptr;
+
+   errno = 0;
+
+   if (!arg || !*arg)
+   return -1;
+   res = strtoll(arg, , base);
+   if (!ptr || ptr == arg || *ptr)
+   return -1;
+   if ((res == LLONG_MIN || res == LLONG_MAX) && errno == ERANGE)
+   return -1;
+   if (res > INT64_MAX || res < INT64_MIN)
+   return -1;
+
+   *val = res;
+   return 0;
+}
+
 int get_s32(__s32 *val, const char *arg, int base)
 {
long res;
-- 
2.19.0



[PATCH iproute2 net-next v2 4/6] include: add definitions for taprio [DO NOT COMMIT]

2018-10-04 Thread Vinicius Costa Gomes
DO NOT COMMIT

This patch exists only to ease the testing, until this header is
updated with the definitions from the kernel.

Signed-off-by: Vinicius Costa Gomes 
---
 include/uapi/linux/pkt_sched.h | 52 --
 1 file changed, 49 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 8975fd1a..89ee47c2 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -395,9 +395,9 @@ enum {
 struct tc_htb_xstats {
__u32 lends;
__u32 borrows;
-   __u32 giants;   /* too big packets (rate will not be accurate) */
-   __u32 tokens;
-   __u32 ctokens;
+   __u32 giants;   /* unused since 'Make HTB scheduler work with TSO.' */
+   __s32 tokens;
+   __s32 ctokens;
 };
 
 /* HFSC section */
@@ -1084,4 +1084,50 @@ enum {
CAKE_ATM_MAX
 };
 
+
+/* TAPRIO */
+enum {
+   TC_TAPRIO_CMD_SET_GATES = 0x00,
+   TC_TAPRIO_CMD_SET_AND_HOLD = 0x01,
+   TC_TAPRIO_CMD_SET_AND_RELEASE = 0x02,
+};
+
+enum {
+   TCA_TAPRIO_SCHED_ENTRY_UNSPEC,
+   TCA_TAPRIO_SCHED_ENTRY_INDEX, /* u32 */
+   TCA_TAPRIO_SCHED_ENTRY_CMD, /* u8 */
+   TCA_TAPRIO_SCHED_ENTRY_GATE_MASK, /* u32 */
+   TCA_TAPRIO_SCHED_ENTRY_INTERVAL, /* u32 */
+   __TCA_TAPRIO_SCHED_ENTRY_MAX,
+};
+#define TCA_TAPRIO_SCHED_ENTRY_MAX (__TCA_TAPRIO_SCHED_ENTRY_MAX - 1)
+
+/* The format for schedule entry list is:
+ * [TCA_TAPRIO_SCHED_ENTRY_LIST]
+ *   [TCA_TAPRIO_SCHED_ENTRY]
+ * [TCA_TAPRIO_SCHED_ENTRY_CMD]
+ * [TCA_TAPRIO_SCHED_ENTRY_GATES]
+ * [TCA_TAPRIO_SCHED_ENTRY_INTERVAL]
+ */
+enum {
+   TCA_TAPRIO_SCHED_UNSPEC,
+   TCA_TAPRIO_SCHED_ENTRY,
+   __TCA_TAPRIO_SCHED_MAX,
+};
+
+#define TCA_TAPRIO_SCHED_MAX (__TCA_TAPRIO_SCHED_MAX - 1)
+
+enum {
+   TCA_TAPRIO_ATTR_UNSPEC,
+   TCA_TAPRIO_ATTR_PRIOMAP, /* struct tc_mqprio_qopt */
+   TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST, /* nested of entry */
+   TCA_TAPRIO_ATTR_SCHED_BASE_TIME, /* s64 */
+   TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY, /* single entry */
+   TCA_TAPRIO_ATTR_SCHED_CLOCKID, /* s32 */
+   TCA_TAPRIO_PAD,
+   __TCA_TAPRIO_ATTR_MAX,
+};
+
+#define TCA_TAPRIO_ATTR_MAX (__TCA_TAPRIO_ATTR_MAX - 1)
+
 #endif
-- 
2.19.0



[PATCH iproute2 net-next v2 0/6] Introduce the taprio scheduler

2018-10-04 Thread Vinicius Costa Gomes
Hi,

Changes from v1:
  - Remove references to the "H" (Set-Gates-And-Hold-MAC) and "R"
(Set-Gates-And-Release-MAC) commands, as these commands will only
be used when Frame Preemption support is added (David Ahern);
  - Moved the functions that print and read commands to be closer (David
Ahern);

Changes from RFC:
  - Removed support for the sched-file parameter, in favor of
supporting the batch mode feature;

This is the iproute2 side of the taprio v1 series.

Please see the kernel side cover letter for more information about how
to test this.

Cheers,
--
Vinicius

Jesus Sanchez-Palencia (1):
  libnetlink: Add helper for getting a __s32 from netlink msgs

Vinicius Costa Gomes (5):
  utils: Implement get_s64()
  include: Add helper to retrieve a __s64 from a netlink msg
  include: add definitions for taprio [DO NOT COMMIT]
  tc: Add support for configuring the taprio scheduler
  taprio: Add manpage for tc-taprio(8)

 include/libnetlink.h   |  11 +
 include/uapi/linux/pkt_sched.h |  52 -
 include/utils.h|   1 +
 lib/utils.c|  21 ++
 man/man8/tc-taprio.8   | 142 
 tc/Makefile|   1 +
 tc/q_taprio.c  | 400 +
 7 files changed, 625 insertions(+), 3 deletions(-)
 create mode 100644 man/man8/tc-taprio.8
 create mode 100644 tc/q_taprio.c

-- 
2.19.0



Re: [PATCH iproute2 net-next v1 5/6] tc: Add support for configuring the taprio scheduler

2018-10-03 Thread Vinicius Costa Gomes
Hi David,

David Ahern  writes:

> On 9/28/18 7:10 PM, Vinicius Costa Gomes wrote:
>> This traffic scheduler allows traffic classes states (transmission
>> allowed/not allowed, in the simplest case) to be scheduled, according
>> to a pre-generated time sequence. This is the basis of the IEEE
>> 802.1Qbv specification.
>> 
>> Example configuration:
>> 
>> tc qdisc replace dev enp3s0 parent root handle 100 taprio \
>>   num_tc 3 \
>>map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
>>queues 1@0 1@1 2@2 \
>>base-time 1528743495910289987 \
>>sched-entry S 01 30 \
>>sched-entry S 02 30 \
>>sched-entry S 04 30 \
>>clockid CLOCK_TAI
>> 
>> The configuration format is similar to mqprio. The main difference is
>> the presence of a schedule, built by multiple "sched-entry"
>> definitions, each entry has the following format:
>> 
>>  sched-entry   
>> 
>> The only supported  is "S", which means "SetGateStates",
>
> ...
>
>> +static int str_to_entry_cmd(const char *str)
>> +{
>> +if (strcmp(str, "S") == 0)
>> +return TC_TAPRIO_CMD_SET_GATES;
>> +
>> +if (strcmp(str, "H") == 0)
>> +return TC_TAPRIO_CMD_SET_AND_HOLD;
>> +
>> +if (strcmp(str, "R") == 0)
>> +return TC_TAPRIO_CMD_SET_AND_RELEASE;
>
> If 'S' is the only supported command, what are 'H' and 'R'?

It is the only command that works (for now): taprio is a software only
implementation now.

Just to give some background, 'H' (Set-And-Hold-MAC) and 'R'
(Set-And-Release-MAC) will be used when Frame Preemption is added, 'H'
means that any preemtible frame should have finished being transmitted
before this "entry" starts and frame preemption is disabled, 'R' means
preemptible frames may be transmitted during this entry's interval and
frame preemption is re-enabled.

Will remove these references for now. 

>
>> +
>> +return -1;
>> +}
>> +
>
>> +
>> +static const char *command_to_str(__u8 cmd)
>> +{
>> +switch (cmd) {
>> +case TC_TAPRIO_CMD_SET_GATES:
>> +return "S";
>> +case TC_TAPRIO_CMD_SET_AND_HOLD:
>> +return "H";
>> +case TC_TAPRIO_CMD_SET_AND_RELEASE:
>> +return "R";
>> +default:
>> +return "Invalid";
>> +}
>> +}
>
> And can you keep str-to-command and command-to-str helpers close
> together in the code.

Sure. WIll fix it for v2.


Cheers,
--
Vinicius


Re: [PATCH net-next v1 0/1] net/sched: Introduce the taprio scheduler

2018-10-01 Thread Vinicius Costa Gomes
Hi,

Just a small correction, one link on the cover letter is wrong.

Vinicius Costa Gomes  writes:

[...]

>
>
> [1] https://patchwork.ozlabs.org/cover/938991/
>
> [2] https://patchwork.ozlabs.org/cover/808504/
>
> [3] github doesn't make it clear, but the gist can be cloned like this:
> $ git clone https://gist.github.com/jeez/bd3afeff081ba64a695008dd8215866f 
> taprio-test
>
> [4] https://github.com/vcgomes/linux/tree/taprio-v1

The correct link is:

[4] https://github.com/vcgomes/net-next

>
> [5] https://github.com/vcgomes/iproute2/tree/taprio-v1
>
>
> Vinicius Costa Gomes (1):
>   tc: Add support for configuring the taprio scheduler
>
>  include/uapi/linux/pkt_sched.h |  46 ++
>  net/sched/Kconfig  |  11 +
>  net/sched/Makefile |   1 +
>  net/sched/sch_taprio.c | 962 +
>  4 files changed, 1020 insertions(+)
>  create mode 100644 net/sched/sch_taprio.c
>
> -- 
> 2.19.0


Cheers,
--
Vinicius


[PATCH iproute2 net-next v1 5/6] tc: Add support for configuring the taprio scheduler

2018-09-28 Thread Vinicius Costa Gomes
This traffic scheduler allows traffic classes states (transmission
allowed/not allowed, in the simplest case) to be scheduled, according
to a pre-generated time sequence. This is the basis of the IEEE
802.1Qbv specification.

Example configuration:

tc qdisc replace dev enp3s0 parent root handle 100 taprio \
  num_tc 3 \
  map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
  queues 1@0 1@1 2@2 \
  base-time 1528743495910289987 \
  sched-entry S 01 30 \
  sched-entry S 02 30 \
  sched-entry S 04 30 \
  clockid CLOCK_TAI

The configuration format is similar to mqprio. The main difference is
the presence of a schedule, built by multiple "sched-entry"
definitions, each entry has the following format:

 sched-entry   

The only supported  is "S", which means "SetGateStates",
following the IEEE 802.1Qbv-2015 definition (Table 8-6). 
is a bitmask where each bit is a associated with a traffic class, so
bit 0 (the least significant bit) being "on" means that traffic class
0 is "active" for that schedule entry.  is a time duration
in nanoseconds that specifies for how long that state defined by 
and  should be held before moving to the next entry.

This schedule is circular, that is, after the last entry is executed
it starts from the first one, indefinitely.

The other parameters can be defined as follows:

 - base-time: specifies the instant when the schedule starts, if
  'base-time' is a time in the past, the schedule will start at

  base-time + (N * cycle-time)

   where N is the smallest integer so the resulting time is greater
   than "now", and "cycle-time" is the sum of all the intervals of the
   entries in the schedule;

 - clockid: specifies the reference clock to be used;

The parameters should be similar to what the IEEE 802.1Q family of
specification defines.

Signed-off-by: Vinicius Costa Gomes 
Signed-off-by: Jesus Sanchez-Palencia 
---
 tc/Makefile   |   1 +
 tc/q_taprio.c | 410 ++
 2 files changed, 411 insertions(+)
 create mode 100644 tc/q_taprio.c

diff --git a/tc/Makefile b/tc/Makefile
index 5a1a7ff9..25a28284 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -74,6 +74,7 @@ TCMODULES += e_bpf.o
 TCMODULES += f_matchall.o
 TCMODULES += q_cbs.o
 TCMODULES += q_etf.o
+TCMODULES += q_taprio.o
 
 TCSO :=
 ifeq ($(TC_CONFIG_ATM),y)
diff --git a/tc/q_taprio.c b/tc/q_taprio.c
new file mode 100644
index ..645a7613
--- /dev/null
+++ b/tc/q_taprio.c
@@ -0,0 +1,410 @@
+/*
+ * q_taprio.c  Time Aware Priority Scheduler
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors:Vinicius Costa Gomes 
+ * Jesus Sanchez-Palencia 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+#include "tc_util.h"
+#include "list.h"
+
+struct sched_entry {
+   struct list_head list;
+   uint32_t index;
+   uint32_t interval;
+   uint32_t gatemask;
+   uint8_t cmd;
+};
+
+#define CLOCKID_INVALID (-1)
+static const struct static_clockid {
+   const char *name;
+   clockid_t clockid;
+} clockids_sysv[] = {
+   { "REALTIME", CLOCK_REALTIME },
+   { "TAI", CLOCK_TAI },
+   { "BOOTTIME", CLOCK_BOOTTIME },
+   { "MONOTONIC", CLOCK_MONOTONIC },
+   { NULL }
+};
+
+static void explain(void)
+{
+   fprintf(stderr, "Usage: ... taprio clockid CLOCKID\n");
+   fprintf(stderr, "  [num_tc NUMBER] [map P0 P1 ...] ");
+   fprintf(stderr, "  [queues COUNT@OFFSET COUNT@OFFSET 
COUNT@OFFSET ...] ");
+   fprintf(stderr, "  [ [sched-entry index cmd gate-mask 
interval] ... ] ");
+   fprintf(stderr, "  [base-time time] ");
+   fprintf(stderr, "\nCLOCKID must be a valid SYS-V id (i.e. CLOCK_TAI)");
+   fprintf(stderr, "\n");
+}
+
+static void explain_clockid(const char *val)
+{
+   fprintf(stderr, "taprio: illegal value for \"clockid\": \"%s\".\n", 
val);
+   fprintf(stderr, "It must be a valid SYS-V id (i.e. CLOCK_TAI)\n");
+}
+
+static int get_clockid(__s32 *val, const char *arg)
+{
+   const struct static_clockid *c;
+
+   /* Drop the CLOCK_ prefix if that is being used. */
+   if (strcasestr(arg, "CLOCK_") != NULL)
+   arg += sizeof("CLOCK_") - 1;
+
+   for (c = clockids_sysv; c->name; c++) {
+   

[PATCH iproute2 net-next v1 1/6] utils: Implement get_s64()

2018-09-28 Thread Vinicius Costa Gomes
Add this helper to read signed 64-bit integers from a string.

Signed-off-by: Vinicius Costa Gomes 
---
 include/utils.h |  1 +
 lib/utils.c | 21 +
 2 files changed, 22 insertions(+)

diff --git a/include/utils.h b/include/utils.h
index 8cb4349e..58574a05 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -139,6 +139,7 @@ int get_time_rtt(unsigned *val, const char *arg, int *raw);
 #define get_byte get_u8
 #define get_ushort get_u16
 #define get_short get_s16
+int get_s64(__s64 *val, const char *arg, int base);
 int get_u64(__u64 *val, const char *arg, int base);
 int get_u32(__u32 *val, const char *arg, int base);
 int get_s32(__s32 *val, const char *arg, int base);
diff --git a/lib/utils.c b/lib/utils.c
index e87ecf31..1b84b801 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -383,6 +383,27 @@ int get_u8(__u8 *val, const char *arg, int base)
return 0;
 }
 
+int get_s64(__s64 *val, const char *arg, int base)
+{
+   long res;
+   char *ptr;
+
+   errno = 0;
+
+   if (!arg || !*arg)
+   return -1;
+   res = strtoll(arg, , base);
+   if (!ptr || ptr == arg || *ptr)
+   return -1;
+   if ((res == LLONG_MIN || res == LLONG_MAX) && errno == ERANGE)
+   return -1;
+   if (res > INT64_MAX || res < INT64_MIN)
+   return -1;
+
+   *val = res;
+   return 0;
+}
+
 int get_s32(__s32 *val, const char *arg, int base)
 {
long res;
-- 
2.19.0



[PATCH iproute2 net-next v1 2/6] include: Add helper to retrieve a __s64 from a netlink msg

2018-09-28 Thread Vinicius Costa Gomes
This allows signed 64-bit integers to be retrieved from a netlink
message.
---
 include/libnetlink.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/libnetlink.h b/include/libnetlink.h
index 9d9249e6..88164975 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -185,6 +185,13 @@ static inline __u64 rta_getattr_u64(const struct rtattr 
*rta)
memcpy(, RTA_DATA(rta), sizeof(__u64));
return tmp;
 }
+static inline __s64 rta_getattr_s64(const struct rtattr *rta)
+{
+   __s64 tmp;
+
+   memcpy(, RTA_DATA(rta), sizeof(__s64));
+   return tmp;
+}
 static inline const char *rta_getattr_str(const struct rtattr *rta)
 {
return (const char *)RTA_DATA(rta);
-- 
2.19.0



[PATCH iproute2 net-next v1 0/6] introduce the taprio scheduler

2018-09-28 Thread Vinicius Costa Gomes
Hi,

Changes from RFC:
  - Removed support for the sched-file parameter, in favor of
supporting the batch mode feature;

This is the iproute2 side of the taprio v1 series.

Please see the kernel side cover letter for more information about how
to test this.

Cheers,
--
Vinicius


Jesus Sanchez-Palencia (1):
  libnetlink: Add helper for getting a __s32 from netlink msgs

Vinicius Costa Gomes (5):
  utils: Implement get_s64()
  include: Add helper to retrieve a __s64 from a netlink msg
  include: add definitions for taprio [DO NOT COMMIT]
  tc: Add support for configuring the taprio scheduler
  taprio: Add manpage for tc-taprio(8)

 include/libnetlink.h   |  11 +
 include/uapi/linux/pkt_sched.h |  52 -
 include/utils.h|   1 +
 lib/utils.c|  21 ++
 man/man8/tc-taprio.8   | 142 
 tc/Makefile|   1 +
 tc/q_taprio.c  | 410 +
 7 files changed, 635 insertions(+), 3 deletions(-)
 create mode 100644 man/man8/tc-taprio.8
 create mode 100644 tc/q_taprio.c

-- 
2.19.0



[PATCH iproute2 net-next v1 3/6] libnetlink: Add helper for getting a __s32 from netlink msgs

2018-09-28 Thread Vinicius Costa Gomes
From: Jesus Sanchez-Palencia 

This function retrieves a signed 32-bit integer from a netlink message
and returns it.

Signed-off-by: Jesus Sanchez-Palencia 
---
 include/libnetlink.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/libnetlink.h b/include/libnetlink.h
index 88164975..79ba793e 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -185,6 +185,10 @@ static inline __u64 rta_getattr_u64(const struct rtattr 
*rta)
memcpy(, RTA_DATA(rta), sizeof(__u64));
return tmp;
 }
+static inline __s32 rta_getattr_s32(const struct rtattr *rta)
+{
+   return *(__s32 *)RTA_DATA(rta);
+}
 static inline __s64 rta_getattr_s64(const struct rtattr *rta)
 {
__s64 tmp;
-- 
2.19.0



[PATCH iproute2 net-next v1 6/6] taprio: Add manpage for tc-taprio(8)

2018-09-28 Thread Vinicius Costa Gomes
This documents the parameters and provides an example of usage.

Signed-off-by: Vinicius Costa Gomes 
---
 man/man8/tc-taprio.8 | 142 +++
 1 file changed, 142 insertions(+)
 create mode 100644 man/man8/tc-taprio.8

diff --git a/man/man8/tc-taprio.8 b/man/man8/tc-taprio.8
new file mode 100644
index ..92055b43
--- /dev/null
+++ b/man/man8/tc-taprio.8
@@ -0,0 +1,142 @@
+.TH TAPRIO 8 "25 Sept 2018" "iproute2" "Linux"
+.SH NAME
+TAPRIO \- Time Aware Priority Shaper
+.SH SYNOPSIS
+.B tc qdisc ... dev
+dev
+.B parent
+classid
+.B [ handle
+major:
+.B ] taprio num_tc
+tcs
+.ti +8
+.B map
+P0 P1 P2 ...
+.B queues
+count1@offset1 count2@offset2 ...
+.ti +8
+.B base-time
+base-time
+.B clockid
+clockid
+.ti +8
+.B sched-entry
+  
+.ti +8
+.B sched-entry
+  
+.ti +8
+.B sched-entry
+  
+.ti +8
+.B sched-entry
+  
+
+.SH DESCRIPTION
+The TAPRIO qdisc implements a simplified version of the scheduling
+state machine defined by IEEE 802.1Q-2018 Section 8.6.9, which allows
+configuration of a sequence of gate states, where each gate state
+allows outgoing traffic for a subset (potentially empty) of traffic
+classes.
+
+How traffic is mapped to different hardware queues is similar to
+.BR mqprio(8)
+and so the
+.B map
+and
+.Q queues
+parameters have the same meaning.
+
+The other parameters specify the schedule, and at what point in time
+it should start (it can behave as the schedule started in the past).
+
+.SH PARAMETERS
+.TP
+num_tc
+.BR
+Number of traffic classes to use. Up to 16 classes supported.
+
+.TP
+map
+.br
+The priority to traffic class map. Maps priorities 0..15 to a specified
+traffic class. See
+.BR mqprio(8)
+for more details.
+
+.TP
+queues
+.br
+Provide count and offset of queue range for each traffic class. In the
+format,
+.B count@offset.
+Queue ranges for each traffic classes cannot overlap and must be a
+contiguous range of queues.
+
+.TP
+base-time
+.br
+Specifies the instant in nanoseconds, using the reference of
+.B clockid,
+defining the time when the schedule starts. If 'base-time' is a time
+in the past, the schedule will start at
+
+base-time + (N * cycle-time)
+
+where N is the smallest integer so the resulting time is greater than
+"now", and "cycle-time" is the sum of all the intervals of the entries
+in the schedule;
+
+.TP
+clockid
+.br
+Specifies the clock to be used by qdisc's internal timer for measuring
+time and scheduling events.
+
+.TP
+sched-entry
+.br
+There may multiple
+.B sched-entry
+parameters in a single schedule. Each one has the
+
+sched-entry   
+
+format. The only supported  is "S", which
+means "SetGateStates", following the IEEE 802.1Q-2018 definition
+(Table 8-7).  is a bitmask where each bit is a associated
+with a traffic class, so bit 0 (the least significant bit) being "on"
+means that traffic class 0 is "active" for that schedule entry.
+ is a time duration, in nanoseconds, that specifies for how
+long that state defined by  and  should be held
+before moving to the next entry.
+
+.SH EXAMPLES
+
+The following example shows how an traffic schedule with three traffic
+classes ("num_tc 3"), which are separated different traffic classes,
+we are going to call these TC 0, TC 1 and TC 2. We could read the
+"map" parameter below as: traffic with priority 3 is classified as TC
+0, priority 2 is classified as TC 1 and the rest is classified as TC
+2.
+
+The schedule will start at instant 1528743495910289987 using the
+reference CLOCK_TAI. The schedule is composed of three entries each of
+300us duration.
+
+.EX
+# tc qdisc replace dev eth0 parent root handle 100 taprio \\
+  num_tc 3 \\
+  map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \\
+  queues 1@0 1@1 2@2 \\
+  base-time 1528743495910289987 \\
+  sched-entry S 01 30 \\
+  sched-entry S 02 30 \\
+      sched-entry S 04 30 \\
+  clockid CLOCK_TAI
+.EE
+
+
+.SH AUTHORS
+Vinicius Costa Gomes 
-- 
2.19.0



[PATCH iproute2 net-next v1 4/6] include: add definitions for taprio [DO NOT COMMIT]

2018-09-28 Thread Vinicius Costa Gomes
DO NOT COMMIT

This patch exists only to ease the testing, until this header is
updated with the definitions from the kernel.

Signed-off-by: Vinicius Costa Gomes 
---
 include/uapi/linux/pkt_sched.h | 52 --
 1 file changed, 49 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 8975fd1a..89ee47c2 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -395,9 +395,9 @@ enum {
 struct tc_htb_xstats {
__u32 lends;
__u32 borrows;
-   __u32 giants;   /* too big packets (rate will not be accurate) */
-   __u32 tokens;
-   __u32 ctokens;
+   __u32 giants;   /* unused since 'Make HTB scheduler work with TSO.' */
+   __s32 tokens;
+   __s32 ctokens;
 };
 
 /* HFSC section */
@@ -1084,4 +1084,50 @@ enum {
CAKE_ATM_MAX
 };
 
+
+/* TAPRIO */
+enum {
+   TC_TAPRIO_CMD_SET_GATES = 0x00,
+   TC_TAPRIO_CMD_SET_AND_HOLD = 0x01,
+   TC_TAPRIO_CMD_SET_AND_RELEASE = 0x02,
+};
+
+enum {
+   TCA_TAPRIO_SCHED_ENTRY_UNSPEC,
+   TCA_TAPRIO_SCHED_ENTRY_INDEX, /* u32 */
+   TCA_TAPRIO_SCHED_ENTRY_CMD, /* u8 */
+   TCA_TAPRIO_SCHED_ENTRY_GATE_MASK, /* u32 */
+   TCA_TAPRIO_SCHED_ENTRY_INTERVAL, /* u32 */
+   __TCA_TAPRIO_SCHED_ENTRY_MAX,
+};
+#define TCA_TAPRIO_SCHED_ENTRY_MAX (__TCA_TAPRIO_SCHED_ENTRY_MAX - 1)
+
+/* The format for schedule entry list is:
+ * [TCA_TAPRIO_SCHED_ENTRY_LIST]
+ *   [TCA_TAPRIO_SCHED_ENTRY]
+ * [TCA_TAPRIO_SCHED_ENTRY_CMD]
+ * [TCA_TAPRIO_SCHED_ENTRY_GATES]
+ * [TCA_TAPRIO_SCHED_ENTRY_INTERVAL]
+ */
+enum {
+   TCA_TAPRIO_SCHED_UNSPEC,
+   TCA_TAPRIO_SCHED_ENTRY,
+   __TCA_TAPRIO_SCHED_MAX,
+};
+
+#define TCA_TAPRIO_SCHED_MAX (__TCA_TAPRIO_SCHED_MAX - 1)
+
+enum {
+   TCA_TAPRIO_ATTR_UNSPEC,
+   TCA_TAPRIO_ATTR_PRIOMAP, /* struct tc_mqprio_qopt */
+   TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST, /* nested of entry */
+   TCA_TAPRIO_ATTR_SCHED_BASE_TIME, /* s64 */
+   TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY, /* single entry */
+   TCA_TAPRIO_ATTR_SCHED_CLOCKID, /* s32 */
+   TCA_TAPRIO_PAD,
+   __TCA_TAPRIO_ATTR_MAX,
+};
+
+#define TCA_TAPRIO_ATTR_MAX (__TCA_TAPRIO_ATTR_MAX - 1)
+
 #endif
-- 
2.19.0



[PATCH net-next v1 1/1] tc: Add support for configuring the taprio scheduler

2018-09-28 Thread Vinicius Costa Gomes
This traffic scheduler allows traffic classes states (transmission
allowed/not allowed, in the simplest case) to be scheduled, according
to a pre-generated time sequence. This is the basis of the IEEE
802.1Qbv specification.

Example configuration:

tc qdisc replace dev enp3s0 parent root handle 100 taprio \
  num_tc 3 \
  map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
  queues 1@0 1@1 2@2 \
  base-time 1528743495910289987 \
  sched-entry S 01 30 \
  sched-entry S 02 30 \
  sched-entry S 04 30 \
  clockid CLOCK_TAI

The configuration format is similar to mqprio. The main difference is
the presence of a schedule, built by multiple "sched-entry"
definitions, each entry has the following format:

 sched-entry   

The only supported  is "S", which means "SetGateStates",
following the IEEE 802.1Qbv-2015 definition (Table 8-6). 
is a bitmask where each bit is a associated with a traffic class, so
bit 0 (the least significant bit) being "on" means that traffic class
0 is "active" for that schedule entry.  is a time duration
in nanoseconds that specifies for how long that state defined by 
and  should be held before moving to the next entry.

This schedule is circular, that is, after the last entry is executed
it starts from the first one, indefinitely.

The other parameters can be defined as follows:

 - base-time: specifies the instant when the schedule starts, if
  'base-time' is a time in the past, the schedule will start at

  base-time + (N * cycle-time)

   where N is the smallest integer so the resulting time is greater
   than "now", and "cycle-time" is the sum of all the intervals of the
   entries in the schedule;

 - clockid: specifies the reference clock to be used;

The parameters should be similar to what the IEEE 802.1Q family of
specification defines.

Signed-off-by: Vinicius Costa Gomes 
---
 include/uapi/linux/pkt_sched.h |  46 ++
 net/sched/Kconfig  |  11 +
 net/sched/Makefile |   1 +
 net/sched/sch_taprio.c | 962 +
 4 files changed, 1020 insertions(+)
 create mode 100644 net/sched/sch_taprio.c

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index e9b7244ac381..89ee47c2f17d 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -1084,4 +1084,50 @@ enum {
CAKE_ATM_MAX
 };
 
+
+/* TAPRIO */
+enum {
+   TC_TAPRIO_CMD_SET_GATES = 0x00,
+   TC_TAPRIO_CMD_SET_AND_HOLD = 0x01,
+   TC_TAPRIO_CMD_SET_AND_RELEASE = 0x02,
+};
+
+enum {
+   TCA_TAPRIO_SCHED_ENTRY_UNSPEC,
+   TCA_TAPRIO_SCHED_ENTRY_INDEX, /* u32 */
+   TCA_TAPRIO_SCHED_ENTRY_CMD, /* u8 */
+   TCA_TAPRIO_SCHED_ENTRY_GATE_MASK, /* u32 */
+   TCA_TAPRIO_SCHED_ENTRY_INTERVAL, /* u32 */
+   __TCA_TAPRIO_SCHED_ENTRY_MAX,
+};
+#define TCA_TAPRIO_SCHED_ENTRY_MAX (__TCA_TAPRIO_SCHED_ENTRY_MAX - 1)
+
+/* The format for schedule entry list is:
+ * [TCA_TAPRIO_SCHED_ENTRY_LIST]
+ *   [TCA_TAPRIO_SCHED_ENTRY]
+ * [TCA_TAPRIO_SCHED_ENTRY_CMD]
+ * [TCA_TAPRIO_SCHED_ENTRY_GATES]
+ * [TCA_TAPRIO_SCHED_ENTRY_INTERVAL]
+ */
+enum {
+   TCA_TAPRIO_SCHED_UNSPEC,
+   TCA_TAPRIO_SCHED_ENTRY,
+   __TCA_TAPRIO_SCHED_MAX,
+};
+
+#define TCA_TAPRIO_SCHED_MAX (__TCA_TAPRIO_SCHED_MAX - 1)
+
+enum {
+   TCA_TAPRIO_ATTR_UNSPEC,
+   TCA_TAPRIO_ATTR_PRIOMAP, /* struct tc_mqprio_qopt */
+   TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST, /* nested of entry */
+   TCA_TAPRIO_ATTR_SCHED_BASE_TIME, /* s64 */
+   TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY, /* single entry */
+   TCA_TAPRIO_ATTR_SCHED_CLOCKID, /* s32 */
+   TCA_TAPRIO_PAD,
+   __TCA_TAPRIO_ATTR_MAX,
+};
+
+#define TCA_TAPRIO_ATTR_MAX (__TCA_TAPRIO_ATTR_MAX - 1)
+
 #endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index e95741388311..1b9afdee5ba9 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -194,6 +194,17 @@ config NET_SCH_ETF
  To compile this code as a module, choose M here: the
  module will be called sch_etf.
 
+config NET_SCH_TAPRIO
+   tristate "Time Aware Priority (taprio) Scheduler"
+   help
+ Say Y here if you want to use the Time Aware Priority (taprio) packet
+ scheduling algorithm.
+
+ See the top of  for more details.
+
+ To compile this code as a module, choose M here: the
+ module will be called sch_taprio.
+
 config NET_SCH_GRED
tristate "Generic Random Early Detection (GRED)"
---help---
diff --git a/net/sched/Makefile b/net/sched/Makefile
index f0403f49edcb..8a40431d7b5c 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -57,6 +57,7 @@ obj-$(CONFIG_NET_SCH_HHF) += sch_hhf.o
 obj-$(CONFIG_NET_SCH_PIE)  += sch_pie.o
 obj-$(CONFIG_NET_SCH_CBS)  += sch_cbs.o
 obj-$(CONFIG_NET_SCH_ETF)  += sch_etf.o
+obj-$(CO

[PATCH net-next v1 0/1] net/sched: Introduce the taprio scheduler

2018-09-28 Thread Vinicius Costa Gomes
 still
leave the controller outside their "correct" windows. This happens
mostly for low-priority classes, and only if they are 'starved' by
the higher priority ones;


This series is also hosted on github and can be found at [4].
The companion iproute2 patches can be found at [5].


Cheers,
--
Vinicius


[1] https://patchwork.ozlabs.org/cover/938991/

[2] https://patchwork.ozlabs.org/cover/808504/

[3] github doesn't make it clear, but the gist can be cloned like this:
$ git clone https://gist.github.com/jeez/bd3afeff081ba64a695008dd8215866f 
taprio-test

[4] https://github.com/vcgomes/linux/tree/taprio-v1

[5] https://github.com/vcgomes/iproute2/tree/taprio-v1


Vinicius Costa Gomes (1):
  tc: Add support for configuring the taprio scheduler

 include/uapi/linux/pkt_sched.h |  46 ++
 net/sched/Kconfig  |  11 +
 net/sched/Makefile |   1 +
 net/sched/sch_taprio.c | 962 +
 4 files changed, 1020 insertions(+)
 create mode 100644 net/sched/sch_taprio.c

-- 
2.19.0



Re: [RFC iproute2-next v1 5/5] tc: Add support for configuring the taprio scheduler

2018-07-24 Thread Vinicius Costa Gomes
Hi,

Vinicius Costa Gomes  writes:

> Hi,
>
> Stephen Hemminger  writes:
>

[...]

>>
>> Why not just use batch mode? Introducing another input mode in tc that is
>> only in one qdisc seems like a bad idea.
>
> Seems that I have missed batch mode. I am going to play with it a little
> and see how things would look.
>

I finally had the chance to play with this, and indeed batch mode is
quite nice. Will replace the custom file input mode on the next
iteration. Thanks.


Cheers,
--
Vinicius


[PATCH net-next] cbs: Add support for the graft function

2018-07-23 Thread Vinicius Costa Gomes
This will allow to install a child qdisc under cbs. The main use case
is to install ETF (Earliest TxTime First) qdisc under cbs, so there's
another level of control for time-sensitive traffic.

Signed-off-by: Vinicius Costa Gomes 
---
 net/sched/sch_cbs.c | 134 +---
 1 file changed, 125 insertions(+), 9 deletions(-)

diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index cdd96b9a27bc..e26a24017faa 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -78,18 +78,42 @@ struct cbs_sched_data {
s64 sendslope; /* in bytes/s */
s64 idleslope; /* in bytes/s */
struct qdisc_watchdog watchdog;
-   int (*enqueue)(struct sk_buff *skb, struct Qdisc *sch);
+   int (*enqueue)(struct sk_buff *skb, struct Qdisc *sch,
+  struct sk_buff **to_free);
struct sk_buff *(*dequeue)(struct Qdisc *sch);
+   struct Qdisc *qdisc;
 };
 
-static int cbs_enqueue_offload(struct sk_buff *skb, struct Qdisc *sch)
+static int cbs_child_enqueue(struct sk_buff *skb, struct Qdisc *sch,
+struct Qdisc *child,
+struct sk_buff **to_free)
 {
-   return qdisc_enqueue_tail(skb, sch);
+   int err;
+
+   err = child->ops->enqueue(skb, child, to_free);
+   if (err != NET_XMIT_SUCCESS)
+   return err;
+
+   qdisc_qstats_backlog_inc(sch, skb);
+   sch->q.qlen++;
+
+   return NET_XMIT_SUCCESS;
 }
 
-static int cbs_enqueue_soft(struct sk_buff *skb, struct Qdisc *sch)
+static int cbs_enqueue_offload(struct sk_buff *skb, struct Qdisc *sch,
+  struct sk_buff **to_free)
 {
struct cbs_sched_data *q = qdisc_priv(sch);
+   struct Qdisc *qdisc = q->qdisc;
+
+   return cbs_child_enqueue(skb, sch, qdisc, to_free);
+}
+
+static int cbs_enqueue_soft(struct sk_buff *skb, struct Qdisc *sch,
+   struct sk_buff **to_free)
+{
+   struct cbs_sched_data *q = qdisc_priv(sch);
+   struct Qdisc *qdisc = q->qdisc;
 
if (sch->q.qlen == 0 && q->credits > 0) {
/* We need to stop accumulating credits when there's
@@ -99,7 +123,7 @@ static int cbs_enqueue_soft(struct sk_buff *skb, struct 
Qdisc *sch)
q->last = ktime_get_ns();
}
 
-   return qdisc_enqueue_tail(skb, sch);
+   return cbs_child_enqueue(skb, sch, qdisc, to_free);
 }
 
 static int cbs_enqueue(struct sk_buff *skb, struct Qdisc *sch,
@@ -107,7 +131,7 @@ static int cbs_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
 {
struct cbs_sched_data *q = qdisc_priv(sch);
 
-   return q->enqueue(skb, sch);
+   return q->enqueue(skb, sch, to_free);
 }
 
 /* timediff is in ns, slope is in bytes/s */
@@ -132,9 +156,25 @@ static s64 credits_from_len(unsigned int len, s64 slope, 
s64 port_rate)
return div64_s64(len * slope, port_rate);
 }
 
+static struct sk_buff *cbs_child_dequeue(struct Qdisc *sch, struct Qdisc 
*child)
+{
+   struct sk_buff *skb;
+
+   skb = child->ops->dequeue(child);
+   if (!skb)
+   return NULL;
+
+   qdisc_qstats_backlog_dec(sch, skb);
+   qdisc_bstats_update(sch, skb);
+   sch->q.qlen--;
+
+   return skb;
+}
+
 static struct sk_buff *cbs_dequeue_soft(struct Qdisc *sch)
 {
struct cbs_sched_data *q = qdisc_priv(sch);
+   struct Qdisc *qdisc = q->qdisc;
s64 now = ktime_get_ns();
struct sk_buff *skb;
s64 credits;
@@ -157,8 +197,7 @@ static struct sk_buff *cbs_dequeue_soft(struct Qdisc *sch)
return NULL;
}
}
-
-   skb = qdisc_dequeue_head(sch);
+   skb = cbs_child_dequeue(sch, qdisc);
if (!skb)
return NULL;
 
@@ -178,7 +217,10 @@ static struct sk_buff *cbs_dequeue_soft(struct Qdisc *sch)
 
 static struct sk_buff *cbs_dequeue_offload(struct Qdisc *sch)
 {
-   return qdisc_dequeue_head(sch);
+   struct cbs_sched_data *q = qdisc_priv(sch);
+   struct Qdisc *qdisc = q->qdisc;
+
+   return cbs_child_dequeue(sch, qdisc);
 }
 
 static struct sk_buff *cbs_dequeue(struct Qdisc *sch)
@@ -310,6 +352,13 @@ static int cbs_init(struct Qdisc *sch, struct nlattr *opt,
return -EINVAL;
}
 
+   q->qdisc = qdisc_create_dflt(sch->dev_queue, _qdisc_ops,
+sch->handle, extack);
+   if (!q->qdisc)
+   return -ENOMEM;
+
+   qdisc_hash_add(q->qdisc, false);
+
q->queue = sch->dev_queue - netdev_get_tx_queue(dev, 0);
 
q->enqueue = cbs_enqueue_soft;
@@ -328,6 +377,9 @@ static void cbs_destroy(struct Qdisc *sch)
qdisc_watchdog_cancel(>watchdog);
 
cbs_disable_offload(dev, q);
+
+   if (q->qdisc)
+   qdisc_destroy(q->qdisc);
 }
 
 static int cbs_dump(struct Qdisc *sch, struct sk_buff *skb)
@@ -356,8

Re: [RFC iproute2-next v1 5/5] tc: Add support for configuring the taprio scheduler

2018-07-17 Thread Vinicius Costa Gomes
Hi,

Stephen Hemminger  writes:

> On Fri, 13 Jul 2018 17:06:11 -0700
> Vinicius Costa Gomes  wrote:
>
>> +while (fscanf(f, "%ms %x %" PRIu32 "\n", _str, , 
>> ) != EOF)  {
>> +struct rtattr *entry;
>> +
>> +err = str_to_entry_cmd(cmd_str);
>> +free(cmd_str);
>> +
>> +if (err < 0)
>> +return err;
>> +
>> +cmd = err;
>> +
>> +entry = addattr_nest(n, 1024, TCA_TAPRIO_SCHED_ENTRY);
>> +
>> +addattr_l(n, 1024, TCA_TAPRIO_SCHED_ENTRY_INDEX, , 
>> sizeof(index));
>> +addattr_l(n, 1024, TCA_TAPRIO_SCHED_ENTRY_CMD, , 
>> sizeof(cmd));
>> +addattr_l(n, 1024, TCA_TAPRIO_SCHED_ENTRY_GATE_MASK, , 
>> sizeof(gatemask));
>> +addattr_l(n, 1024, TCA_TAPRIO_SCHED_ENTRY_INTERVAL, , 
>> sizeof(interval));
>> +
>> +addattr_nest_end(n, entry);
>> +}
>> +
>
> Why not just use batch mode? Introducing another input mode in tc that is
> only in one qdisc seems like a bad idea.

Seems that I have missed batch mode. I am going to play with it a little
and see how things would look.

>
>> +
>> +static int taprio_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr 
>> *opt)
>> +{
>> +struct rtattr *tb[TCA_TAPRIO_ATTR_MAX + 1];
>> +struct tc_mqprio_qopt *qopt = 0;
>> +__s64 cycle_time = 0, extension_time = 0, base_time = 0;
>> +__s32 clockid = CLOCKID_INVALID;
>> +__u32 preempt_mask = 0;
>> +int i;
>> +
>> +if (opt == NULL)
>> +return 0;
>> +
>> +parse_rtattr_nested(tb, TCA_TAPRIO_ATTR_MAX, opt);
>> +
>> +if (tb[TCA_TAPRIO_ATTR_PRIOMAP] == NULL)
>> +return -1;
>> +
>> +qopt = RTA_DATA(tb[TCA_TAPRIO_ATTR_PRIOMAP]);
>> +
>> +fprintf(f, "tc %u map ", qopt->num_tc);
>> +for (i = 0; i <= TC_PRIO_MAX; i++)
>> +fprintf(f, "%u ", qopt->prio_tc_map[i]);
>> +fprintf(f, "\n  queues:");
>> +for (i = 0; i < qopt->num_tc; i++)
>> +fprintf(f, "(%u:%u) ", qopt->offset[i],
>> +qopt->offset[i] + qopt->count[i] - 1);
>> +
>> +if (tb[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME])
>> +cycle_time = 
>> rta_getattr_s64(tb[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME]);
>> +
>> +if (tb[TCA_TAPRIO_ATTR_SCHED_EXTENSION_TIME])
>> +extension_time = 
>> rta_getattr_s64(tb[TCA_TAPRIO_ATTR_SCHED_EXTENSION_TIME]);
>> +
>> +if (tb[TCA_TAPRIO_ATTR_SCHED_BASE_TIME])
>> +base_time = 
>> rta_getattr_s64(tb[TCA_TAPRIO_ATTR_SCHED_BASE_TIME]);
>> +
>> +if (tb[TCA_TAPRIO_ATTR_PREEMPT_MASK])
>> +preempt_mask = 
>> rta_getattr_s64(tb[TCA_TAPRIO_ATTR_PREEMPT_MASK]);
>> +
>> +if (tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID])
>> +clockid = rta_getattr_s32(tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]);
>> +
>> +fprintf(f, "\n  clockid %s ", get_clock_name(clockid));
>> +
>> +fprintf(f, "\n  base-time %lld cycle-time %lld extension-time %lld ",
>> +base_time, cycle_time, extension_time);
>> +
>> +fprintf(f, "\n  preempt-mask 0x%x ", preempt_mask);
>> +
>> +return print_sched_list(f, tb[TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST]);
>
>
> Please implement JSON output using json_print functions.

Sure. Will do.


Cheers,
--
Vinicius


Re: [RFC net-next v1 1/1] net/sched: Introduce the taprio scheduler

2018-07-16 Thread Vinicius Costa Gomes
Hi Jiri,

Jiri Pirko  writes:

[...]

>>
>>gates.sched
>
> Any particular reason this has to be in file and not on the cmdline?

The idea here was to keep longer schedules more manageable. And during
testing I found it more ergonomic to have a file.

It also has the advantage that the file can be reused by other tools,
dump-classifier (awful name, I admit), included in that github gist, is
one example, it uses the schedule (and some more information) to
calculate which packets would fall outside their "windows" in a pcap
dump.

Anyway, if there are use cases that having the schedule in the command
line helps, I would be happy to add it.


Cheers,
--
Vinicius


[RFC iproute2-next v1 0/5] net/sched: Introduce the taprio scheduler

2018-07-13 Thread Vinicius Costa Gomes
Hi,

This is iproute2 side of the taprio RFC series.

Please see the kernel side cover letter for more information about how
to test this.


Cheers,
--
Vinicius


Jesus Sanchez-Palencia (1):
  libnetlink: Add helper for getting a __s32 from netlink msgs

Vinicius Costa Gomes (4):
  utils: Implement get_s64()
  include: Add helper to retrieve a __s64 from a netlink msg
  include: add definitions for taprio [DO NOT COMMIT]
  tc: Add support for configuring the taprio scheduler

 include/libnetlink.h   |  14 +
 include/uapi/linux/pkt_sched.h |  48 
 include/utils.h|   1 +
 lib/utils.c|  21 ++
 tc/Makefile|   1 +
 tc/q_taprio.c  | 450 +
 6 files changed, 535 insertions(+)
 create mode 100644 tc/q_taprio.c

-- 
2.18.0



[RFC net-next v1 1/1] net/sched: Introduce the taprio scheduler

2018-07-13 Thread Vinicius Costa Gomes
This scheduler allows the network administrator to configure schedules
for classes of traffic, the configuration interface is similar to what
IEEE 802.1Qbv-2015 defines.

Example configuration:

$ tc qdisc add dev enp2s0 parent root handle 100 taprio \
num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 2@2 \
sched-file ~/gates.sched \
base-time 100 \
clockid CLOCK_TAI

This qdisc borrows a few concepts from mqprio and most the parameters
are similar to mqprio. The main difference is the 'sched-file'
parameter, one example of a schedule file would be:

gates.sched
---
S 01 30
S 02 30
S 04 30

The format of each line is:
  

The only supported  is "S", which means "SetGateStates",
following the IEEE 802.1Qbv-2015 definition (Table 8-6). 
is a bitmask where each bit is a associated with a traffic class, so
bit 0 (the least significant bit) being "on" means that traffic class
0 is "active" for that schedule entry.  is a time duration
in nanoseconds that specifies for how long that state defined by 
and  should be held before moving to the next entry.

This schedule is circular, that is, after the last entry is executed
it starts from the first one, indefinitely.

The other parameters can be defined as follows:
 - base-time: allows that multiple systems can have synchronized
 schedules, it specifies the instant when the schedule starts;
 - clockid: specifies the reference clock to be used;

Signed-off-by: Vinicius Costa Gomes 
---
 include/uapi/linux/pkt_sched.h |  49 ++
 net/sched/Kconfig  |  11 +
 net/sched/Makefile |   1 +
 net/sched/sch_taprio.c | 952 +
 4 files changed, 1013 insertions(+)
 create mode 100644 net/sched/sch_taprio.c

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index d9cc9dc4f547..863df790ff07 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -1069,4 +1069,53 @@ enum {
CAKE_ATM_MAX
 };
 
+
+/* TAPRIO */
+enum {
+   TC_TAPRIO_CMD_SET_GATES = 0x00,
+   TC_TAPRIO_CMD_SET_AND_HOLD = 0x01,
+   TC_TAPRIO_CMD_SET_AND_RELEASE = 0x02,
+};
+
+enum {
+   TCA_TAPRIO_SCHED_ENTRY_UNSPEC,
+   TCA_TAPRIO_SCHED_ENTRY_INDEX, /* u32 */
+   TCA_TAPRIO_SCHED_ENTRY_CMD, /* u8 */
+   TCA_TAPRIO_SCHED_ENTRY_GATE_MASK, /* u32 */
+   TCA_TAPRIO_SCHED_ENTRY_INTERVAL, /* u32 */
+   __TCA_TAPRIO_SCHED_ENTRY_MAX,
+};
+#define TCA_TAPRIO_SCHED_ENTRY_MAX (__TCA_TAPRIO_SCHED_ENTRY_MAX - 1)
+
+/* The format for schedule entry list is:
+ * [TCA_TAPRIO_SCHED_ENTRY_LIST]
+ *   [TCA_TAPRIO_SCHED_ENTRY]
+ * [TCA_TAPRIO_SCHED_ENTRY_CMD]
+ * [TCA_TAPRIO_SCHED_ENTRY_GATES]
+ * [TCA_TAPRIO_SCHED_ENTRY_INTERVAL]
+ */
+enum {
+   TCA_TAPRIO_SCHED_UNSPEC,
+   TCA_TAPRIO_SCHED_ENTRY,
+   __TCA_TAPRIO_SCHED_MAX,
+};
+
+#define TCA_TAPRIO_SCHED_MAX (__TCA_TAPRIO_SCHED_MAX - 1)
+
+enum {
+   TCA_TAPRIO_ATTR_UNSPEC,
+   TCA_TAPRIO_ATTR_PRIOMAP, /* struct tc_mqprio_qopt */
+   TCA_TAPRIO_ATTR_PREEMPT_MASK, /* which traffic classes are preemptible, 
u32 */
+   TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST, /* nested of entry */
+   TCA_TAPRIO_ATTR_SCHED_BASE_TIME, /* s64 */
+   TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME, /* s64 */
+   TCA_TAPRIO_ATTR_SCHED_EXTENSION_TIME, /* s64 */
+   TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY, /* single entry */
+   TCA_TAPRIO_ATTR_SCHED_CLOCKID, /* s32 */
+   TCA_TAPRIO_PAD,
+   __TCA_TAPRIO_ATTR_MAX,
+};
+
+#define TCA_TAPRIO_ATTR_MAX (__TCA_TAPRIO_ATTR_MAX - 1)
+
 #endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 7af246764a35..e728d2e3052b 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -194,6 +194,17 @@ config NET_SCH_ETF
  To compile this code as a module, choose M here: the
  module will be called sch_etf.
 
+config NET_SCH_TAPRIO
+   tristate "Time Aware Priority (taprio) Scheduler"
+   help
+ Say Y here if you want to use the Time Aware Priority (taprio) packet
+ scheduling algorithm.
+
+ See the top of  for more details.
+
+ To compile this code as a module, choose M here: the
+ module will be called sch_taprio.
+
 config NET_SCH_GRED
tristate "Generic Random Early Detection (GRED)"
---help---
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 673ee7d26ff2..dc81a16e35b5 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -56,6 +56,7 @@ obj-$(CONFIG_NET_SCH_HHF) += sch_hhf.o
 obj-$(CONFIG_NET_SCH_PIE)  += sch_pie.o
 obj-$(CONFIG_NET_SCH_CBS)  += sch_cbs.o
 obj-$(CONFIG_NET_SCH_ETF)  += sch_etf.o
+obj-$(CONFIG_NET_SCH_TAPRIO)   += sch_taprio.o
 
 obj-$(CONFIG_NET_CLS_U32)  += cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)   += cls_route.o
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_ta

[RFC iproute2-next v1 1/5] utils: Implement get_s64()

2018-07-13 Thread Vinicius Costa Gomes
Add this helper to read signed 64-bit integers from a string.
---
 include/utils.h |  1 +
 lib/utils.c | 21 +
 2 files changed, 22 insertions(+)

diff --git a/include/utils.h b/include/utils.h
index 8cb4349e..58574a05 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -139,6 +139,7 @@ int get_time_rtt(unsigned *val, const char *arg, int *raw);
 #define get_byte get_u8
 #define get_ushort get_u16
 #define get_short get_s16
+int get_s64(__s64 *val, const char *arg, int base);
 int get_u64(__u64 *val, const char *arg, int base);
 int get_u32(__u32 *val, const char *arg, int base);
 int get_s32(__s32 *val, const char *arg, int base);
diff --git a/lib/utils.c b/lib/utils.c
index 02ce6772..02836d6e 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -382,6 +382,27 @@ int get_u8(__u8 *val, const char *arg, int base)
return 0;
 }
 
+int get_s64(__s64 *val, const char *arg, int base)
+{
+   long res;
+   char *ptr;
+
+   errno = 0;
+
+   if (!arg || !*arg)
+   return -1;
+   res = strtoll(arg, , base);
+   if (!ptr || ptr == arg || *ptr)
+   return -1;
+   if ((res == LLONG_MIN || res == LLONG_MAX) && errno == ERANGE)
+   return -1;
+   if (res > INT64_MAX || res < INT64_MIN)
+   return -1;
+
+   *val = res;
+   return 0;
+}
+
 int get_s32(__s32 *val, const char *arg, int base)
 {
long res;
-- 
2.18.0



[RFC iproute2-next v1 3/5] libnetlink: Add helper for getting a __s32 from netlink msgs

2018-07-13 Thread Vinicius Costa Gomes
From: Jesus Sanchez-Palencia 

This function retrieves a signed 32-bit integer from a netlink message
and returns it.

Signed-off-by: Jesus Sanchez-Palencia 
---
 include/libnetlink.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/libnetlink.h b/include/libnetlink.h
index 88164975..114a816e 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -185,6 +185,13 @@ static inline __u64 rta_getattr_u64(const struct rtattr 
*rta)
memcpy(, RTA_DATA(rta), sizeof(__u64));
return tmp;
 }
+static inline __s32 rta_getattr_s32(const struct rtattr *rta)
+{
+   __s32 tmp;
+
+   memcpy(, RTA_DATA(rta), sizeof(__s32));
+   return tmp;
+}
 static inline __s64 rta_getattr_s64(const struct rtattr *rta)
 {
__s64 tmp;
-- 
2.18.0



[RFC iproute2-next v1 5/5] tc: Add support for configuring the taprio scheduler

2018-07-13 Thread Vinicius Costa Gomes
This traffic scheduler allows traffic classes states (transmission
allowed/not allowed, in the simplest case) to be scheduled, according
to a pre-generated time sequence. This is the basis of the IEEE
802.1Qbv specification.

If the controller supports it, traffic can be also marked as
preemptable/not preemptable, and the states for each entry in the time
sequence gains a couple more commands. This maps to the functionality
defined by the IEEE 802.1Qbu specification.

The syntax is:

tc qdisc add dev DEV parent NODE taprio num_tc NUMBER map P0 P1 ...
queues TC0 TC1 TC2 ...
[ [sched-file file] | [sched-row INDEX CMD GATE-MASK INTERVAL]
[ base-time TIME ] [ extension-time TIME ] [ cycle-time TIME ]
[ preemption TC0 TC1 TC2 ... ]
clockid CLOCKID

The parameters should be similar to what the IEEE 802.1Q family of
specification define.

Signed-off-by: Vinicius Costa Gomes 
Signed-off-by: Jesus Sanchez-Palencia 
---
 tc/Makefile   |   1 +
 tc/q_taprio.c | 450 ++
 2 files changed, 451 insertions(+)
 create mode 100644 tc/q_taprio.c

diff --git a/tc/Makefile b/tc/Makefile
index dfd00267..6534d69b 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -71,6 +71,7 @@ TCMODULES += q_clsact.o
 TCMODULES += e_bpf.o
 TCMODULES += f_matchall.o
 TCMODULES += q_cbs.o
+TCMODULES += q_taprio.o
 
 TCSO :=
 ifeq ($(TC_CONFIG_ATM),y)
diff --git a/tc/q_taprio.c b/tc/q_taprio.c
new file mode 100644
index ..ef8baa4b
--- /dev/null
+++ b/tc/q_taprio.c
@@ -0,0 +1,450 @@
+/*
+ * q_taprio.c  Time Aware Priority Scheduler
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors:Vinicius Costa Gomes 
+ * Jesus Sanchez-Palencia 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+#include "tc_util.h"
+
+#define CLOCKID_INVALID (-1)
+static const struct static_clockid {
+   const char *name;
+   clockid_t clockid;
+} clockids_sysv[] = {
+   { "REALTIME", CLOCK_REALTIME },
+   { "TAI", CLOCK_TAI },
+   { "BOOTTIME", CLOCK_BOOTTIME },
+   { "MONOTONIC", CLOCK_MONOTONIC },
+   { NULL }
+};
+
+static void explain(void)
+{
+   fprintf(stderr, "Usage: ... taprio clockid CLOCKID\n");
+   fprintf(stderr, "  [num_tc NUMBER] [map P0 P1 ...] ");
+   fprintf(stderr, "  [queues TC0 TC1 TC2 ...] ");
+   fprintf(stderr, "  [ [sched-file file] | [sched-row 
index cmd gate-mask interval] ] ");
+   fprintf(stderr, "  [base-time time] [extension-time 
time] [cycle-time time] ");
+   fprintf(stderr, "  [preemption TC0 TC1 TC2 ... ] ");
+   fprintf(stderr, "\nCLOCKID must be a valid SYS-V id (i.e. CLOCK_TAI)");
+   fprintf(stderr, "\n");
+}
+
+static void explain_clockid(const char *val)
+{
+   fprintf(stderr, "taprio: illegal value for \"clockid\": \"%s\".\n", 
val);
+   fprintf(stderr, "It must be a valid SYS-V id (i.e. CLOCK_TAI)\n");
+}
+
+static int get_clockid(__s32 *val, const char *arg)
+{
+   const struct static_clockid *c;
+
+   /* Drop the CLOCK_ prefix if that is being used. */
+   if (strcasestr(arg, "CLOCK_") != NULL)
+   arg += sizeof("CLOCK_") - 1;
+
+   for (c = clockids_sysv; c->name; c++) {
+   if (strcasecmp(c->name, arg) == 0) {
+   *val = c->clockid;
+
+   return 0;
+   }
+   }
+
+   return -1;
+}
+
+static const char* get_clock_name(clockid_t clockid)
+{
+   const struct static_clockid *c;
+
+   for (c = clockids_sysv; c->name; c++) {
+   if (clockid == c->clockid)
+   return c->name;
+   }
+
+   return "invalid";
+}
+
+static int str_to_entry_cmd(const char *str)
+{
+   if (strcmp(str, "S") == 0)
+   return TC_TAPRIO_CMD_SET_GATES;
+
+   if (strcmp(str, "H") == 0)
+   return TC_TAPRIO_CMD_SET_AND_HOLD;
+
+   if (strcmp(str, "R") == 0)
+   return TC_TAPRIO_CMD_SET_AND_RELEASE;
+
+   return -1;
+}
+
+static int add_sched_list(FILE *f, struct nlmsghdr *n)
+{
+   __u32 interval, gatemask, index = 0;
+   char *cmd_str;
+   __u8 cmd;
+   int err;
+
+   while (fscanf(f, "%ms %x %" PRIu32 "\n", _str, , 
) != EOF)  {
+   struct rtattr *entry;
+

[RFC iproute2-next v1 4/5] include: add definitions for taprio [DO NOT COMMIT]

2018-07-13 Thread Vinicius Costa Gomes
DO NOT COMMIT

This patch exists only to ease the testing, until this header is
updated with the definitions from the kernel.

Signed-off-by: Vinicius Costa Gomes 
---
 include/uapi/linux/pkt_sched.h | 48 ++
 1 file changed, 48 insertions(+)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 37b5096a..8ca988cb 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -934,4 +934,52 @@ enum {
 
 #define TCA_CBS_MAX (__TCA_CBS_MAX - 1)
 
+/* TAPRIO */
+enum {
+   TC_TAPRIO_CMD_SET_GATES = 0x00,
+   TC_TAPRIO_CMD_SET_AND_HOLD = 0x01,
+   TC_TAPRIO_CMD_SET_AND_RELEASE = 0x02,
+};
+
+enum {
+   TCA_TAPRIO_SCHED_ENTRY_UNSPEC,
+   TCA_TAPRIO_SCHED_ENTRY_INDEX, /* u32 */
+   TCA_TAPRIO_SCHED_ENTRY_CMD, /* u8 */
+   TCA_TAPRIO_SCHED_ENTRY_GATE_MASK, /* u32 */
+   TCA_TAPRIO_SCHED_ENTRY_INTERVAL, /* u32 */
+   __TCA_TAPRIO_SCHED_ENTRY_MAX,
+};
+#define TCA_TAPRIO_SCHED_ENTRY_MAX (__TCA_TAPRIO_SCHED_ENTRY_MAX - 1)
+
+/* The format for schedule entry list is:
+ * [TCA_TAPRIO_SCHED_ENTRY_LIST]
+ *   [TCA_TAPRIO_SCHED_ENTRY]
+ * [TCA_TAPRIO_SCHED_ENTRY_CMD]
+ * [TCA_TAPRIO_SCHED_ENTRY_GATES]
+ * [TCA_TAPRIO_SCHED_ENTRY_INTERVAL]
+ */
+enum {
+   TCA_TAPRIO_SCHED_UNSPEC,
+   TCA_TAPRIO_SCHED_ENTRY,
+   __TCA_TAPRIO_SCHED_MAX,
+};
+
+#define TCA_TAPRIO_SCHED_MAX (__TCA_TAPRIO_SCHED_MAX - 1)
+
+enum {
+   TCA_TAPRIO_ATTR_UNSPEC,
+   TCA_TAPRIO_ATTR_PRIOMAP, /* struct tc_mqprio_qopt */
+   TCA_TAPRIO_ATTR_PREEMPT_MASK, /* which traffic classes are preemptible, 
u32 */
+   TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST, /* nested of entry */
+   TCA_TAPRIO_ATTR_SCHED_BASE_TIME, /* s64 */
+   TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME, /* s64 */
+   TCA_TAPRIO_ATTR_SCHED_EXTENSION_TIME, /* s64 */
+   TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY, /*  */
+   TCA_TAPRIO_ATTR_SCHED_CLOCKID, /* s32 */
+   TCA_TAPRIO_PAD,
+   __TCA_TAPRIO_ATTR_MAX,
+};
+
+#define TCA_TAPRIO_ATTR_MAX (__TCA_TAPRIO_ATTR_MAX - 1)
+
 #endif
-- 
2.18.0



[RFC iproute2-next v1 2/5] include: Add helper to retrieve a __s64 from a netlink msg

2018-07-13 Thread Vinicius Costa Gomes
This allows signed 64-bit integers to be retrieved from a netlink
message.
---
 include/libnetlink.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/libnetlink.h b/include/libnetlink.h
index 9d9249e6..88164975 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -185,6 +185,13 @@ static inline __u64 rta_getattr_u64(const struct rtattr 
*rta)
memcpy(, RTA_DATA(rta), sizeof(__u64));
return tmp;
 }
+static inline __s64 rta_getattr_s64(const struct rtattr *rta)
+{
+   __s64 tmp;
+
+   memcpy(, RTA_DATA(rta), sizeof(__s64));
+   return tmp;
+}
 static inline const char *rta_getattr_str(const struct rtattr *rta)
 {
return (const char *)RTA_DATA(rta);
-- 
2.18.0



[RFC net-next v1 0/1] net/sched: Introduce the taprio scheduler

2018-07-13 Thread Vinicius Costa Gomes


Hi,

This series provides a set of interfaces that can be used by
applications that require (time-based) Scheduled Transmission of
packets. It is comprised by 3 new components to the kernel:

  - etf: the per-queue TxTime-Based scheduling qdisc;
  - taprio: the per-port Time-Aware scheduler qdisc;
  - SO_TXTIME: a socket option + cmsg APIs.

ETF and SO_TXTIME are already applied[1] into the net-next tree. This
is the remaining piece.

Overview


The CBS qdisc proposal RFC [2] included some rough ideas about the
design and API of a "taprio" (Time Aware Priority) qdisc. The idea of
presenting the taprio ideas at that point (almost 10 months ago!) was
to show our vision of how things would fit together going forward.
>From that concept stage to this (almost) realised stage the main
differences are:

  - As of now, taprio is a software only implementation of a schedule
executor;
  - Instead of taprio centralising all the time based decisions, taprio
can work together with ETF (the Earliest TxTime First), a qdisc
meant to use the LaunchTime (or similar) feature of various network
controllers;

In a nutshell, taprio is a root qdisc that can execute a predefined
schedule, etf is a qdisc that provides time based admission control
and "earliest deadline first" dequeue mode, and SO_TXTIME is a socket
option that is used for enabling a socket to be used for time-based
transmission and configuring its parameters.

taprio
==

This scheduler allows the network administrator to configure schedules
for classes of traffic, the configuration interface is similar to what
IEEE 802.1Qbv-2015 defines.

Example configuration:

$ tc qdisc add dev enp2s0 parent root handle 100 taprio \
num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 2@2 \
sched-file ~/gates.sched \
base-time 1528743495910289987 \
clockid CLOCK_TAI

This qdisc borrows a few concepts from mqprio and so, most the
parameters are similar to mqprio. The main difference is the
'sched-file' parameter, one example on a schedule file would be:

gates.sched
---
S 01 30
S 02 30
S 04 30

The format of each line is:
  

The only supported  is "S", which means "SetGateStates",
following the IEEE 802.1Qbv-2015 definition (Table 8-6). 
is a bitmask where each bit is a associated with a traffic class, so
bit 0 (the least significant bit) being "on" means that traffic class
0 is "active" for that schedule entry.  is a time duration
in nanoseconds that specifies for how long that state defined by 
and  should be held before moving to the next entry.

This schedule is circular, that is, after the last entry is executed
it starts from the first one, indefinitely.

The other parameters can be defined as follows:
  - base-time: allows that multiple systems can have synchronised
schedules, it specifies the instant when the schedule starts;
  - clockid: specifies the reference clock to be used;

A more complete example can be found here, with instructions of how to
test it:

https://gist.github.com/jeez/bd3afeff081ba64a695008dd8215866f [3]

The basic design of the scheduler is simple, after we calculate the
first expiration of the hrtimer, we set the next expiration to be the
previous plus the current entry's interval. At each time the function
runs, we set the current_entry, which has a gate_mask (that controls
which traffic classes are allowed to "go out" during each interval),
and we reuse this callback to "kick" the qdisc (this is the reason
that the usual qdisc watchdog isn't used).


Known Issues


 - As taprio is a software only implementation, and there's another
   layer of queuing in the network controller, packets can still
   leave the controller outside their "correct" windows. This happens
   mostly for low-priority classes, and is more evident if they are
   'starved' by the higher priority ones;
 - There's no support for changing the schedule during runtime;

This series is also hosted on github and can be found at [4].
The companion iproute2 patches can be found at [5].


Cheers,
--
Vinicius


[1] https://patchwork.ozlabs.org/cover/938991/

[2] https://patchwork.ozlabs.org/cover/808504/

[3] github doesn't make it clear, but the gist can be cloned like this:
$ git clone https://gist.github.com/jeez/bd3afeff081ba64a695008dd8215866f 
taprio-test

[4] https://github.com/vcgomes/linux/tree/taprio-RFC-v1

[5] https://github.com/vcgomes/iproute2/tree/taprio-RFC-v1


Vinicius Costa Gomes (1):
  net/sched: Introduce the taprio scheduler

 include/uapi/linux/pkt_sched.h |  49 ++
 net/sched/Kconfig  |  11 +
 net/sched/Makefile |   1 +
 net/sched/sch_taprio.c | 952 +
 4 files changed, 1013 insertions(+)
 create mode 100644 net/sched/sch_taprio.c

-- 
2.18.0



Re: [PATCH iproute2-next 3/3] treewide: Use addattr_nest()/addattr_nest_end() to handle nested attributes

2018-04-13 Thread Vinicius Costa Gomes
Hi,

Serhey Popovych  writes:

[...]

> diff --git a/tc/q_mqprio.c b/tc/q_mqprio.c
> index 89b4600..207d644 100644
> --- a/tc/q_mqprio.c
> +++ b/tc/q_mqprio.c
> @@ -173,8 +173,7 @@ static int mqprio_parse_opt(struct qdisc_util *qu, int 
> argc,
>   argc--; argv++;
>   }
>  
> - tail = NLMSG_TAIL(n);
> - addattr_l(n, 1024, TCA_OPTIONS, , sizeof(opt));
> + tail = addattr_nest_compat(n, 1024, TCA_OPTIONS, , sizeof(opt));
>  
>   if (flags & TC_MQPRIO_F_MODE)
>   addattr_l(n, 1024, TCA_MQPRIO_MODE,
> @@ -209,7 +208,7 @@ static int mqprio_parse_opt(struct qdisc_util *qu, int 
> argc,
>   addattr_nest_end(n, start);
>   }
>  
> - tail->rta_len = (void *)NLMSG_TAIL(n) - (void *)tail;
> + addattr_nest_compat_end(n, tail);
>  
>   return 0;
>  }

Sorry if I am too late, but this breaks mqprio, i.e. something like
this:

$ tc qdisc replace dev enp2s0 handle 100: parent root mqprio \
   num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
   queues 1@0 1@1 2@2 hw 0

that used to work, now doesn't.

This patch looks right, so I thought that it could be possible that mqprio
(in the kernel side) was making some wrong assumptions about the format
of the messages.

And after some investigation, what seems to be happening is something
like this (not too familiar with netlink protocol internals, I may be
missing something).

In the "wire", after this patch, the mqprio part of message may be
represented as:

/* The message format is [ len | type | payload ] */

[ S | 2 |  ]
[ 0 | 2 | ]

Some notes:
 - S is the aligned value of sizeof(opt);
 - The value of TCA_OPTIONS is 2;

Before this patch, I think it was something like:

[ S | 2 |  ]

The problem is that mqprio defines an internal type with the same value
as TCA_OPTIONS (2), and that finalizing (empty) is interpreted as the
"internal" field instead of indicating the end of TCA_OPTIONS, which
causes a size mismatch with 'mqprio_policy', causing the command to
create a mqprio qdisc to fail.

In short, I think that replacing the "open coded" version with
addattr_nest_compat() is not a functionally equivalent change.


Cheers,
--
Vinicius


[next-queue PATCH v7 05/10] igb: Add support for enabling queue steering in filters

2018-04-10 Thread Vinicius Costa Gomes
On some igb models (82575 and i210) the MAC address filters can
control to which queue the packet will be assigned.

This extends the 'state' with one more state to signify that queue
selection should be enabled for that filter.

As 82575 parts are no longer easily obtained (and this was developed
against i210), only support for the i210 model is enabled.

These functions are exported and will be used in the next patch.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 .../net/ethernet/intel/igb/e1000_defines.h|  1 +
 drivers/net/ethernet/intel/igb/igb.h  |  6 +
 drivers/net/ethernet/intel/igb/igb_main.c | 26 +++
 3 files changed, 33 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h 
b/drivers/net/ethernet/intel/igb/e1000_defines.h
index 5417edbe3125..d3d1d868e7ba 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -492,6 +492,7 @@
  */
 #define E1000_RAH_AV  0x8000/* Receive descriptor valid */
 #define E1000_RAH_ASEL_SRC_ADDR 0x0001
+#define E1000_RAH_QSEL_ENABLE 0x1000
 #define E1000_RAL_MAC_ADDR_LEN 4
 #define E1000_RAH_MAC_ADDR_LEN 2
 #define E1000_RAH_POOL_MASK 0x03FC
diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index f3ecda46f287..f48ba090fd6a 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -475,6 +475,7 @@ struct igb_mac_addr {
 #define IGB_MAC_STATE_DEFAULT  0x1
 #define IGB_MAC_STATE_IN_USE   0x2
 #define IGB_MAC_STATE_SRC_ADDR  0x4
+#define IGB_MAC_STATE_QUEUE_STEERING 0x8
 
 /* board specific private data structure */
 struct igb_adapter {
@@ -740,4 +741,9 @@ int igb_add_filter(struct igb_adapter *adapter,
 int igb_erase_filter(struct igb_adapter *adapter,
 struct igb_nfc_filter *input);
 
+int igb_add_mac_steering_filter(struct igb_adapter *adapter,
+   const u8 *addr, u8 queue, u8 flags);
+int igb_del_mac_steering_filter(struct igb_adapter *adapter,
+   const u8 *addr, u8 queue, u8 flags);
+
 #endif /* _IGB_H_ */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 2033ec3c9b27..e3da35cab786 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6935,6 +6935,28 @@ static int igb_del_mac_filter(struct igb_adapter 
*adapter, const u8 *addr,
return igb_del_mac_filter_flags(adapter, addr, queue, 0);
 }
 
+int igb_add_mac_steering_filter(struct igb_adapter *adapter,
+   const u8 *addr, u8 queue, u8 flags)
+{
+   struct e1000_hw *hw = >hw;
+
+   /* In theory, this should be supported on 82575 as well, but
+* that part wasn't easily accessible during development.
+*/
+   if (hw->mac.type != e1000_i210)
+   return -EOPNOTSUPP;
+
+   return igb_add_mac_filter_flags(adapter, addr, queue,
+   IGB_MAC_STATE_QUEUE_STEERING | flags);
+}
+
+int igb_del_mac_steering_filter(struct igb_adapter *adapter,
+   const u8 *addr, u8 queue, u8 flags)
+{
+   return igb_del_mac_filter_flags(adapter, addr, queue,
+   IGB_MAC_STATE_QUEUE_STEERING | flags);
+}
+
 static int igb_uc_sync(struct net_device *netdev, const unsigned char *addr)
 {
struct igb_adapter *adapter = netdev_priv(netdev);
@@ -8784,6 +8806,10 @@ static void igb_rar_set_index(struct igb_adapter 
*adapter, u32 index)
switch (hw->mac.type) {
case e1000_82575:
case e1000_i210:
+   if (adapter->mac_table[index].state &
+   IGB_MAC_STATE_QUEUE_STEERING)
+   rar_high |= E1000_RAH_QSEL_ENABLE;
+
rar_high |= E1000_RAH_POOL_1 *
  adapter->mac_table[index].queue;
break;
-- 
2.17.0



[next-queue PATCH v7 04/10] igb: Add support for MAC address filters specifying source addresses

2018-04-10 Thread Vinicius Costa Gomes
Makes it possible to direct packets to queues based on their source
address. Documents the expected usage of the 'flags' parameter.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 .../net/ethernet/intel/igb/e1000_defines.h|  1 +
 drivers/net/ethernet/intel/igb/igb.h  |  1 +
 drivers/net/ethernet/intel/igb/igb_main.c | 40 ---
 3 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h 
b/drivers/net/ethernet/intel/igb/e1000_defines.h
index 98534f765e0e..5417edbe3125 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -491,6 +491,7 @@
  * manageability enabled, allowing us room for 15 multicast addresses.
  */
 #define E1000_RAH_AV  0x8000/* Receive descriptor valid */
+#define E1000_RAH_ASEL_SRC_ADDR 0x0001
 #define E1000_RAL_MAC_ADDR_LEN 4
 #define E1000_RAH_MAC_ADDR_LEN 2
 #define E1000_RAH_POOL_MASK 0x03FC
diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index 8dbc399b345e..f3ecda46f287 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -474,6 +474,7 @@ struct igb_mac_addr {
 
 #define IGB_MAC_STATE_DEFAULT  0x1
 #define IGB_MAC_STATE_IN_USE   0x2
+#define IGB_MAC_STATE_SRC_ADDR  0x4
 
 /* board specific private data structure */
 struct igb_adapter {
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 976898d39d6e..2033ec3c9b27 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6844,8 +6844,14 @@ static void igb_set_default_mac_filter(struct 
igb_adapter *adapter)
igb_rar_set_index(adapter, 0);
 }
 
-static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr,
- const u8 queue)
+/* Add a MAC filter for 'addr' directing matching traffic to 'queue',
+ * 'flags' is used to indicate what kind of match is made, match is by
+ * default for the destination address, if matching by source address
+ * is desired the flag IGB_MAC_STATE_SRC_ADDR can be used.
+ */
+static int igb_add_mac_filter_flags(struct igb_adapter *adapter,
+   const u8 *addr, const u8 queue,
+   const u8 flags)
 {
struct e1000_hw *hw = >hw;
int rar_entries = hw->mac.rar_entry_count -
@@ -6865,7 +6871,7 @@ static int igb_add_mac_filter(struct igb_adapter 
*adapter, const u8 *addr,
 
ether_addr_copy(adapter->mac_table[i].addr, addr);
adapter->mac_table[i].queue = queue;
-   adapter->mac_table[i].state |= IGB_MAC_STATE_IN_USE;
+   adapter->mac_table[i].state |= IGB_MAC_STATE_IN_USE | flags;
 
igb_rar_set_index(adapter, i);
return i;
@@ -6874,8 +6880,21 @@ static int igb_add_mac_filter(struct igb_adapter 
*adapter, const u8 *addr,
return -ENOSPC;
 }
 
-static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *addr,
+static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr,
  const u8 queue)
+{
+   return igb_add_mac_filter_flags(adapter, addr, queue, 0);
+}
+
+/* Remove a MAC filter for 'addr' directing matching traffic to
+ * 'queue', 'flags' is used to indicate what kind of match need to be
+ * removed, match is by default for the destination address, if
+ * matching by source address is to be removed the flag
+ * IGB_MAC_STATE_SRC_ADDR can be used.
+ */
+static int igb_del_mac_filter_flags(struct igb_adapter *adapter,
+   const u8 *addr, const u8 queue,
+   const u8 flags)
 {
struct e1000_hw *hw = >hw;
int rar_entries = hw->mac.rar_entry_count -
@@ -6892,12 +6911,14 @@ static int igb_del_mac_filter(struct igb_adapter 
*adapter, const u8 *addr,
for (i = 0; i < rar_entries; i++) {
if (!(adapter->mac_table[i].state & IGB_MAC_STATE_IN_USE))
continue;
+   if ((adapter->mac_table[i].state & flags) != flags)
+   continue;
if (adapter->mac_table[i].queue != queue)
continue;
if (!ether_addr_equal(adapter->mac_table[i].addr, addr))
continue;
 
-   adapter->mac_table[i].state &= ~IGB_MAC_STATE_IN_USE;
+   adapter->mac_table[i].state = 0;
memset(adapter->mac_table[i].addr, 0, ETH_ALEN);
adapter->mac_table[i].queue = 0;
 
@@ -6908,6 +6929,12 @@ static int igb_del_mac_filter(struct igb_adapter 
*adapter, const u8 *addr,
return -ENOENT;
 }
 
+static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *

[next-queue PATCH v7 02/10] igb: Fix queue selection on MAC filters on i210

2018-04-10 Thread Vinicius Costa Gomes
On the RAH registers there are semantic differences on the meaning of
the "queue" parameter for traffic steering depending on the controller
model: there is the 82575 meaning, which "queue" means a RX Hardware
Queue, and the i350 meaning, where it is a reception pool.

The previous behaviour was having no effect for i210 based controllers
because the QSEL bit of the RAH register wasn't being set.

This patch separates the condition in discrete cases, so the different
handling is clearer.

Fixes: 83c21335c876 ("igb: improve MAC filter handling")
Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index c1c0bc30a16d..0a79fef3c4fb 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8748,12 +8748,17 @@ static void igb_rar_set_index(struct igb_adapter 
*adapter, u32 index)
if (is_valid_ether_addr(addr))
rar_high |= E1000_RAH_AV;
 
-   if (hw->mac.type == e1000_82575)
+   switch (hw->mac.type) {
+   case e1000_82575:
+   case e1000_i210:
rar_high |= E1000_RAH_POOL_1 *
-   adapter->mac_table[index].queue;
-   else
+ adapter->mac_table[index].queue;
+   break;
+   default:
rar_high |= E1000_RAH_POOL_1 <<
-   adapter->mac_table[index].queue;
+   adapter->mac_table[index].queue;
+   break;
+   }
}
 
wr32(E1000_RAL(index), rar_low);
-- 
2.17.0



[next-queue PATCH v7 06/10] igb: Allow filters to be added for the local MAC address

2018-04-10 Thread Vinicius Costa Gomes
Users expect that when adding a steering filter for the local MAC
address, that all the traffic directed to that address will go to some
queue.

Currently, it's not possible to configure entries in the "in use"
state, which is the normal state of the local MAC address entry (it is
the default), this patch allows to override the steering configuration
of "in use" entries, if the filter to be added match the address and
address type (source or destination) of an existing entry.

There is a bit of a special handling for entries referring to the
local MAC address, when they are removed, only the steering
configuration is reset.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 40 ---
 1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index e3da35cab786..1b6fad88107a 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6844,6 +6844,27 @@ static void igb_set_default_mac_filter(struct 
igb_adapter *adapter)
igb_rar_set_index(adapter, 0);
 }
 
+/* If the filter to be added and an already existing filter express
+ * the same address and address type, it should be possible to only
+ * override the other configurations, for example the queue to steer
+ * traffic.
+ */
+static bool igb_mac_entry_can_be_used(const struct igb_mac_addr *entry,
+ const u8 *addr, const u8 flags)
+{
+   if (!(entry->state & IGB_MAC_STATE_IN_USE))
+   return true;
+
+   if ((entry->state & IGB_MAC_STATE_SRC_ADDR) !=
+   (flags & IGB_MAC_STATE_SRC_ADDR))
+   return false;
+
+   if (!ether_addr_equal(addr, entry->addr))
+   return false;
+
+   return true;
+}
+
 /* Add a MAC filter for 'addr' directing matching traffic to 'queue',
  * 'flags' is used to indicate what kind of match is made, match is by
  * default for the destination address, if matching by source address
@@ -6866,7 +6887,8 @@ static int igb_add_mac_filter_flags(struct igb_adapter 
*adapter,
 * addresses.
 */
for (i = 0; i < rar_entries; i++) {
-   if (adapter->mac_table[i].state & IGB_MAC_STATE_IN_USE)
+   if (!igb_mac_entry_can_be_used(>mac_table[i],
+  addr, flags))
continue;
 
ether_addr_copy(adapter->mac_table[i].addr, addr);
@@ -6918,9 +6940,19 @@ static int igb_del_mac_filter_flags(struct igb_adapter 
*adapter,
if (!ether_addr_equal(adapter->mac_table[i].addr, addr))
continue;
 
-   adapter->mac_table[i].state = 0;
-   memset(adapter->mac_table[i].addr, 0, ETH_ALEN);
-   adapter->mac_table[i].queue = 0;
+   /* When a filter for the default address is "deleted",
+* we return it to its initial configuration
+*/
+   if (adapter->mac_table[i].state & IGB_MAC_STATE_DEFAULT) {
+   adapter->mac_table[i].state =
+   IGB_MAC_STATE_DEFAULT | IGB_MAC_STATE_IN_USE;
+   adapter->mac_table[i].queue =
+   adapter->vfs_allocated_count;
+   } else {
+   adapter->mac_table[i].state = 0;
+   adapter->mac_table[i].queue = 0;
+   memset(adapter->mac_table[i].addr, 0, ETH_ALEN);
+   }
 
igb_rar_set_index(adapter, i);
return 0;
-- 
2.17.0



[next-queue PATCH v7 00/10] igb: offloading of receive filters

2018-04-10 Thread Vinicius Costa Gomes
Hi,

Known issue:
 - It seems that the the QSEL bits in the RAH registers do not have
 any effect for source address (i.e. steering doesn't work for source
 address filters), everything is pointing to a hardware (or
 documentation) issue;

Changes from v6:
 - Because the i210 doesn't support steering traffic per source
   address (only filtering works), an error is emitted when only an
   source address is specified in a classification rule;

Changes from v5:
 - Filters can now be added for local MAC addresses, when removed,
   they return to their initial configuration (thanks for the testing
   Aaron);

Changes from v4:
 - Added a new bit to the MAC address filters internal
 representation to mean that some filters are steering filters (i.e.
 they direct traffic to queues);
 - And, this is only supported in i210;

Changes from v3:
 - Addressed review comments from Aaron F. Brown and
   Jakub Kicinski;

Changes from v2:
 - Addressed review comments from Jakub Kicinski, mostly about coding
   style adjustments and more consistent error reporting;

Changes from v1:
 - Addressed review comments from Alexander Duyck and Florian
   Fainelli;
 - Adding and removing cls_flower filters are now proposed in the same
   patch;
 - cls_flower filters are kept in a separated list from "ethtool"
   filters (so that section of the original cover letter is no longer
   valid);
 - The patch adding support for ethtool filters is now independent from
   the rest of the series;

Original cover letter:

This series enables some ethtool and tc-flower filters to be offloaded
to igb-based network controllers. This is useful when the system
configurator want to steer kinds of traffic to a specific hardware
queue.

The first two commits are bug fixes.

The basis of this series is to export the internal API used to
configure address filters, so they can be used by ethtool, and
extending the functionality so an source address can be handled.

Then, we enable the tc-flower offloading implementation to re-use the
same infrastructure as ethtool, and storing them in the per-adapter
"nfc" (Network Filter Config?) list. But for consistency, for
destructive access they are separated, i.e. an filter added by
tc-flower can only be removed by tc-flower, but ethtool can read them
all.

Only support for VLAN Prio, Source and Destination MAC Address, and
Ethertype is enabled for now.

Open question:
  - igb is initialized with the number of traffic classes as 1, if we
  want to use multiple traffic classes we need to increase this value,
  the only way I could find is to use mqprio (for example). Should igb
  be initialized with, say, the number of queues as its "num_tc"?

--

Vinicius Costa Gomes (10):
  igb: Fix not adding filter elements to the list
  igb: Fix queue selection on MAC filters on i210
  igb: Enable the hardware traffic class feature bit for igb models
  igb: Add support for MAC address filters specifying source addresses
  igb: Add support for enabling queue steering in filters
  igb: Allow filters to be added for the local MAC address
  igb: Enable nfc filters to specify MAC addresses
  igb: Add MAC address support for ethtool nftuple filters
  igb: Add the skeletons for tc-flower offloading
  igb: Add support for adding offloaded clsflower filters

 .../net/ethernet/intel/igb/e1000_defines.h|   2 +
 drivers/net/ethernet/intel/igb/igb.h  |  13 +
 drivers/net/ethernet/intel/igb/igb_ethtool.c  |  73 +++-
 drivers/net/ethernet/intel/igb/igb_main.c | 370 +-
 4 files changed, 441 insertions(+), 17 deletions(-)

--
2.17.0


[next-queue PATCH v7 03/10] igb: Enable the hardware traffic class feature bit for igb models

2018-04-10 Thread Vinicius Costa Gomes
This will allow functionality depending on the hardware being traffic
class aware to work. In particular the tc-flower offloading checks
verifies that this bit is set.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 0a79fef3c4fb..976898d39d6e 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2807,6 +2807,9 @@ static int igb_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
if (hw->mac.type >= e1000_82576)
netdev->features |= NETIF_F_SCTP_CRC;
 
+   if (hw->mac.type >= e1000_i350)
+   netdev->features |= NETIF_F_HW_TC;
+
 #define IGB_GSO_PARTIAL_FEATURES (NETIF_F_GSO_GRE | \
  NETIF_F_GSO_GRE_CSUM | \
  NETIF_F_GSO_IPXIP4 | \
-- 
2.17.0



[next-queue PATCH v7 07/10] igb: Enable nfc filters to specify MAC addresses

2018-04-10 Thread Vinicius Costa Gomes
This allows igb_add_filter()/igb_erase_filter() to work on filters
that include MAC addresses (both source and destination).

For now, this only exposes the functionality, the next commit glues
ethtool into this. Later in this series, these APIs are used to allow
offloading of cls_flower filters.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h |  4 +++
 drivers/net/ethernet/intel/igb/igb_ethtool.c | 28 
 2 files changed, 32 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index f48ba090fd6a..b9b965921e9f 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -442,6 +442,8 @@ struct hwmon_buff {
 enum igb_filter_match_flags {
IGB_FILTER_FLAG_ETHER_TYPE = 0x1,
IGB_FILTER_FLAG_VLAN_TCI   = 0x2,
+   IGB_FILTER_FLAG_SRC_MAC_ADDR   = 0x4,
+   IGB_FILTER_FLAG_DST_MAC_ADDR   = 0x8,
 };
 
 #define IGB_MAX_RXNFC_FILTERS 16
@@ -456,6 +458,8 @@ struct igb_nfc_input {
u8 match_flags;
__be16 etype;
__be16 vlan_tci;
+   u8 src_addr[ETH_ALEN];
+   u8 dst_addr[ETH_ALEN];
 };
 
 struct igb_nfc_filter {
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c 
b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 5975d432836f..31b2960a7869 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -2776,6 +2776,25 @@ int igb_add_filter(struct igb_adapter *adapter, struct 
igb_nfc_filter *input)
return err;
}
 
+   if (input->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR) {
+   err = igb_add_mac_steering_filter(adapter,
+ input->filter.dst_addr,
+ input->action, 0);
+   err = min_t(int, err, 0);
+   if (err)
+   return err;
+   }
+
+   if (input->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR) {
+   err = igb_add_mac_steering_filter(adapter,
+ input->filter.src_addr,
+ input->action,
+ IGB_MAC_STATE_SRC_ADDR);
+   err = min_t(int, err, 0);
+   if (err)
+   return err;
+   }
+
if (input->filter.match_flags & IGB_FILTER_FLAG_VLAN_TCI)
err = igb_rxnfc_write_vlan_prio_filter(adapter, input);
 
@@ -2824,6 +2843,15 @@ int igb_erase_filter(struct igb_adapter *adapter, struct 
igb_nfc_filter *input)
igb_clear_vlan_prio_filter(adapter,
   ntohs(input->filter.vlan_tci));
 
+   if (input->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR)
+   igb_del_mac_steering_filter(adapter, input->filter.src_addr,
+   input->action,
+   IGB_MAC_STATE_SRC_ADDR);
+
+   if (input->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR)
+   igb_del_mac_steering_filter(adapter, input->filter.dst_addr,
+   input->action, 0);
+
return 0;
 }
 
-- 
2.17.0



[next-queue PATCH v7 09/10] igb: Add the skeletons for tc-flower offloading

2018-04-10 Thread Vinicius Costa Gomes
This adds basic functions needed to implement offloading for filters
created by tc-flower.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 66 +++
 1 file changed, 66 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 1b6fad88107a..e3f33fb8064e 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2498,6 +2499,69 @@ static int igb_offload_cbs(struct igb_adapter *adapter,
return 0;
 }
 
+static int igb_configure_clsflower(struct igb_adapter *adapter,
+  struct tc_cls_flower_offload *cls_flower)
+{
+   return -EOPNOTSUPP;
+}
+
+static int igb_delete_clsflower(struct igb_adapter *adapter,
+   struct tc_cls_flower_offload *cls_flower)
+{
+   return -EOPNOTSUPP;
+}
+
+static int igb_setup_tc_cls_flower(struct igb_adapter *adapter,
+  struct tc_cls_flower_offload *cls_flower)
+{
+   switch (cls_flower->command) {
+   case TC_CLSFLOWER_REPLACE:
+   return igb_configure_clsflower(adapter, cls_flower);
+   case TC_CLSFLOWER_DESTROY:
+   return igb_delete_clsflower(adapter, cls_flower);
+   case TC_CLSFLOWER_STATS:
+   return -EOPNOTSUPP;
+   default:
+   return -EINVAL;
+   }
+}
+
+static int igb_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+void *cb_priv)
+{
+   struct igb_adapter *adapter = cb_priv;
+
+   if (!tc_cls_can_offload_and_chain0(adapter->netdev, type_data))
+   return -EOPNOTSUPP;
+
+   switch (type) {
+   case TC_SETUP_CLSFLOWER:
+   return igb_setup_tc_cls_flower(adapter, type_data);
+
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
+static int igb_setup_tc_block(struct igb_adapter *adapter,
+ struct tc_block_offload *f)
+{
+   if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+   return -EOPNOTSUPP;
+
+   switch (f->command) {
+   case TC_BLOCK_BIND:
+   return tcf_block_cb_register(f->block, igb_setup_tc_block_cb,
+adapter, adapter);
+   case TC_BLOCK_UNBIND:
+   tcf_block_cb_unregister(f->block, igb_setup_tc_block_cb,
+   adapter);
+   return 0;
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
 static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type,
void *type_data)
 {
@@ -2506,6 +2570,8 @@ static int igb_setup_tc(struct net_device *dev, enum 
tc_setup_type type,
switch (type) {
case TC_SETUP_QDISC_CBS:
return igb_offload_cbs(adapter, type_data);
+   case TC_SETUP_BLOCK:
+   return igb_setup_tc_block(adapter, type_data);
 
default:
return -EOPNOTSUPP;
-- 
2.17.0



[next-queue PATCH v7 10/10] igb: Add support for adding offloaded clsflower filters

2018-04-10 Thread Vinicius Costa Gomes
This allows filters added by tc-flower and specifying MAC addresses,
Ethernet types, and the VLAN priority field, to be offloaded to the
controller.

This reuses most of the infrastructure used by ethtool, but clsflower
filters are kept in a separated list, so they are invisible to
ethtool.

To setup clsflower offloading:

$ tc qdisc replace dev eth0 handle 100: parent root mqprio \
   num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
   queues 1@0 1@1 2@2 hw 0
(clsflower offloading depends on the netword driver to be configured
with multiple traffic classes, we use mqprio's 'num_tc' parameter to
set it to 3)

$ tc qdisc add dev eth0 ingress

Examples of filters:

$ tc filter add dev eth0 parent : flower \
dst_mac aa:aa:aa:aa:aa:aa \
hw_tc 2 skip_sw
(just a simple filter filtering for the destination MAC address and
steering that traffic to queue 2)

$ tc filter add dev enp2s0 parent : proto 0x22f0 flower \
src_mac cc:cc:cc:cc:cc:cc \
hw_tc 1 skip_sw
(as the i210 doesn't support steering traffic based on the source
address alone, we need to use another steering traffic, in this case
we are using the ethernet type (0x22f0) to steer traffic to queue 1)

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h  |   2 +
 drivers/net/ethernet/intel/igb/igb_main.c | 188 +-
 2 files changed, 188 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index b9b965921e9f..a413284fada6 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -465,6 +465,7 @@ struct igb_nfc_input {
 struct igb_nfc_filter {
struct hlist_node nfc_node;
struct igb_nfc_input filter;
+   unsigned long cookie;
u16 etype_reg_index;
u16 sw_idx;
u16 action;
@@ -604,6 +605,7 @@ struct igb_adapter {
 
/* RX network flow classification support */
struct hlist_head nfc_filter_list;
+   struct hlist_head cls_flower_list;
unsigned int nfc_filter_count;
/* lock for RX network flow classification filter */
spinlock_t nfc_lock;
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index e3f33fb8064e..3c2e68dd0902 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2499,16 +2499,197 @@ static int igb_offload_cbs(struct igb_adapter *adapter,
return 0;
 }
 
+#define ETHER_TYPE_FULL_MASK ((__force __be16)~0)
+#define VLAN_PRIO_FULL_MASK (0x07)
+
+static int igb_parse_cls_flower(struct igb_adapter *adapter,
+   struct tc_cls_flower_offload *f,
+   int traffic_class,
+   struct igb_nfc_filter *input)
+{
+   struct netlink_ext_ack *extack = f->common.extack;
+
+   if (f->dissector->used_keys &
+   ~(BIT(FLOW_DISSECTOR_KEY_BASIC) |
+ BIT(FLOW_DISSECTOR_KEY_CONTROL) |
+ BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) |
+ BIT(FLOW_DISSECTOR_KEY_VLAN))) {
+   NL_SET_ERR_MSG_MOD(extack,
+  "Unsupported key used, only BASIC, CONTROL, 
ETH_ADDRS and VLAN are supported");
+   return -EOPNOTSUPP;
+   }
+
+   if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
+   struct flow_dissector_key_eth_addrs *key, *mask;
+
+   key = skb_flow_dissector_target(f->dissector,
+   FLOW_DISSECTOR_KEY_ETH_ADDRS,
+   f->key);
+   mask = skb_flow_dissector_target(f->dissector,
+FLOW_DISSECTOR_KEY_ETH_ADDRS,
+f->mask);
+
+   if (!is_zero_ether_addr(mask->dst)) {
+   if (!is_broadcast_ether_addr(mask->dst)) {
+   NL_SET_ERR_MSG_MOD(extack, "Only full masks are 
supported for destination MAC address");
+   return -EINVAL;
+   }
+
+   input->filter.match_flags |=
+   IGB_FILTER_FLAG_DST_MAC_ADDR;
+   ether_addr_copy(input->filter.dst_addr, key->dst);
+   }
+
+   if (!is_zero_ether_addr(mask->src)) {
+   if (!is_broadcast_ether_addr(mask->src)) {
+   NL_SET_ERR_MSG_MOD(extack, "Only full masks are 
supported for source MAC address");
+   return -EINVAL;
+   }
+
+   input->filter.match_flags |=
+  

[next-queue PATCH v7 01/10] igb: Fix not adding filter elements to the list

2018-04-10 Thread Vinicius Costa Gomes
Because the order of the parameters passes to 'hlist_add_behind()' was
inverted, the 'parent' node was added "behind" the 'input', as input
is not in the list, this causes the 'input' node to be lost.

Fixes: 0e71def25281 ("igb: add support of RX network flow classification")
Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_ethtool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c 
b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index e77ba0d5866d..5975d432836f 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -2865,7 +2865,7 @@ static int igb_update_ethtool_nfc_entry(struct 
igb_adapter *adapter,
 
/* add filter to the list */
if (parent)
-   hlist_add_behind(>nfc_node, >nfc_node);
+   hlist_add_behind(>nfc_node, >nfc_node);
else
hlist_add_head(>nfc_node, >nfc_filter_list);
 
-- 
2.17.0



[next-queue PATCH v7 08/10] igb: Add MAC address support for ethtool nftuple filters

2018-04-10 Thread Vinicius Costa Gomes
This adds the capability of configuring the queue steering of arriving
packets based on their source and destination MAC addresses.

Source address steering (i.e. driving traffic to a specific queue),
for the i210, does not work, but filtering does (i.e. accepting
traffic based on the source address). So, trying to add a filter
specifying only a source address will be an error.

In practical terms this adds support for the following use cases,
characterized by these examples:

$ ethtool -N eth0 flow-type ether dst aa:aa:aa:aa:aa:aa action 0
(this will direct packets with destination address "aa:aa:aa:aa:aa:aa"
to the RX queue 0)

$ ethtool -N eth0 flow-type ether src 44:44:44:44:44:44 \
  proto 0x22f0 action 3
(this will direct packets with source address "44:44:44:44:44:44" and
ethertype 0x22f0 to the RX queue 3)

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_ethtool.c | 43 ++--
 1 file changed, 39 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c 
b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 31b2960a7869..6697c273ab59 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -2495,6 +2495,23 @@ static int igb_get_ethtool_nfc_entry(struct igb_adapter 
*adapter,
fsp->h_ext.vlan_tci = rule->filter.vlan_tci;
fsp->m_ext.vlan_tci = htons(VLAN_PRIO_MASK);
}
+   if (rule->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR) {
+   ether_addr_copy(fsp->h_u.ether_spec.h_dest,
+   rule->filter.dst_addr);
+   /* As we only support matching by the full
+* mask, return the mask to userspace
+*/
+   eth_broadcast_addr(fsp->m_u.ether_spec.h_dest);
+   }
+   if (rule->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR) {
+   ether_addr_copy(fsp->h_u.ether_spec.h_source,
+   rule->filter.src_addr);
+   /* As we only support matching by the full
+* mask, return the mask to userspace
+*/
+   eth_broadcast_addr(fsp->m_u.ether_spec.h_source);
+   }
+
return 0;
}
return -EINVAL;
@@ -2768,8 +2785,16 @@ static int igb_rxnfc_write_vlan_prio_filter(struct 
igb_adapter *adapter,
 
 int igb_add_filter(struct igb_adapter *adapter, struct igb_nfc_filter *input)
 {
+   struct e1000_hw *hw = >hw;
int err = -EINVAL;
 
+   if (hw->mac.type == e1000_i210 &&
+   !(input->filter.match_flags & ~IGB_FILTER_FLAG_SRC_MAC_ADDR)) {
+   dev_err(>pdev->dev,
+   "i210 doesn't support flow classification rules 
specifying only source addresses.\n");
+   return -EOPNOTSUPP;
+   }
+
if (input->filter.match_flags & IGB_FILTER_FLAG_ETHER_TYPE) {
err = igb_rxnfc_write_etype_filter(adapter, input);
if (err)
@@ -2933,10 +2958,6 @@ static int igb_add_ethtool_nfc_entry(struct igb_adapter 
*adapter,
if ((fsp->flow_type & ~FLOW_EXT) != ETHER_FLOW)
return -EINVAL;
 
-   if (fsp->m_u.ether_spec.h_proto != ETHER_TYPE_FULL_MASK &&
-   fsp->m_ext.vlan_tci != htons(VLAN_PRIO_MASK))
-   return -EINVAL;
-
input = kzalloc(sizeof(*input), GFP_KERNEL);
if (!input)
return -ENOMEM;
@@ -2946,6 +2967,20 @@ static int igb_add_ethtool_nfc_entry(struct igb_adapter 
*adapter,
input->filter.match_flags = IGB_FILTER_FLAG_ETHER_TYPE;
}
 
+   /* Only support matching addresses by the full mask */
+   if (is_broadcast_ether_addr(fsp->m_u.ether_spec.h_source)) {
+   input->filter.match_flags |= IGB_FILTER_FLAG_SRC_MAC_ADDR;
+   ether_addr_copy(input->filter.src_addr,
+   fsp->h_u.ether_spec.h_source);
+   }
+
+   /* Only support matching addresses by the full mask */
+   if (is_broadcast_ether_addr(fsp->m_u.ether_spec.h_dest)) {
+   input->filter.match_flags |= IGB_FILTER_FLAG_DST_MAC_ADDR;
+   ether_addr_copy(input->filter.dst_addr,
+   fsp->h_u.ether_spec.h_dest);
+   }
+
if ((fsp->flow_type & FLOW_EXT) && fsp->m_ext.vlan_tci) {
if (fsp->m_ext.vlan_tci != htons(VLAN_PRIO_MASK)) {
err = -EINVAL;
-- 
2.17.0



RE: [Intel-wired-lan] [next-queue PATCH v6 10/10] igb: Add support for adding offloaded clsflower filters

2018-04-09 Thread Vinicius Costa Gomes
Hi,

"Brown, Aaron F" <aaron.f.br...@intel.com> writes:

>> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On
>> Behalf Of Vinicius Costa Gomes
>> Sent: Thursday, March 29, 2018 2:08 PM
>> To: intel-wired-...@lists.osuosl.org
>> Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus > palen...@intel.com>
>> Subject: [Intel-wired-lan] [next-queue PATCH v6 10/10] igb: Add support for
>> adding offloaded clsflower filters
>> 
>> This allows filters added by tc-flower and specifying MAC addresses,
>> Ethernet types, and the VLAN priority field, to be offloaded to the
>> controller.
>
> Can I get a brief explanation for enabling this?  I'm currently happy
> with this patch series from a regression perspective, but am
> personally a bit, umm, challenged with tc in general but would like to
> run it through the paces a bit.  If it can be done in a one or two
> liner I think it would be a good addition to the patch description.


Of course, my bad for not adding those examples since the begining.



Cheers,
--
Vinicius


RE: [Intel-wired-lan] [next-queue PATCH v6 08/10] igb: Add MAC address support for ethtool nftuple filters

2018-04-09 Thread Vinicius Costa Gomes
Hi,

"Brown, Aaron F"  writes:

[...]

>
>> 
>> I added that note in the hope that someone else would have an stronger
>> opinion about what to do.
>
> I don't have a strong opinion beyond my preference for an ideal world
> where everything works :)  If the part simply cannot filter on the src
> address as a whole without the protocol I would ideally prefer an
> attempt in ethtool to set the filter on src address as a whole to
> return an error WHILE still allowing the filter to be set on an
> ethertype when the proto keyword is issued.  If ethtool does not allow
> that fine grain of control then I think the way it is now is good, I'd
> rather have the annoyance of being able to set a filter that does
> nothing then not be able to set the more specific filter at all.

We are agreed. The next version of this series implements just that:
specifying the src address alone is an error, but if the classifier has
a src address and any other filter, it works.


Cheers,
--
Vinicius


RE: [Intel-wired-lan] [next-queue PATCH v6 08/10] igb: Add MAC address support for ethtool nftuple filters

2018-04-05 Thread Vinicius Costa Gomes
Hi,

"Brown, Aaron F" <aaron.f.br...@intel.com> writes:

>> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On
>> Behalf Of Vinicius Costa Gomes
>> Sent: Thursday, March 29, 2018 2:08 PM
>> To: intel-wired-...@lists.osuosl.org
>> Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus > palen...@intel.com>
>> Subject: [Intel-wired-lan] [next-queue PATCH v6 08/10] igb: Add MAC
>> address support for ethtool nftuple filters
>> 
>> This adds the capability of configuring the queue steering of arriving
>> packets based on their source and destination MAC addresses.
>> 
>> In practical terms this adds support for the following use cases,
>> characterized by these examples:
>> 
>> $ ethtool -N eth0 flow-type ether dst aa:aa:aa:aa:aa:aa action 0
>> (this will direct packets with destination address "aa:aa:aa:aa:aa:aa"
>> to the RX queue 0)
>
> This is now working for me, testing with the dst MAC being the MAC on the 
> i210.  I set the filter and all the traffic to the destination MAC address 
> gets routed to the chosen RX queue.
>
>> $ ethtool -N eth0 flow-type ether src 44:44:44:44:44:44 action 3
>> (this will direct packets with source address "44:44:44:44:44:44" to
>> the RX queue 3)
>
> However, I am still not getting the raw ethernet source filter to
> work.  Even back to back with no other system to "confuse" the stream,
> I set the filter so the source MAC is the same as the MAC on the link
> partner, send traffic and the traffic bounces around the queues as if
> the filter is not set.

It seems there is at least a documentation issue in the i210 datasheet,
steering (placing traffic into a specific queue) by source address
doesn't work, filtering (accepting the traffic based on some rule) does
work. I pointed this out in the cover letter of v5 as a known issue, but
forgot to repeat it for v6, sorry about the confusion.

But only the filtering part is useful, I think, it enables cases like
this:

$ ethtool -N enp2s0 flow-type ether src 68:05:ca:4a:c9:73 proto 0x22f0 action 3

I added that note in the hope that someone else would have an stronger
opinion about what to do.

Anyway, my plan for now will be to document this better and turn the
case that only the source address is specified into an error.

>
>> 
>> Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
>> ---
>>  drivers/net/ethernet/intel/igb/igb_ethtool.c | 35
>> 
>>  1 file changed, 31 insertions(+), 4 deletions(-)


Cheers,
--
Vinicius


[next-queue PATCH] igb: Fix the transmission mode of queue 0 for Qav mode

2018-03-30 Thread Vinicius Costa Gomes
When Qav mode is enabled, queue 0 should be kept on Stream Reservation
mode. From the i210 datasheet, section 8.12.19:

"Note: Queue0 QueueMode must be set to 1b when TransmitMode is set to
Qav." ("QueueMode 1b" represents the Stream Reservation mode)

The solution is to give queue 0 the all the credits it might need, so
it has priority over queue 1.

A situation where this can happen is when cbs is "installed" only on
queue 1, leaving queue 0 alone. For example:

$ tc qdisc replace dev enp2s0 handle 100: parent root mqprio num_tc 3 \
   map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

$ tc qdisc replace dev enp2s0 parent 100:2 cbs locredit -1470 \
   hicredit 30 sendslope -98 idleslope 20000 offload 1

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index c1c0bc30a16d..cce7ada89255 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -1700,7 +1700,22 @@ static void igb_configure_cbs(struct igb_adapter 
*adapter, int queue,
WARN_ON(hw->mac.type != e1000_i210);
WARN_ON(queue < 0 || queue > 1);
 
-   if (enable) {
+   if (enable || queue == 0) {
+   /* i210 does not allow the queue 0 to be in the Strict
+* Priority mode while the Qav mode is enabled, so,
+* instead of disabling strict priority mode, we give
+* queue 0 the maximum of credits possible.
+*
+* See section 8.12.19 of the i210 datasheet, "Note:
+* Queue0 QueueMode must be set to 1b when
+* TransmitMode is set to Qav."
+*/
+   if (queue == 0 && !enable) {
+   /* max "linkspeed" idleslope in kbps */
+   idleslope = 100;
+   hicredit = ETH_FRAME_LEN;
+   }
+
set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_HIGH);
set_queue_mode(hw, queue, QUEUE_MODE_STREAM_RESERVATION);
 
-- 
2.16.3



[next-queue PATCH v6 02/10] igb: Fix queue selection on MAC filters on i210

2018-03-29 Thread Vinicius Costa Gomes
On the RAH registers there are semantic differences on the meaning of
the "queue" parameter for traffic steering depending on the controller
model: there is the 82575 meaning, which "queue" means a RX Hardware
Queue, and the i350 meaning, where it is a reception pool.

The previous behaviour was having no effect for i210 based controllers
because the QSEL bit of the RAH register wasn't being set.

This patch separates the condition in discrete cases, so the different
handling is clearer.

Fixes: 83c21335c876 ("igb: improve MAC filter handling")
Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index c1c0bc30a16d..0a79fef3c4fb 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8748,12 +8748,17 @@ static void igb_rar_set_index(struct igb_adapter 
*adapter, u32 index)
if (is_valid_ether_addr(addr))
rar_high |= E1000_RAH_AV;
 
-   if (hw->mac.type == e1000_82575)
+   switch (hw->mac.type) {
+   case e1000_82575:
+   case e1000_i210:
rar_high |= E1000_RAH_POOL_1 *
-   adapter->mac_table[index].queue;
-   else
+ adapter->mac_table[index].queue;
+   break;
+   default:
rar_high |= E1000_RAH_POOL_1 <<
-   adapter->mac_table[index].queue;
+   adapter->mac_table[index].queue;
+   break;
+   }
}
 
wr32(E1000_RAL(index), rar_low);
-- 
2.16.3



[next-queue PATCH v6 03/10] igb: Enable the hardware traffic class feature bit for igb models

2018-03-29 Thread Vinicius Costa Gomes
This will allow functionality depending on the hardware being traffic
class aware to work. In particular the tc-flower offloading checks
verifies that this bit is set.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 0a79fef3c4fb..976898d39d6e 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2807,6 +2807,9 @@ static int igb_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
if (hw->mac.type >= e1000_82576)
netdev->features |= NETIF_F_SCTP_CRC;
 
+   if (hw->mac.type >= e1000_i350)
+   netdev->features |= NETIF_F_HW_TC;
+
 #define IGB_GSO_PARTIAL_FEATURES (NETIF_F_GSO_GRE | \
  NETIF_F_GSO_GRE_CSUM | \
  NETIF_F_GSO_IPXIP4 | \
-- 
2.16.3



[next-queue PATCH v6 09/10] igb: Add the skeletons for tc-flower offloading

2018-03-29 Thread Vinicius Costa Gomes
This adds basic functions needed to implement offloading for filters
created by tc-flower.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 66 +++
 1 file changed, 66 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 1b6fad88107a..e3f33fb8064e 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2498,6 +2499,69 @@ static int igb_offload_cbs(struct igb_adapter *adapter,
return 0;
 }
 
+static int igb_configure_clsflower(struct igb_adapter *adapter,
+  struct tc_cls_flower_offload *cls_flower)
+{
+   return -EOPNOTSUPP;
+}
+
+static int igb_delete_clsflower(struct igb_adapter *adapter,
+   struct tc_cls_flower_offload *cls_flower)
+{
+   return -EOPNOTSUPP;
+}
+
+static int igb_setup_tc_cls_flower(struct igb_adapter *adapter,
+  struct tc_cls_flower_offload *cls_flower)
+{
+   switch (cls_flower->command) {
+   case TC_CLSFLOWER_REPLACE:
+   return igb_configure_clsflower(adapter, cls_flower);
+   case TC_CLSFLOWER_DESTROY:
+   return igb_delete_clsflower(adapter, cls_flower);
+   case TC_CLSFLOWER_STATS:
+   return -EOPNOTSUPP;
+   default:
+   return -EINVAL;
+   }
+}
+
+static int igb_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+void *cb_priv)
+{
+   struct igb_adapter *adapter = cb_priv;
+
+   if (!tc_cls_can_offload_and_chain0(adapter->netdev, type_data))
+   return -EOPNOTSUPP;
+
+   switch (type) {
+   case TC_SETUP_CLSFLOWER:
+   return igb_setup_tc_cls_flower(adapter, type_data);
+
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
+static int igb_setup_tc_block(struct igb_adapter *adapter,
+ struct tc_block_offload *f)
+{
+   if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+   return -EOPNOTSUPP;
+
+   switch (f->command) {
+   case TC_BLOCK_BIND:
+   return tcf_block_cb_register(f->block, igb_setup_tc_block_cb,
+adapter, adapter);
+   case TC_BLOCK_UNBIND:
+   tcf_block_cb_unregister(f->block, igb_setup_tc_block_cb,
+   adapter);
+   return 0;
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
 static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type,
void *type_data)
 {
@@ -2506,6 +2570,8 @@ static int igb_setup_tc(struct net_device *dev, enum 
tc_setup_type type,
switch (type) {
case TC_SETUP_QDISC_CBS:
return igb_offload_cbs(adapter, type_data);
+   case TC_SETUP_BLOCK:
+   return igb_setup_tc_block(adapter, type_data);
 
default:
return -EOPNOTSUPP;
-- 
2.16.3



[next-queue PATCH v6 04/10] igb: Add support for MAC address filters specifying source addresses

2018-03-29 Thread Vinicius Costa Gomes
Makes it possible to direct packets to queues based on their source
address. Documents the expected usage of the 'flags' parameter.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/e1000_defines.h |  1 +
 drivers/net/ethernet/intel/igb/igb.h   |  1 +
 drivers/net/ethernet/intel/igb/igb_main.c  | 40 ++
 3 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h 
b/drivers/net/ethernet/intel/igb/e1000_defines.h
index 98534f765e0e..5417edbe3125 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -491,6 +491,7 @@
  * manageability enabled, allowing us room for 15 multicast addresses.
  */
 #define E1000_RAH_AV  0x8000/* Receive descriptor valid */
+#define E1000_RAH_ASEL_SRC_ADDR 0x0001
 #define E1000_RAL_MAC_ADDR_LEN 4
 #define E1000_RAH_MAC_ADDR_LEN 2
 #define E1000_RAH_POOL_MASK 0x03FC
diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index 8dbc399b345e..f3ecda46f287 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -474,6 +474,7 @@ struct igb_mac_addr {
 
 #define IGB_MAC_STATE_DEFAULT  0x1
 #define IGB_MAC_STATE_IN_USE   0x2
+#define IGB_MAC_STATE_SRC_ADDR  0x4
 
 /* board specific private data structure */
 struct igb_adapter {
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 976898d39d6e..2033ec3c9b27 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6844,8 +6844,14 @@ static void igb_set_default_mac_filter(struct 
igb_adapter *adapter)
igb_rar_set_index(adapter, 0);
 }
 
-static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr,
- const u8 queue)
+/* Add a MAC filter for 'addr' directing matching traffic to 'queue',
+ * 'flags' is used to indicate what kind of match is made, match is by
+ * default for the destination address, if matching by source address
+ * is desired the flag IGB_MAC_STATE_SRC_ADDR can be used.
+ */
+static int igb_add_mac_filter_flags(struct igb_adapter *adapter,
+   const u8 *addr, const u8 queue,
+   const u8 flags)
 {
struct e1000_hw *hw = >hw;
int rar_entries = hw->mac.rar_entry_count -
@@ -6865,7 +6871,7 @@ static int igb_add_mac_filter(struct igb_adapter 
*adapter, const u8 *addr,
 
ether_addr_copy(adapter->mac_table[i].addr, addr);
adapter->mac_table[i].queue = queue;
-   adapter->mac_table[i].state |= IGB_MAC_STATE_IN_USE;
+   adapter->mac_table[i].state |= IGB_MAC_STATE_IN_USE | flags;
 
igb_rar_set_index(adapter, i);
return i;
@@ -6874,8 +6880,21 @@ static int igb_add_mac_filter(struct igb_adapter 
*adapter, const u8 *addr,
return -ENOSPC;
 }
 
-static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *addr,
+static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr,
  const u8 queue)
+{
+   return igb_add_mac_filter_flags(adapter, addr, queue, 0);
+}
+
+/* Remove a MAC filter for 'addr' directing matching traffic to
+ * 'queue', 'flags' is used to indicate what kind of match need to be
+ * removed, match is by default for the destination address, if
+ * matching by source address is to be removed the flag
+ * IGB_MAC_STATE_SRC_ADDR can be used.
+ */
+static int igb_del_mac_filter_flags(struct igb_adapter *adapter,
+   const u8 *addr, const u8 queue,
+   const u8 flags)
 {
struct e1000_hw *hw = >hw;
int rar_entries = hw->mac.rar_entry_count -
@@ -6892,12 +6911,14 @@ static int igb_del_mac_filter(struct igb_adapter 
*adapter, const u8 *addr,
for (i = 0; i < rar_entries; i++) {
if (!(adapter->mac_table[i].state & IGB_MAC_STATE_IN_USE))
continue;
+   if ((adapter->mac_table[i].state & flags) != flags)
+   continue;
if (adapter->mac_table[i].queue != queue)
continue;
if (!ether_addr_equal(adapter->mac_table[i].addr, addr))
continue;
 
-   adapter->mac_table[i].state &= ~IGB_MAC_STATE_IN_USE;
+   adapter->mac_table[i].state = 0;
memset(adapter->mac_table[i].addr, 0, ETH_ALEN);
adapter->mac_table[i].queue = 0;
 
@@ -6908,6 +6929,12 @@ static int igb_del_mac_filter(struct igb_adapter 
*adapter, const u8 *addr,
return -ENOENT;
 }
 
+static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *

[next-queue PATCH v6 06/10] igb: Allow filters to be added for the local MAC address

2018-03-29 Thread Vinicius Costa Gomes
Users expect that when adding a steering filter for the local MAC
address, that all the traffic directed to that address will go to some
queue.

Currently, it's not possible to configure entries in the "in use"
state, which is the normal state of the local MAC address entry (it is
the default), this patch allows to override the steering configuration
of "in use" entries, if the filter to be added match the address and
address type (source or destination) of an existing entry.

There is a bit of a special handling for entries referring to the
local MAC address, when they are removed, only the steering
configuration is reset.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 40 +++
 1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index e3da35cab786..1b6fad88107a 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6844,6 +6844,27 @@ static void igb_set_default_mac_filter(struct 
igb_adapter *adapter)
igb_rar_set_index(adapter, 0);
 }
 
+/* If the filter to be added and an already existing filter express
+ * the same address and address type, it should be possible to only
+ * override the other configurations, for example the queue to steer
+ * traffic.
+ */
+static bool igb_mac_entry_can_be_used(const struct igb_mac_addr *entry,
+ const u8 *addr, const u8 flags)
+{
+   if (!(entry->state & IGB_MAC_STATE_IN_USE))
+   return true;
+
+   if ((entry->state & IGB_MAC_STATE_SRC_ADDR) !=
+   (flags & IGB_MAC_STATE_SRC_ADDR))
+   return false;
+
+   if (!ether_addr_equal(addr, entry->addr))
+   return false;
+
+   return true;
+}
+
 /* Add a MAC filter for 'addr' directing matching traffic to 'queue',
  * 'flags' is used to indicate what kind of match is made, match is by
  * default for the destination address, if matching by source address
@@ -6866,7 +6887,8 @@ static int igb_add_mac_filter_flags(struct igb_adapter 
*adapter,
 * addresses.
 */
for (i = 0; i < rar_entries; i++) {
-   if (adapter->mac_table[i].state & IGB_MAC_STATE_IN_USE)
+   if (!igb_mac_entry_can_be_used(>mac_table[i],
+  addr, flags))
continue;
 
ether_addr_copy(adapter->mac_table[i].addr, addr);
@@ -6918,9 +6940,19 @@ static int igb_del_mac_filter_flags(struct igb_adapter 
*adapter,
if (!ether_addr_equal(adapter->mac_table[i].addr, addr))
continue;
 
-   adapter->mac_table[i].state = 0;
-   memset(adapter->mac_table[i].addr, 0, ETH_ALEN);
-   adapter->mac_table[i].queue = 0;
+   /* When a filter for the default address is "deleted",
+* we return it to its initial configuration
+*/
+   if (adapter->mac_table[i].state & IGB_MAC_STATE_DEFAULT) {
+   adapter->mac_table[i].state =
+   IGB_MAC_STATE_DEFAULT | IGB_MAC_STATE_IN_USE;
+   adapter->mac_table[i].queue =
+   adapter->vfs_allocated_count;
+   } else {
+   adapter->mac_table[i].state = 0;
+   adapter->mac_table[i].queue = 0;
+   memset(adapter->mac_table[i].addr, 0, ETH_ALEN);
+   }
 
igb_rar_set_index(adapter, i);
return 0;
-- 
2.16.3



[next-queue PATCH v6 00/10] igb: offloading of receive filters

2018-03-29 Thread Vinicius Costa Gomes

Hi,

Changes from v5:
 - Filters can now be added for local MAC addresses, when removed,
   they return to their initial configuration (thanks for the testing
   Aaron);

Changes from v4:
 - Added a new bit to the MAC address filters internal
 representation to mean that some filters are steering filters (i.e.
 they direct traffic to queues);
 - And, this is only supported in i210;

Changes from v3:
 - Addressed review comments from Aaron F. Brown and
   Jakub Kicinski;

Changes from v2:
 - Addressed review comments from Jakub Kicinski, mostly about coding
   style adjustments and more consistent error reporting;

Changes from v1:
 - Addressed review comments from Alexander Duyck and Florian
   Fainelli;
 - Adding and removing cls_flower filters are now proposed in the same
   patch;
 - cls_flower filters are kept in a separated list from "ethtool"
   filters (so that section of the original cover letter is no longer
   valid);
 - The patch adding support for ethtool filters is now independent from
   the rest of the series;

Original cover letter:

This series enables some ethtool and tc-flower filters to be offloaded
to igb-based network controllers. This is useful when the system
configurator want to steer kinds of traffic to a specific hardware
queue.

The first two commits are bug fixes.

The basis of this series is to export the internal API used to
configure address filters, so they can be used by ethtool, and
extending the functionality so an source address can be handled.

Then, we enable the tc-flower offloading implementation to re-use the
same infrastructure as ethtool, and storing them in the per-adapter
"nfc" (Network Filter Config?) list. But for consistency, for
destructive access they are separated, i.e. an filter added by
tc-flower can only be removed by tc-flower, but ethtool can read them
all.

Only support for VLAN Prio, Source and Destination MAC Address, and
Ethertype is enabled for now.

Open question:
  - igb is initialized with the number of traffic classes as 1, if we
  want to use multiple traffic classes we need to increase this value,
  the only way I could find is to use mqprio (for example). Should igb
  be initialized with, say, the number of queues as its "num_tc"?


Vinicius Costa Gomes (10):
  igb: Fix not adding filter elements to the list
  igb: Fix queue selection on MAC filters on i210
  igb: Enable the hardware traffic class feature bit for igb models
  igb: Add support for MAC address filters specifying source addresses
  igb: Add support for enabling queue steering in filters
  igb: Allow filters to be added for the local MAC address
  igb: Enable nfc filters to specify MAC addresses
  igb: Add MAC address support for ethtool nftuple filters
  igb: Add the skeletons for tc-flower offloading
  igb: Add support for adding offloaded clsflower filters

 drivers/net/ethernet/intel/igb/e1000_defines.h |   2 +
 drivers/net/ethernet/intel/igb/igb.h   |  13 +
 drivers/net/ethernet/intel/igb/igb_ethtool.c   |  65 -
 drivers/net/ethernet/intel/igb/igb_main.c  | 370 -
 4 files changed, 433 insertions(+), 17 deletions(-)

--
2.16.3


[next-queue PATCH v6 10/10] igb: Add support for adding offloaded clsflower filters

2018-03-29 Thread Vinicius Costa Gomes
This allows filters added by tc-flower and specifying MAC addresses,
Ethernet types, and the VLAN priority field, to be offloaded to the
controller.

This reuses most of the infrastructure used by ethtool, but clsflower
filters are kept in a separated list, so they are invisible to
ethtool.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h  |   2 +
 drivers/net/ethernet/intel/igb/igb_main.c | 188 +-
 2 files changed, 188 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index b9b965921e9f..a413284fada6 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -465,6 +465,7 @@ struct igb_nfc_input {
 struct igb_nfc_filter {
struct hlist_node nfc_node;
struct igb_nfc_input filter;
+   unsigned long cookie;
u16 etype_reg_index;
u16 sw_idx;
u16 action;
@@ -604,6 +605,7 @@ struct igb_adapter {
 
/* RX network flow classification support */
struct hlist_head nfc_filter_list;
+   struct hlist_head cls_flower_list;
unsigned int nfc_filter_count;
/* lock for RX network flow classification filter */
spinlock_t nfc_lock;
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index e3f33fb8064e..3c2e68dd0902 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2499,16 +2499,197 @@ static int igb_offload_cbs(struct igb_adapter *adapter,
return 0;
 }
 
+#define ETHER_TYPE_FULL_MASK ((__force __be16)~0)
+#define VLAN_PRIO_FULL_MASK (0x07)
+
+static int igb_parse_cls_flower(struct igb_adapter *adapter,
+   struct tc_cls_flower_offload *f,
+   int traffic_class,
+   struct igb_nfc_filter *input)
+{
+   struct netlink_ext_ack *extack = f->common.extack;
+
+   if (f->dissector->used_keys &
+   ~(BIT(FLOW_DISSECTOR_KEY_BASIC) |
+ BIT(FLOW_DISSECTOR_KEY_CONTROL) |
+ BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) |
+ BIT(FLOW_DISSECTOR_KEY_VLAN))) {
+   NL_SET_ERR_MSG_MOD(extack,
+  "Unsupported key used, only BASIC, CONTROL, 
ETH_ADDRS and VLAN are supported");
+   return -EOPNOTSUPP;
+   }
+
+   if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
+   struct flow_dissector_key_eth_addrs *key, *mask;
+
+   key = skb_flow_dissector_target(f->dissector,
+   FLOW_DISSECTOR_KEY_ETH_ADDRS,
+   f->key);
+   mask = skb_flow_dissector_target(f->dissector,
+FLOW_DISSECTOR_KEY_ETH_ADDRS,
+f->mask);
+
+   if (!is_zero_ether_addr(mask->dst)) {
+   if (!is_broadcast_ether_addr(mask->dst)) {
+   NL_SET_ERR_MSG_MOD(extack, "Only full masks are 
supported for destination MAC address");
+   return -EINVAL;
+   }
+
+   input->filter.match_flags |=
+   IGB_FILTER_FLAG_DST_MAC_ADDR;
+   ether_addr_copy(input->filter.dst_addr, key->dst);
+   }
+
+   if (!is_zero_ether_addr(mask->src)) {
+   if (!is_broadcast_ether_addr(mask->src)) {
+   NL_SET_ERR_MSG_MOD(extack, "Only full masks are 
supported for source MAC address");
+   return -EINVAL;
+   }
+
+   input->filter.match_flags |=
+   IGB_FILTER_FLAG_SRC_MAC_ADDR;
+   ether_addr_copy(input->filter.src_addr, key->src);
+   }
+   }
+
+   if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
+   struct flow_dissector_key_basic *key, *mask;
+
+   key = skb_flow_dissector_target(f->dissector,
+   FLOW_DISSECTOR_KEY_BASIC,
+   f->key);
+   mask = skb_flow_dissector_target(f->dissector,
+FLOW_DISSECTOR_KEY_BASIC,
+f->mask);
+
+   if (mask->n_proto) {
+   if (mask->n_proto != ETHER_TYPE_FULL_MASK) {
+   NL_SET_ERR_MSG_MOD(extack, "Only full mask is 
supported for EtherType filter");
+ 

[next-queue PATCH v6 05/10] igb: Add support for enabling queue steering in filters

2018-03-29 Thread Vinicius Costa Gomes
On some igb models (82575 and i210) the MAC address filters can
control to which queue the packet will be assigned.

This extends the 'state' with one more state to signify that queue
selection should be enabled for that filter.

As 82575 parts are no longer easily obtained (and this was developed
against i210), only support for the i210 model is enabled.

These functions are exported and will be used in the next patch.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/e1000_defines.h |  1 +
 drivers/net/ethernet/intel/igb/igb.h   |  6 ++
 drivers/net/ethernet/intel/igb/igb_main.c  | 26 ++
 3 files changed, 33 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h 
b/drivers/net/ethernet/intel/igb/e1000_defines.h
index 5417edbe3125..d3d1d868e7ba 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -492,6 +492,7 @@
  */
 #define E1000_RAH_AV  0x8000/* Receive descriptor valid */
 #define E1000_RAH_ASEL_SRC_ADDR 0x0001
+#define E1000_RAH_QSEL_ENABLE 0x1000
 #define E1000_RAL_MAC_ADDR_LEN 4
 #define E1000_RAH_MAC_ADDR_LEN 2
 #define E1000_RAH_POOL_MASK 0x03FC
diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index f3ecda46f287..f48ba090fd6a 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -475,6 +475,7 @@ struct igb_mac_addr {
 #define IGB_MAC_STATE_DEFAULT  0x1
 #define IGB_MAC_STATE_IN_USE   0x2
 #define IGB_MAC_STATE_SRC_ADDR  0x4
+#define IGB_MAC_STATE_QUEUE_STEERING 0x8
 
 /* board specific private data structure */
 struct igb_adapter {
@@ -740,4 +741,9 @@ int igb_add_filter(struct igb_adapter *adapter,
 int igb_erase_filter(struct igb_adapter *adapter,
 struct igb_nfc_filter *input);
 
+int igb_add_mac_steering_filter(struct igb_adapter *adapter,
+   const u8 *addr, u8 queue, u8 flags);
+int igb_del_mac_steering_filter(struct igb_adapter *adapter,
+   const u8 *addr, u8 queue, u8 flags);
+
 #endif /* _IGB_H_ */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 2033ec3c9b27..e3da35cab786 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6935,6 +6935,28 @@ static int igb_del_mac_filter(struct igb_adapter 
*adapter, const u8 *addr,
return igb_del_mac_filter_flags(adapter, addr, queue, 0);
 }
 
+int igb_add_mac_steering_filter(struct igb_adapter *adapter,
+   const u8 *addr, u8 queue, u8 flags)
+{
+   struct e1000_hw *hw = >hw;
+
+   /* In theory, this should be supported on 82575 as well, but
+* that part wasn't easily accessible during development.
+*/
+   if (hw->mac.type != e1000_i210)
+   return -EOPNOTSUPP;
+
+   return igb_add_mac_filter_flags(adapter, addr, queue,
+   IGB_MAC_STATE_QUEUE_STEERING | flags);
+}
+
+int igb_del_mac_steering_filter(struct igb_adapter *adapter,
+   const u8 *addr, u8 queue, u8 flags)
+{
+   return igb_del_mac_filter_flags(adapter, addr, queue,
+   IGB_MAC_STATE_QUEUE_STEERING | flags);
+}
+
 static int igb_uc_sync(struct net_device *netdev, const unsigned char *addr)
 {
struct igb_adapter *adapter = netdev_priv(netdev);
@@ -8784,6 +8806,10 @@ static void igb_rar_set_index(struct igb_adapter 
*adapter, u32 index)
switch (hw->mac.type) {
case e1000_82575:
case e1000_i210:
+   if (adapter->mac_table[index].state &
+   IGB_MAC_STATE_QUEUE_STEERING)
+   rar_high |= E1000_RAH_QSEL_ENABLE;
+
rar_high |= E1000_RAH_POOL_1 *
  adapter->mac_table[index].queue;
break;
-- 
2.16.3



[next-queue PATCH v6 08/10] igb: Add MAC address support for ethtool nftuple filters

2018-03-29 Thread Vinicius Costa Gomes
This adds the capability of configuring the queue steering of arriving
packets based on their source and destination MAC addresses.

In practical terms this adds support for the following use cases,
characterized by these examples:

$ ethtool -N eth0 flow-type ether dst aa:aa:aa:aa:aa:aa action 0
(this will direct packets with destination address "aa:aa:aa:aa:aa:aa"
to the RX queue 0)

$ ethtool -N eth0 flow-type ether src 44:44:44:44:44:44 action 3
(this will direct packets with source address "44:44:44:44:44:44" to
the RX queue 3)

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_ethtool.c | 35 
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c 
b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 31b2960a7869..491946e09f8e 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -2495,6 +2495,23 @@ static int igb_get_ethtool_nfc_entry(struct igb_adapter 
*adapter,
fsp->h_ext.vlan_tci = rule->filter.vlan_tci;
fsp->m_ext.vlan_tci = htons(VLAN_PRIO_MASK);
}
+   if (rule->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR) {
+   ether_addr_copy(fsp->h_u.ether_spec.h_dest,
+   rule->filter.dst_addr);
+   /* As we only support matching by the full
+* mask, return the mask to userspace
+*/
+   eth_broadcast_addr(fsp->m_u.ether_spec.h_dest);
+   }
+   if (rule->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR) {
+   ether_addr_copy(fsp->h_u.ether_spec.h_source,
+   rule->filter.src_addr);
+   /* As we only support matching by the full
+* mask, return the mask to userspace
+*/
+   eth_broadcast_addr(fsp->m_u.ether_spec.h_source);
+   }
+
return 0;
}
return -EINVAL;
@@ -2933,10 +2950,6 @@ static int igb_add_ethtool_nfc_entry(struct igb_adapter 
*adapter,
if ((fsp->flow_type & ~FLOW_EXT) != ETHER_FLOW)
return -EINVAL;
 
-   if (fsp->m_u.ether_spec.h_proto != ETHER_TYPE_FULL_MASK &&
-   fsp->m_ext.vlan_tci != htons(VLAN_PRIO_MASK))
-   return -EINVAL;
-
input = kzalloc(sizeof(*input), GFP_KERNEL);
if (!input)
return -ENOMEM;
@@ -2946,6 +2959,20 @@ static int igb_add_ethtool_nfc_entry(struct igb_adapter 
*adapter,
input->filter.match_flags = IGB_FILTER_FLAG_ETHER_TYPE;
}
 
+   /* Only support matching addresses by the full mask */
+   if (is_broadcast_ether_addr(fsp->m_u.ether_spec.h_source)) {
+   input->filter.match_flags |= IGB_FILTER_FLAG_SRC_MAC_ADDR;
+   ether_addr_copy(input->filter.src_addr,
+   fsp->h_u.ether_spec.h_source);
+   }
+
+   /* Only support matching addresses by the full mask */
+   if (is_broadcast_ether_addr(fsp->m_u.ether_spec.h_dest)) {
+   input->filter.match_flags |= IGB_FILTER_FLAG_DST_MAC_ADDR;
+   ether_addr_copy(input->filter.dst_addr,
+   fsp->h_u.ether_spec.h_dest);
+   }
+
if ((fsp->flow_type & FLOW_EXT) && fsp->m_ext.vlan_tci) {
if (fsp->m_ext.vlan_tci != htons(VLAN_PRIO_MASK)) {
err = -EINVAL;
-- 
2.16.3



[next-queue PATCH v6 07/10] igb: Enable nfc filters to specify MAC addresses

2018-03-29 Thread Vinicius Costa Gomes
This allows igb_add_filter()/igb_erase_filter() to work on filters
that include MAC addresses (both source and destination).

For now, this only exposes the functionality, the next commit glues
ethtool into this. Later in this series, these APIs are used to allow
offloading of cls_flower filters.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h |  4 
 drivers/net/ethernet/intel/igb/igb_ethtool.c | 28 
 2 files changed, 32 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index f48ba090fd6a..b9b965921e9f 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -442,6 +442,8 @@ struct hwmon_buff {
 enum igb_filter_match_flags {
IGB_FILTER_FLAG_ETHER_TYPE = 0x1,
IGB_FILTER_FLAG_VLAN_TCI   = 0x2,
+   IGB_FILTER_FLAG_SRC_MAC_ADDR   = 0x4,
+   IGB_FILTER_FLAG_DST_MAC_ADDR   = 0x8,
 };
 
 #define IGB_MAX_RXNFC_FILTERS 16
@@ -456,6 +458,8 @@ struct igb_nfc_input {
u8 match_flags;
__be16 etype;
__be16 vlan_tci;
+   u8 src_addr[ETH_ALEN];
+   u8 dst_addr[ETH_ALEN];
 };
 
 struct igb_nfc_filter {
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c 
b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 5975d432836f..31b2960a7869 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -2776,6 +2776,25 @@ int igb_add_filter(struct igb_adapter *adapter, struct 
igb_nfc_filter *input)
return err;
}
 
+   if (input->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR) {
+   err = igb_add_mac_steering_filter(adapter,
+ input->filter.dst_addr,
+ input->action, 0);
+   err = min_t(int, err, 0);
+   if (err)
+   return err;
+   }
+
+   if (input->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR) {
+   err = igb_add_mac_steering_filter(adapter,
+ input->filter.src_addr,
+ input->action,
+ IGB_MAC_STATE_SRC_ADDR);
+   err = min_t(int, err, 0);
+   if (err)
+   return err;
+   }
+
if (input->filter.match_flags & IGB_FILTER_FLAG_VLAN_TCI)
err = igb_rxnfc_write_vlan_prio_filter(adapter, input);
 
@@ -2824,6 +2843,15 @@ int igb_erase_filter(struct igb_adapter *adapter, struct 
igb_nfc_filter *input)
igb_clear_vlan_prio_filter(adapter,
   ntohs(input->filter.vlan_tci));
 
+   if (input->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR)
+   igb_del_mac_steering_filter(adapter, input->filter.src_addr,
+   input->action,
+   IGB_MAC_STATE_SRC_ADDR);
+
+   if (input->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR)
+   igb_del_mac_steering_filter(adapter, input->filter.dst_addr,
+   input->action, 0);
+
return 0;
 }
 
-- 
2.16.3



[next-queue PATCH v6 01/10] igb: Fix not adding filter elements to the list

2018-03-29 Thread Vinicius Costa Gomes
Because the order of the parameters passes to 'hlist_add_behind()' was
inverted, the 'parent' node was added "behind" the 'input', as input
is not in the list, this causes the 'input' node to be lost.

Fixes: 0e71def25281 ("igb: add support of RX network flow classification")
Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_ethtool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c 
b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index e77ba0d5866d..5975d432836f 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -2865,7 +2865,7 @@ static int igb_update_ethtool_nfc_entry(struct 
igb_adapter *adapter,
 
/* add filter to the list */
if (parent)
-   hlist_add_behind(>nfc_node, >nfc_node);
+   hlist_add_behind(>nfc_node, >nfc_node);
else
hlist_add_head(>nfc_node, >nfc_filter_list);
 
-- 
2.16.3



RE: [Intel-wired-lan] [next-queue PATCH v5 7/9] igb: Add MAC address support for ethtool nftuple filters

2018-03-27 Thread Vinicius Costa Gomes
Hi Aaron,

"Brown, Aaron F"  writes:

[...]

> And watching the rx_queue counters continues to be spread across the 
> different queues.  This is with Jeff Kirsher's  next queue, kernel 
> 4.16.0-rc4_next-queue_dev-queue_e31d20a, which has the series of 8 igb 
> patches applied.
>
> When I go back and run the an older build (with an earlier version of
> the series) of the same tree, 4.16.0-rc4_next-queue_dev-queue_84a3942,
> with the same procedure and same systems all the rx traffic is
> relegated to queue 0 (or whichever queue I assign it to) for either
> the src or dst filter.  Here is a sample of my counters after it had
> been running netperf_stress over the weekend:

The difference in behaviour between v4 and v5 is that v4 is configuring
(wrongly) the controller to send all the traffic directed to the
local MAC address to queue 0, v5 allows that filter to be added, but it
does nothing in reality.

I am working on a new version of this series that should work for adding
filters that involve the local MAC address. The initial use cases that I
had in mind all used MAC addresses different from the local one, but I
see that this indeed useful (and less surprising).


Thank you,
--
Vinicius


RE: [Intel-wired-lan] [next-queue PATCH v5 7/9] igb: Add MAC address support for ethtool nftuple filters

2018-03-26 Thread Vinicius Costa Gomes
Hi Aaron,

"Brown, Aaron F"  writes:

>
> Maybe not "this" patch, but this is the one that enables the ethtool 
> commands, so replying here.
> The filters do not seem to take effect with this version (v5) of the
> series.  The commands are accepted for i210 and rejected with
> unsupported messages for the other adapters (as desired) and an
> ethtool -n shows the filter, however, with either the src or dst
> filter set I can run traffic (netperf) that should be caught by the
> filter and rather than being directed to the single queue it is spread
> across queues as would be expected without the filter set.
>
> The test system still has a kernel / driver with the v4 series of this
> patch set and the exact same filter commands / system setup does
> filter the traffic to the specified rx queue with the v4 series.

That's interesting. The only difference is that now non steering filters
(filters added by 'ip (m)addr', PACKET_ADD_MEMBERSHIP and the local MAC
address, for example) do not have the QSEL bit set.

On my tests I cannot detect any change in behaviour between those two
versions of the series, for example. trying to add a filter for the
local MAC address has no visible effect in both versions. (This raises a
question: should this be an error, or should this override the default
entry configuration, or this behaviour is fine?)

Can you share more information about your tests? so I can reproduce it here.


Thank you,
--
Vinicius


[next-queue PATCH v5 1/9] igb: Fix not adding filter elements to the list

2018-03-21 Thread Vinicius Costa Gomes
Because the order of the parameters passes to 'hlist_add_behind()' was
inverted, the 'parent' node was added "behind" the 'input', as input
is not in the list, this causes the 'input' node to be lost.

Fixes: 0e71def25281 ("igb: add support of RX network flow classification")
Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_ethtool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c 
b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 606e6761758f..143f0bb34e4d 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -2864,7 +2864,7 @@ static int igb_update_ethtool_nfc_entry(struct 
igb_adapter *adapter,
 
/* add filter to the list */
if (parent)
-   hlist_add_behind(>nfc_node, >nfc_node);
+   hlist_add_behind(>nfc_node, >nfc_node);
else
hlist_add_head(>nfc_node, >nfc_filter_list);
 
-- 
2.16.2



[next-queue PATCH v5 3/9] igb: Enable the hardware traffic class feature bit for igb models

2018-03-21 Thread Vinicius Costa Gomes
This will allow functionality depending on the hardware being traffic
class aware to work. In particular the tc-flower offloading checks
verifies that this bit is set.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index d0e8e796c6fa..9ce29b8bb7da 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2806,6 +2806,9 @@ static int igb_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
if (hw->mac.type >= e1000_82576)
netdev->features |= NETIF_F_SCTP_CRC;
 
+   if (hw->mac.type >= e1000_i350)
+   netdev->features |= NETIF_F_HW_TC;
+
 #define IGB_GSO_PARTIAL_FEATURES (NETIF_F_GSO_GRE | \
  NETIF_F_GSO_GRE_CSUM | \
  NETIF_F_GSO_IPXIP4 | \
-- 
2.16.2



[next-queue PATCH v5 6/9] igb: Enable nfc filters to specify MAC addresses

2018-03-21 Thread Vinicius Costa Gomes
This allows igb_add_filter()/igb_erase_filter() to work on filters
that include MAC addresses (both source and destination).

For now, this only exposes the functionality, the next commit glues
ethtool into this. Later in this series, these APIs are used to allow
offloading of cls_flower filters.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h |  4 
 drivers/net/ethernet/intel/igb/igb_ethtool.c | 28 
 2 files changed, 32 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index dfef1702ba21..66165879f12b 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -441,6 +441,8 @@ struct hwmon_buff {
 enum igb_filter_match_flags {
IGB_FILTER_FLAG_ETHER_TYPE = 0x1,
IGB_FILTER_FLAG_VLAN_TCI   = 0x2,
+   IGB_FILTER_FLAG_SRC_MAC_ADDR   = 0x4,
+   IGB_FILTER_FLAG_DST_MAC_ADDR   = 0x8,
 };
 
 #define IGB_MAX_RXNFC_FILTERS 16
@@ -455,6 +457,8 @@ struct igb_nfc_input {
u8 match_flags;
__be16 etype;
__be16 vlan_tci;
+   u8 src_addr[ETH_ALEN];
+   u8 dst_addr[ETH_ALEN];
 };
 
 struct igb_nfc_filter {
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c 
b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 143f0bb34e4d..4c6a1b78c413 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -2775,6 +2775,25 @@ int igb_add_filter(struct igb_adapter *adapter, struct 
igb_nfc_filter *input)
return err;
}
 
+   if (input->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR) {
+   err = igb_add_mac_steering_filter(adapter,
+ input->filter.dst_addr,
+ input->action, 0);
+   err = min_t(int, err, 0);
+   if (err)
+   return err;
+   }
+
+   if (input->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR) {
+   err = igb_add_mac_steering_filter(adapter,
+ input->filter.src_addr,
+ input->action,
+ IGB_MAC_STATE_SRC_ADDR);
+   err = min_t(int, err, 0);
+   if (err)
+   return err;
+   }
+
if (input->filter.match_flags & IGB_FILTER_FLAG_VLAN_TCI)
err = igb_rxnfc_write_vlan_prio_filter(adapter, input);
 
@@ -2823,6 +2842,15 @@ int igb_erase_filter(struct igb_adapter *adapter, struct 
igb_nfc_filter *input)
igb_clear_vlan_prio_filter(adapter,
   ntohs(input->filter.vlan_tci));
 
+   if (input->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR)
+   igb_del_mac_steering_filter(adapter, input->filter.src_addr,
+   input->action,
+   IGB_MAC_STATE_SRC_ADDR);
+
+   if (input->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR)
+   igb_del_mac_steering_filter(adapter, input->filter.dst_addr,
+   input->action, 0);
+
return 0;
 }
 
-- 
2.16.2



[next-queue PATCH v5 8/9] igb: Add the skeletons for tc-flower offloading

2018-03-21 Thread Vinicius Costa Gomes
This adds basic functions needed to implement offloading for filters
created by tc-flower.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 66 +++
 1 file changed, 66 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 52cd891aa579..150231e4db9d 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2497,6 +2498,69 @@ static int igb_offload_cbs(struct igb_adapter *adapter,
return 0;
 }
 
+static int igb_configure_clsflower(struct igb_adapter *adapter,
+  struct tc_cls_flower_offload *cls_flower)
+{
+   return -EOPNOTSUPP;
+}
+
+static int igb_delete_clsflower(struct igb_adapter *adapter,
+   struct tc_cls_flower_offload *cls_flower)
+{
+   return -EOPNOTSUPP;
+}
+
+static int igb_setup_tc_cls_flower(struct igb_adapter *adapter,
+  struct tc_cls_flower_offload *cls_flower)
+{
+   switch (cls_flower->command) {
+   case TC_CLSFLOWER_REPLACE:
+   return igb_configure_clsflower(adapter, cls_flower);
+   case TC_CLSFLOWER_DESTROY:
+   return igb_delete_clsflower(adapter, cls_flower);
+   case TC_CLSFLOWER_STATS:
+   return -EOPNOTSUPP;
+   default:
+   return -EINVAL;
+   }
+}
+
+static int igb_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+void *cb_priv)
+{
+   struct igb_adapter *adapter = cb_priv;
+
+   if (!tc_cls_can_offload_and_chain0(adapter->netdev, type_data))
+   return -EOPNOTSUPP;
+
+   switch (type) {
+   case TC_SETUP_CLSFLOWER:
+   return igb_setup_tc_cls_flower(adapter, type_data);
+
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
+static int igb_setup_tc_block(struct igb_adapter *adapter,
+ struct tc_block_offload *f)
+{
+   if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+   return -EOPNOTSUPP;
+
+   switch (f->command) {
+   case TC_BLOCK_BIND:
+   return tcf_block_cb_register(f->block, igb_setup_tc_block_cb,
+adapter, adapter);
+   case TC_BLOCK_UNBIND:
+   tcf_block_cb_unregister(f->block, igb_setup_tc_block_cb,
+   adapter);
+   return 0;
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
 static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type,
void *type_data)
 {
@@ -2505,6 +2569,8 @@ static int igb_setup_tc(struct net_device *dev, enum 
tc_setup_type type,
switch (type) {
case TC_SETUP_QDISC_CBS:
return igb_offload_cbs(adapter, type_data);
+   case TC_SETUP_BLOCK:
+   return igb_setup_tc_block(adapter, type_data);
 
default:
return -EOPNOTSUPP;
-- 
2.16.2



[next-queue PATCH v5 0/9] igb: offloading of receive filters

2018-03-21 Thread Vinicius Costa Gomes
Hi,

Changes from v4:
 - Added a new bit to the MAC address filters internal representation
 to mean that some MAC address filters are steering filters (i.e. they
 direct traffic to queues);
 - And, this is only supported in i210;
 - Added a "Known Issue" section;

Changes from v3:
 - Addressed review comments from Aaron F. Brown and
   Jakub Kicinski;

Changes from v2:
 - Addressed review comments from Jakub Kicinski, mostly about coding
   style adjustments and more consistent error reporting;

Changes from v1:
 - Addressed review comments from Alexander Duyck and Florian
   Fainelli;
 - Adding and removing cls_flower filters are now proposed in the same
   patch;
 - cls_flower filters are kept in a separated list from "ethtool"
   filters (so that section of the original cover letter is no longer
   valid);
 - The patch adding support for ethtool filters is now independent from
   the rest of the series;

Known issue:
 - It seems that the the QSEL bits in the RAH registers do not have
 any effect for source address (i.e. steering doesn't work for source
 address filters), everything is pointing to a hardware (or
 documentation) issue;

Original cover letter:

This series enables some ethtool and tc-flower filters to be offloaded
to igb-based network controllers. This is useful when the system
configurator want to steer kinds of traffic to a specific hardware
queue.

The first two commits are bug fixes.

The basis of this series is to export the internal API used to
configure address filters, so they can be used by ethtool, and
extending the functionality so an source address can be handled.

Then, we enable the tc-flower offloading implementation to re-use the
same infrastructure as ethtool, and storing them in the per-adapter
"nfc" (Network Filter Config?) list. But for consistency, for
destructive access they are separated, i.e. an filter added by
tc-flower can only be removed by tc-flower, but ethtool can read them
all.

Only support for VLAN Prio, Source and Destination MAC Address, and
Ethertype is enabled for now.

Open question:
  - igb is initialized with the number of traffic classes as 1, if we
  want to use multiple traffic classes we need to increase this value,
  the only way I could find is to use mqprio (for example). Should igb
  be initialized with, say, the number of queues as its "num_tc"?

Vinicius Costa Gomes (9):
  igb: Fix not adding filter elements to the list
  igb: Fix queue selection on MAC filters on i210
  igb: Enable the hardware traffic class feature bit for igb models
  igb: Add support for MAC address filters specifying source addresses
  igb: Add support for enabling queue steering in filters
  igb: Enable nfc filters to specify MAC addresses
  igb: Add MAC address support for ethtool nftuple filters
  igb: Add the skeletons for tc-flower offloading
  igb: Add support for adding offloaded clsflower filters

 drivers/net/ethernet/intel/igb/e1000_defines.h |   2 +
 drivers/net/ethernet/intel/igb/igb.h   |  13 +
 drivers/net/ethernet/intel/igb/igb_ethtool.c   |  65 -
 drivers/net/ethernet/intel/igb/igb_main.c  | 332 -
 4 files changed, 398 insertions(+), 14 deletions(-)

--
2.16.2


[next-queue PATCH v5 4/9] igb: Add support for MAC address filters specifying source addresses

2018-03-21 Thread Vinicius Costa Gomes
Makes it possible to direct packets to queues based on their source
address. Documents the expected usage of the 'flags' parameter.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/e1000_defines.h |  1 +
 drivers/net/ethernet/intel/igb/igb.h   |  1 +
 drivers/net/ethernet/intel/igb/igb_main.c  | 40 ++
 3 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h 
b/drivers/net/ethernet/intel/igb/e1000_defines.h
index 83cabff1e0ab..a3e5514b044e 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -490,6 +490,7 @@
  * manageability enabled, allowing us room for 15 multicast addresses.
  */
 #define E1000_RAH_AV  0x8000/* Receive descriptor valid */
+#define E1000_RAH_ASEL_SRC_ADDR 0x0001
 #define E1000_RAL_MAC_ADDR_LEN 4
 #define E1000_RAH_MAC_ADDR_LEN 2
 #define E1000_RAH_POOL_MASK 0x03FC
diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index 55d6f17d5799..4501b28ff7c5 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -473,6 +473,7 @@ struct igb_mac_addr {
 
 #define IGB_MAC_STATE_DEFAULT  0x1
 #define IGB_MAC_STATE_IN_USE   0x2
+#define IGB_MAC_STATE_SRC_ADDR  0x4
 
 /* board specific private data structure */
 struct igb_adapter {
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 9ce29b8bb7da..a5a681f7fbb2 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6843,8 +6843,14 @@ static void igb_set_default_mac_filter(struct 
igb_adapter *adapter)
igb_rar_set_index(adapter, 0);
 }
 
-static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr,
- const u8 queue)
+/* Add a MAC filter for 'addr' directing matching traffic to 'queue',
+ * 'flags' is used to indicate what kind of match is made, match is by
+ * default for the destination address, if matching by source address
+ * is desired the flag IGB_MAC_STATE_SRC_ADDR can be used.
+ */
+static int igb_add_mac_filter_flags(struct igb_adapter *adapter,
+   const u8 *addr, const u8 queue,
+   const u8 flags)
 {
struct e1000_hw *hw = >hw;
int rar_entries = hw->mac.rar_entry_count -
@@ -6864,7 +6870,7 @@ static int igb_add_mac_filter(struct igb_adapter 
*adapter, const u8 *addr,
 
ether_addr_copy(adapter->mac_table[i].addr, addr);
adapter->mac_table[i].queue = queue;
-   adapter->mac_table[i].state |= IGB_MAC_STATE_IN_USE;
+   adapter->mac_table[i].state |= IGB_MAC_STATE_IN_USE | flags;
 
igb_rar_set_index(adapter, i);
return i;
@@ -6873,8 +6879,21 @@ static int igb_add_mac_filter(struct igb_adapter 
*adapter, const u8 *addr,
return -ENOSPC;
 }
 
-static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *addr,
+static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr,
  const u8 queue)
+{
+   return igb_add_mac_filter_flags(adapter, addr, queue, 0);
+}
+
+/* Remove a MAC filter for 'addr' directing matching traffic to
+ * 'queue', 'flags' is used to indicate what kind of match need to be
+ * removed, match is by default for the destination address, if
+ * matching by source address is to be removed the flag
+ * IGB_MAC_STATE_SRC_ADDR can be used.
+ */
+static int igb_del_mac_filter_flags(struct igb_adapter *adapter,
+   const u8 *addr, const u8 queue,
+   const u8 flags)
 {
struct e1000_hw *hw = >hw;
int rar_entries = hw->mac.rar_entry_count -
@@ -6891,12 +6910,14 @@ static int igb_del_mac_filter(struct igb_adapter 
*adapter, const u8 *addr,
for (i = 0; i < rar_entries; i++) {
if (!(adapter->mac_table[i].state & IGB_MAC_STATE_IN_USE))
continue;
+   if ((adapter->mac_table[i].state & flags) != flags)
+   continue;
if (adapter->mac_table[i].queue != queue)
continue;
if (!ether_addr_equal(adapter->mac_table[i].addr, addr))
continue;
 
-   adapter->mac_table[i].state &= ~IGB_MAC_STATE_IN_USE;
+   adapter->mac_table[i].state = 0;
memset(adapter->mac_table[i].addr, 0, ETH_ALEN);
adapter->mac_table[i].queue = 0;
 
@@ -6907,6 +6928,12 @@ static int igb_del_mac_filter(struct igb_adapter 
*adapter, const u8 *addr,
return -ENOENT;
 }
 
+static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *

[next-queue PATCH v5 7/9] igb: Add MAC address support for ethtool nftuple filters

2018-03-21 Thread Vinicius Costa Gomes
This adds the capability of configuring the queue steering of arriving
packets based on their source and destination MAC addresses.

In practical terms this adds support for the following use cases,
characterized by these examples:

$ ethtool -N eth0 flow-type ether dst aa:aa:aa:aa:aa:aa action 0
(this will direct packets with destination address "aa:aa:aa:aa:aa:aa"
to the RX queue 0)

$ ethtool -N eth0 flow-type ether src 44:44:44:44:44:44 action 3
(this will direct packets with source address "44:44:44:44:44:44" to
the RX queue 3)

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_ethtool.c | 35 
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c 
b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 4c6a1b78c413..27caa413ade2 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -2494,6 +2494,23 @@ static int igb_get_ethtool_nfc_entry(struct igb_adapter 
*adapter,
fsp->h_ext.vlan_tci = rule->filter.vlan_tci;
fsp->m_ext.vlan_tci = htons(VLAN_PRIO_MASK);
}
+   if (rule->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR) {
+   ether_addr_copy(fsp->h_u.ether_spec.h_dest,
+   rule->filter.dst_addr);
+   /* As we only support matching by the full
+* mask, return the mask to userspace
+*/
+   eth_broadcast_addr(fsp->m_u.ether_spec.h_dest);
+   }
+   if (rule->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR) {
+   ether_addr_copy(fsp->h_u.ether_spec.h_source,
+   rule->filter.src_addr);
+   /* As we only support matching by the full
+* mask, return the mask to userspace
+*/
+   eth_broadcast_addr(fsp->m_u.ether_spec.h_source);
+   }
+
return 0;
}
return -EINVAL;
@@ -2932,10 +2949,6 @@ static int igb_add_ethtool_nfc_entry(struct igb_adapter 
*adapter,
if ((fsp->flow_type & ~FLOW_EXT) != ETHER_FLOW)
return -EINVAL;
 
-   if (fsp->m_u.ether_spec.h_proto != ETHER_TYPE_FULL_MASK &&
-   fsp->m_ext.vlan_tci != htons(VLAN_PRIO_MASK))
-   return -EINVAL;
-
input = kzalloc(sizeof(*input), GFP_KERNEL);
if (!input)
return -ENOMEM;
@@ -2945,6 +2958,20 @@ static int igb_add_ethtool_nfc_entry(struct igb_adapter 
*adapter,
input->filter.match_flags = IGB_FILTER_FLAG_ETHER_TYPE;
}
 
+   /* Only support matching addresses by the full mask */
+   if (is_broadcast_ether_addr(fsp->m_u.ether_spec.h_source)) {
+   input->filter.match_flags |= IGB_FILTER_FLAG_SRC_MAC_ADDR;
+   ether_addr_copy(input->filter.src_addr,
+   fsp->h_u.ether_spec.h_source);
+   }
+
+   /* Only support matching addresses by the full mask */
+   if (is_broadcast_ether_addr(fsp->m_u.ether_spec.h_dest)) {
+   input->filter.match_flags |= IGB_FILTER_FLAG_DST_MAC_ADDR;
+   ether_addr_copy(input->filter.dst_addr,
+   fsp->h_u.ether_spec.h_dest);
+   }
+
if ((fsp->flow_type & FLOW_EXT) && fsp->m_ext.vlan_tci) {
if (fsp->m_ext.vlan_tci != htons(VLAN_PRIO_MASK)) {
err = -EINVAL;
-- 
2.16.2



[next-queue PATCH v5 2/9] igb: Fix queue selection on MAC filters on i210

2018-03-21 Thread Vinicius Costa Gomes
On the RAH registers there are semantic differences on the meaning of
the "queue" parameter for traffic steering depending on the controller
model: there is the 82575 meaning, which "queue" means a RX Hardware
Queue, and the i350 meaning, where it is a reception pool.

The previous behaviour was having no effect for i210 based controllers
because the QSEL bit of the RAH register wasn't being set.

This patch separates the condition in discrete cases, so the different
handling is clearer.

Fixes: 83c21335c876 ("igb: improve MAC filter handling")
Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 715bb32e6901..d0e8e796c6fa 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8747,12 +8747,17 @@ static void igb_rar_set_index(struct igb_adapter 
*adapter, u32 index)
if (is_valid_ether_addr(addr))
rar_high |= E1000_RAH_AV;
 
-   if (hw->mac.type == e1000_82575)
+   switch (hw->mac.type) {
+   case e1000_82575:
+   case e1000_i210:
rar_high |= E1000_RAH_POOL_1 *
-   adapter->mac_table[index].queue;
-   else
+ adapter->mac_table[index].queue;
+   break;
+   default:
rar_high |= E1000_RAH_POOL_1 <<
-   adapter->mac_table[index].queue;
+   adapter->mac_table[index].queue;
+   break;
+   }
}
 
wr32(E1000_RAL(index), rar_low);
-- 
2.16.2



[next-queue PATCH v5 5/9] igb: Add support for enabling queue steering in filters

2018-03-21 Thread Vinicius Costa Gomes
On some igb models (82575 and i210) the MAC address filters can
control to which queue the packet will be assigned.

This extends the 'state' with one more state to signify that queue
selection should be enabled for that filter.

As 82575 parts are no longer easily obtained (and this was developed
against i210), only support for the i210 model is enabled.

These functions are exported and will be used in the next patch.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/e1000_defines.h |  1 +
 drivers/net/ethernet/intel/igb/igb.h   |  6 ++
 drivers/net/ethernet/intel/igb/igb_main.c  | 26 ++
 3 files changed, 33 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h 
b/drivers/net/ethernet/intel/igb/e1000_defines.h
index a3e5514b044e..c6f552de30dd 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -491,6 +491,7 @@
  */
 #define E1000_RAH_AV  0x8000/* Receive descriptor valid */
 #define E1000_RAH_ASEL_SRC_ADDR 0x0001
+#define E1000_RAH_QSEL_ENABLE 0x1000
 #define E1000_RAL_MAC_ADDR_LEN 4
 #define E1000_RAH_MAC_ADDR_LEN 2
 #define E1000_RAH_POOL_MASK 0x03FC
diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index 4501b28ff7c5..dfef1702ba21 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -474,6 +474,7 @@ struct igb_mac_addr {
 #define IGB_MAC_STATE_DEFAULT  0x1
 #define IGB_MAC_STATE_IN_USE   0x2
 #define IGB_MAC_STATE_SRC_ADDR  0x4
+#define IGB_MAC_STATE_QUEUE_STEERING 0x8
 
 /* board specific private data structure */
 struct igb_adapter {
@@ -739,4 +740,9 @@ int igb_add_filter(struct igb_adapter *adapter,
 int igb_erase_filter(struct igb_adapter *adapter,
 struct igb_nfc_filter *input);
 
+int igb_add_mac_steering_filter(struct igb_adapter *adapter,
+   const u8 *addr, u8 queue, u8 flags);
+int igb_del_mac_steering_filter(struct igb_adapter *adapter,
+   const u8 *addr, u8 queue, u8 flags);
+
 #endif /* _IGB_H_ */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index a5a681f7fbb2..52cd891aa579 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6934,6 +6934,28 @@ static int igb_del_mac_filter(struct igb_adapter 
*adapter, const u8 *addr,
return igb_del_mac_filter_flags(adapter, addr, queue, 0);
 }
 
+int igb_add_mac_steering_filter(struct igb_adapter *adapter,
+   const u8 *addr, u8 queue, u8 flags)
+{
+   struct e1000_hw *hw = >hw;
+
+   /* In theory, this should be supported on 82575 as well, but
+* that part wasn't easily accessible during development.
+*/
+   if (hw->mac.type != e1000_i210)
+   return -EOPNOTSUPP;
+
+   return igb_add_mac_filter_flags(adapter, addr, queue,
+   IGB_MAC_STATE_QUEUE_STEERING | flags);
+}
+
+int igb_del_mac_steering_filter(struct igb_adapter *adapter,
+   const u8 *addr, u8 queue, u8 flags)
+{
+   return igb_del_mac_filter_flags(adapter, addr, queue,
+   IGB_MAC_STATE_QUEUE_STEERING | flags);
+}
+
 static int igb_uc_sync(struct net_device *netdev, const unsigned char *addr)
 {
struct igb_adapter *adapter = netdev_priv(netdev);
@@ -8783,6 +8805,10 @@ static void igb_rar_set_index(struct igb_adapter 
*adapter, u32 index)
switch (hw->mac.type) {
case e1000_82575:
case e1000_i210:
+   if (adapter->mac_table[index].state &
+   IGB_MAC_STATE_QUEUE_STEERING)
+   rar_high |= E1000_RAH_QSEL_ENABLE;
+
rar_high |= E1000_RAH_POOL_1 *
  adapter->mac_table[index].queue;
break;
-- 
2.16.2



[next-queue PATCH v5 9/9] igb: Add support for adding offloaded clsflower filters

2018-03-21 Thread Vinicius Costa Gomes
This allows filters added by tc-flower and specifying MAC addresses,
Ethernet types, and the VLAN priority field, to be offloaded to the
controller.

This reuses most of the infrastructure used by ethtool, but clsflower
filters are kept in a separated list, so they are invisible to
ethtool.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h  |   2 +
 drivers/net/ethernet/intel/igb/igb_main.c | 188 +-
 2 files changed, 188 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index 66165879f12b..adfef068e866 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -464,6 +464,7 @@ struct igb_nfc_input {
 struct igb_nfc_filter {
struct hlist_node nfc_node;
struct igb_nfc_input filter;
+   unsigned long cookie;
u16 etype_reg_index;
u16 sw_idx;
u16 action;
@@ -603,6 +604,7 @@ struct igb_adapter {
 
/* RX network flow classification support */
struct hlist_head nfc_filter_list;
+   struct hlist_head cls_flower_list;
unsigned int nfc_filter_count;
/* lock for RX network flow classification filter */
spinlock_t nfc_lock;
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 150231e4db9d..cc580b17dab3 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2498,16 +2498,197 @@ static int igb_offload_cbs(struct igb_adapter *adapter,
return 0;
 }
 
+#define ETHER_TYPE_FULL_MASK ((__force __be16)~0)
+#define VLAN_PRIO_FULL_MASK (0x07)
+
+static int igb_parse_cls_flower(struct igb_adapter *adapter,
+   struct tc_cls_flower_offload *f,
+   int traffic_class,
+   struct igb_nfc_filter *input)
+{
+   struct netlink_ext_ack *extack = f->common.extack;
+
+   if (f->dissector->used_keys &
+   ~(BIT(FLOW_DISSECTOR_KEY_BASIC) |
+ BIT(FLOW_DISSECTOR_KEY_CONTROL) |
+ BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) |
+ BIT(FLOW_DISSECTOR_KEY_VLAN))) {
+   NL_SET_ERR_MSG_MOD(extack,
+  "Unsupported key used, only BASIC, CONTROL, 
ETH_ADDRS and VLAN are supported");
+   return -EOPNOTSUPP;
+   }
+
+   if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
+   struct flow_dissector_key_eth_addrs *key, *mask;
+
+   key = skb_flow_dissector_target(f->dissector,
+   FLOW_DISSECTOR_KEY_ETH_ADDRS,
+   f->key);
+   mask = skb_flow_dissector_target(f->dissector,
+FLOW_DISSECTOR_KEY_ETH_ADDRS,
+f->mask);
+
+   if (!is_zero_ether_addr(mask->dst)) {
+   if (!is_broadcast_ether_addr(mask->dst)) {
+   NL_SET_ERR_MSG_MOD(extack, "Only full masks are 
supported for destination MAC address");
+   return -EINVAL;
+   }
+
+   input->filter.match_flags |=
+   IGB_FILTER_FLAG_DST_MAC_ADDR;
+   ether_addr_copy(input->filter.dst_addr, key->dst);
+   }
+
+   if (!is_zero_ether_addr(mask->src)) {
+   if (!is_broadcast_ether_addr(mask->src)) {
+   NL_SET_ERR_MSG_MOD(extack, "Only full masks are 
supported for source MAC address");
+   return -EINVAL;
+   }
+
+   input->filter.match_flags |=
+   IGB_FILTER_FLAG_SRC_MAC_ADDR;
+   ether_addr_copy(input->filter.src_addr, key->src);
+   }
+   }
+
+   if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
+   struct flow_dissector_key_basic *key, *mask;
+
+   key = skb_flow_dissector_target(f->dissector,
+   FLOW_DISSECTOR_KEY_BASIC,
+   f->key);
+   mask = skb_flow_dissector_target(f->dissector,
+FLOW_DISSECTOR_KEY_BASIC,
+f->mask);
+
+   if (mask->n_proto) {
+   if (mask->n_proto != ETHER_TYPE_FULL_MASK) {
+   NL_SET_ERR_MSG_MOD(extack, "Only full mask is 
supported for EtherType filter");
+ 

Re: [Intel-wired-lan] [next-queue PATCH v4 6/8] igb: Add MAC address support for ethtool nftuple filters

2018-03-16 Thread Vinicius Costa Gomes
Hi,

Alexander Duyck <alexander.du...@gmail.com> writes:

> On Tue, Mar 13, 2018 at 8:04 PM, Brown, Aaron F <aaron.f.br...@intel.com> 
> wrote:
>>> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On
>>> Behalf Of Vinicius Costa Gomes
>>> Sent: Wednesday, March 7, 2018 4:37 PM
>>> To: intel-wired-...@lists.osuosl.org
>>> Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus >> palen...@intel.com>
>>> Subject: [Intel-wired-lan] [next-queue PATCH v4 6/8] igb: Add MAC address
>>> support for ethtool nftuple filters
>>>
>>> This adds the capability of configuring the queue steering of arriving
>>> packets based on their source and destination MAC addresses.
>>>
>>> In practical terms this adds support for the following use cases,
>>> characterized by these examples:
>>>
>>> $ ethtool -N eth0 flow-type ether dst aa:aa:aa:aa:aa:aa action 0
>>> (this will direct packets with destination address "aa:aa:aa:aa:aa:aa"
>>> to the RX queue 0)
>>>
>>> $ ethtool -N eth0 flow-type ether src 44:44:44:44:44:44 action 3
>>> (this will direct packets with source address "44:44:44:44:44:44" to
>>> the RX queue 3)
>>
>> This seems to work fine on i210, and the patch series allows me to set the 
>> rx filters on the i350, i354 and i211, but it is not directing the packets 
>> to the queue I request.
>>
>> With the exception of i210 the rx_queues number does not seem to be effected 
>> by setting the filter.  In the case of i211 the rx packets stay on rx_queue 
>> 0 with or without an ether src or dst filter.  The first example one seems 
>> to work at first since it's directing to queue 0, but changing the filter to 
>> "action 1" does not change the behavior.  With the i350 and i354 ports the 
>> packets are spread across the rx_queues with or without the filter set.
>
> Do any of the other parts actually support this functionality? I don't
> think they do.

>From what I can see, the only other part that supports queue steering (by MAC
addresses) is the 82575. But as I don't have any of those handy, making
it work only for the i210 seems more reasonable, to avoid getting into
this situation again.

>
> What we might look at doing instead of trying to add support for other
> parts would be to explicitly limit this functionality to the i210
> since if I am not mistaken this may be a feature only available in
> that hardware.

Sounds good to me.

>
> Thanks.
>
> - Alex


Cheers,
--
Vinicius


[PATCH net-next v3 0/1] skbuff: Fix applications not being woken for errors

2018-03-16 Thread Vinicius Costa Gomes
Hi,

Changes from v2:
 - As the skbuff fix got applied into the net tree, removing from this
 series (didn't change the subject to avoid causing any more confusion);

Changes from v1:
 - Fixed comments from Willem de Bruijn, about the order of the
 options passed to getopt();
 - Added Reviewed-by and Fixes tags to patch (2);

Changes from the RFC:
 - tweaked commit messages;

Original cover letter:

This is actually a "bug report"-RFC instead of the more usual "new
feature"-RFC.

We are developing an application that uses TX hardware timestamping to
make some measurements, and during development Randy Witt initially
reported that the application poll() never unblocked when TX hardware
timestamping was enabled.

After some investigation, it turned out the problem wasn't only
exclusive to hardware timestamping, and could be reproduced with
software timestamping.

Applying patch (1), and running txtimestamp like this, for example:

$ ./txtimestamp -u -4 192.168.1.71 -c 1000 -D -l 1000 -F

('-u' to use UDP only, '-4' for ipv4 only, '-c 1000' to send 1000
packets for each test, '-D' to remove the delay between packets, '-l
1000' to set the payload to 1000 bytes, '-F' for configuring poll() to
wait forever)

will cause the application to become stuck in the poll() call in most
of the times. (Note: I couldn't reproduce the issue running against an
address that is routed through loopback.)

Another interesting fact is that if the POLLIN event is added to the
poll() .events, poll() no longer becomes stuck, and more interestingly
the returned event in .revents is only POLLERR.

After a few debugging sessions, we got to 'sock_queue_err_skb()' and
how it notifies applications of the error just enqueued. Changing it
to use 'sk->sk_error_report()', fixes the issue for hardware and
software timestamping. That is patch (2).

The "solution" proposed in patch (2) looks like too big a hammer, if
it's not, then it seems that this problem existed since a long time
ago (pre git) and was uncommon for folks to reach the necessary
conditions to trigger it (my hypothesis is that only triggers when the
error is reported from a different task context than the application).

Am I missing something here?

Cheers,


Vinicius Costa Gomes (1):
  selftests/txtimestamp: Add more configurable parameters

 .../selftests/networking/timestamping/txtimestamp.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

--
2.16.2


[PATCH net-next v3 1/1] selftests/txtimestamp: Add more configurable parameters

2018-03-16 Thread Vinicius Costa Gomes
Add a way to configure if poll() should wait forever for an event, the
number of packets that should be sent for each and if there should be
any delay between packets.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 .../selftests/networking/timestamping/txtimestamp.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/networking/timestamping/txtimestamp.c 
b/tools/testing/selftests/networking/timestamping/txtimestamp.c
index 5df07047ca86..81a98a240456 100644
--- a/tools/testing/selftests/networking/timestamping/txtimestamp.c
+++ b/tools/testing/selftests/networking/timestamping/txtimestamp.c
@@ -68,9 +68,11 @@ static int cfg_num_pkts = 4;
 static int do_ipv4 = 1;
 static int do_ipv6 = 1;
 static int cfg_payload_len = 10;
+static int cfg_poll_timeout = 100;
 static bool cfg_show_payload;
 static bool cfg_do_pktinfo;
 static bool cfg_loop_nodata;
+static bool cfg_no_delay;
 static uint16_t dest_port = 9000;
 
 static struct sockaddr_in daddr;
@@ -171,7 +173,7 @@ static void __poll(int fd)
 
memset(, 0, sizeof(pollfd));
pollfd.fd = fd;
-   ret = poll(, 1, 100);
+   ret = poll(, 1, cfg_poll_timeout);
if (ret != 1)
error(1, errno, "poll");
 }
@@ -371,7 +373,8 @@ static void do_test(int family, unsigned int opt)
error(1, errno, "send");
 
/* wait for all errors to be queued, else ACKs arrive OOO */
-   usleep(50 * 1000);
+   if (!cfg_no_delay)
+   usleep(50 * 1000);
 
__poll(fd);
 
@@ -392,6 +395,9 @@ static void __attribute__((noreturn)) usage(const char 
*filepath)
"  -4:   only IPv4\n"
"  -6:   only IPv6\n"
"  -h:   show this message\n"
+   "  -c N: number of packets for each test\n"
+   "  -D:   no delay between packets\n"
+   "  -F:   poll() waits forever for an event\n"
"  -I:   request PKTINFO\n"
"  -l N: send N bytes at a time\n"
"  -n:   set no-payload option\n"
@@ -409,7 +415,7 @@ static void parse_opt(int argc, char **argv)
int proto_count = 0;
char c;
 
-   while ((c = getopt(argc, argv, "46hIl:np:rRux")) != -1) {
+   while ((c = getopt(argc, argv, "46c:DFhIl:np:rRux")) != -1) {
switch (c) {
case '4':
do_ipv6 = 0;
@@ -417,6 +423,15 @@ static void parse_opt(int argc, char **argv)
case '6':
do_ipv4 = 0;
break;
+   case 'c':
+   cfg_num_pkts = strtoul(optarg, NULL, 10);
+   break;
+   case 'D':
+   cfg_no_delay = true;
+   break;
+   case 'F':
+   cfg_poll_timeout = -1;
+   break;
case 'I':
cfg_do_pktinfo = true;
break;
-- 
2.16.2



[PATCH net-next v2 2/2] skbuff: Fix not waking applications when errors are enqueued

2018-03-14 Thread Vinicius Costa Gomes
When errors are enqueued to the error queue via sock_queue_err_skb()
function, it is possible that the waiting application is not notified.

Calling 'sk->sk_data_ready()' would not notify applications that
selected only POLLERR events in poll() (for example).

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Randy E. Witt <randy.e.w...@intel.com>
Reviewed-by: Eric Dumazet <eduma...@google.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 net/core/skbuff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 715c13495ba6..6def3534f509 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4181,7 +4181,7 @@ int sock_queue_err_skb(struct sock *sk, struct sk_buff 
*skb)
 
skb_queue_tail(>sk_error_queue, skb);
if (!sock_flag(sk, SOCK_DEAD))
-   sk->sk_data_ready(sk);
+   sk->sk_error_report(sk);
return 0;
 }
 EXPORT_SYMBOL(sock_queue_err_skb);
-- 
2.16.2



[PATCH net-next v2 1/2] selftests/txtimestamp: Add more configurable parameters

2018-03-14 Thread Vinicius Costa Gomes
Add a way to configure if poll() should wait forever for an event, the
number of packets that should be sent for each and if there should be
any delay between packets.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 .../selftests/networking/timestamping/txtimestamp.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/networking/timestamping/txtimestamp.c 
b/tools/testing/selftests/networking/timestamping/txtimestamp.c
index 5df07047ca86..81a98a240456 100644
--- a/tools/testing/selftests/networking/timestamping/txtimestamp.c
+++ b/tools/testing/selftests/networking/timestamping/txtimestamp.c
@@ -68,9 +68,11 @@ static int cfg_num_pkts = 4;
 static int do_ipv4 = 1;
 static int do_ipv6 = 1;
 static int cfg_payload_len = 10;
+static int cfg_poll_timeout = 100;
 static bool cfg_show_payload;
 static bool cfg_do_pktinfo;
 static bool cfg_loop_nodata;
+static bool cfg_no_delay;
 static uint16_t dest_port = 9000;
 
 static struct sockaddr_in daddr;
@@ -171,7 +173,7 @@ static void __poll(int fd)
 
memset(, 0, sizeof(pollfd));
pollfd.fd = fd;
-   ret = poll(, 1, 100);
+   ret = poll(, 1, cfg_poll_timeout);
if (ret != 1)
error(1, errno, "poll");
 }
@@ -371,7 +373,8 @@ static void do_test(int family, unsigned int opt)
error(1, errno, "send");
 
/* wait for all errors to be queued, else ACKs arrive OOO */
-   usleep(50 * 1000);
+   if (!cfg_no_delay)
+   usleep(50 * 1000);
 
__poll(fd);
 
@@ -392,6 +395,9 @@ static void __attribute__((noreturn)) usage(const char 
*filepath)
"  -4:   only IPv4\n"
"  -6:   only IPv6\n"
"  -h:   show this message\n"
+   "  -c N: number of packets for each test\n"
+   "  -D:   no delay between packets\n"
+   "  -F:   poll() waits forever for an event\n"
"  -I:   request PKTINFO\n"
"  -l N: send N bytes at a time\n"
"  -n:   set no-payload option\n"
@@ -409,7 +415,7 @@ static void parse_opt(int argc, char **argv)
int proto_count = 0;
char c;
 
-   while ((c = getopt(argc, argv, "46hIl:np:rRux")) != -1) {
+   while ((c = getopt(argc, argv, "46c:DFhIl:np:rRux")) != -1) {
switch (c) {
case '4':
do_ipv6 = 0;
@@ -417,6 +423,15 @@ static void parse_opt(int argc, char **argv)
case '6':
do_ipv4 = 0;
break;
+   case 'c':
+   cfg_num_pkts = strtoul(optarg, NULL, 10);
+   break;
+   case 'D':
+   cfg_no_delay = true;
+   break;
+   case 'F':
+   cfg_poll_timeout = -1;
+   break;
case 'I':
cfg_do_pktinfo = true;
break;
-- 
2.16.2



[PATCH net-next v2 0/2] skbuff: Fix applications not being woken for errors

2018-03-14 Thread Vinicius Costa Gomes
Hi,

Changes from v1:
 - Fixed comments from Willem de Bruijn, about the order of the
 options passed to getopt();
 - Added Reviewed-by and Fixes tags to patch (2);

Changes from the RFC:
 - tweaked commit messages;

Original cover letter:

This is actually a "bug report"-RFC instead of the more usual "new
feature"-RFC.

We are developing an application that uses TX hardware timestamping to
make some measurements, and during development Randy Witt initially
reported that the application poll() never unblocked when TX hardware
timestamping was enabled.

After some investigation, it turned out the problem wasn't only
exclusive to hardware timestamping, and could be reproduced with
software timestamping.

Applying patch (1), and running txtimestamp like this, for example:

$ ./txtimestamp -u -4 192.168.1.71 -c 1000 -D -l 1000 -F

('-u' to use UDP only, '-4' for ipv4 only, '-c 1000' to send 1000
packets for each test, '-D' to remove the delay between packets, '-l
1000' to set the payload to 1000 bytes, '-F' for configuring poll() to
wait forever)

will cause the application to become stuck in the poll() call in most
of the times. (Note: I couldn't reproduce the issue running against an
address that is routed through loopback.)

Another interesting fact is that if the POLLIN event is added to the
poll() .events, poll() no longer becomes stuck, and more interestingly
the returned event in .revents is only POLLERR.

After a few debugging sessions, we got to 'sock_queue_err_skb()' and
how it notifies applications of the error just enqueued. Changing it
to use 'sk->sk_error_report()', fixes the issue for hardware and
software timestamping. That is patch (2).

The "solution" proposed in patch (2) looks like too big a hammer, if
it's not, then it seems that this problem existed since a long time
ago (pre git) and was uncommon for folks to reach the necessary
conditions to trigger it (my hypothesis is that only triggers when the
error is reported from a different task context than the application).

Am I missing something here?

Cheers,
--

Vinicius Costa Gomes (2):
  selftests/txtimestamp: Add more configurable parameters
  skbuff: Fix not waking applications when errors are enqueued

 net/core/skbuff.c   |  2 +-
 .../selftests/networking/timestamping/txtimestamp.c | 21 ++---
 2 files changed, 19 insertions(+), 4 deletions(-)

--
2.16.2


[PATCH net] skbuff: Fix not waking applications when errors are enqueued

2018-03-14 Thread Vinicius Costa Gomes
When errors are enqueued to the error queue via sock_queue_err_skb()
function, it is possible that the waiting application is not notified.

Calling 'sk->sk_data_ready()' would not notify applications that
selected only POLLERR events in poll() (for example).

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Randy E. Witt <randy.e.w...@intel.com>
Reviewed-by: Eric Dumazet <eduma...@google.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 net/core/skbuff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index baf990528943..1ea4be79c0a7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4179,7 +4179,7 @@ int sock_queue_err_skb(struct sock *sk, struct sk_buff 
*skb)
 
skb_queue_tail(>sk_error_queue, skb);
if (!sock_flag(sk, SOCK_DEAD))
-   sk->sk_data_ready(sk);
+   sk->sk_error_report(sk);
return 0;
 }
 EXPORT_SYMBOL(sock_queue_err_skb);
-- 
2.16.2



RE: [Intel-wired-lan] [next-queue PATCH v4 6/8] igb: Add MAC address support for ethtool nftuple filters

2018-03-14 Thread Vinicius Costa Gomes
Hi,

"Brown, Aaron F" <aaron.f.br...@intel.com> writes:

>> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On
>> Behalf Of Vinicius Costa Gomes
>> Sent: Wednesday, March 7, 2018 4:37 PM
>> To: intel-wired-...@lists.osuosl.org
>> Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus > palen...@intel.com>
>> Subject: [Intel-wired-lan] [next-queue PATCH v4 6/8] igb: Add MAC address
>> support for ethtool nftuple filters
>> 
>> This adds the capability of configuring the queue steering of arriving
>> packets based on their source and destination MAC addresses.
>> 
>> In practical terms this adds support for the following use cases,
>> characterized by these examples:
>> 
>> $ ethtool -N eth0 flow-type ether dst aa:aa:aa:aa:aa:aa action 0
>> (this will direct packets with destination address "aa:aa:aa:aa:aa:aa"
>> to the RX queue 0)
>> 
>> $ ethtool -N eth0 flow-type ether src 44:44:44:44:44:44 action 3
>> (this will direct packets with source address "44:44:44:44:44:44" to
>> the RX queue 3)
>
> This seems to work fine on i210, and the patch series allows me to set
> the rx filters on the i350, i354 and i211, but it is not directing the
> packets to the queue I request.
>

For the i211, it seems that the datasheet is slightly misleading: it has
the QSEL bit documented on the RAH registers, but the queue selection
bits are not mentioned, so it really seems that queue selection won't
work for this controller.

For the other cases (in a quick search I couldn't find the i354
datasheet), the semantics changes, it's more about pool selection than
queue selection, and it depends on vfs_allocated_count (>= 1) and the
number of rss_queues (<= 1) to get to the state where setting the queue
via filters would have the expected effect.

> With the exception of i210 the rx_queues number does not seem to be
> effected by setting the filter. In the case of i211 the rx packets
> stay on rx_queue 0 with or without an ether src or dst filter. The
> first example one seems to work at first since it's directing to queue
> 0, but changing the filter to "action 1" does not change the behavior.
> With the i350 and i354 ports the packets are spread across the
> rx_queues with or without the filter set.
>

So, what I am thinking is: make it an error selecting any queue
different than zero (this is expected to work for all controllers, and
it's what will be called when the user does something like 'ip maddr'),
for controller different than the i210. Later, if/when this feature is
needed for other controllers we can extend the checks.

Does this make sense?


Thanks,
--
Vinicius


Re: [PATCH net-next 0/2] skbuff: Fix applications not being woken for errors

2018-03-14 Thread Vinicius Costa Gomes
Hi,

Willem de Bruijn  writes:

>> Another interesting fact is that if the POLLIN event is added to the
>> poll() .events, poll() no longer becomes stuck,
>
> The process has registered interest only in POLLIN, which the call to
> sk_data_read (sock_def_readable) will trigger.
>
>> and more interestingly
>> the returned event in .revents is only POLLERR.
>
> datagram_poll will set (E)POLLERR based on non-empty sk_error_queue.
>
>> After a few debugging sessions, we got to 'sock_queue_err_skb()' and
>> how it notifies applications of the error just enqueued. Changing it
>> to use 'sk->sk_error_report()', fixes the issue for hardware and
>> software timestamping. That is patch (2).
>>
>> The "solution" proposed in patch (2) looks like too big a hammer,
>
> It looks fine to me. POLLERR is returned regardless of the mask a
> process sets up in pollfd.events. So waking with sk_error_report
> will fix this while still waking callers waiting on POLLIN.
>
> Note that on sock_dequeue_err_skb, if another notification (of the
> right kind) is waiting, sk_error_report is already called instead of
> sk_data_ready.

Thank you all who did confirm that this was the right fix.

>
> This should perhaps go to net, instead of net-next (though not the
> test).

Will propose it to net, what I am thinking is that now that a few TSN
features are upstream, applications that use hardware timestamping (the
easiest way to trigger this bug) could be more common.

>
> If resending, a small nit in the test: please keep the alphabetical
> order in getopt. The filepath also looks a bit fishy, but git am applied
> the mbox from patchwork without issue.

Will send a second version.


Thanks,
--
Vinicius


RE: [Intel-wired-lan] [next-queue PATCH v4 2/8] igb: Fix queue selection on MAC filters on i210 and i211

2018-03-14 Thread Vinicius Costa Gomes
Hi,

"Brown, Aaron F"  writes:

>> --- a/drivers/net/ethernet/intel/igb/igb_main.c
>> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
>> @@ -8747,12 +8747,19 @@ static void igb_rar_set_index(struct igb_adapter
>> *adapter, u32 index)
>>  if (is_valid_ether_addr(addr))
>>  rar_high |= E1000_RAH_AV;
>> 
>> -if (hw->mac.type == e1000_82575)
>> +switch (hw->mac.type) {
>> +case e1000_82575:
>> +case e1000_i210:
>> +case e1000_i211:
>> +rar_high |= E1000_RAH_QSEL_ENABLE;
>>  rar_high |= E1000_RAH_POOL_1 *
>> -adapter->mac_table[index].queue;
>> -else
>> +  adapter->mac_table[index].queue;
>> +break;
>> +default:
>>  rar_high |= E1000_RAH_POOL_1 <<
>> -adapter->mac_table[index].queue;
>> +adapter->mac_table[index].queue;
>
> Small nit.  Shouldn't this line be indented more to be a few spaces
> past the "|=" operator as above?

I don't know why my editor seemed to disagree, I will fix.


Thank you,
--
Vinicius


[PATCH net-next 1/2] selftests/txtimestamp: Add more configurable parameters

2018-03-13 Thread Vinicius Costa Gomes
Add a way to configure if poll() should wait forever for an event, the
number of packets that should be sent for each and if there should be
any delay between packets.

Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 .../selftests/networking/timestamping/txtimestamp.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/networking/timestamping/txtimestamp.c 
b/tools/testing/selftests/networking/timestamping/txtimestamp.c
index 5df07047ca86..5190b1dd78b1 100644
--- a/tools/testing/selftests/networking/timestamping/txtimestamp.c
+++ b/tools/testing/selftests/networking/timestamping/txtimestamp.c
@@ -68,9 +68,11 @@ static int cfg_num_pkts = 4;
 static int do_ipv4 = 1;
 static int do_ipv6 = 1;
 static int cfg_payload_len = 10;
+static int cfg_poll_timeout = 100;
 static bool cfg_show_payload;
 static bool cfg_do_pktinfo;
 static bool cfg_loop_nodata;
+static bool cfg_no_delay;
 static uint16_t dest_port = 9000;
 
 static struct sockaddr_in daddr;
@@ -171,7 +173,7 @@ static void __poll(int fd)
 
memset(, 0, sizeof(pollfd));
pollfd.fd = fd;
-   ret = poll(, 1, 100);
+   ret = poll(, 1, cfg_poll_timeout);
if (ret != 1)
error(1, errno, "poll");
 }
@@ -371,7 +373,8 @@ static void do_test(int family, unsigned int opt)
error(1, errno, "send");
 
/* wait for all errors to be queued, else ACKs arrive OOO */
-   usleep(50 * 1000);
+   if (!cfg_no_delay)
+   usleep(50 * 1000);
 
__poll(fd);
 
@@ -397,6 +400,9 @@ static void __attribute__((noreturn)) usage(const char 
*filepath)
"  -n:   set no-payload option\n"
"  -r:   use raw\n"
"  -R:   use raw (IP_HDRINCL)\n"
+   "  -D:   no delay between packets\n"
+   "  -F:   poll() waits forever for an event\n"
+   "  -c N: number of packets for each test\n"
"  -p N: connect to port N\n"
"  -u:   use udp\n"
"  -x:   show payload (up to 70 bytes)\n",
@@ -409,7 +415,7 @@ static void parse_opt(int argc, char **argv)
int proto_count = 0;
char c;
 
-   while ((c = getopt(argc, argv, "46hIl:np:rRux")) != -1) {
+   while ((c = getopt(argc, argv, "46hIl:np:rRuxc:DF")) != -1) {
switch (c) {
case '4':
do_ipv6 = 0;
@@ -447,6 +453,15 @@ static void parse_opt(int argc, char **argv)
case 'x':
cfg_show_payload = true;
break;
+   case 'c':
+   cfg_num_pkts = strtoul(optarg, NULL, 10);
+   break;
+   case 'D':
+   cfg_no_delay = true;
+   break;
+   case 'F':
+   cfg_poll_timeout = -1;
+   break;
case 'h':
default:
usage(argv[0]);
-- 
2.16.2



[PATCH net-next 0/2] skbuff: Fix applications not being woken for errors

2018-03-13 Thread Vinicius Costa Gomes
Hi,

Changes from the RFC:
 - tweaked commit messages;

Original cover letter:

This is actually a "bug report"-RFC instead of the more usual "new
feature"-RFC.

We are developing an application that uses TX hardware timestamping to
make some measurements, and during development Randy Witt initially
reported that the application poll() never unblocked when TX hardware
timestamping was enabled.

After some investigation, it turned out the problem wasn't only
exclusive to hardware timestamping, and could be reproduced with
software timestamping.

Applying patch (1), and running txtimestamp like this, for example:

$ ./txtimestamp -u -4 192.168.1.71 -c 1000 -D -l 1000 -F

('-u' to use UDP only, '-4' for ipv4 only, '-c 1000' to send 1000
packets for each test, '-D' to remove the delay between packets, '-l
1000' to set the payload to 1000 bytes, '-F' for configuring poll() to
wait forever)

will cause the application to become stuck in the poll() call in most
of the times. (Note: I couldn't reproduce the issue running against an
address that is routed through loopback.)

Another interesting fact is that if the POLLIN event is added to the
poll() .events, poll() no longer becomes stuck, and more interestingly
the returned event in .revents is only POLLERR.

After a few debugging sessions, we got to 'sock_queue_err_skb()' and
how it notifies applications of the error just enqueued. Changing it
to use 'sk->sk_error_report()', fixes the issue for hardware and
software timestamping. That is patch (2).

The "solution" proposed in patch (2) looks like too big a hammer, if
it's not, then it seems that this problem existed since a long time
ago (pre git) and was uncommon for folks to reach the necessary
conditions to trigger it (my hypothesis is that only triggers when the
error is reported from a different task context than the application).

Am I missing something here?


Cheers,
--

Vinicius Costa Gomes (2):
  selftests/txtimestamp: Add more configurable parameters
  skbuff: Fix not waking applications when errors are enqueued

 net/core/skbuff.c   |  2 +-
 .../selftests/networking/timestamping/txtimestamp.c | 21 ++---
 2 files changed, 19 insertions(+), 4 deletions(-)

--
2.16.2


[PATCH net-next 2/2] skbuff: Fix not waking applications when errors are enqueued

2018-03-13 Thread Vinicius Costa Gomes
When errors are enqueued to the error queue via sock_queue_err_skb()
function, it is possible that the waiting application is not notified.

Calling 'sk->sk_data_ready()' would not notify applications that
selected only POLLERR events in poll() (for example).

Reported-by: Randy E. Witt <randy.e.w...@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 net/core/skbuff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 715c13495ba6..6def3534f509 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4181,7 +4181,7 @@ int sock_queue_err_skb(struct sock *sk, struct sk_buff 
*skb)
 
skb_queue_tail(>sk_error_queue, skb);
if (!sock_flag(sk, SOCK_DEAD))
-   sk->sk_data_ready(sk);
+   sk->sk_error_report(sk);
return 0;
 }
 EXPORT_SYMBOL(sock_queue_err_skb);
-- 
2.16.2



Re: [RFC PATCH v0 2/2] skbuff: Notify errors with sk_error_report()

2018-03-13 Thread Vinicius Costa Gomes
Hi

Eric Dumazet <eric.duma...@gmail.com> writes:

> On 03/12/2018 04:10 PM, Vinicius Costa Gomes wrote:
>> When errors are enqueued to the error queue via sock_queue_err_skb()
>> function, it is possible that the correct application is not notified.
>
> Your patch makes sense, thanks.

Cool. Will send a non-RFC version then.

Would the changes to txtimestamp selftest be helpful?


Cheers,
--
Vinicius


[RFC PATCH v0 2/2] skbuff: Notify errors with sk_error_report()

2018-03-12 Thread Vinicius Costa Gomes
When errors are enqueued to the error queue via sock_queue_err_skb()
function, it is possible that the correct application is not notified.

Reported-by: Randy E. Witt <randy.e.w...@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.go...@intel.com>
---
 net/core/skbuff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 715c13495ba6..6def3534f509 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4181,7 +4181,7 @@ int sock_queue_err_skb(struct sock *sk, struct sk_buff 
*skb)
 
skb_queue_tail(>sk_error_queue, skb);
if (!sock_flag(sk, SOCK_DEAD))
-   sk->sk_data_ready(sk);
+   sk->sk_error_report(sk);
return 0;
 }
 EXPORT_SYMBOL(sock_queue_err_skb);
-- 
2.16.2



  1   2   3   >