[dpdk-dev] ipv4 fragmentation bug?

2016-09-15 Thread Александр Киселев
I am sorry for the late reply.

I am not sure anymore about is it a bug I found or the author of
rte_ipv4_fragment_packet() realy
wanted to constraint the size of mtu writing lines:

frag_size = (uint16_t)(mtu_size - sizeof(struct ipv4_hdr));
/* Fragment size should be a multiply of 8. */
assert((frag_size & IPV4_HDR_FO_MASK) == 0);

So, if we assume that any mtu size is valid then it's a bug and the
function must be rewriten.
Otherwise, since mtu_size is an input parameter of the function,
validation should be stronger than
just a assertion or it should be noted in the documentation that
not all values for the paremater mtu_size are valid.

I can write a patch. I just need a confirmation since
I am not sure about the networking background regarding MTU.
I tried to find anything about MTU in the RFC, but so far nothing.
According to RFC all mtu sizes are valid.



>2016-08-22 14:31 GMT+03:00 Thomas Monjalon :
>Hi,

> 2016-08-15 20:30 GMT+03:00 ? ??? :
> > While playing with function rte_ipv4_fragment_packet I found that it
> > incorrectly fragments packets.
> > For example if the function takes 1200 bytes packet and mtu size 1000 it
> > will produces two fragments. And when those fragments are reassembled back
> > the resulting packet will be 4 bytes shorter than it should be.
> >
> > I played with linux ping program and it reports that a reply is truncated.
> > 1204 bytes from 192.168.125.1: icmp_seq=1 ttl=64 (truncated)
> >
> > Looking at the source of rte_ipv4_fragment_packet I discovered the cause
> > of the above behavior.
> >
> > Function makes the following assumption and the whole calculations are
> > bases on that assumption.
> >
> > /* Fragment size should be a multiply of 8. */
> > IP_FRAG_ASSERT((frag_size & IPV4_HDR_FO_MASK) == 0);
> >
> > The problem is that this assert doesn?t make any sense. It's true that
> > fragment size should be a multiply of 8, but what this line real checks is
> > that
> > the size of mtu minus 20 bytes should be multiply of 8. In other words
> > it constrains the size of the mtu. So, if I take valid mtu value, say
> > 1504,
> > it will produce incorrect fragments when asserts are off.

>Thanks for reporting.
>
>2016-08-15 20:48, ? ???:
>> I'am sorry. Looks like having an mtu value multiply of 8 is a good practice.
>>
>> But mtu value 1504 is also widely used in qinq linux interfaces.

>Please, would like to write a patch for master branch?
>Or do you prefer to delegate it to someone reading this thread?



--
Alexander Kiselev


[dpdk-dev] ipv4 fragmentation bug?

2016-08-15 Thread Александр Киселев
I'am sorry. Looks like having an mtu value multiply of 8 is a good practice.

But mtu value 1504 is also widely used in qinq linux interfaces.

2016-08-15 20:30 GMT+03:00 ? ??? :

> While playing with function rte_ipv4_fragment_packet I found that it
> incorrectly fragments packets.
> For example if the function takes 1200 bytes packet and mtu size 1000 it
> will produces two fragments. And when those fragments are reassembled back
> the resulting packet will be 4 bytes shorter than it should be.
>
> I played with linux ping program and it reports that a reply is truncated.
> 1204 bytes from 192.168.125.1: icmp_seq=1 ttl=64 (truncated)
>
> Looking at the source of rte_ipv4_fragment_packet I discovered the cause
> of the above behavior.
>
> Function makes the following assumption and the whole calculations are
> bases on that assumption.
>
> /* Fragment size should be a multiply of 8. */
> IP_FRAG_ASSERT((frag_size & IPV4_HDR_FO_MASK) == 0);
>
> The problem is that this assert doesn?t make any sense. It's true that
> fragment size should be a multiply of 8, but what this line real checks is
> that
> the size of mtu minus 20 bytes should be multiply of 8. In other words
> it constrains the size of the mtu. So, if I take valid mtu value, say
> 1504,
> it will produce incorrect fragments when asserts are off.
>
> P.S.
> I am using DPDK v 2.2.0
>
> --
> --
> Kiselev Alexander
>



-- 
--
Kiselev Alexander


[dpdk-dev] ipv4 fragmentation bug?

2016-08-15 Thread Александр Киселев
While playing with function rte_ipv4_fragment_packet I found that it
incorrectly fragments packets.
For example if the function takes 1200 bytes packet and mtu size 1000 it
will produces two fragments. And when those fragments are reassembled back
the resulting packet will be 4 bytes shorter than it should be.

I played with linux ping program and it reports that a reply is truncated.
1204 bytes from 192.168.125.1: icmp_seq=1 ttl=64 (truncated)

Looking at the source of rte_ipv4_fragment_packet I discovered the cause of
the above behavior.

Function makes the following assumption and the whole calculations are
bases on that assumption.

/* Fragment size should be a multiply of 8. */
IP_FRAG_ASSERT((frag_size & IPV4_HDR_FO_MASK) == 0);

The problem is that this assert doesn?t make any sense. It's true that
fragment size should be a multiply of 8, but what this line real checks is
that
the size of mtu minus 20 bytes should be multiply of 8. In other words
it constrains the size of the mtu. So, if I take valid mtu value, say 1504,
it will produce incorrect fragments when asserts are off.

P.S.
I am using DPDK v 2.2.0

-- 
--
Kiselev Alexander


[dpdk-dev] perfomance of rte_lpm rule subsystem

2016-05-02 Thread Александр Киселев
Stephen, what was the main reason you use red-black tree instead of dir-24-8?
Did you switch to using trees because of too big memory working set of
dir-24-8 algorithm?


2016-04-19 18:46 GMT+03:00 Stephen Hemminger :

> On Tue, 19 Apr 2016 14:11:11 +0300
> ? ???  wrote:
>
> > Hi.
> >
> > Doing some test with rte_lpm (adding/deleting bgp full table rules) I
> > noticed that
> > rule subsystem is very slow even considering that probably it was never
> > designed for using
> > in a data forwarding plane. So I want to propose some changes to the
> "rule"
> > subsystem.
> >
> > I reimplemented rule part ot the lib using rte_hash, and perfomance of
> > adding/deleted routes have increased dramatically.
> > If increasing speed of adding deleting routes makes sence for anybody
> else
> > I would like to discuss my patch.
> > The patch also include changes that make next_hop 64 bit, so please just
> > ignore them. The rule changes are in the following
> > functions only:
> >
> > rte_lpm2_create
> >
> > rule_find
> > rule_add
> > rule_delete
> > find_previous_rule
> > delete_depth_small
> > delete_depth_big
> >
> > rte_lpm2_add
> > rte_lpm2_delete
> > rte_lpm2_is_rule_present
> > rte_lpm2_delete_all
> >
>
> We forked LPM back several versions ago.
> I sent the patches to use BSD red-black tree for rules but the patches were
> ignored. mostly because it broke ABI.
>



-- 
--
Kiselev Alexander


[dpdk-dev] lpm rule subsystem's perfomance

2016-04-19 Thread Александр Киселев
Hi.

Doing some tests with rte_lpm (adding/deleting bgp full table rules) I
noticed that
rule subsystem is very slow even considering that probably it was never
designed for using
in a data forwarding plane. So I want to propose some changes to it.

I reimplemented "rule" part ot the lib using rte_hash, and perfomance of
adding/deleted routes have increased dramatically.
If increasing speed of adding deleting routes makes sence for anybody else
I would like to discuss my patch.
The patch also include changes that make next_hop 64 bit, so please just
ignore them. The rule changes are in the following
functions only:

rte_lpm2_create

rule_find
rule_add
rule_delete
find_previous_rule
delete_depth_small
delete_depth_big

rte_lpm2_add
rte_lpm2_delete
rte_lpm2_is_rule_present
rte_lpm2_delete_all

P.S. the patch was made for 2.2.0 version.
P.P.S. Would it be more convinient to include full source file instead of
patch?

Alex Kiselev.

Patch

--- ./rte_lpm.c 2016-04-19 13:58:44.670649240 +0300
+++ ./rte_lpm2.c 2016-04-19 13:58:44.686649240 +0300
@@ -45,6 +45,9 @@
 #include /* for definition of RTE_CACHE_LINE_SIZE */
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -53,14 +56,14 @@
 #include 
 #include 

-#include "rte_lpm.h"
+#include "rte_lpm2.h"

-TAILQ_HEAD(rte_lpm_list, rte_tailq_entry);
+TAILQ_HEAD(RTE_LPM2_list, rte_tailq_entry);

-static struct rte_tailq_elem rte_lpm_tailq = {
- .name = "RTE_LPM",
+static struct rte_tailq_elem RTE_LPM2_tailq = {
+ .name = "RTE_LPM2",
 };
-EAL_REGISTER_TAILQ(rte_lpm_tailq)
+EAL_REGISTER_TAILQ(RTE_LPM2_tailq)

 #define MAX_DEPTH_TBL24 24

@@ -70,10 +73,10 @@
 };

 /* Macro to enable/disable run-time checks. */
-#if defined(RTE_LIBRTE_LPM_DEBUG)
+#if defined(RTE_LIBRTE_LPM2_DEBUG)
 #include 
 #define VERIFY_DEPTH(depth) do {\
- if ((depth == 0) || (depth > RTE_LPM_MAX_DEPTH))\
+ if ((depth == 0) || (depth > RTE_LPM2_MAX_DEPTH))\
  rte_panic("LPM: Invalid depth (%u) at line %d", \
  (unsigned)(depth), __LINE__);   \
 } while (0)
@@ -113,25 +116,25 @@
  return 1 << (MAX_DEPTH_TBL24 - depth);

  /* Else if depth is greater than 24 */
- return (1 << (RTE_LPM_MAX_DEPTH - depth));
+ return (1 << (RTE_LPM2_MAX_DEPTH - depth));
 }

 /*
  * Find an existing lpm table and return a pointer to it.
  */
-struct rte_lpm *
-rte_lpm_find_existing(const char *name)
+struct rte_lpm2 *
+rte_lpm2_find_existing(const char *name)
 {
- struct rte_lpm *l = NULL;
+ struct rte_lpm2 *l = NULL;
  struct rte_tailq_entry *te;
- struct rte_lpm_list *lpm_list;
+ struct RTE_LPM2_list *lpm_list;

- lpm_list = RTE_TAILQ_CAST(rte_lpm_tailq.head, rte_lpm_list);
+ lpm_list = RTE_TAILQ_CAST(RTE_LPM2_tailq.head, RTE_LPM2_list);

  rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
  TAILQ_FOREACH(te, lpm_list, next) {
- l = (struct rte_lpm *) te->data;
- if (strncmp(name, l->name, RTE_LPM_NAMESIZE) == 0)
+ l = (struct rte_lpm2 *) te->data;
+ if (strncmp(name, l->name, RTE_LPM2_NAMESIZE) == 0)
  break;
  }
  rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -145,22 +148,148 @@
 }

 /*
+ * Finds a rule in rule table.
+ * NOTE: Valid range for depth parameter is 1 .. 32 inclusive.
+ */
+static inline struct RTE_LPM2_rule*
+rule_find(struct rte_lpm2 *lpm, uint32_t ip_masked, uint8_t depth)
+{
+ VERIFY_DEPTH(depth);
+
+ lpm_rule_key_t rule_key;
+ rule_key.net = ip_masked;
+ rule_key.mask = (uint32_t) depth;
+
+ struct RTE_LPM2_rule* rule;
+
+ int ret = rte_hash_lookup_data(lpm->rules_hash_tbl, (void*) &rule_key,
+ (void**) &rule);
+
+   if (ret >= 0) {
+  return rule;
+   }
+
+ /* rule is not found */
+ return NULL;
+}
+
+/*
+ * Finds a rule in rule table.
+ * NOTE: Valid range for depth parameter is 1 .. 32 inclusive.
+ */
+static inline struct RTE_LPM2_rule*
+rule_find_by_key(struct rte_lpm2 *lpm, lpm_rule_key_t* rule_key)
+{
+ struct RTE_LPM2_rule* rule;
+
+ int ret = rte_hash_lookup_data(lpm->rules_hash_tbl, (void*) rule_key,
+ (void**) &rule);
+
+   if (ret >= 0) {
+  return rule;
+   }
+
+ /* rule is not found */
+ return NULL;
+}
+
+/*
+ * Finds a rule in rule table.
+ * NOTE: Valid range for depth parameter is 1 .. 32 inclusive.
+ */
+static inline struct RTE_LPM2_rule*
+rule_find_by_key_with_hash(struct rte_lpm2 *lpm, lpm_rule_key_t* rule_key,
+  hash_sig_t sig)
+{
+ struct RTE_LPM2_rule* rule;
+
+ int ret = rte_hash_lookup_with_hash_data(lpm->rules_hash_tbl, (void*)
rule_key,
+ sig, (void**) &rule);
+
+   if (ret >= 0) {
+  return rule;
+   }
+
+ /* rule is not found */
+ return NULL;
+}
+
+/*
+ * hash function for rules hash table
+ */
+static inline uint32_t lpm_prefix_hash_crc(const void *data,
+  __rte_unused uint32_t data_len, uint32_t init_val)
+{
+   // printf("lpm_prefix_hash_crc init: %ul\n", init_val);
+
+ init_val = rte_hash_crc_8byte(*((uint64_t*) data), init_val);
+
+   // printf("lpm_prefix_hash_crc result: %ul\n", init_val);
+
+ return init_val;
+}
+
+/*
  * Allocates memory for LPM object
  */

[dpdk-dev] perfomance of rte_lpm rule subsystem

2016-04-19 Thread Александр Киселев
Hi.

Doing some test with rte_lpm (adding/deleting bgp full table rules) I
noticed that
rule subsystem is very slow even considering that probably it was never
designed for using
in a data forwarding plane. So I want to propose some changes to the "rule"
subsystem.

I reimplemented rule part ot the lib using rte_hash, and perfomance of
adding/deleted routes have increased dramatically.
If increasing speed of adding deleting routes makes sence for anybody else
I would like to discuss my patch.
The patch also include changes that make next_hop 64 bit, so please just
ignore them. The rule changes are in the following
functions only:

rte_lpm2_create

rule_find
rule_add
rule_delete
find_previous_rule
delete_depth_small
delete_depth_big

rte_lpm2_add
rte_lpm2_delete
rte_lpm2_is_rule_present
rte_lpm2_delete_all


P.S. the patch was made for 2.2.0 version.
P.P.S. Would it be more convinient to include full source file instead of
patch?