Re: [dpdk-dev] [PATCH v4 06/29] graph: populate fastpath memory for graph reel

Kiran Kumar Kokkilagadda Wed, 08 Apr 2020 19:45:26 -0700


> -----Original Message-----
> From: dev <[email protected]> On Behalf Of Andrzej Ostruszka
> Sent: Wednesday, April 8, 2020 11:00 PM
> To: [email protected]
> Subject: Re: [dpdk-dev] [PATCH v4 06/29] graph: populate fastpath memory for
> graph reel
> 
> On 4/5/20 10:55 AM, [email protected] wrote:
> > From: Jerin Jacob <[email protected]>
> [...]
> > diff --git a/lib/librte_graph/graph_populate.c
> > b/lib/librte_graph/graph_populate.c
> > new file mode 100644
> > index 000000000..093512efa
> > --- /dev/null
> > +++ b/lib/librte_graph/graph_populate.c
> > @@ -0,0 +1,234 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2020 Marvell International Ltd.
> > + */
> > +
> > +#include <fnmatch.h>
> > +#include <stdbool.h>
> > +
> > +#include <rte_common.h>
> > +#include <rte_errno.h>
> > +#include <rte_malloc.h>
> > +#include <rte_memzone.h>
> > +
> > +#include "graph_private.h"
> > +
> > +static size_t
> > +graph_fp_mem_calc_size(struct graph *graph) {
> > +   struct graph_node *graph_node;
> > +   rte_node_t val;
> > +   size_t sz;
> > +
> > +   /* Graph header */
> > +   sz = sizeof(struct rte_graph);
> > +   /* Source nodes list */
> > +   sz += sizeof(rte_graph_off_t) * graph->src_node_count;
> > +   /* Circular buffer for pending streams of size number of nodes */
> > +   val = rte_align32pow2(graph->node_count * sizeof(rte_graph_off_t));
> > +   sz = RTE_ALIGN(sz, val);
> > +   graph->cir_start = sz;
> > +   graph->cir_mask = rte_align32pow2(graph->node_count) - 1;
> > +   sz += val;
> 
> Aren't here source nodes counted twice?  I'm trying now to wrap my head
> around how this all is structured and laid out in memory (thus the slowdown in
> review) so I am most probably missing something here.
>


Yes, we are counting source nodes offset, 2 times in the circular buffer. In 
fact intentionally we are allocating the circular buffer more than the required 
size (rte_align32pow2).
By allocating circular buffer with more size, at least in some cases we can 
avoid wraparound.
Let me try to explain how this memory reel and graph walk works.
This is how memory reel looks like.

1. Graph_header---> 2. FENCE ---> 3. [Graph walk always starts from here] 
memory for source node object offsets ---> 4. [circular buffer starts] enqueued 
node object offset [ circular buffer end] --> 5. FENCE ---> 6. Memory for Node 
objects

3 and 4 will have the offset of their corresponding node object in the 6.

Initially before graph walk start we will populate the 3 (see 
graph_src_nodes_populate) and when the graph walk start first we will go over 3 
and based on the enqueues , we will populate the 4 and this is where we are 
creating circle (we will be keep walking in 4 till there are no more enqueues). 
So, circular buffer is actually walk the source nodes first then will create 
circle for enqueued nodes (4).

  
> > +   /* Fence */
> > +   sz += sizeof(RTE_GRAPH_FENCE);
> > +   sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
> > +   graph->nodes_start = sz;
> > +   /* For 0..N node objects with fence */
> > +   STAILQ_FOREACH(graph_node, &graph->node_list, next) {
> > +           sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
> > +           sz += sizeof(struct rte_node);
> > +           /* Pointer to next nodes(edges) */
> > +           sz += sizeof(struct rte_node *) * graph_node->node->nb_edges;
> > +   }
> > +
> > +   graph->mem_sz = sz;
> > +   return sz;
> > +}
> > +
> > +static void
> > +graph_header_popluate(struct graph *_graph) {
> > +   struct rte_graph *graph = _graph->graph;
> > +
> > +   graph->tail = 0;
> > +   graph->head = (int32_t)-_graph->src_node_count;
> > +   graph->cir_mask = _graph->cir_mask;
> > +   graph->nb_nodes = _graph->node_count;
> > +   graph->cir_start = RTE_PTR_ADD(graph, _graph->cir_start);
> > +   graph->nodes_start = _graph->nodes_start;
> > +   graph->socket = _graph->socket;
> > +   graph->id = _graph->id;
> > +   memcpy(graph->name, _graph->name, RTE_GRAPH_NAMESIZE);
> 
> As I've mentioned above I'm learning the structure of the lib/memory so quick
> question here.  My understanding is that rte_graph is a "view of the 'struct
> graph' sufficient for worker" so does it need both id & name?  Both of them
> seems to be used in error or dump/debug paths.  It probably doesn't matter 
> (e.g.
> for performance) - just asking because 'id' seems to be used only in one place
> (where name could replace it probably).
> 

User will have access to the node info both ways using either name or ID. These 
are used in slow path. 
It is up to the user how he wants to use it.  


> > +   graph->fence = RTE_GRAPH_FENCE;
> > +}
> > +
> > +static void
> > +graph_nodes_populate(struct graph *_graph) {
> > +   rte_graph_off_t off = _graph->nodes_start;
> > +   struct rte_graph *graph = _graph->graph;
> > +   struct graph_node *graph_node;
> > +   rte_edge_t count, nb_edges;
> > +   const char *parent;
> > +   rte_node_t pid;
> > +
> > +   STAILQ_FOREACH(graph_node, &_graph->node_list, next) {
> > +           struct rte_node *node = RTE_PTR_ADD(graph, off);
> > +           memset(node, 0, sizeof(*node));
> > +           node->fence = RTE_GRAPH_FENCE;
> > +           node->off = off;
> > +           node->process = graph_node->node->process;
> > +           memcpy(node->name, graph_node->node->name,
> RTE_GRAPH_NAMESIZE);
> > +           pid = graph_node->node->parent_id;
> > +           if (pid != RTE_NODE_ID_INVALID) { /* Cloned node */
> > +                   parent = rte_node_id_to_name(pid);
> > +                   memcpy(node->parent, parent,
> RTE_GRAPH_NAMESIZE);
> > +           }
> > +           node->id = graph_node->node->id;
> > +           node->parent_id = pid;
> > +           nb_edges = graph_node->node->nb_edges;
> > +           node->nb_edges = nb_edges;
> > +           off += sizeof(struct rte_node);
> > +           /* Copy the name in first pass to replace with rte_node* later*/
> > +           for (count = 0; count < nb_edges; count++)
> > +                   node->nodes[count] = (struct rte_node *)&graph_node
> > +                                                ->adjacency_list[count]
> > +                                                ->node->name[0];
> 
> I'm not sure I understand what is going here.  Please see below ...


See below.

> 
> > +
> > +           off += sizeof(struct rte_node *) * nb_edges;
> > +           off = RTE_ALIGN(off, RTE_CACHE_LINE_SIZE);
> > +           node->next = off;
> > +           __rte_node_stream_alloc(graph, node);
> > +   }
> > +}
> [...]
> > +static int
> > +graph_node_nexts_populate(struct graph *_graph) {
> > +   rte_node_t count, val;
> > +   rte_graph_off_t off;
> > +   struct rte_node *node;
> > +   const struct rte_graph *graph = _graph->graph;
> > +   const char *name;
> > +
> > +   rte_graph_foreach_node(count, off, graph, node) {
> > +           for (val = 0; val < node->nb_edges; val++) {
> > +                   name = (const char *)node->nodes[val];
> > +                   node->nodes[val] = graph_node_name_to_ptr(graph,
> name);
> 
> ... Is it so that during node the first loop above some node might refer (by 
> name)
> to other node that is not yet "registered" so instead of storing rte_node 
> pointer
> you stored actually pointer to name which you now update to proper rte_node?

Exactly, it is because next nodes are based on name not based on ID. All we 
need is user has to create all the nodes before graph create. So, that at the 
time of graph create we will take care of linking the actual nodes based on 
name.

> 
> > +                   if (node->nodes[val] == NULL)
> > +                           SET_ERR_JMP(EINVAL, fail, "%s not found",
> name);
> > +           }
> > +   }
> > +
> > +   return 0;
> > +fail:
> > +   return -rte_errno;
> > +}
> [...]
> 
> With regards
> Andrzej Ostruszka

Re: [dpdk-dev] [PATCH v4 06/29] graph: populate fastpath memory for graph reel

Reply via email to