On 4 February 2015 at 00:48, Bill Fischofer <bill.fischo...@linaro.org> wrote: > Passing in an application-supplied shm is guesswork since there is no > architected way for an application to know how much storage an > implementation needs to honor any given request. The way this currently > works is the call succeeds if the supplied shm is large enough and fails > otherwise. I see no reason to change that behavior. Because the size of the shm region allocated for packet storage might be determined by other means (e.g. because you are partitioning your total memory map and allocating areas for different purposes) and you will get as many packets as the implementation can fit into that space. Basically if you specify a shm region (of some size), this will override any requested number of packets (or segments). But it could still be useful to understand how many packet will actually fit into that space.
> > We originally capped the requested size to ODP_CONFIG_PACKET_BUF_LEN_MAX but > that immediately broke a number of the crypto tests which need 32K buffers. > That's why the code currently treats requests for large sizes as a request > for an unsegmented pool. Isn't this a behavior that should be part of the API specification? Per your description, this seems just to be part of the linux-generic implementation and could be interpreted differently by other implementations. Personally however, I think unsegmented pools should be requested specifically and not a characteristic that is inferred from other parameters. Default is to get whatever segmentation the HW happens to use (at least 256 bytes per our ODP prerequisite). When the application needs an exception, this request should be made explicit. > > I don't understand the reluctance to recognize unsegmented pools as a useful > construct. All of this attempted configurability is easily solved in a > fully portable manner by saying that if an application has some reason for > wanting things to be of a specific size it can simply request an unsegmented > pool and be assured that packets will never appear segmented to it. There > was universal agreement last Summer that this was useful so I'm not sure > what changed in the interim. > > Performance may suffer on some platforms when using unsegmented pools, but > presumably that's a trade-off the application has consciously chosen. > Otherwise the application should be content to use whatever segment sizes > the implementation chooses since the relevant ODP APIs always return a > seglen to enable it to navigate them as needed, and we've already stipulated > that the implementation-chosen segment size will never be less than 256 > bytes. > > A bit of diversion into HW design may explain my obstinacy here. The > scaling problems with network processing at higher speeds all have to do > with memory processing speeds. The rule of thumb is that in the time that > network speeds grow by a factor of 10, corresponding memory speeds grow by a > factor of 3. This means that for every generation in the progression from > 1Gb to 10Gb to 100Gb and beyond at each step memory bottlenecks become more > and more the gating factor on the ability to handle traffic. Between 10Gb > and 100Gb you have to go to NUMA-type architectures because byte-addressable > SMP DRAM simply cannot keep pace. This has nothing to do with how clever > your SW is or how fast your processors are. It's the memory latencies and > access times that kill you. Playing around with cache alignments and such > buys you nothing. > So what HW designers do is create segregated DRAMs designed and optimized > for packet processing. Multiple interleaved memories, in fact, since they > need to be multiplexed to overcome the I/O-memory performance disparity. > These memories have two properties that are relevant to ODP. First, they > are not directly addressable by processing cores. This is why there is > always some sort of implicit or explicit mapping function needed to make the > contents of these memories addressable to cores. Second, they are > block-addressable, not byte-addressable. The reason for this is that > address decode lines take silicon die area and power and have a real cost > associated with them. By eliminating byte addressability you make the > memories both faster and cheaper since address decode logic is one of the > more expensive components of any memory subsystem. Cores need byte > addressability to run SW, but HW components use whatever addressability > makes sense for them. This is where HW packet segmentation arises. It's > not done for arbitrary reasons and it's "baked into" the design of the > memory system and is not subject to change by SW. > > So this is why the implementation choses packet segment sizes. It's because > it's matched to the block sizes used by the HW packet memory subsystem. > It's also why SW should process packets on a per-segment basis, because > otherwise it's incurring additional latency until all of the segments > containing a packet can be made available to it. HW will typically prefetch > the first segment and start the core processing it as soon as it's available > and let the SW request additional segments as needed since by design it's > expected that any information the SW may need (i.e., the packet headers) > will reside in the first segment. > > I hope that makes sense. The point is that scalable designs can scale down > to lower speeds easily while those designed for concepts that only work at > lower speeds break completely as you try to scale them up to higher levels. > So if we design ODP for shared-memory SMP systems we need to understand that > these APIs will need to be completely reworked to enable applications to > work with high-speed devices. So best to get them right in the first place. > > Bill > > On Tue, Feb 3, 2015 at 3:04 PM, Ola Liljedahl <ola.liljed...@linaro.org> > wrote: >> >> My alternative approach that should achieve that same goals as Petri >> but give more freedom to implementations. You don't have to approve of >> it, I just want to show that given a defined and understood problem, >> many potential solutions exist and the first alternative might not be >> the best. Let's work together. >> >> * init_seg_len: >> On input: user's required (desired) minimal length of initial >> segment (including headroom) >> On output: implementation's best effort to match user's request >> Purpose: ensure that those packet headers the application >> normally will process are stored in a consecutive memory area. >> Applications do not have to check any configuration in order >> to initialize a configuration which the implementation anyway has to >> check if it can support. >> Applications should check output values to see if its desired >> values were matched. The application decides whether a failure to >> match is a fatal error or the application can handle the situation >> anyway (e.g. with degraded performance because it has to do some >> workarounds in SW). >> >> * seg_len: >> On input: user's desired length of other segments >> On output: implementation's best effort to match user's request >> Purpose: a hint from the user how to partition to pool into >> segments for best trade-off between memory utilization and SW >> processing performance. >> Note: I know some HW can support multiple segment sizes so I >> think it is useful for the API to allow for this. Targets which only >> support one segment size (per packet pool) could use e.g. >> max(int_seg_len, seg_len). Some targets may not allow user-defined >> segment sizes at all, the ODP implementation will just return the >> actual values and the application can check whether those are >> acceptable. >> >> * init_seg_num: >> On input: Number of initial segments. >> On output: Updated with actual number of segments if a shared >> memory region was specified? >> * seg_num: >> On input: Number of other segments. >> On output: Updated with actual number of segments if a shared >> memory region was specified? >> >> I dislike the defines because they will make a future portable ABI >> (binary interface) definition impossible. We will need a portable ABI >> to support e.g. shared library implementations. So all ODP_CONFIG's >> should only be used internally (by the implementation in question) and >> not be accessible as compile time constants to applications, i.e. they >> should not be part of the ODP API. Are there any other instances where >> the application is supposed to use these constants? >> >> -- Ola >> >> On 3 February 2015 at 14:31, Ola Liljedahl <ola.liljed...@linaro.org> >> wrote: >> > On 3 February 2015 at 13:59, Petri Savolainen >> > <petri.savolai...@linaro.org> wrote: >> >> Completed odp_pool_param_t definition with packet pool parameters. >> >> Parameter definition is close to what we are using already. >> >> >> >> * seg_len: Defines minimum segment buffer length. >> >> With this parameter user can: >> >> * trade-off between pool memory usage and SW performance >> >> (linear memory access) >> >> * avoid segmentation in packet head (e.g. if legacy app >> >> cannot handle >> >> segmentation in the middle of the packet headers) >> > We already had defined a minimum segment size for conforming ODP >> > implementations. Isn't that enough? >> > >> > I can see value in specifying the minimum size of the first segment of >> > a packet (which would contain all headers the application is going to >> > process). But this proposal goes much further than that. >> > >> > >> >> * seg_len < ODP_CONFIG_PACKET_SEG_LEN_MIN is rounded up to >> >> ODP_CONFIG_PACKET_SEG_LEN_MIN >> >> * seg_len > ODP_CONFIG_PACKET_SEG_LEN_MAX is not valid >> >> >> >> * seg_align: Defines minimum segment buffer alignment. With this >> >> parameter, >> >> user can force buffer alignment to match e.g. aligment >> >> requirements >> >> of data structures stored in or algorithms accessing the >> >> packet >> > Can you give a practical example of when this configuration is useful? >> > To my knowledge, most data structures have quite small alignment >> > requirements, e.g. based on alignment requirements of individual >> > fields. But here I assume that we would specify alignment in multiples >> > of cache lines here (because the minimum segment alignment would be >> > the cache line size). >> > >> >> headroom. When user don't have specific alignment >> >> requirement 0 >> >> should be used for default. >> >> >> >> * seg_num: Number of segments. This is also the maximum number of >> >> packets. >> > I think these configurations could be hints but not strict >> > requirements. They do not change the *functionality* so an application >> > should not fail if these configurations can not be obeyed (except for >> > that legacy situation you describe above). The hints enable more >> > optimal utilization of e.g. packet memory and may decrease SW overhead >> > during packet processing but do not change the functionality. >> > >> > To enable different hardware implementations, ODP apps should not >> > enforce unnecessary (non-functional) requirements on the ODP >> > implementations and limit the number of targets ODP can be implemented >> > on. ODP is not DPDK. >> > >> > Applications should also not have to first check the limits of the >> > specific ODP implementation (as you suggested yesterday), adapts its >> > configuration to that and then send back those requirements to the ODP >> > implementation (which still has to check the parameters to verify that >> > they are valid). This is too complicated and will likely lead to code >> > that cheats and thus is not portable. Better for applications just to >> > specify its requested configuration to ODP and then get back the >> > results (i.e. actual values that will be used). The application can >> > then if necessary check that the configuration was honored. This >> > follows the normal programming flow. >> > >> >> >> >> Signed-off-by: Petri Savolainen <petri.savolai...@linaro.org> >> >> --- >> >> include/odp/api/pool.h | 26 +++++++++++++++++++++----- >> >> 1 file changed, 21 insertions(+), 5 deletions(-) >> >> >> >> diff --git a/include/odp/api/pool.h b/include/odp/api/pool.h >> >> index d09d92e..a1d7494 100644 >> >> --- a/include/odp/api/pool.h >> >> +++ b/include/odp/api/pool.h >> >> @@ -61,13 +61,29 @@ typedef struct odp_pool_param_t { >> >> of 8. */ >> >> uint32_t num; /**< Number of buffers in the >> >> pool */ >> >> } buf; >> >> -/* Reserved for packet and timeout specific params >> >> struct { >> >> - uint32_t seg_size; >> >> - uint32_t seg_align; >> >> - uint32_t num; >> >> + uint32_t seg_len; /**< Minimum packet segment >> >> buffer >> >> + length in bytes. It >> >> includes >> >> + possible >> >> head-/tailroom bytes. >> >> + The maximum value is >> >> defined by >> >> + >> >> ODP_CONFIG_PACKET_SEG_LEN_MAX. >> >> + Use 0 for default >> >> length. */ >> >> + uint32_t seg_align; /**< Minimum packet segment >> >> buffer >> >> + alignment in bytes. >> >> Valid >> >> + values are powers of >> >> two. The >> >> + maximum value is >> >> defined by >> >> + >> >> ODP_CONFIG_PACKET_SEG_ALIGN_MAX >> >> + . Use 0 for default >> >> alignment. >> >> + Default will always be >> >> a >> >> + multiple of 8. >> >> + */ >> >> + uint32_t seg_num; /**< Number of packet >> >> segments in >> >> + the pool. This is also >> >> the >> >> + maximum number of >> >> packets, >> >> + since each packet >> >> consist of >> >> + at least one segment. >> > What if both seg_num and a shared memory region is specified in the >> > odp_pool_create call? Which takes precedence? >> > >> >> + */ >> >> } pkt; >> >> -*/ >> >> struct { >> >> uint32_t __res1; /* Keep struct identical to >> >> buf, */ >> >> uint32_t __res2; /* until pool implementation >> >> is fixed*/ >> >> -- >> >> 2.2.2 >> >> >> >> >> >> _______________________________________________ >> >> lng-odp mailing list >> >> lng-odp@lists.linaro.org >> >> http://lists.linaro.org/mailman/listinfo/lng-odp >> >> _______________________________________________ >> lng-odp mailing list >> lng-odp@lists.linaro.org >> http://lists.linaro.org/mailman/listinfo/lng-odp > > _______________________________________________ lng-odp mailing list lng-odp@lists.linaro.org http://lists.linaro.org/mailman/listinfo/lng-odp