Given the assumptions that VPP makes on packet / pool organization,
perhaps another approach would be to add a request bit to the
odp_pool_param_t struct to request that this pool support linear
organization.
Something like:

typedef struct odp_pool_param_t {
        /** Pool type */
        int type;

        union {
                uint32_t all_bits;

                struct {
                        uint32_t linear:1;
                };
        } opts;

        /** Variant parameters for different pool types */
        union {
                ...
        };
} odp_pool_param_t;

Setting param.opts.linear would request that this pool be organized
such that applications like VPP can do the math it's currently doing?
We could also provide "official" APIs to create/expand compact handles
and encourage apps to make use of those since not every implementation
may be able to offer linear pools. Specifying the linear option on
such an implementation would result in the odp_pool_create() call
failing.

On Wed, May 24, 2017 at 7:21 AM, Bill Fischofer
<bill.fischo...@linaro.org> wrote:
> I've added this topic to the agenda for today's ARCH call. I agree
> with Petri that any changes should be based on measurements, and
> preferably real application measurements rather than microbenchmarks.
>
> On Wed, May 24, 2017 at 6:46 AM, Savolainen, Petri (Nokia - FI/Espoo)
> <petri.savolai...@nokia.com> wrote:
>>
>>
>>> -----Original Message-----
>>> From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On Behalf Of
>>> Sachin Saxena
>>> Sent: Wednesday, May 24, 2017 12:43 PM
>>> To: lng-odp@lists.linaro.org
>>> Subject: Re: [lng-odp] APIs for dealing with compact handles
>>>
>>> Thanks Bill for initiating the thread.
>>>
>>> Please checkout some more details*(**i**nline)* on the requirements &
>>> proposal.
>>>
>>>
>>> On 5/17/2017 8:50 PM, Bill Fischofer wrote:
>>> > This thread is to discuss ideas and proposed solutions to two issues
>>> > that have been raised by Sachin relating to VPP needs, as well as
>>> > Honnappa relating to the scalable scheduler. Background ========= ODP
>>> > handles are abstract types that implementations may define to be of
>>> > arbitrary bit width in size. However, for a number of reasons (e.g.,
>>> > provision of strong typing support, ABI compatibility, efficiency of
>>> > internal manipulation, etc.) these are typically represented as 64-bit
>>> > quantities. Some applications that store handles in their own
>>> > structures wish to minimize the cache footprint consumed by these
>>> > structures and so would like an option to store handles in a more
>>> > compact format that uses a smaller number of bits. To date 32-bits
>>> > seems sufficient for application need, however in theory 16 or even 8
>>> > bits might be desirable in some circumstances. We already have an
>>> > example of 8-bit handles in the odp_packet_seg_t type, where odp-linux
>>> > uses an 8-bit representation of this type as a segment index when ODP
>>> > is configured with --enable-abi-compat=no while using a 64-bit size
>>> > when configured with --enable-abi-compat=yes. Considerations
>>> > ============ In choosing the bit width to use in representing handles
>>> > there are two main considerations that implementations must take into
>>> > account. First, to achieve strong typing in C, handles need to be of
>>> > pointer width. For development this is a very valuable feature, which
>>> > is why implementations are encouraged to provide strong typing for ODP
>>> > abstract types. Second, for ABI compatibility it is required that all
>>> > implementations use the same width for types that are to be ABI
>>> > compatible across different implementations. Implementations may
>>> > interpret the bits of a handle very differently, but all must agree
>>> > that handles are of the same bit width if they wish to be binary
>>> > compatible with each other. Stated Needs =========== VPP currently
>>> > packages its metadata into a vlib_mbuf struct that is used pervasively
>>> > to reference packets that are being processed by VPP nodes. The
>>> > address of this struct is desired to be held in compressed (32-bit)
>>> > format. Today the vlib_mbuf is implemented as a user area associated
>>> > with an odp_packet_t. As such the odp_packet_user_area() API returns a
>>> > (64-bit) pointer. What is desired is a compact representation of this
>>> > address.
>>> VPP collects bunch of packets from ODP/DPDK input node and looks for
>>> inline "struct vlib_buffer" address in each packet.
>>> Then it creates a VPP Library Frame which is a collection of the
>>> vlib_buffers (vectors). For this, VPP converts 64-bit address of each
>>> vlib_buffer to a 32-bit index and save in the VLib frame and pass this
>>> frame to next Node.
>>> In each processing node in Data path where packet contents are accessed,
>>> VPP converts this 32-bit index to actual 64-bit address to get packet
>>> data pointer.
>>> In current implementation, VPP converts 32-bit index to address @ ~900
>>> places in overall code via API:
>>>              vlib_get_buffer (vlib_main_t * vm, u32 buffer_index)
>>>
>>>
>>> *Code reference*:
>>> GIT: https://git.fd.io/odp4vpp/log/
>>> Files:              vlib/vlib/buffer_funcs.h
>>> vlib/vlib/buffer.h
>>
>>
>> Have you considered / tested the performance impact of changing buffer_index 
>> to u64. Surely, you can now pack more u32 indexes per cache line, but you 
>> need to do the conversion many times (900 places hints that it's done 
>> multiple times per a forwarded packet) which consumes CPU cycles also. I'd 
>> like to get some numbers, how much better of you are with 32bit vs 64bit 
>> handles. 64 bit handles would remove need for conversions on both levels - 
>> application or ODP would not need to convert as application would store 
>> odp_packet_t, which would be direct pointer to packet structure. E.g. HW 
>> data prefetchers may have improved from the time when VPP was originally 
>> designed.
>>
>> If <64bit indexes are needed, only viable option to me is packet index 
>> conversion API (no user area indexes). The down side of it is that every 
>> implementation must be able to do those conversions (efficiently). Also I'd 
>> say that index size would need to be 32 bits, so memory savings would be 
>> only 2x in a 64 bit system.
>>
>> Odp-linux used to define odp_packet_t as 32bit index, but was changed to 
>> pointer since it improved performance with l2fwd app about 10%. L2fwd is 
>> kind of worst case app since has very few cycles per packet.
>>
>> Another option for VPP is to maintain a packet context table. You'd save 
>> packet handle into the table and use this table index internally as 
>> "buffer_index".
>>
>>
>> -Petri
>>
>>

Reply via email to