Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page: platform/linux-generic/include/odp_ring_st_internal.h line 46 @@ -0,0 +1,118 @@ +/* Copyright (c) 2018, Linaro Limited + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ + +#ifndef ODP_RING_ST_INTERNAL_H_ +#define ODP_RING_ST_INTERNAL_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include <odp/api/hints.h> +#include <odp_align_internal.h> + +/* Basic ring for single thread usage. Operations must be synchronized by using + * locks (or other means), when multiple threads use the same ring. */ +typedef struct { + uint32_t head; + uint32_t tail; + uint32_t mask; + uint32_t *data; + +} ring_st_t; + +/* Initialize ring. Ring size must be a power of two. */ +static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size) +{ + ring->head = 0; + ring->tail = 0; + ring->mask = size - 1; + ring->data = data; +} + +/* Dequeue data from the ring head. Max_num is smaller than ring size.*/ +static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[], + uint32_t max_num) +{ + uint32_t head, tail, mask, idx; + uint32_t num, i; + + head = ring->head; + tail = ring->tail; + mask = ring->mask; + num = tail - head;
Comment: Not true. See the comment below where it's clear that `head` and `tail` are free running and thus will eventually wrap as described above. If these were `uint64_t` variables you could argue that the wraps would take centuries and hence are nothing to worry about, but being `uint32_t` variables this should be expected to happen regularly on heavily used queues. `abs()` adds no overhead since compilers treat it as an intrinsic. > Bill Fischofer(Bill-Fischofer-Linaro) wrote: > Note that neither `head` nor `tail` get masked when doing enqueues or > dequeues, so they will traverse the entire 32-bit range of the variable over > time. While the ring itself will never hold more than `size` entries, `head` > and `tail` will wrap around. When you index into the ring you do so via a > mask (`idx = head & mask;`), but calculating the number of elements in the > ring doesn't do this, which is why `abs()` is needed. >> Petri Savolainen(psavol) wrote: >> Plus this is done only once - in pool create phase. >>> Petri Savolainen(psavol) wrote: >>> No since ring size will be much smaller than 4 billion. >>>> Petri Savolainen(psavol) wrote: >>>> The point is that ring size will never be close to 4 billion entries. E.g. >>>> currently tail is always max 4096 larger than head. Your example above is >>>> based assumption of 4 billion entry ring. Overflow is avoided when ring >>>> size if <2 billion, as 32 bit indexes can be still used to calculate >>>> number of items correctly. >>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>> Agreed. >>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>> Agreed, this is good for now. Later we may wish to honor the >>>>>> user-requested queue `size` parameter. >>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>> Agreed. >>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>> Correct, as noted earlier. I withdraw that comment. >>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>> Presumably the compiler will see the overlap and optimize away the >>>>>>>>> redundancy, so I assume the performance impact will be nil here. >>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>> Since you're allowing `head` and `tail` to run over the entire >>>>>>>>>> 32-bit range, you're correct that you can completely fill the ring. >>>>>>>>>> I did miss that point. However, as shown above you still need this >>>>>>>>>> to be: >>>>>>>>>> >>>>>>>>>> ``` >>>>>>>>>> num = size - abs(tail - head); >>>>>>>>>> ``` >>>>>>>>>> To avoid problems at the 32-bit wrap boundary. >>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>> You're computing in 32 bits not 8 bits, and your ring size is less >>>>>>>>>>> than 2^32 elements. Consider the following test program: >>>>>>>>>>> ``` >>>>>>>>>>> #include <stdlib.h> >>>>>>>>>>> #include <stdio.h> >>>>>>>>>>> #include <stdint.h> >>>>>>>>>>> #include <inttypes.h> >>>>>>>>>>> >>>>>>>>>>> void main() { >>>>>>>>>>> uint32_t head[4] = {0, 1, 2, 3}; >>>>>>>>>>> uint32_t tail[4] = {0, 0xffffffff, 0xfffffffe, 0xfffffffd}; >>>>>>>>>>> uint32_t mask = 4095; >>>>>>>>>>> uint32_t result; >>>>>>>>>>> int i; >>>>>>>>>>> >>>>>>>>>>> for (i = 0; i < 4; i++) { >>>>>>>>>>> printf("head = %" PRIu32 " tail = %" PRIu32 ":\n", >>>>>>>>>>> head[i], tail[i]); >>>>>>>>>>> >>>>>>>>>>> result = tail[i] - head[i]; >>>>>>>>>>> printf("tail - head = %" PRIu32 "\n", result); >>>>>>>>>>> >>>>>>>>>>> result = (tail[i] - head[i]) & mask; >>>>>>>>>>> printf("(tail - head) & mask = %" PRIu32 "\n", result); >>>>>>>>>>> >>>>>>>>>>> result = abs(tail[i] - head[i]); >>>>>>>>>>> printf("abs(tail - head) = %" PRIu32 "\n\n", result); >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>>> ``` >>>>>>>>>>> in theory `tail - head` should be the number of elements in the >>>>>>>>>>> ring, in this case 0, 2, 4, and 6. But running this test program >>>>>>>>>>> gives the following output: >>>>>>>>>>> ``` >>>>>>>>>>> head = 0 tail = 0: >>>>>>>>>>> tail - head = 0 >>>>>>>>>>> (tail - head) & mask = 0 >>>>>>>>>>> abs(tail - head) = 0 >>>>>>>>>>> >>>>>>>>>>> head = 1 tail = 4294967295: >>>>>>>>>>> tail - head = 4294967294 >>>>>>>>>>> (tail - head) & mask = 4094 >>>>>>>>>>> abs(tail - head) = 2 >>>>>>>>>>> >>>>>>>>>>> head = 2 tail = 4294967294: >>>>>>>>>>> tail - head = 4294967292 >>>>>>>>>>> (tail - head) & mask = 4092 >>>>>>>>>>> abs(tail - head) = 4 >>>>>>>>>>> >>>>>>>>>>> head = 3 tail = 4294967293: >>>>>>>>>>> tail - head = 4294967290 >>>>>>>>>>> (tail - head) & mask = 4090 >>>>>>>>>>> abs(tail - head) = 6 >>>>>>>>>>> ``` >>>>>>>>>>> Since you're allowing head to run free over the 32-bit range of the >>>>>>>>>>> variable, when the 32-bits rolls over you'll get a large positive >>>>>>>>>>> number, not the small one you need to stay within the ring bounds. >>>>>>>>>>> The alternative is to mask `head` and `tail` as you increment them, >>>>>>>>>>> but then you run into the effective range issue. >>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>> OK >>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>> OK, the `ODP_ASSERT()` would still be useful for debugging. >>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>> That functionality is not obvious from the name. It either >>>>>>>>>>>>>> implies that one of the input arguments is written (not true >>>>>>>>>>>>>> here) or the reader might assume that it is an expression >>>>>>>>>>>>>> without side-effect and should be deleted (what I originally >>>>>>>>>>>>>> thought when reading it). You should pick a routine name that >>>>>>>>>>>>>> makes it clear it's actually doing something real, in this case >>>>>>>>>>>>>> performing prefetch processing. >>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>> Elsewhere you write `if (ring_st_is_empty(...))`, not `if >>>>>>>>>>>>>>> (ring_st_is_empty(...) == 1)` so this is inconsistent. >>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>> Didn't try larger than 32. 32 is already quite large from QoS >>>>>>>>>>>>>>>> point of view. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm planning to use config file for run time tunning, so this >>>>>>>>>>>>>>>> hard coding may change in that phase. >>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>> One entry is not lost. >>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>> One entry is not lost. User provided size if not (currently) >>>>>>>>>>>>>>>>>> used. Queue size is always 4k. >>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>> One entry is not lost. >>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>> One entry is not lost. >>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>> OK, added checks in v2. >>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>> OK. Compiler probably did that already, but changed in >>>>>>>>>>>>>>>>>>>>>> v2. >>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>> Tail and head indexes are (masked from) uint32_t and do >>>>>>>>>>>>>>>>>>>>>>> not wrap around when the ring is full. I think you >>>>>>>>>>>>>>>>>>>>>>> assume that the store index is 0...size-1, while it's >>>>>>>>>>>>>>>>>>>>>>> full uint32_t which is then masked to get the actual >>>>>>>>>>>>>>>>>>>>>>> index. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> For example: >>>>>>>>>>>>>>>>>>>>>>> size = 100; >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Empty: >>>>>>>>>>>>>>>>>>>>>>> head = 100 >>>>>>>>>>>>>>>>>>>>>>> tail = 100 >>>>>>>>>>>>>>>>>>>>>>> num = 100 - 100 = 0 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Full: >>>>>>>>>>>>>>>>>>>>>>> head = 100 >>>>>>>>>>>>>>>>>>>>>>> tail = 200 >>>>>>>>>>>>>>>>>>>>>>> num = 200 - 100 = 100 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Wrap uint32_t + full: >>>>>>>>>>>>>>>>>>>>>>> head = 0xFFFFFF9C >>>>>>>>>>>>>>>>>>>>>>> tail = 0 >>>>>>>>>>>>>>>>>>>>>>> num = 0 - 0xFFFFFF9C = 0x64 = 100 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> So, no abs() needed. Ring size can be 4096, instead of >>>>>>>>>>>>>>>>>>>>>>> 4095. >>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>> It's already documented 5 lines above: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> /* Initialize ring. Ring size must be a power of two. >>>>>>>>>>>>>>>>>>>>>>>> */ >>>>>>>>>>>>>>>>>>>>>>>> static inline void ring_st_init(ring_st_t *ring, >>>>>>>>>>>>>>>>>>>>>>>> uint32_t *data, uint32_t size) >>>>>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>> This function converts 32 bit buffer indexes to >>>>>>>>>>>>>>>>>>>>>>>>> buffer header pointers. The counter operation is >>>>>>>>>>>>>>>>>>>>>>>>> buffer_index_from_buf(). The prefetch is a side >>>>>>>>>>>>>>>>>>>>>>>>> effect of the function, which may be changed/moved >>>>>>>>>>>>>>>>>>>>>>>>> any time if it's found out that there's a place for >>>>>>>>>>>>>>>>>>>>>>>>> prefetching. I actually plan to test if number of >>>>>>>>>>>>>>>>>>>>>>>>> prefetches should be limited as e.g. 32 consecutive >>>>>>>>>>>>>>>>>>>>>>>>> prefetches may be too much for some CPU architectures. >>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> I prefer style where '== 0' is used instead of '!'. >>>>>>>>>>>>>>>>>>>>>>>>>> Especially, when the if clause is as complex as this >>>>>>>>>>>>>>>>>>>>>>>>>> and there's danger for reader to miss the '!' sign. >>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> It's there to ensure that all bits are zero also >>>>>>>>>>>>>>>>>>>>>>>>>>> when someone would modify the bitfield from two to >>>>>>>>>>>>>>>>>>>>>>>>>>> three fields later on. Similarly to memset() zero >>>>>>>>>>>>>>>>>>>>>>>>>>> is used for struct inits. >>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> There's no need for abs(). Since it's all uint32_t >>>>>>>>>>>>>>>>>>>>>>>>>>>> variables, wrap a round is handled already. >>>>>>>>>>>>>>>>>>>>>>>>>>>> An example in 8bits: >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0xff - 0xfd = 0x02 >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x00 - 0xfe = 0x02 >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x01 - 0xff = 0x02 >>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x02 - 0x00 = 0x02 >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> This passes both gcc and clang, and is used >>>>>>>>>>>>>>>>>>>>>>>>>>>> already in the other ring implementation see >>>>>>>>>>>>>>>>>>>>>>>>>>>> ring_deq_multi(). >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I prefer style with blank line in the end of a >>>>>>>>>>>>>>>>>>>>>>>>>>>>> typedef, since it's easier to spot the type name >>>>>>>>>>>>>>>>>>>>>>>>>>>>> (as it's not mixed into struct field names). >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Checkpatch passes so this style should be OK. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does this mean that sizes larger than 32 have no >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> added performance benefit? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> earlier, if we're not going to use the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user-supplied queue size. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Given its name, this looks like an extraneous >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> statement that should be deleted. Renaming >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this to something like >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `prefetch_dequeued_bufs()` would make the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> intent clearer here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `if (!ring_st_is_empty(&queue->s.ring_st))` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> seems more natural here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Change to `if (param->size >= >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CONFIG_QUEUE_SIZE)` to handle the effective >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> queue capacity. The user-supplied `size` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should then be set to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ROUNDUP_POWER2_U32(size) - 1` for the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> masking to work properly. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same comment here as for plain queues. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As noted earlier, due to "losing" one >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entry to distinguish queue empty/full, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this should be returned as >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `CONFIG_QUEUE_SIZE - 1`, and we also need >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to ensure that `CONFIG_QUEUE_SIZE` is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> itself a power of 2. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since you're initializing `index.pool` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and `index.buffer` there's no need to set >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `index.u32` here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We originally had this index >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> partitioning based on >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS`. Do we want to return >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to that here? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If not, we at least need an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_STATIC_ASSERT()` to ensure that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS < 256` or else bad >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> things will happen here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This routine can be optimized to: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> return ring->head == ring->tail; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your invariant is the queue is empty >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> when `head == tail` therefore the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> queue is full when `abs(tail - head) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> == mask`, so the correct calculation >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here is: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `num = mask - abs(tail - head);` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The effect is that a queue can only >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hold `size - 1` elements, otherwise >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you cannot distinguish between a full >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and an empty queue without another bit >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of metadata, which is a cost you're >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> trying to avoid. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is somewhat problematic if the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caller is trying to be "optimal" by >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specifying a power of two in the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `size` parameter of the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_param_t` passed to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_create()`. For this reason >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we may wish to return a `max_size` of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a power of 2 - 1 in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_capability()` as part of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this patch series. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This only works if `size` is a power >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of 2. Should be documented as such, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> since this is an internal routine. In >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this case an `ODP_ASSERT(size == >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ROUNDUP_POWER2_U32(size))` for this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requirement would be a useful >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> debugging aid. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be `num = abs(tail - head);` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to deal with wrap arounds, otherwise >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may be misinterpreted as overly >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> large since it's `uint32_t`. Note >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that GCC and clang recognize `abs()` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and treat it as a builtin, so >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there's no actual `stdlib.h` call >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Extra blank line should be removed >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (nit). https://github.com/Linaro/odp/pull/492#discussion_r170277089 updated_at 2018-02-23 15:19:39