Petri Savolainen(psavol) replied on github web page: platform/linux-generic/odp_queue.c @@ -263,7 +275,7 @@ static int queue_destroy(odp_queue_t handle) ODP_ERR("queue \"%s\" already destroyed\n", queue->s.name); return -1; } - if (queue->s.head != NULL) { + if (ring_st_is_empty(&queue->s.ring_st) == 0) {
Comment: ``` if (ring_st_is_empty(&queue->s.ring_st)) if (ring_st_is_empty(&queue->s.ring_st)) if (ring_st_is_empty(&queue->s.ring_st) == 0) if (ring_st_is_empty(&queue->s.ring_st)) if (ring_st_is_empty(&queue->s.ring_st)) if (!ring_st_is_empty(&queue->s.ring_st)) if (ring_st_is_empty(&queue->s.ring_st)) if (ring_st_is_empty(&queue->s.ring_st)) ``` Find if clause for "ring is not empty" from picture above. > Petri Savolainen(psavol) wrote: > It's unsigned 32 bit arithmetic, not signed integer arithmetic. Also, tail is > always ahead of head, but no more than queue size (e.g. 4k), So, e.g. when > tail is 0, and head is 0xffffffff the number of items in ring is 0 - > 0xffffffff = 1 > > ``` > head tail > hex [index], hex [index], num, num_free > 0xfffffffe [6], 0xffffffff [7], 1, 7 > 0xffffffff [7], 0x00000000 [0], 1, 7 > 0x00000000 [0], 0x00000001 [1], 1, 7 > ``` >> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >> Please see above discussion. The comment stands. >>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>> Not true. See the comment below where it's clear that `head` and `tail` are >>> free running and thus will eventually wrap as described above. If these >>> were `uint64_t` variables you could argue that the wraps would take >>> centuries and hence are nothing to worry about, but being `uint32_t` >>> variables this should be expected to happen regularly on heavily used >>> queues. >>> >>> `abs()` adds no overhead since compilers treat it as an intrinsic. >>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>> Note that neither `head` nor `tail` get masked when doing enqueues or >>>> dequeues, so they will traverse the entire 32-bit range of the variable >>>> over time. While the ring itself will never hold more than `size` entries, >>>> `head` and `tail` will wrap around. When you index into the ring you do so >>>> via a mask (`idx = head & mask;`), but calculating the number of elements >>>> in the ring doesn't do this, which is why `abs()` is needed. >>>>> Petri Savolainen(psavol) wrote: >>>>> Plus this is done only once - in pool create phase. >>>>>> Petri Savolainen(psavol) wrote: >>>>>> No since ring size will be much smaller than 4 billion. >>>>>>> Petri Savolainen(psavol) wrote: >>>>>>> The point is that ring size will never be close to 4 billion entries. >>>>>>> E.g. currently tail is always max 4096 larger than head. Your example >>>>>>> above is based assumption of 4 billion entry ring. Overflow is avoided >>>>>>> when ring size if <2 billion, as 32 bit indexes can be still used to >>>>>>> calculate number of items correctly. >>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>> Agreed. >>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>> Agreed, this is good for now. Later we may wish to honor the >>>>>>>>> user-requested queue `size` parameter. >>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>> Agreed. >>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>> Correct, as noted earlier. I withdraw that comment. >>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>> Presumably the compiler will see the overlap and optimize away the >>>>>>>>>>>> redundancy, so I assume the performance impact will be nil here. >>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>> Since you're allowing `head` and `tail` to run over the entire >>>>>>>>>>>>> 32-bit range, you're correct that you can completely fill the >>>>>>>>>>>>> ring. I did miss that point. However, as shown above you still >>>>>>>>>>>>> need this to be: >>>>>>>>>>>>> >>>>>>>>>>>>> ``` >>>>>>>>>>>>> num = size - abs(tail - head); >>>>>>>>>>>>> ``` >>>>>>>>>>>>> To avoid problems at the 32-bit wrap boundary. >>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>> You're computing in 32 bits not 8 bits, and your ring size is >>>>>>>>>>>>>> less than 2^32 elements. Consider the following test program: >>>>>>>>>>>>>> ``` >>>>>>>>>>>>>> #include <stdlib.h> >>>>>>>>>>>>>> #include <stdio.h> >>>>>>>>>>>>>> #include <stdint.h> >>>>>>>>>>>>>> #include <inttypes.h> >>>>>>>>>>>>>> >>>>>>>>>>>>>> void main() { >>>>>>>>>>>>>> uint32_t head[4] = {0, 1, 2, 3}; >>>>>>>>>>>>>> uint32_t tail[4] = {0, 0xffffffff, 0xfffffffe, 0xfffffffd}; >>>>>>>>>>>>>> uint32_t mask = 4095; >>>>>>>>>>>>>> uint32_t result; >>>>>>>>>>>>>> int i; >>>>>>>>>>>>>> >>>>>>>>>>>>>> for (i = 0; i < 4; i++) { >>>>>>>>>>>>>> printf("head = %" PRIu32 " tail = %" PRIu32 ":\n", >>>>>>>>>>>>>> head[i], tail[i]); >>>>>>>>>>>>>> >>>>>>>>>>>>>> result = tail[i] - head[i]; >>>>>>>>>>>>>> printf("tail - head = %" PRIu32 "\n", result); >>>>>>>>>>>>>> >>>>>>>>>>>>>> result = (tail[i] - head[i]) & mask; >>>>>>>>>>>>>> printf("(tail - head) & mask = %" PRIu32 "\n", result); >>>>>>>>>>>>>> >>>>>>>>>>>>>> result = abs(tail[i] - head[i]); >>>>>>>>>>>>>> printf("abs(tail - head) = %" PRIu32 "\n\n", result); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> } >>>>>>>>>>>>>> ``` >>>>>>>>>>>>>> in theory `tail - head` should be the number of elements in the >>>>>>>>>>>>>> ring, in this case 0, 2, 4, and 6. But running this test program >>>>>>>>>>>>>> gives the following output: >>>>>>>>>>>>>> ``` >>>>>>>>>>>>>> head = 0 tail = 0: >>>>>>>>>>>>>> tail - head = 0 >>>>>>>>>>>>>> (tail - head) & mask = 0 >>>>>>>>>>>>>> abs(tail - head) = 0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> head = 1 tail = 4294967295: >>>>>>>>>>>>>> tail - head = 4294967294 >>>>>>>>>>>>>> (tail - head) & mask = 4094 >>>>>>>>>>>>>> abs(tail - head) = 2 >>>>>>>>>>>>>> >>>>>>>>>>>>>> head = 2 tail = 4294967294: >>>>>>>>>>>>>> tail - head = 4294967292 >>>>>>>>>>>>>> (tail - head) & mask = 4092 >>>>>>>>>>>>>> abs(tail - head) = 4 >>>>>>>>>>>>>> >>>>>>>>>>>>>> head = 3 tail = 4294967293: >>>>>>>>>>>>>> tail - head = 4294967290 >>>>>>>>>>>>>> (tail - head) & mask = 4090 >>>>>>>>>>>>>> abs(tail - head) = 6 >>>>>>>>>>>>>> ``` >>>>>>>>>>>>>> Since you're allowing head to run free over the 32-bit range of >>>>>>>>>>>>>> the variable, when the 32-bits rolls over you'll get a large >>>>>>>>>>>>>> positive number, not the small one you need to stay within the >>>>>>>>>>>>>> ring bounds. The alternative is to mask `head` and `tail` as you >>>>>>>>>>>>>> increment them, but then you run into the effective range issue. >>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>> OK >>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>> OK, the `ODP_ASSERT()` would still be useful for debugging. >>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>> That functionality is not obvious from the name. It either >>>>>>>>>>>>>>>>> implies that one of the input arguments is written (not true >>>>>>>>>>>>>>>>> here) or the reader might assume that it is an expression >>>>>>>>>>>>>>>>> without side-effect and should be deleted (what I originally >>>>>>>>>>>>>>>>> thought when reading it). You should pick a routine name that >>>>>>>>>>>>>>>>> makes it clear it's actually doing something real, in this >>>>>>>>>>>>>>>>> case performing prefetch processing. >>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>> Elsewhere you write `if (ring_st_is_empty(...))`, not `if >>>>>>>>>>>>>>>>>> (ring_st_is_empty(...) == 1)` so this is inconsistent. >>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>> Didn't try larger than 32. 32 is already quite large from >>>>>>>>>>>>>>>>>>> QoS point of view. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I'm planning to use config file for run time tunning, so >>>>>>>>>>>>>>>>>>> this hard coding may change in that phase. >>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>> One entry is not lost. >>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>> One entry is not lost. User provided size if not >>>>>>>>>>>>>>>>>>>>> (currently) used. Queue size is always 4k. >>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>> One entry is not lost. >>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>> One entry is not lost. >>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>> OK, added checks in v2. >>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>> OK. Compiler probably did that already, but changed >>>>>>>>>>>>>>>>>>>>>>>>> in v2. >>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> Tail and head indexes are (masked from) uint32_t and >>>>>>>>>>>>>>>>>>>>>>>>>> do not wrap around when the ring is full. I think >>>>>>>>>>>>>>>>>>>>>>>>>> you assume that the store index is 0...size-1, while >>>>>>>>>>>>>>>>>>>>>>>>>> it's full uint32_t which is then masked to get the >>>>>>>>>>>>>>>>>>>>>>>>>> actual index. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> For example: >>>>>>>>>>>>>>>>>>>>>>>>>> size = 100; >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Empty: >>>>>>>>>>>>>>>>>>>>>>>>>> head = 100 >>>>>>>>>>>>>>>>>>>>>>>>>> tail = 100 >>>>>>>>>>>>>>>>>>>>>>>>>> num = 100 - 100 = 0 >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Full: >>>>>>>>>>>>>>>>>>>>>>>>>> head = 100 >>>>>>>>>>>>>>>>>>>>>>>>>> tail = 200 >>>>>>>>>>>>>>>>>>>>>>>>>> num = 200 - 100 = 100 >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Wrap uint32_t + full: >>>>>>>>>>>>>>>>>>>>>>>>>> head = 0xFFFFFF9C >>>>>>>>>>>>>>>>>>>>>>>>>> tail = 0 >>>>>>>>>>>>>>>>>>>>>>>>>> num = 0 - 0xFFFFFF9C = 0x64 = 100 >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> So, no abs() needed. Ring size can be 4096, instead >>>>>>>>>>>>>>>>>>>>>>>>>> of 4095. >>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> It's already documented 5 lines above: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> /* Initialize ring. Ring size must be a power of >>>>>>>>>>>>>>>>>>>>>>>>>>> two. */ >>>>>>>>>>>>>>>>>>>>>>>>>>> static inline void ring_st_init(ring_st_t *ring, >>>>>>>>>>>>>>>>>>>>>>>>>>> uint32_t *data, uint32_t size) >>>>>>>>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> This function converts 32 bit buffer indexes to >>>>>>>>>>>>>>>>>>>>>>>>>>>> buffer header pointers. The counter operation is >>>>>>>>>>>>>>>>>>>>>>>>>>>> buffer_index_from_buf(). The prefetch is a side >>>>>>>>>>>>>>>>>>>>>>>>>>>> effect of the function, which may be changed/moved >>>>>>>>>>>>>>>>>>>>>>>>>>>> any time if it's found out that there's a place >>>>>>>>>>>>>>>>>>>>>>>>>>>> for prefetching. I actually plan to test if number >>>>>>>>>>>>>>>>>>>>>>>>>>>> of prefetches should be limited as e.g. 32 >>>>>>>>>>>>>>>>>>>>>>>>>>>> consecutive prefetches may be too much for some >>>>>>>>>>>>>>>>>>>>>>>>>>>> CPU architectures. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I prefer style where '== 0' is used instead of >>>>>>>>>>>>>>>>>>>>>>>>>>>>> '!'. Especially, when the if clause is as complex >>>>>>>>>>>>>>>>>>>>>>>>>>>>> as this and there's danger for reader to miss the >>>>>>>>>>>>>>>>>>>>>>>>>>>>> '!' sign. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's there to ensure that all bits are zero also >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> when someone would modify the bitfield from two >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to three fields later on. Similarly to memset() >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zero is used for struct inits. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's no need for abs(). Since it's all >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uint32_t variables, wrap a round is handled >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> already. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> An example in 8bits: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0xff - 0xfd = 0x02 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x00 - 0xfe = 0x02 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x01 - 0xff = 0x02 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x02 - 0x00 = 0x02 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This passes both gcc and clang, and is used >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> already in the other ring implementation see >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ring_deq_multi(). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I prefer style with blank line in the end of a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> typedef, since it's easier to spot the type >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> name (as it's not mixed into struct field >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> names). Checkpatch passes so this style should >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be OK. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does this mean that sizes larger than 32 have >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> no added performance benefit? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> noted earlier, if we're not going to use the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user-supplied queue size. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Given its name, this looks like an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> extraneous statement that should be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deleted. Renaming this to something like >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `prefetch_dequeued_bufs()` would make the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> intent clearer here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `if >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (!ring_st_is_empty(&queue->s.ring_st))` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> seems more natural here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Change to `if (param->size >= >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CONFIG_QUEUE_SIZE)` to handle the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> effective queue capacity. The >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user-supplied `size` should then be set >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to `ROUNDUP_POWER2_U32(size) - 1` for the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> masking to work properly. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same comment here as for plain queues. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As noted earlier, due to "losing" one >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entry to distinguish queue empty/full, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this should be returned as >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `CONFIG_QUEUE_SIZE - 1`, and we also >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to ensure that `CONFIG_QUEUE_SIZE` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is itself a power of 2. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since you're initializing `index.pool` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and `index.buffer` there's no need to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> set `index.u32` here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We originally had this index >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> partitioning based on >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS`. Do we want to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> return to that here? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If not, we at least need an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_STATIC_ASSERT()` to ensure that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS < 256` or else bad >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> things will happen here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This routine can be optimized to: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> return ring->head == ring->tail; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your invariant is the queue is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> empty when `head == tail` therefore >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the queue is full when `abs(tail - >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> head) == mask`, so the correct >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> calculation here is: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `num = mask - abs(tail - head);` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The effect is that a queue can only >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hold `size - 1` elements, otherwise >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you cannot distinguish between a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> full and an empty queue without >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> another bit of metadata, which is a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cost you're trying to avoid. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is somewhat problematic if the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caller is trying to be "optimal" by >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specifying a power of two in the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `size` parameter of the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_param_t` passed to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_create()`. For this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reason we may wish to return a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `max_size` of a power of 2 - 1 in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_capability()` as part of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this patch series. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This only works if `size` is a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> power of 2. Should be documented >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as such, since this is an internal >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> routine. In this case an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_ASSERT(size == >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ROUNDUP_POWER2_U32(size))` for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this requirement would be a useful >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> debugging aid. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be `num = abs(tail - >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> head);` to deal with wrap >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> arounds, otherwise may be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> misinterpreted as overly large >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> since it's `uint32_t`. Note that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> GCC and clang recognize `abs()` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and treat it as a builtin, so >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there's no actual `stdlib.h` call >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Extra blank line should be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> removed (nit). https://github.com/Linaro/odp/pull/492#discussion_r170572592 updated_at 2018-02-26 12:13:33