Petri Savolainen(psavol) replied on github web page:

platform/linux-generic/odp_queue.c
@@ -263,7 +275,7 @@ static int queue_destroy(odp_queue_t handle)
                ODP_ERR("queue \"%s\" already destroyed\n", queue->s.name);
                return -1;
        }
-       if (queue->s.head != NULL) {
+       if (ring_st_is_empty(&queue->s.ring_st) == 0) {


Comment:
```
if (ring_st_is_empty(&queue->s.ring_st))

if (ring_st_is_empty(&queue->s.ring_st))

if (ring_st_is_empty(&queue->s.ring_st) == 0)

if (ring_st_is_empty(&queue->s.ring_st))

if (ring_st_is_empty(&queue->s.ring_st))

if (!ring_st_is_empty(&queue->s.ring_st))

if (ring_st_is_empty(&queue->s.ring_st))

if (ring_st_is_empty(&queue->s.ring_st))
```

Find if clause for "ring is not empty" from picture above.

> Petri Savolainen(psavol) wrote:
> It's unsigned 32 bit arithmetic, not signed integer arithmetic. Also, tail is 
> always ahead of head, but no more than queue size (e.g. 4k), So, e.g. when 
> tail is 0, and head is 0xffffffff the number of items in ring is 0 - 
> 0xffffffff = 1
> 
> ```
> head            tail
> hex    [index], hex    [index], num, num_free
> 0xfffffffe [6], 0xffffffff [7], 1,   7
> 0xffffffff [7], 0x00000000 [0], 1,   7
> 0x00000000 [0], 0x00000001 [1], 1,   7
> ```


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> Please see above discussion. The comment stands.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Not true. See the comment below where it's clear that `head` and `tail` are 
>>> free running and thus will eventually wrap as described above. If these 
>>> were `uint64_t` variables you could argue that the wraps would take 
>>> centuries and hence are nothing to worry about, but being `uint32_t` 
>>> variables this should be expected to happen regularly on heavily used 
>>> queues. 
>>> 
>>> `abs()` adds no overhead since compilers treat it as an intrinsic.


>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>> Note that neither `head` nor `tail` get masked when doing enqueues or 
>>>> dequeues, so they will traverse the entire 32-bit range of the variable 
>>>> over time. While the ring itself will never hold more than `size` entries, 
>>>> `head` and `tail` will wrap around. When you index into the ring you do so 
>>>> via a mask (`idx = head & mask;`), but calculating the number of elements 
>>>> in the ring doesn't do this, which is why `abs()` is needed.


>>>>> Petri Savolainen(psavol) wrote:
>>>>> Plus this is done only once - in pool create phase.


>>>>>> Petri Savolainen(psavol) wrote:
>>>>>> No since ring size will be much smaller than 4 billion.


>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>> The point is that ring size will never be close to 4 billion entries. 
>>>>>>> E.g. currently tail is always max 4096 larger than head. Your example 
>>>>>>> above is based assumption of 4 billion entry ring. Overflow is avoided 
>>>>>>> when ring size if <2 billion, as 32 bit indexes can be still used to 
>>>>>>> calculate number of items correctly.


>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>> Agreed.


>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>> Agreed, this is good for now. Later we may wish to honor the 
>>>>>>>>> user-requested queue `size` parameter.


>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>> Agreed.


>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>> Correct, as noted earlier. I withdraw that comment.


>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>> Presumably the compiler will see the overlap and optimize away the 
>>>>>>>>>>>> redundancy, so I assume the performance impact will be nil here.


>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>> Since you're allowing `head` and `tail` to run over the entire 
>>>>>>>>>>>>> 32-bit range, you're correct that you can completely fill the 
>>>>>>>>>>>>> ring. I did miss that point. However, as shown above you still 
>>>>>>>>>>>>> need this to be:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> num = size - abs(tail - head);
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> To avoid problems at the 32-bit wrap boundary.


>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>> You're computing in 32 bits not 8 bits, and your ring size is 
>>>>>>>>>>>>>> less than 2^32 elements. Consider the following test program:
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>> #include <stdlib.h>
>>>>>>>>>>>>>> #include <stdio.h>
>>>>>>>>>>>>>> #include <stdint.h>
>>>>>>>>>>>>>> #include <inttypes.h>
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> void main() {
>>>>>>>>>>>>>>  uint32_t head[4] = {0, 1, 2, 3};
>>>>>>>>>>>>>>  uint32_t tail[4] = {0, 0xffffffff, 0xfffffffe, 0xfffffffd};
>>>>>>>>>>>>>>  uint32_t mask = 4095;
>>>>>>>>>>>>>>  uint32_t result;
>>>>>>>>>>>>>>  int i;
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  for (i = 0; i < 4; i++) {
>>>>>>>>>>>>>>          printf("head = %" PRIu32 " tail = %" PRIu32 ":\n",
>>>>>>>>>>>>>>                 head[i], tail[i]);
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>          result = tail[i] - head[i];
>>>>>>>>>>>>>>          printf("tail - head = %" PRIu32 "\n", result);
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>          result = (tail[i] - head[i]) & mask;
>>>>>>>>>>>>>>          printf("(tail - head) & mask = %" PRIu32 "\n", result);
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>          result = abs(tail[i] - head[i]);
>>>>>>>>>>>>>>          printf("abs(tail - head) = %" PRIu32 "\n\n", result);
>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>> in theory `tail - head` should be the number of elements in the 
>>>>>>>>>>>>>> ring, in this case 0, 2, 4, and 6. But running this test program 
>>>>>>>>>>>>>> gives the following output:
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>> head = 0 tail = 0:
>>>>>>>>>>>>>> tail - head = 0
>>>>>>>>>>>>>> (tail - head) & mask = 0
>>>>>>>>>>>>>> abs(tail - head) = 0
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> head = 1 tail = 4294967295:
>>>>>>>>>>>>>> tail - head = 4294967294
>>>>>>>>>>>>>> (tail - head) & mask = 4094
>>>>>>>>>>>>>> abs(tail - head) = 2
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> head = 2 tail = 4294967294:
>>>>>>>>>>>>>> tail - head = 4294967292
>>>>>>>>>>>>>> (tail - head) & mask = 4092
>>>>>>>>>>>>>> abs(tail - head) = 4
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> head = 3 tail = 4294967293:
>>>>>>>>>>>>>> tail - head = 4294967290
>>>>>>>>>>>>>> (tail - head) & mask = 4090
>>>>>>>>>>>>>> abs(tail - head) = 6
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>> Since you're allowing head to run free over the 32-bit range of 
>>>>>>>>>>>>>> the variable, when the 32-bits rolls over you'll get a large 
>>>>>>>>>>>>>> positive number, not the small one you need to stay within the 
>>>>>>>>>>>>>> ring bounds. The alternative is to mask `head` and `tail` as you 
>>>>>>>>>>>>>> increment them, but then you run into the effective range issue.


>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>> OK


>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>> OK, the `ODP_ASSERT()` would still be useful for debugging.


>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>> That functionality is not obvious from the name. It either 
>>>>>>>>>>>>>>>>> implies that one of the input arguments is written (not true 
>>>>>>>>>>>>>>>>> here) or the reader might assume that it is an expression 
>>>>>>>>>>>>>>>>> without side-effect and should be deleted (what I originally 
>>>>>>>>>>>>>>>>> thought when reading it). You should pick a routine name that 
>>>>>>>>>>>>>>>>> makes it clear it's actually doing something real, in this 
>>>>>>>>>>>>>>>>> case performing prefetch processing.


>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>> Elsewhere you write `if (ring_st_is_empty(...))`, not `if 
>>>>>>>>>>>>>>>>>> (ring_st_is_empty(...) == 1)` so this is inconsistent. 


>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>> Didn't try larger than 32. 32 is already quite large from 
>>>>>>>>>>>>>>>>>>> QoS point of view. 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I'm planning to use config file for run time tunning, so 
>>>>>>>>>>>>>>>>>>> this hard coding may change in that phase.


>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>> One entry is not lost.


>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>> One entry is not lost. User provided size if not 
>>>>>>>>>>>>>>>>>>>>> (currently) used. Queue size is always 4k.


>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>> One entry is not lost.


>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>> One entry is not lost.


>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>> OK, added checks in v2.


>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> OK. Compiler probably did that already, but changed 
>>>>>>>>>>>>>>>>>>>>>>>>> in v2.


>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> Tail and head indexes are (masked from) uint32_t and 
>>>>>>>>>>>>>>>>>>>>>>>>>> do not wrap around when the ring is full. I think 
>>>>>>>>>>>>>>>>>>>>>>>>>> you assume that the store index is 0...size-1, while 
>>>>>>>>>>>>>>>>>>>>>>>>>> it's full uint32_t which is then masked to get the 
>>>>>>>>>>>>>>>>>>>>>>>>>> actual index.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> For example:
>>>>>>>>>>>>>>>>>>>>>>>>>> size = 100;
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Empty:
>>>>>>>>>>>>>>>>>>>>>>>>>> head = 100
>>>>>>>>>>>>>>>>>>>>>>>>>> tail = 100
>>>>>>>>>>>>>>>>>>>>>>>>>> num = 100 - 100 = 0
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Full:
>>>>>>>>>>>>>>>>>>>>>>>>>> head = 100
>>>>>>>>>>>>>>>>>>>>>>>>>> tail = 200
>>>>>>>>>>>>>>>>>>>>>>>>>> num = 200 - 100 = 100
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Wrap uint32_t + full:
>>>>>>>>>>>>>>>>>>>>>>>>>> head = 0xFFFFFF9C
>>>>>>>>>>>>>>>>>>>>>>>>>> tail = 0
>>>>>>>>>>>>>>>>>>>>>>>>>> num = 0 - 0xFFFFFF9C = 0x64 = 100
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> So, no abs() needed. Ring size can be 4096, instead 
>>>>>>>>>>>>>>>>>>>>>>>>>> of 4095.


>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> It's already documented 5 lines above:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> /* Initialize ring. Ring size must be a power of 
>>>>>>>>>>>>>>>>>>>>>>>>>>> two. */
>>>>>>>>>>>>>>>>>>>>>>>>>>> static inline void ring_st_init(ring_st_t *ring, 
>>>>>>>>>>>>>>>>>>>>>>>>>>> uint32_t *data, uint32_t size)
>>>>>>>>>>>>>>>>>>>>>>>>>>> {


>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> This function converts 32 bit buffer indexes to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> buffer header pointers. The counter operation is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> buffer_index_from_buf(). The prefetch is a side 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> effect of the function, which may be changed/moved 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> any time if it's found out that there's a place 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> for prefetching. I actually plan to test if number 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> of prefetches should be limited as e.g. 32 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> consecutive prefetches may be too much for some 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> CPU architectures.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I prefer style where '== 0' is used instead of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '!'. Especially, when the if clause is as complex 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as this and there's danger for reader to miss the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '!' sign. 


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's there to ensure that all bits are zero also 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> when someone would modify the bitfield from two 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to three fields later on. Similarly to memset() 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zero is used for struct inits.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's no need for abs(). Since it's all 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uint32_t variables, wrap a round is handled 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> already.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> An example in 8bits:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0xff - 0xfd = 0x02
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x00 - 0xfe = 0x02
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x01 - 0xff = 0x02
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x02 - 0x00 = 0x02
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This passes both gcc and clang, and is used 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> already in the other ring implementation see 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ring_deq_multi().


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I prefer style with blank line in the end of a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> typedef, since it's easier to spot the type 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> name (as it's not mixed into struct field 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> names). Checkpatch passes so this style should 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be OK.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does this mean that sizes larger than 32 have 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> no added performance benefit?


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> noted earlier, if we're not going to use the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user-supplied queue size.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Given its name, this looks like an 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> extraneous statement that should be 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deleted. Renaming this to something like 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `prefetch_dequeued_bufs()` would make the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> intent clearer here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `if 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (!ring_st_is_empty(&queue->s.ring_st))` 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> seems more natural here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Change to `if (param->size >= 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CONFIG_QUEUE_SIZE)` to handle the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> effective queue capacity. The 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user-supplied `size` should then be set 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to `ROUNDUP_POWER2_U32(size) - 1` for the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> masking to work properly.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same comment here as for plain queues.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As noted earlier, due to "losing" one 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entry to distinguish queue empty/full, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this should be returned as 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `CONFIG_QUEUE_SIZE - 1`, and we also 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to ensure that `CONFIG_QUEUE_SIZE` 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is itself a power of 2.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since you're initializing `index.pool` 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and `index.buffer` there's no need to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> set `index.u32` here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We originally had this index 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> partitioning based on 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS`. Do we want to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> return to that here?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If not, we at least need an 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_STATIC_ASSERT()` to ensure that 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS < 256` or else bad 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> things will happen here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This routine can be optimized to:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> return ring->head == ring->tail;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ```


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your invariant is the queue is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> empty when `head == tail` therefore 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the queue is full when `abs(tail - 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> head) == mask`, so the correct 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> calculation here is:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `num = mask - abs(tail - head);`
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The effect is that a queue can only 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hold `size - 1` elements, otherwise 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you cannot distinguish between a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> full and an empty queue without 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> another bit of metadata, which is a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cost you're trying to avoid.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is somewhat problematic if the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caller is trying to be "optimal" by 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specifying a power of two in the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `size` parameter of the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_param_t` passed to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_create()`. For this 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reason we may wish to return a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `max_size` of a power of 2 - 1 in 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_capability()` as part of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this patch series.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This only works if `size` is a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> power of 2. Should be documented 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as such, since this is an internal 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> routine. In this case an 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_ASSERT(size == 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ROUNDUP_POWER2_U32(size))` for 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this requirement would be a useful 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> debugging aid.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be `num = abs(tail - 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> head);` to deal with wrap 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> arounds, otherwise may be 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> misinterpreted as overly large 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> since it's `uint32_t`. Note that 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> GCC and clang recognize `abs()` 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and treat it as a builtin, so 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there's no actual `stdlib.h` call 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Extra blank line should be 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r170572592
updated_at 2018-02-26 12:13:33

Reply via email to