Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
line 46
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier:     BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <odp/api/hints.h>
+#include <odp_align_internal.h>
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+       uint32_t head;
+       uint32_t tail;
+       uint32_t mask;
+       uint32_t *data;
+
+} ring_st_t;
+
+/* Initialize ring. Ring size must be a power of two. */
+static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t size)
+{
+       ring->head = 0;
+       ring->tail = 0;
+       ring->mask = size - 1;
+       ring->data = data;
+}
+
+/* Dequeue data from the ring head. Max_num is smaller than ring size.*/
+static inline uint32_t ring_st_deq_multi(ring_st_t *ring, uint32_t data[],
+                                        uint32_t max_num)
+{
+       uint32_t head, tail, mask, idx;
+       uint32_t num, i;
+
+       head = ring->head;
+       tail = ring->tail;
+       mask = ring->mask;
+       num  = tail - head;


Comment:
Not true. See the comment below where it's clear that `head` and `tail` are 
free running and thus will eventually wrap as described above. If these were 
`uint64_t` variables you could argue that the wraps would take centuries and 
hence are nothing to worry about, but being `uint32_t` variables this should be 
expected to happen regularly on heavily used queues. 

`abs()` adds no overhead since compilers treat it as an intrinsic.

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> Note that neither `head` nor `tail` get masked when doing enqueues or 
> dequeues, so they will traverse the entire 32-bit range of the variable over 
> time. While the ring itself will never hold more than `size` entries, `head` 
> and `tail` will wrap around. When you index into the ring you do so via a 
> mask (`idx = head & mask;`), but calculating the number of elements in the 
> ring doesn't do this, which is why `abs()` is needed.


>> Petri Savolainen(psavol) wrote:
>> Plus this is done only once - in pool create phase.


>>> Petri Savolainen(psavol) wrote:
>>> No since ring size will be much smaller than 4 billion.


>>>> Petri Savolainen(psavol) wrote:
>>>> The point is that ring size will never be close to 4 billion entries. E.g. 
>>>> currently tail is always max 4096 larger than head. Your example above is 
>>>> based assumption of 4 billion entry ring. Overflow is avoided when ring 
>>>> size if <2 billion, as 32 bit indexes can be still used to calculate 
>>>> number of items correctly.


>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>> Agreed.


>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>> Agreed, this is good for now. Later we may wish to honor the 
>>>>>> user-requested queue `size` parameter.


>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>> Agreed.


>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>> Correct, as noted earlier. I withdraw that comment.


>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>> Presumably the compiler will see the overlap and optimize away the 
>>>>>>>>> redundancy, so I assume the performance impact will be nil here.


>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>> Since you're allowing `head` and `tail` to run over the entire 
>>>>>>>>>> 32-bit range, you're correct that you can completely fill the ring. 
>>>>>>>>>> I did miss that point. However, as shown above you still need this 
>>>>>>>>>> to be:
>>>>>>>>>> 
>>>>>>>>>> ```
>>>>>>>>>> num = size - abs(tail - head);
>>>>>>>>>> ```
>>>>>>>>>> To avoid problems at the 32-bit wrap boundary.


>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>> You're computing in 32 bits not 8 bits, and your ring size is less 
>>>>>>>>>>> than 2^32 elements. Consider the following test program:
>>>>>>>>>>> ```
>>>>>>>>>>> #include <stdlib.h>
>>>>>>>>>>> #include <stdio.h>
>>>>>>>>>>> #include <stdint.h>
>>>>>>>>>>> #include <inttypes.h>
>>>>>>>>>>> 
>>>>>>>>>>> void main() {
>>>>>>>>>>>     uint32_t head[4] = {0, 1, 2, 3};
>>>>>>>>>>>     uint32_t tail[4] = {0, 0xffffffff, 0xfffffffe, 0xfffffffd};
>>>>>>>>>>>     uint32_t mask = 4095;
>>>>>>>>>>>     uint32_t result;
>>>>>>>>>>>     int i;
>>>>>>>>>>> 
>>>>>>>>>>>     for (i = 0; i < 4; i++) {
>>>>>>>>>>>             printf("head = %" PRIu32 " tail = %" PRIu32 ":\n",
>>>>>>>>>>>                    head[i], tail[i]);
>>>>>>>>>>> 
>>>>>>>>>>>             result = tail[i] - head[i];
>>>>>>>>>>>             printf("tail - head = %" PRIu32 "\n", result);
>>>>>>>>>>> 
>>>>>>>>>>>             result = (tail[i] - head[i]) & mask;
>>>>>>>>>>>             printf("(tail - head) & mask = %" PRIu32 "\n", result);
>>>>>>>>>>> 
>>>>>>>>>>>             result = abs(tail[i] - head[i]);
>>>>>>>>>>>             printf("abs(tail - head) = %" PRIu32 "\n\n", result);
>>>>>>>>>>>     }
>>>>>>>>>>> }
>>>>>>>>>>> ```
>>>>>>>>>>> in theory `tail - head` should be the number of elements in the 
>>>>>>>>>>> ring, in this case 0, 2, 4, and 6. But running this test program 
>>>>>>>>>>> gives the following output:
>>>>>>>>>>> ```
>>>>>>>>>>> head = 0 tail = 0:
>>>>>>>>>>> tail - head = 0
>>>>>>>>>>> (tail - head) & mask = 0
>>>>>>>>>>> abs(tail - head) = 0
>>>>>>>>>>> 
>>>>>>>>>>> head = 1 tail = 4294967295:
>>>>>>>>>>> tail - head = 4294967294
>>>>>>>>>>> (tail - head) & mask = 4094
>>>>>>>>>>> abs(tail - head) = 2
>>>>>>>>>>> 
>>>>>>>>>>> head = 2 tail = 4294967294:
>>>>>>>>>>> tail - head = 4294967292
>>>>>>>>>>> (tail - head) & mask = 4092
>>>>>>>>>>> abs(tail - head) = 4
>>>>>>>>>>> 
>>>>>>>>>>> head = 3 tail = 4294967293:
>>>>>>>>>>> tail - head = 4294967290
>>>>>>>>>>> (tail - head) & mask = 4090
>>>>>>>>>>> abs(tail - head) = 6
>>>>>>>>>>> ```
>>>>>>>>>>> Since you're allowing head to run free over the 32-bit range of the 
>>>>>>>>>>> variable, when the 32-bits rolls over you'll get a large positive 
>>>>>>>>>>> number, not the small one you need to stay within the ring bounds. 
>>>>>>>>>>> The alternative is to mask `head` and `tail` as you increment them, 
>>>>>>>>>>> but then you run into the effective range issue.


>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>> OK


>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>> OK, the `ODP_ASSERT()` would still be useful for debugging.


>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>> That functionality is not obvious from the name. It either 
>>>>>>>>>>>>>> implies that one of the input arguments is written (not true 
>>>>>>>>>>>>>> here) or the reader might assume that it is an expression 
>>>>>>>>>>>>>> without side-effect and should be deleted (what I originally 
>>>>>>>>>>>>>> thought when reading it). You should pick a routine name that 
>>>>>>>>>>>>>> makes it clear it's actually doing something real, in this case 
>>>>>>>>>>>>>> performing prefetch processing.


>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>> Elsewhere you write `if (ring_st_is_empty(...))`, not `if 
>>>>>>>>>>>>>>> (ring_st_is_empty(...) == 1)` so this is inconsistent. 


>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>> Didn't try larger than 32. 32 is already quite large from QoS 
>>>>>>>>>>>>>>>> point of view. 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I'm planning to use config file for run time tunning, so this 
>>>>>>>>>>>>>>>> hard coding may change in that phase.


>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>> One entry is not lost.


>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>> One entry is not lost. User provided size if not (currently) 
>>>>>>>>>>>>>>>>>> used. Queue size is always 4k.


>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>> One entry is not lost.


>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>> One entry is not lost.


>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>> OK, added checks in v2.


>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>> OK. Compiler probably did that already, but changed in 
>>>>>>>>>>>>>>>>>>>>>> v2.


>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>> Tail and head indexes are (masked from) uint32_t and do 
>>>>>>>>>>>>>>>>>>>>>>> not wrap around when the ring is full. I think you 
>>>>>>>>>>>>>>>>>>>>>>> assume that the store index is 0...size-1, while it's 
>>>>>>>>>>>>>>>>>>>>>>> full uint32_t which is then masked to get the actual 
>>>>>>>>>>>>>>>>>>>>>>> index.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> For example:
>>>>>>>>>>>>>>>>>>>>>>> size = 100;
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Empty:
>>>>>>>>>>>>>>>>>>>>>>> head = 100
>>>>>>>>>>>>>>>>>>>>>>> tail = 100
>>>>>>>>>>>>>>>>>>>>>>> num = 100 - 100 = 0
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Full:
>>>>>>>>>>>>>>>>>>>>>>> head = 100
>>>>>>>>>>>>>>>>>>>>>>> tail = 200
>>>>>>>>>>>>>>>>>>>>>>> num = 200 - 100 = 100
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Wrap uint32_t + full:
>>>>>>>>>>>>>>>>>>>>>>> head = 0xFFFFFF9C
>>>>>>>>>>>>>>>>>>>>>>> tail = 0
>>>>>>>>>>>>>>>>>>>>>>> num = 0 - 0xFFFFFF9C = 0x64 = 100
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> So, no abs() needed. Ring size can be 4096, instead of 
>>>>>>>>>>>>>>>>>>>>>>> 4095.


>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>> It's already documented 5 lines above:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> /* Initialize ring. Ring size must be a power of two. 
>>>>>>>>>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>>>>>>>>>> static inline void ring_st_init(ring_st_t *ring, 
>>>>>>>>>>>>>>>>>>>>>>>> uint32_t *data, uint32_t size)
>>>>>>>>>>>>>>>>>>>>>>>> {


>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> This function converts 32 bit buffer indexes to 
>>>>>>>>>>>>>>>>>>>>>>>>> buffer header pointers. The counter operation is 
>>>>>>>>>>>>>>>>>>>>>>>>> buffer_index_from_buf(). The prefetch is a side 
>>>>>>>>>>>>>>>>>>>>>>>>> effect of the function, which may be changed/moved 
>>>>>>>>>>>>>>>>>>>>>>>>> any time if it's found out that there's a place for 
>>>>>>>>>>>>>>>>>>>>>>>>> prefetching. I actually plan to test if number of 
>>>>>>>>>>>>>>>>>>>>>>>>> prefetches should be limited as e.g. 32 consecutive 
>>>>>>>>>>>>>>>>>>>>>>>>> prefetches may be too much for some CPU architectures.


>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> I prefer style where '== 0' is used instead of '!'. 
>>>>>>>>>>>>>>>>>>>>>>>>>> Especially, when the if clause is as complex as this 
>>>>>>>>>>>>>>>>>>>>>>>>>> and there's danger for reader to miss the '!' sign. 


>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> It's there to ensure that all bits are zero also 
>>>>>>>>>>>>>>>>>>>>>>>>>>> when someone would modify the bitfield from two to 
>>>>>>>>>>>>>>>>>>>>>>>>>>> three fields later on. Similarly to memset() zero 
>>>>>>>>>>>>>>>>>>>>>>>>>>> is used for struct inits.


>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's no need for abs(). Since it's all uint32_t 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> variables, wrap a round is handled already.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> An example in 8bits:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0xff - 0xfd = 0x02
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x00 - 0xfe = 0x02
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x01 - 0xff = 0x02
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x02 - 0x00 = 0x02
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> This passes both gcc and clang, and is used 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> already in the other ring implementation see 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> ring_deq_multi().


>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I prefer style with blank line in the end of a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> typedef, since it's easier to spot the type name 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (as it's not mixed into struct field names). 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Checkpatch passes so this style should be OK.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does this mean that sizes larger than 32 have no 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> added performance benefit?


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> earlier, if we're not going to use the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user-supplied queue size.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Given its name, this looks like an extraneous 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> statement that should be deleted. Renaming 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this to something like 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `prefetch_dequeued_bufs()` would make the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> intent clearer here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `if (!ring_st_is_empty(&queue->s.ring_st))` 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> seems more natural here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Change to `if (param->size >= 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CONFIG_QUEUE_SIZE)` to handle the effective 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> queue capacity. The user-supplied `size` 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should then be set to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ROUNDUP_POWER2_U32(size) - 1` for the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> masking to work properly.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same comment here as for plain queues.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As noted earlier, due to "losing" one 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entry to distinguish queue empty/full, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this should be returned as 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `CONFIG_QUEUE_SIZE - 1`, and we also need 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to ensure that `CONFIG_QUEUE_SIZE` is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> itself a power of 2.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since you're initializing `index.pool` 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and `index.buffer` there's no need to set 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `index.u32` here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We originally had this index 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> partitioning based on 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS`. Do we want to return 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to that here?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If not, we at least need an 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_STATIC_ASSERT()` to ensure that 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS < 256` or else bad 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> things will happen here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This routine can be optimized to:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> return ring->head == ring->tail;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ```


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your invariant is the queue is empty 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> when `head == tail` therefore the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> queue is full when `abs(tail - head) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> == mask`, so the correct calculation 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here is:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `num = mask - abs(tail - head);`
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The effect is that a queue can only 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hold `size - 1` elements, otherwise 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you cannot distinguish between a full 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and an empty queue without another bit 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of metadata, which is a cost you're 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> trying to avoid.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is somewhat problematic if the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caller is trying to be "optimal" by 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specifying a power of two in the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `size` parameter of the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_param_t` passed to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_create()`. For this reason 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we may wish to return a `max_size` of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a power of 2 - 1 in 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_capability()` as part of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this patch series.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This only works if `size` is a power 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of 2. Should be documented as such, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> since this is an internal routine. In 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this case an `ODP_ASSERT(size == 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ROUNDUP_POWER2_U32(size))` for this 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requirement would be a useful 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> debugging aid.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be `num = abs(tail - head);` 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to deal with wrap arounds, otherwise 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may be misinterpreted as overly 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> large since it's `uint32_t`. Note 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that GCC and clang recognize `abs()` 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and treat it as a builtin, so 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there's no actual `stdlib.h` call 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fischofer(Bill-Fischofer-Linaro) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Extra blank line should be removed 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (nit).


https://github.com/Linaro/odp/pull/492#discussion_r170277089
updated_at 2018-02-23 15:19:39

Reply via email to