Bill Fischofer(Bill-Fischofer-Linaro) replied on github web page:

platform/linux-generic/include/odp_ring_st_internal.h
line 24
@@ -0,0 +1,118 @@
+/* Copyright (c) 2018, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier:     BSD-3-Clause
+ */
+
+#ifndef ODP_RING_ST_INTERNAL_H_
+#define ODP_RING_ST_INTERNAL_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <odp/api/hints.h>
+#include <odp_align_internal.h>
+
+/* Basic ring for single thread usage. Operations must be synchronized by using
+ * locks (or other means), when multiple threads use the same ring. */
+typedef struct {
+       uint32_t head;
+       uint32_t tail;
+       uint32_t mask;
+       uint32_t *data;
+


Comment:
OK

> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
> OK, the `ODP_ASSERT()` would still be useful for debugging.


>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>> That functionality is not obvious from the name. It either implies that one 
>> of the input arguments is written (not true here) or the reader might assume 
>> that it is an expression without side-effect and should be deleted (what I 
>> originally thought when reading it). You should pick a routine name that 
>> makes it clear it's actually doing something real, in this case performing 
>> prefetch processing.


>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>> Elsewhere you write `if (ring_st_is_empty(...))`, not `if 
>>> (ring_st_is_empty(...) == 1)` so this is inconsistent. 


>>>> Petri Savolainen(psavol) wrote:
>>>> Didn't try larger than 32. 32 is already quite large from QoS point of 
>>>> view. 
>>>> 
>>>> I'm planning to use config file for run time tunning, so this hard coding 
>>>> may change in that phase.


>>>>> Petri Savolainen(psavol) wrote:
>>>>> One entry is not lost.


>>>>>> Petri Savolainen(psavol) wrote:
>>>>>> One entry is not lost. User provided size if not (currently) used. Queue 
>>>>>> size is always 4k.


>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>> One entry is not lost.


>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>> One entry is not lost.


>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>> OK, added checks in v2.


>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>> OK. Compiler probably did that already, but changed in v2.


>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>> Tail and head indexes are (masked from) uint32_t and do not wrap 
>>>>>>>>>>> around when the ring is full. I think you assume that the store 
>>>>>>>>>>> index is 0...size-1, while it's full uint32_t which is then masked 
>>>>>>>>>>> to get the actual index.
>>>>>>>>>>> 
>>>>>>>>>>> For example:
>>>>>>>>>>> size = 100;
>>>>>>>>>>> 
>>>>>>>>>>> Empty:
>>>>>>>>>>> head = 100
>>>>>>>>>>> tail = 100
>>>>>>>>>>> num = 100 - 100 = 0
>>>>>>>>>>> 
>>>>>>>>>>> Full:
>>>>>>>>>>> head = 100
>>>>>>>>>>> tail = 200
>>>>>>>>>>> num = 200 - 100 = 100
>>>>>>>>>>> 
>>>>>>>>>>> Wrap uint32_t + full:
>>>>>>>>>>> head = 0xFFFFFF9C
>>>>>>>>>>> tail = 0
>>>>>>>>>>> num = 0 - 0xFFFFFF9C = 0x64 = 100
>>>>>>>>>>> 
>>>>>>>>>>> So, no abs() needed. Ring size can be 4096, instead of 4095.


>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>> It's already documented 5 lines above:
>>>>>>>>>>>> 
>>>>>>>>>>>> /* Initialize ring. Ring size must be a power of two. */
>>>>>>>>>>>> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, 
>>>>>>>>>>>> uint32_t size)
>>>>>>>>>>>> {


>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>> This function converts 32 bit buffer indexes to buffer header 
>>>>>>>>>>>>> pointers. The counter operation is buffer_index_from_buf(). The 
>>>>>>>>>>>>> prefetch is a side effect of the function, which may be 
>>>>>>>>>>>>> changed/moved any time if it's found out that there's a place for 
>>>>>>>>>>>>> prefetching. I actually plan to test if number of prefetches 
>>>>>>>>>>>>> should be limited as e.g. 32 consecutive prefetches may be too 
>>>>>>>>>>>>> much for some CPU architectures.


>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>> I prefer style where '== 0' is used instead of '!'. Especially, 
>>>>>>>>>>>>>> when the if clause is as complex as this and there's danger for 
>>>>>>>>>>>>>> reader to miss the '!' sign. 


>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>> It's there to ensure that all bits are zero also when someone 
>>>>>>>>>>>>>>> would modify the bitfield from two to three fields later on. 
>>>>>>>>>>>>>>> Similarly to memset() zero is used for struct inits.


>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>> There's no need for abs(). Since it's all uint32_t variables, 
>>>>>>>>>>>>>>>> wrap a round is handled already.
>>>>>>>>>>>>>>>> An example in 8bits:
>>>>>>>>>>>>>>>> 0xff - 0xfd = 0x02
>>>>>>>>>>>>>>>> 0x00 - 0xfe = 0x02
>>>>>>>>>>>>>>>> 0x01 - 0xff = 0x02
>>>>>>>>>>>>>>>> 0x02 - 0x00 = 0x02
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> This passes both gcc and clang, and is used already in the 
>>>>>>>>>>>>>>>> other ring implementation see ring_deq_multi().


>>>>>>>>>>>>>>>>> Petri Savolainen(psavol) wrote:
>>>>>>>>>>>>>>>>> I prefer style with blank line in the end of a typedef, since 
>>>>>>>>>>>>>>>>> it's easier to spot the type name (as it's not mixed into 
>>>>>>>>>>>>>>>>> struct field names). Checkpatch passes so this style should 
>>>>>>>>>>>>>>>>> be OK.


>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>> Does this mean that sizes larger than 32 have no added 
>>>>>>>>>>>>>>>>>> performance benefit?


>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if 
>>>>>>>>>>>>>>>>>>> we're not going to use the user-supplied queue size.


>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>> Given its name, this looks like an extraneous statement 
>>>>>>>>>>>>>>>>>>>> that should be deleted. Renaming this to something like 
>>>>>>>>>>>>>>>>>>>> `prefetch_dequeued_bufs()` would make the intent clearer 
>>>>>>>>>>>>>>>>>>>> here.


>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>> `if (!ring_st_is_empty(&queue->s.ring_st))` seems more 
>>>>>>>>>>>>>>>>>>>>> natural here.


>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to 
>>>>>>>>>>>>>>>>>>>>>> handle the effective queue capacity. The user-supplied 
>>>>>>>>>>>>>>>>>>>>>> `size` should then be set to `ROUNDUP_POWER2_U32(size) - 
>>>>>>>>>>>>>>>>>>>>>> 1` for the masking to work properly.


>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>> Same comment here as for plain queues.


>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>> As noted earlier, due to "losing" one entry to 
>>>>>>>>>>>>>>>>>>>>>>>> distinguish queue empty/full, this should be returned 
>>>>>>>>>>>>>>>>>>>>>>>> as `CONFIG_QUEUE_SIZE - 1`, and we also need to ensure 
>>>>>>>>>>>>>>>>>>>>>>>> that `CONFIG_QUEUE_SIZE` is itself a power of 2.


>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> Since you're initializing `index.pool` and 
>>>>>>>>>>>>>>>>>>>>>>>>> `index.buffer` there's no need to set `index.u32` 
>>>>>>>>>>>>>>>>>>>>>>>>> here.


>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> We originally had this index partitioning based on 
>>>>>>>>>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS`. Do we want to return to that 
>>>>>>>>>>>>>>>>>>>>>>>>>> here?
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> If not, we at least need an `ODP_STATIC_ASSERT()` to 
>>>>>>>>>>>>>>>>>>>>>>>>>> ensure that `ODP_CONFIG_POOLS < 256` or else bad 
>>>>>>>>>>>>>>>>>>>>>>>>>> things will happen here.


>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> This routine can be optimized to:
>>>>>>>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>>>>>>>> return ring->head == ring->tail;
>>>>>>>>>>>>>>>>>>>>>>>>>>> ```


>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your invariant is the queue is empty when `head == 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> tail` therefore the queue is full when `abs(tail - 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> head) == mask`, so the correct calculation here is:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> `num = mask - abs(tail - head);`
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> The effect is that a queue can only hold `size - 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1` elements, otherwise you cannot distinguish 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> between a full and an empty queue without another 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> bit of metadata, which is a cost you're trying to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> avoid.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is somewhat problematic if the caller is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> trying to be "optimal" by specifying a power of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> two in the `size` parameter of the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_param_t` passed to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_create()`. For this reason we may wish 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> to return a `max_size` of a power of 2 - 1 in 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> `odp_queue_capability()` as part of this patch 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> series.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This only works if `size` is a power of 2. Should 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be documented as such, since this is an internal 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> routine. In this case an `ODP_ASSERT(size == 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ROUNDUP_POWER2_U32(size))` for this requirement 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be a useful debugging aid.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be `num = abs(tail - head);` to deal with 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrap arounds, otherwise may be misinterpreted as 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> overly large since it's `uint32_t`. Note that 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> GCC and clang recognize `abs()` and treat it as 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a builtin, so there's no actual `stdlib.h` call 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Extra blank line should be removed (nit).


https://github.com/Linaro/odp/pull/492#discussion_r169951888
updated_at 2018-02-22 13:22:36

Reply via email to