Petri Savolainen(psavol) replied on github web page: platform/linux-generic/odp_queue.c line 95 @@ -143,8 +150,10 @@ static int queue_capability(odp_queue_capability_t *capa) capa->max_sched_groups = sched_fn->num_grps(); capa->sched_prios = odp_schedule_num_prio(); capa->plain.max_num = capa->max_queues; + capa->plain.max_size = CONFIG_QUEUE_SIZE; capa->plain.nonblocking = ODP_BLOCKING; capa->sched.max_num = capa->max_queues; + capa->sched.max_size = CONFIG_QUEUE_SIZE;
Comment: One entry is not lost. > Petri Savolainen(psavol) wrote: > One entry is not lost. >> Petri Savolainen(psavol) wrote: >> OK, added checks in v2. >>> Petri Savolainen(psavol) wrote: >>> OK. Compiler probably did that already, but changed in v2. >>>> Petri Savolainen(psavol) wrote: >>>> Tail and head indexes are (masked from) uint32_t and do not wrap around >>>> when the ring is full. I think you assume that the store index is >>>> 0...size-1, while it's full uint32_t which is then masked to get the >>>> actual index. >>>> >>>> For example: >>>> size = 100; >>>> >>>> Empty: >>>> head = 100 >>>> tail = 100 >>>> num = 100 - 100 = 0 >>>> >>>> Full: >>>> head = 100 >>>> tail = 200 >>>> num = 200 - 100 = 100 >>>> >>>> Wrap uint32_t + full: >>>> head = 0xFFFFFF9C >>>> tail = 0 >>>> num = 0 - 0xFFFFFF9C = 0x64 = 100 >>>> >>>> So, no abs() needed. Ring size can be 4096, instead of 4095. >>>>> Petri Savolainen(psavol) wrote: >>>>> It's already documented 5 lines above: >>>>> >>>>> /* Initialize ring. Ring size must be a power of two. */ >>>>> static inline void ring_st_init(ring_st_t *ring, uint32_t *data, uint32_t >>>>> size) >>>>> { >>>>>> Petri Savolainen(psavol) wrote: >>>>>> This function converts 32 bit buffer indexes to buffer header pointers. >>>>>> The counter operation is buffer_index_from_buf(). The prefetch is a side >>>>>> effect of the function, which may be changed/moved any time if it's >>>>>> found out that there's a place for prefetching. I actually plan to test >>>>>> if number of prefetches should be limited as e.g. 32 consecutive >>>>>> prefetches may be too much for some CPU architectures. >>>>>>> Petri Savolainen(psavol) wrote: >>>>>>> I prefer style where '== 0' is used instead of '!'. Especially, when >>>>>>> the if clause is as complex as this and there's danger for reader to >>>>>>> miss the '!' sign. >>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>> It's there to ensure that all bits are zero also when someone would >>>>>>>> modify the bitfield from two to three fields later on. Similarly to >>>>>>>> memset() zero is used for struct inits. >>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>> There's no need for abs(). Since it's all uint32_t variables, wrap a >>>>>>>>> round is handled already. >>>>>>>>> An example in 8bits: >>>>>>>>> 0xff - 0xfd = 0x02 >>>>>>>>> 0x00 - 0xfe = 0x02 >>>>>>>>> 0x01 - 0xff = 0x02 >>>>>>>>> 0x02 - 0x00 = 0x02 >>>>>>>>> >>>>>>>>> This passes both gcc and clang, and is used already in the other ring >>>>>>>>> implementation see ring_deq_multi(). >>>>>>>>>> Petri Savolainen(psavol) wrote: >>>>>>>>>> I prefer style with blank line in the end of a typedef, since it's >>>>>>>>>> easier to spot the type name (as it's not mixed into struct field >>>>>>>>>> names). Checkpatch passes so this style should be OK. >>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>> Does this mean that sizes larger than 32 have no added performance >>>>>>>>>>> benefit? >>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>> Must use `CONFIG_QUEUE_SIZE - 1` here, as noted earlier, if we're >>>>>>>>>>>> not going to use the user-supplied queue size. >>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>> Given its name, this looks like an extraneous statement that >>>>>>>>>>>>> should be deleted. Renaming this to something like >>>>>>>>>>>>> `prefetch_dequeued_bufs()` would make the intent clearer here. >>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>> `if (!ring_st_is_empty(&queue->s.ring_st))` seems more natural >>>>>>>>>>>>>> here. >>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>> Change to `if (param->size >= CONFIG_QUEUE_SIZE)` to handle the >>>>>>>>>>>>>>> effective queue capacity. The user-supplied `size` should then >>>>>>>>>>>>>>> be set to `ROUNDUP_POWER2_U32(size) - 1` for the masking to >>>>>>>>>>>>>>> work properly. >>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>> Same comment here as for plain queues. >>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>> As noted earlier, due to "losing" one entry to distinguish >>>>>>>>>>>>>>>>> queue empty/full, this should be returned as >>>>>>>>>>>>>>>>> `CONFIG_QUEUE_SIZE - 1`, and we also need to ensure that >>>>>>>>>>>>>>>>> `CONFIG_QUEUE_SIZE` is itself a power of 2. >>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>> Since you're initializing `index.pool` and `index.buffer` >>>>>>>>>>>>>>>>>> there's no need to set `index.u32` here. >>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>> We originally had this index partitioning based on >>>>>>>>>>>>>>>>>>> `ODP_CONFIG_POOLS`. Do we want to return to that here? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> If not, we at least need an `ODP_STATIC_ASSERT()` to ensure >>>>>>>>>>>>>>>>>>> that `ODP_CONFIG_POOLS < 256` or else bad things will >>>>>>>>>>>>>>>>>>> happen here. >>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>> This routine can be optimized to: >>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>> return ring->head == ring->tail; >>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>> Your invariant is the queue is empty when `head == tail` >>>>>>>>>>>>>>>>>>>>> therefore the queue is full when `abs(tail - head) == >>>>>>>>>>>>>>>>>>>>> mask`, so the correct calculation here is: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> `num = mask - abs(tail - head);` >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The effect is that a queue can only hold `size - 1` >>>>>>>>>>>>>>>>>>>>> elements, otherwise you cannot distinguish between a full >>>>>>>>>>>>>>>>>>>>> and an empty queue without another bit of metadata, which >>>>>>>>>>>>>>>>>>>>> is a cost you're trying to avoid. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> This is somewhat problematic if the caller is trying to >>>>>>>>>>>>>>>>>>>>> be "optimal" by specifying a power of two in the `size` >>>>>>>>>>>>>>>>>>>>> parameter of the `odp_queue_param_t` passed to >>>>>>>>>>>>>>>>>>>>> `odp_queue_create()`. For this reason we may wish to >>>>>>>>>>>>>>>>>>>>> return a `max_size` of a power of 2 - 1 in >>>>>>>>>>>>>>>>>>>>> `odp_queue_capability()` as part of this patch series. >>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>> This only works if `size` is a power of 2. Should be >>>>>>>>>>>>>>>>>>>>>> documented as such, since this is an internal routine. >>>>>>>>>>>>>>>>>>>>>> In this case an `ODP_ASSERT(size == >>>>>>>>>>>>>>>>>>>>>> ROUNDUP_POWER2_U32(size))` for this requirement would be >>>>>>>>>>>>>>>>>>>>>> a useful debugging aid. >>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>> should be `num = abs(tail - head);` to deal with wrap >>>>>>>>>>>>>>>>>>>>>>> arounds, otherwise may be misinterpreted as overly >>>>>>>>>>>>>>>>>>>>>>> large since it's `uint32_t`. Note that GCC and clang >>>>>>>>>>>>>>>>>>>>>>> recognize `abs()` and treat it as a builtin, so there's >>>>>>>>>>>>>>>>>>>>>>> no actual `stdlib.h` call here. >>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>> Extra blank line should be removed (nit). https://github.com/Linaro/odp/pull/492#discussion_r169891663 updated_at 2018-02-22 09:32:40