[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-07-17 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887192#comment-16887192
 ] 

ASF subversion and git services commented on DISPATCH-1372:
---

Commit 94180b80d603ffa3a3ce33fad85eda3659357b8f in qpid-dispatch's branch 
refs/heads/master from Francesco Nigro
[ https://gitbox.apache.org/repos/asf?p=qpid-dispatch.git;h=94180b8 ]

DISPATCH-1372 alloc_pool intrusive linked list can be replaced by a linked stack

Signed-off-by: Kenneth Giusti 

This closes #525


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-07-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887193#comment-16887193
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

asfgit commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-07-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887010#comment-16887010
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

codecov-io commented on issue #525: DISPATCH-1372 alloc_pool intrusive linked 
list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#issuecomment-504422052
 
 
   # 
[Codecov](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=h1) 
Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@bf3a1aa`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `93.33%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/qpid-dispatch/pull/525/graphs/tree.svg?width=650&token=rk2Cgd27pP&height=150&src=pr)](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master #525   +/-   ##
   =
 Coverage  ?   86.87%   
   =
 Files ?   87   
 Lines ?19599   
 Branches  ?0   
   =
 Hits  ?17026   
 Misses? 2573   
 Partials  ?0
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[src/alloc\_pool.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL2FsbG9jX3Bvb2wuYw==)
 | `95.96% <93.33%> (ø)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=footer).
 Last update 
[bf3a1aa...c8cb95b](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-07-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886403#comment-16886403
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r304081272
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,145 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+//128 has been chosen because many CPUs arch use an
+//adiacent line prefetching optimization that load
+//2*cache line bytes in batch
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev; //do not use DEQ_LINKS here: 
field position could affect access cost
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
+uint64_t  size;
+qd_alloc_chunk_t  base_chunk;
+};
+
+static inline void init_stack(qd_alloc_linked_stack_t *stack)
+{
+stack->top_chunk = &stack->base_chunk;
+stack->top_chunk->next = NULL;
+stack->top = 0;
+stack->size = 0;
+}
+
+static inline void prev_chunk_stack(qd_alloc_linked_stack_t *const stack)
+{
+const uint32_t chunk_size = CHUNK_SIZE;
+assert(stack->top == 0);
+assert(stack->size != 0);
+assert(stack->top_chunk != &stack->base_chunk);
+qd_alloc_chunk_t *prev = stack->top_chunk->prev;
+//TODO(franz):  stack->top_chunk could be passed externally and walked its 
nexts
+//  to recycle the last chunk.
+//  Just need to pay attention to null out 
released_chunk->prev->next
+//  to make it unreachable from the stack
+stack->top_chunk = prev;
+stack->top = chunk_size;
+}
+
+static inline qd_alloc_item_t *pop_stack(qd_alloc_linked_stack_t *const stack)
+{
+if (stack->top == 0) {
+if (stack->size == 0) {
+assert(stack->top_chunk == &stack->base_chunk);
+return NULL;
+}
+prev_chunk_stack(stack);
+}
+stack->top--;
+assert(stack->top >= 0 && stack->top < CHUNK_SIZE);
+stack->size--;
+assert(stack->size >= 0);
+qd_alloc_item_t *item = stack->top_chunk->items[stack->top];
+assert(item != NULL);
+return item;
+}
+
+static inline void free_stack_chunks(qd_alloc_linked_stack_t *stack)
+{
+assert(stack->size == 0);
+//the assumption here is that next is always correctly set
+qd_alloc_chunk_t *chunk = stack->base_chunk.next;
+while (chunk != NULL) {
+qd_alloc_chunk_t *next = chunk->next;
+free(chunk);
+chunk = next;
+}
+}
+
+static inline void next_chunk_stack(qd_alloc_linked_stack_t *const stack)
+{
+assert(stack->top == CHUNK_SIZE);
+qd_alloc_chunk_t *top = stack->top_chunk->next;
+if (top == NULL) {
+top = NEW(qd_alloc_chunk_t);
+stack->top_chunk->next = top;
+top->prev = stack->top_chunk;
+top->next = NULL;
+}
+assert(top->prev == stack->top_chunk);
+assert(stack->top_chunk->next == top);
+stack->top_chunk = top;
+stack->top = 0;
+}
+
+static inline void push_stack(qd_alloc_linked_stack_t *stack, qd_alloc_item_t 
*item)
+{
+const uint32_t chunk_size = CHUNK_SIZE;
+if (stack->top == chunk_size) {
+next_chunk_stack(stack);
+}
+stack->size++;
+stack->top_chunk->items[stack->top] = item;
+stack->top++;
+}
+
+static inline int unordered_move_stack(qd_alloc_linked_stack_t *from, 
qd_alloc_linked_stack_t *to, uint32_t length)
+{
+length = from->size < length ? from->size : length;
 
 Review comment:
   It never happen: i have implemented this method in a general fashion, but 
there ATM there is no real call site that allows it to happen. I have been 
lucky that the way I've implemented it has allowed too handle the case where a 
new chunk can't be allocated too
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> ---

[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-07-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886399#comment-16886399
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

ChugR commented on pull request #525: DISPATCH-1372 alloc_pool intrusive linked 
list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r303900546
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,145 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+//128 has been chosen because many CPUs arch use an
+//adiacent line prefetching optimization that load
+//2*cache line bytes in batch
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev; //do not use DEQ_LINKS here: 
field position could affect access cost
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
+uint64_t  size;
+qd_alloc_chunk_t  base_chunk;
+};
+
+static inline void init_stack(qd_alloc_linked_stack_t *stack)
+{
+stack->top_chunk = &stack->base_chunk;
+stack->top_chunk->next = NULL;
+stack->top = 0;
+stack->size = 0;
+}
+
+static inline void prev_chunk_stack(qd_alloc_linked_stack_t *const stack)
+{
+const uint32_t chunk_size = CHUNK_SIZE;
+assert(stack->top == 0);
+assert(stack->size != 0);
+assert(stack->top_chunk != &stack->base_chunk);
+qd_alloc_chunk_t *prev = stack->top_chunk->prev;
+//TODO(franz):  stack->top_chunk could be passed externally and walked its 
nexts
+//  to recycle the last chunk.
+//  Just need to pay attention to null out 
released_chunk->prev->next
+//  to make it unreachable from the stack
+stack->top_chunk = prev;
+stack->top = chunk_size;
+}
+
+static inline qd_alloc_item_t *pop_stack(qd_alloc_linked_stack_t *const stack)
+{
+if (stack->top == 0) {
+if (stack->size == 0) {
+assert(stack->top_chunk == &stack->base_chunk);
+return NULL;
+}
+prev_chunk_stack(stack);
+}
+stack->top--;
+assert(stack->top >= 0 && stack->top < CHUNK_SIZE);
+stack->size--;
+assert(stack->size >= 0);
+qd_alloc_item_t *item = stack->top_chunk->items[stack->top];
+assert(item != NULL);
+return item;
+}
+
+static inline void free_stack_chunks(qd_alloc_linked_stack_t *stack)
+{
+assert(stack->size == 0);
+//the assumption here is that next is always correctly set
+qd_alloc_chunk_t *chunk = stack->base_chunk.next;
+while (chunk != NULL) {
+qd_alloc_chunk_t *next = chunk->next;
+free(chunk);
+chunk = next;
+}
+}
+
+static inline void next_chunk_stack(qd_alloc_linked_stack_t *const stack)
+{
+assert(stack->top == CHUNK_SIZE);
+qd_alloc_chunk_t *top = stack->top_chunk->next;
+if (top == NULL) {
+top = NEW(qd_alloc_chunk_t);
+stack->top_chunk->next = top;
+top->prev = stack->top_chunk;
+top->next = NULL;
+}
+assert(top->prev == stack->top_chunk);
+assert(stack->top_chunk->next == top);
+stack->top_chunk = top;
+stack->top = 0;
+}
+
+static inline void push_stack(qd_alloc_linked_stack_t *stack, qd_alloc_item_t 
*item)
+{
+const uint32_t chunk_size = CHUNK_SIZE;
+if (stack->top == chunk_size) {
+next_chunk_stack(stack);
+}
+stack->size++;
+stack->top_chunk->items[stack->top] = item;
+stack->top++;
+}
+
+static inline int unordered_move_stack(qd_alloc_linked_stack_t *from, 
qd_alloc_linked_stack_t *to, uint32_t length)
+{
+length = from->size < length ? from->size : length;
 
 Review comment:
   How often is the _length_ value lowered because from->size is too small? 
This event will mess up QD_MEMORY_STATS stats->held_by_threads since that value 
is always incremented or decremented by the config->transfer_batch_size and not 
by the smaller value assigned to _length_.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> 

[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-07-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882739#comment-16882739
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

codecov-io commented on issue #525: DISPATCH-1372 alloc_pool intrusive linked 
list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#issuecomment-504422052
 
 
   # 
[Codecov](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=h1) 
Report
   > Merging 
[#525](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/qpid-dispatch/commit/f22766996be12f5a0d6b388d04be6d7ccad1eb80?src=pr&el=desc)
 will **decrease** coverage by `0.04%`.
   > The diff coverage is `98.85%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/qpid-dispatch/pull/525/graphs/tree.svg?width=650&token=rk2Cgd27pP&height=150&src=pr)](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master #525  +/-   ##
   ==
   - Coverage   86.91%   86.86%   -0.05% 
   ==
 Files  87   87  
 Lines   1952119578  +57 
   ==
   + Hits1696717007  +40 
   - Misses   2554 2571  +17
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[src/alloc\_pool.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL2FsbG9jX3Bvb2wuYw==)
 | `98.27% <98.85%> (+0.02%)` | :arrow_up: |
   | 
[src/parse.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3BhcnNlLmM=)
 | `85% <0%> (-1.4%)` | :arrow_down: |
   | 
[...core/modules/address\_lookup\_client/lookup\_client.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL21vZHVsZXMvYWRkcmVzc19sb29rdXBfY2xpZW50L2xvb2t1cF9jbGllbnQuYw==)
 | `91.63% <0%> (-1.05%)` | :arrow_down: |
   | 
[src/router\_core/forwarder.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL2ZvcndhcmRlci5j)
 | `93.1% <0%> (-0.69%)` | :arrow_down: |
   | 
[src/router\_core/agent\_link.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL2FnZW50X2xpbmsuYw==)
 | `67.52% <0%> (-0.52%)` | :arrow_down: |
   | 
[src/container.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL2NvbnRhaW5lci5j)
 | `79.41% <0%> (-0.23%)` | :arrow_down: |
   | 
[src/iterator.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL2l0ZXJhdG9yLmM=)
 | `89.03% <0%> (-0.2%)` | :arrow_down: |
   | 
[src/router\_core/router\_core.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL3JvdXRlcl9jb3JlLmM=)
 | `86.53% <0%> (-0.2%)` | :arrow_down: |
   | 
[src/router\_node.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9ub2RlLmM=)
 | `93.69% <0%> (-0.13%)` | :arrow_down: |
   | 
[src/router\_core/transfer.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL3RyYW5zZmVyLmM=)
 | `95.37% <0%> (-0.03%)` | :arrow_down: |
   | ... and [3 
more](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=footer).
 Last update 
[f227669...124b292](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Fra

[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-07-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877834#comment-16877834
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r299326869
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,142 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+//128 has been chosen because many CPUs arch use an
+//adiacent line prefetching optimization that load
+//2*cache line bytes in batch
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev; //do not use DEQ_LINKS here: 
field position could affect access cost
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
+uint64_t  size;
+qd_alloc_chunk_t  base_chunk;
+};
+
+static inline void init_stack(qd_alloc_linked_stack_t *stack)
+{
+stack->top_chunk = &stack->base_chunk;
+stack->top_chunk->next = NULL;
+stack->top = 0;
+stack->size = 0;
+}
+
+static inline void prev_chunk_stack(qd_alloc_linked_stack_t *const stack)
+{
+assert(stack->top == 0);
+assert(stack->size != 0);
+assert(stack->top_chunk != &stack->base_chunk);
+qd_alloc_chunk_t *prev = stack->top_chunk->prev;
+//TODO(franz):  stack->top_chunk could be passed externally and walked its 
nexts
+//  to recycle the last chunk.
+//  Just need to pay attention to null out 
released_chunk->prev->next
+//  to make it unreachable from the stack
+stack->top_chunk = prev;
+stack->top = CHUNK_SIZE;
+}
+
+static inline qd_alloc_item_t *pop_stack(qd_alloc_linked_stack_t *const stack)
+{
+if (stack->top == 0) {
+if (stack->size == 0) {
+assert(stack->top_chunk == &stack->base_chunk);
+return NULL;
+}
+prev_chunk_stack(stack);
+}
+stack->top--;
+assert(stack->top >= 0 && stack->top < CHUNK_SIZE);
+stack->size--;
+assert(stack->size >= 0);
+qd_alloc_item_t *item = stack->top_chunk->items[stack->top];
+assert(item != NULL);
+return item;
+}
+
+static inline void free_stack_chunks(qd_alloc_linked_stack_t *stack)
+{
+assert(stack->size == 0);
+//the assumption here is that next is always correctly set
+qd_alloc_chunk_t *chunk = stack->base_chunk.next;
+while (chunk != NULL) {
+qd_alloc_chunk_t *next = chunk->next;
+free(chunk);
+chunk = next;
+}
+}
+
+static inline void next_chunk_stack(qd_alloc_linked_stack_t *const stack)
+{
+assert(stack->top == CHUNK_SIZE);
+qd_alloc_chunk_t *top = stack->top_chunk->next;
+if (top == NULL) {
+top = NEW(qd_alloc_chunk_t);
 
 Review comment:
   @astitcher @kgiusti I was thinking to allocate new chunks aligned to cache 
line size, to help `unordered_move_stack` `memcpy`s, wdyt? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple 

[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-07-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876734#comment-16876734
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

codecov-io commented on issue #525: DISPATCH-1372 alloc_pool intrusive linked 
list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#issuecomment-504422052
 
 
   # 
[Codecov](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=h1) 
Report
   > Merging 
[#525](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/qpid-dispatch/commit/bd4e52f616b5e53056c0e296aa096ae660e7af70?src=pr&el=desc)
 will **increase** coverage by `0.03%`.
   > The diff coverage is `98.85%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/qpid-dispatch/pull/525/graphs/tree.svg?width=650&token=rk2Cgd27pP&height=150&src=pr)](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master #525  +/-   ##
   ==
   + Coverage   86.82%   86.86%   +0.03% 
   ==
 Files  87   87  
 Lines   1952019578  +58 
   ==
   + Hits1694917007  +58 
 Misses   2571 2571
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[src/alloc\_pool.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL2FsbG9jX3Bvb2wuYw==)
 | `98.27% <98.85%> (+0.02%)` | :arrow_up: |
   | 
[...core/modules/address\_lookup\_client/lookup\_client.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL21vZHVsZXMvYWRkcmVzc19sb29rdXBfY2xpZW50L2xvb2t1cF9jbGllbnQuYw==)
 | `91.63% <0%> (-1.05%)` | :arrow_down: |
   | 
[src/router\_core/forwarder.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL2ZvcndhcmRlci5j)
 | `93.1% <0%> (-0.69%)` | :arrow_down: |
   | 
[src/container.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL2NvbnRhaW5lci5j)
 | `79.41% <0%> (-0.19%)` | :arrow_down: |
   | 
[src/message.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL21lc3NhZ2UuYw==)
 | `88.77% <0%> (-0.03%)` | :arrow_down: |
   | 
[src/router\_node.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9ub2RlLmM=)
 | `93.69% <0%> (ø)` | :arrow_up: |
   | 
[src/router\_core/connections.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL2Nvbm5lY3Rpb25zLmM=)
 | `94.81% <0%> (+0.11%)` | :arrow_up: |
   | 
[src/router\_core/transfer.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL3RyYW5zZmVyLmM=)
 | `95.37% <0%> (+0.77%)` | :arrow_up: |
   | 
[...c/router\_core/modules/test\_hooks/core\_test\_hooks.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL21vZHVsZXMvdGVzdF9ob29rcy9jb3JlX3Rlc3RfaG9va3MuYw==)
 | `93.94% <0%> (+1.27%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=footer).
 Last update 
[bd4e52f...72b6929](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
>

[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-07-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876728#comment-16876728
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r299326869
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,142 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+//128 has been chosen because many CPUs arch use an
+//adiacent line prefetching optimization that load
+//2*cache line bytes in batch
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev; //do not use DEQ_LINKS here: 
field position could affect access cost
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
+uint64_t  size;
+qd_alloc_chunk_t  base_chunk;
+};
+
+static inline void init_stack(qd_alloc_linked_stack_t *stack)
+{
+stack->top_chunk = &stack->base_chunk;
+stack->top_chunk->next = NULL;
+stack->top = 0;
+stack->size = 0;
+}
+
+static inline void prev_chunk_stack(qd_alloc_linked_stack_t *const stack)
+{
+assert(stack->top == 0);
+assert(stack->size != 0);
+assert(stack->top_chunk != &stack->base_chunk);
+qd_alloc_chunk_t *prev = stack->top_chunk->prev;
+//TODO(franz):  stack->top_chunk could be passed externally and walked its 
nexts
+//  to recycle the last chunk.
+//  Just need to pay attention to null out 
released_chunk->prev->next
+//  to make it unreachable from the stack
+stack->top_chunk = prev;
+stack->top = CHUNK_SIZE;
+}
+
+static inline qd_alloc_item_t *pop_stack(qd_alloc_linked_stack_t *const stack)
+{
+if (stack->top == 0) {
+if (stack->size == 0) {
+assert(stack->top_chunk == &stack->base_chunk);
+return NULL;
+}
+prev_chunk_stack(stack);
+}
+stack->top--;
+assert(stack->top >= 0 && stack->top < CHUNK_SIZE);
+stack->size--;
+assert(stack->size >= 0);
+qd_alloc_item_t *item = stack->top_chunk->items[stack->top];
+assert(item != NULL);
+return item;
+}
+
+static inline void free_stack_chunks(qd_alloc_linked_stack_t *stack)
+{
+assert(stack->size == 0);
+//the assumption here is that next is always correctly set
+qd_alloc_chunk_t *chunk = stack->base_chunk.next;
+while (chunk != NULL) {
+qd_alloc_chunk_t *next = chunk->next;
+free(chunk);
+chunk = next;
+}
+}
+
+static inline void next_chunk_stack(qd_alloc_linked_stack_t *const stack)
+{
+assert(stack->top == CHUNK_SIZE);
+qd_alloc_chunk_t *top = stack->top_chunk->next;
+if (top == NULL) {
+top = NEW(qd_alloc_chunk_t);
 
 Review comment:
   @astitcher @kgiusti I was thiking to allocate new chunkcs aligned to cache 
line size, to help `unordered_move_stack` `memcpy`s, wdyt? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple 

[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-07-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876729#comment-16876729
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r299326869
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,142 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+//128 has been chosen because many CPUs arch use an
+//adiacent line prefetching optimization that load
+//2*cache line bytes in batch
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev; //do not use DEQ_LINKS here: 
field position could affect access cost
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
+uint64_t  size;
+qd_alloc_chunk_t  base_chunk;
+};
+
+static inline void init_stack(qd_alloc_linked_stack_t *stack)
+{
+stack->top_chunk = &stack->base_chunk;
+stack->top_chunk->next = NULL;
+stack->top = 0;
+stack->size = 0;
+}
+
+static inline void prev_chunk_stack(qd_alloc_linked_stack_t *const stack)
+{
+assert(stack->top == 0);
+assert(stack->size != 0);
+assert(stack->top_chunk != &stack->base_chunk);
+qd_alloc_chunk_t *prev = stack->top_chunk->prev;
+//TODO(franz):  stack->top_chunk could be passed externally and walked its 
nexts
+//  to recycle the last chunk.
+//  Just need to pay attention to null out 
released_chunk->prev->next
+//  to make it unreachable from the stack
+stack->top_chunk = prev;
+stack->top = CHUNK_SIZE;
+}
+
+static inline qd_alloc_item_t *pop_stack(qd_alloc_linked_stack_t *const stack)
+{
+if (stack->top == 0) {
+if (stack->size == 0) {
+assert(stack->top_chunk == &stack->base_chunk);
+return NULL;
+}
+prev_chunk_stack(stack);
+}
+stack->top--;
+assert(stack->top >= 0 && stack->top < CHUNK_SIZE);
+stack->size--;
+assert(stack->size >= 0);
+qd_alloc_item_t *item = stack->top_chunk->items[stack->top];
+assert(item != NULL);
+return item;
+}
+
+static inline void free_stack_chunks(qd_alloc_linked_stack_t *stack)
+{
+assert(stack->size == 0);
+//the assumption here is that next is always correctly set
+qd_alloc_chunk_t *chunk = stack->base_chunk.next;
+while (chunk != NULL) {
+qd_alloc_chunk_t *next = chunk->next;
+free(chunk);
+chunk = next;
+}
+}
+
+static inline void next_chunk_stack(qd_alloc_linked_stack_t *const stack)
+{
+assert(stack->top == CHUNK_SIZE);
+qd_alloc_chunk_t *top = stack->top_chunk->next;
+if (top == NULL) {
+top = NEW(qd_alloc_chunk_t);
 
 Review comment:
   @astitcher @kgiusti I was thinking to allocate new chunks aligned to cache 
line size, to help `unordered_move_stack` `memcpy`s, wdyt? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple 

[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-07-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876726#comment-16876726
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on issue #525: DISPATCH-1372 alloc_pool intrusive linked 
list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#issuecomment-507545028
 
 
   @kgiusti I've just squashed the commit with the batch move between stacks ie 
`unordered_move_stack`. 
   The benchmarks shows that it delivers a small (but measurable) speedup over 
the original version, so IMO it worths to be used. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875420#comment-16875420
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on issue #525: DISPATCH-1372 alloc_pool intrusive linked 
list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#issuecomment-506937548
 
 
   @kgiusti 
   I've tested with a long running multiple pairs run with 25m messages and 2 
routers (+ 2 clients pairs) without getting any error on shutdown: I need some 
help to reproduce the failure...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875419#comment-16875419
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on issue #525: DISPATCH-1372 alloc_pool intrusive linked 
list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#issuecomment-506878556
 
 
   @kgiusti Do you know if it was the version with the first or the last 
commit? The second commit has introduced a memcpy and has been written super in 
hurry, maybe the first commit ie a946341f711a1d22c144299d846c6d9c804538e4 
doesn't have that bug...
   I've run long tests (2 hours each) with several clients (4) on both the 
commits with/without debug mode and I haven't hit this same failure :(
   I will try to do the same with 2 routers to se if I'm able to hit it...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875416#comment-16875416
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on issue #525: DISPATCH-1372 alloc_pool intrusive linked 
list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#issuecomment-506937548
 
 
   @kgiusti 
   I've tested with a long running multiple pairs run with 25m messages and 2 
routers without getting any error on shutdown: I need some help to reproduce 
the failure...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875408#comment-16875408
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on issue #525: DISPATCH-1372 alloc_pool intrusive linked 
list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#issuecomment-506878556
 
 
   @kgiusti Do you know if it was the version with the first or the last 
commit? The second commit has introduced a memcpy and has been written super in 
hurry, maybe the first commit ie a946341f711a1d22c144299d846c6d9c804538e4 
doesn't have that bug...
   I've run long tests (2 hour each) with several clients (4) on both the 
commits with/without debug mode and I haven't hit this same failure :(
   I will try to do the same with 2 routers to se if I'm able to hit it...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875238#comment-16875238
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on issue #525: DISPATCH-1372 alloc_pool intrusive linked 
list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#issuecomment-506878556
 
 
   @kgiusti Do you know if it was the version with the first or the last 
commit? The second commit has introduced a memcpy and has been written super in 
hurry, maybe the first commit ie a946341f711a1d22c144299d846c6d9c804538e4 
doesn't have that bug...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875237#comment-16875237
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on issue #525: DISPATCH-1372 alloc_pool intrusive linked 
list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#issuecomment-506878556
 
 
   @kgiusti Do you know if it was the version with the first or the last 
commit? The second commit has introduced a memcpy, maybe the first commit ie 
a946341f711a1d22c144299d846c6d9c804538e4 doesn't have that bug...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875233#comment-16875233
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on issue #525: DISPATCH-1372 alloc_pool intrusive linked 
list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#issuecomment-506878556
 
 
   @kgiusti Do you know if it was the version with the first or the last 
commit? The second commit has introduced a memcpy, maybe the first commit ie 
a946341f711a1d22c144299d846c6d9c804538e4 it doesn't have that bug...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875232#comment-16875232
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on issue #525: DISPATCH-1372 alloc_pool intrusive linked 
list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#issuecomment-506878556
 
 
   @kgiusti Do you know if it was the version with the first or the last 
commit? The second commit has introduced a memcpy
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874461#comment-16874461
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

codecov-io commented on issue #525: DISPATCH-1372 alloc_pool intrusive linked 
list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#issuecomment-504422052
 
 
   # 
[Codecov](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=h1) 
Report
   > Merging 
[#525](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/qpid-dispatch/commit/5885b4fa29f81cf394521ca6729538fe6273263f?src=pr&el=desc)
 will **decrease** coverage by `<.01%`.
   > The diff coverage is `98.85%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/qpid-dispatch/pull/525/graphs/tree.svg?width=650&token=rk2Cgd27pP&height=150&src=pr)](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master #525  +/-   ##
   ==
   - Coverage   86.86%   86.86%   -0.01% 
   ==
 Files  87   87  
 Lines   1951919578  +59 
   ==
   + Hits1695617007  +51 
   - Misses   2563 2571   +8
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[src/alloc\_pool.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL2FsbG9jX3Bvb2wuYw==)
 | `98.27% <98.85%> (+0.02%)` | :arrow_up: |
   | 
[...core/modules/address\_lookup\_client/lookup\_client.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL21vZHVsZXMvYWRkcmVzc19sb29rdXBfY2xpZW50L2xvb2t1cF9jbGllbnQuYw==)
 | `91.63% <0%> (-1.05%)` | :arrow_down: |
   | 
[src/router\_core/forwarder.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL2ZvcndhcmRlci5j)
 | `93.1% <0%> (-0.69%)` | :arrow_down: |
   | 
[src/router\_core/agent\_link.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL2FnZW50X2xpbmsuYw==)
 | `67.52% <0%> (-0.52%)` | :arrow_down: |
   | 
[src/router\_core/router\_core.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL3JvdXRlcl9jb3JlLmM=)
 | `86.53% <0%> (-0.2%)` | :arrow_down: |
   | 
[src/container.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL2NvbnRhaW5lci5j)
 | `79.41% <0%> (-0.19%)` | :arrow_down: |
   | 
[src/message.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL21lc3NhZ2UuYw==)
 | `88.77% <0%> (-0.02%)` | :arrow_down: |
   | 
[src/router\_core/connections.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL2Nvbm5lY3Rpb25zLmM=)
 | `94.81% <0%> (+0.11%)` | :arrow_up: |
   | 
[src/router\_node.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9ub2RlLmM=)
 | `93.69% <0%> (+0.12%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=footer).
 Last update 
[5885b4f...763ab03](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a i

[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874460#comment-16874460
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r298338495
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,142 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+//128 has been chosen because many CPUs arch use an
+//adiacent line prefetching optimization that load
+//2*cache line bytes in batch
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
 
 Review comment:
   I'm gonna take a look to 
https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf
 again, to be sure...let me know wdyt
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874398#comment-16874398
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r298306316
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,142 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+//128 has been chosen because many CPUs arch use an
+//adiacent line prefetching optimization that load
+//2*cache line bytes in batch
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
 
 Review comment:
   Yep, but that will make the default transfert size not anymore a multiple of 
chunk size :O
   I need to write that too in the comments..
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874390#comment-16874390
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

ganeshmurthy commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r298301407
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,142 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+//128 has been chosen because many CPUs arch use an
+//adiacent line prefetching optimization that load
+//2*cache line bytes in batch
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
 
 Review comment:
   define CHUNK_SIZE 128/sizeof(void*) - 2
   That way, like you said, one qd_alloc_chunk_t will fit in one  Adjacent 
Cache-Line Prefetch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874367#comment-16874367
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r298285848
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,142 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+//128 has been chosen because many CPUs arch use an
+//adiacent line prefetching optimization that load
+//2*cache line bytes in batch
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
 
 Review comment:
   That's correct,  the thing is that ideally I should have used next/prev to 
reach the 128 bytes (ie 128 bytes should be the size of the chunk struct), but 
I have chosen to order the fields according to the LIFO access pattern (prev is 
accessed only If the first item was the last accessed) and left the adjacent 
line prefetcher do the dirty work for the other items not accessed yet. My bet 
was that the  regular access pattern would make the cache prefetching do its 
work to fetch what has been left outside by the adjacent line prefetcher.
   We could try to do that: now is the better time to attempt to make it faster.
   Feel free to modify and ask @mgoulish to test it so if the perf will 
increase I will merge it here :+1: 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874364#comment-16874364
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r298285848
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,142 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+//128 has been chosen because many CPUs arch use an
+//adiacent line prefetching optimization that load
+//2*cache line bytes in batch
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
 
 Review comment:
   That's correct,  the thing is that ideally I should have used next/prev to 
reach the 128 bytes (ie 128 bytes should be the size of the chunk struct), but 
I have chosen to order the fields according to the LIFO access pattern (prev is 
accessed only If the first item was the last accessed) and left the adjacent 
line prefetcher do the dirty work for the other items not accessed yet.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874355#comment-16874355
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

ganeshmurthy commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r298281051
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,142 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+//128 has been chosen because many CPUs arch use an
+//adiacent line prefetching optimization that load
+//2*cache line bytes in batch
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
 
 Review comment:
   So assuming sizeof(void*) in a 64 bit machine is 8, 128/8 = 16
   qd_alloc_chunk_t will have 
   1. 8 bytes for prev
   2. 16 items (qd_alloc_item_ts)
   3. 8 bytes for next
   ==
   Total = 128 + 16 = 144 bytes. 
   ===
   But Adjacent Cache-Line Prefetch size ie 128 bytes. Looks like 144 bytes 
will not fit into one Adjacent Cache-Line Prefetch. So, does it make sense to 
move "items" as the first member of qd_alloc_chunk_t in which case we could use 
the DEQ for prev and next

 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874343#comment-16874343
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on issue #525: DISPATCH-1372 alloc_pool intrusive linked 
list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#issuecomment-506415051
 
 
   I've added a new commit to attempt batch unordered copy between stacks: I 
will wait the @mgoulish response on this and If it delivers some improvement 
will squash and ask 4 reviews :+1: 
   On my tests it seems to deliver the opposite of an improvement TBH, but 
let's see what a real load would tell...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874293#comment-16874293
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on issue #525: DISPATCH-1372 alloc_pool intrusive linked 
list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#issuecomment-506415051
 
 
   I've added a new commit to attempt batch unordered copy between stacks: I 
will wait the @mgoulish response on this and If it delivers some improvement 
will squash and ask 4 reviews :+1: 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874247#comment-16874247
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

kgiusti commented on issue #525: DISPATCH-1372 alloc_pool intrusive linked list 
can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#issuecomment-506394294
 
 
   Just for completeness - mgoulish did some benchmarking with this patch.  
Here is a summary of his results:
   
   RESULTS  (mean latency over 1e6 messages ) (in msec)
   
   pre-Franz:
 64.27
 64.22
 62.62
   average of 3 tests : 63.70
   
   post-Franz:
 48.37
 48.89
 50.83
   average of 3 tests : 49.36
   
   
   Post-Franz messages fly 1.29 times faster.
   ( Get from sender to receiver in 0.77 as much time. )
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874242#comment-16874242
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

kgiusti commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r298235050
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -164,14 +242,17 @@ void *qd_alloc(qd_alloc_type_desc_t *desc, 
qd_alloc_pool_t **tpool)
 desc->stats->held_by_threads += desc->config->transfer_batch_size;
 #endif
 for (idx = 0; idx < desc->config->transfer_batch_size; idx++) {
-item = DEQ_HEAD(desc->global_pool->free_list);
-DEQ_REMOVE_HEAD(desc->global_pool->free_list);
-DEQ_INSERT_TAIL(pool->free_list, item);
+item = pop_stack(&desc->global_pool->free_list);
 
 Review comment:
   Thanks for trying that out - let's keep with the original CHUNK_SIZE then.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873676#comment-16873676
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

codecov-io commented on issue #525: DISPATCH-1372 alloc_pool intrusive linked 
list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#issuecomment-504422052
 
 
   # 
[Codecov](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=h1) 
Report
   > Merging 
[#525](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/qpid-dispatch/commit/5885b4fa29f81cf394521ca6729538fe6273263f?src=pr&el=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/qpid-dispatch/pull/525/graphs/tree.svg?width=650&token=rk2Cgd27pP&height=150&src=pr)](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master #525  +/-   ##
   ==
   + Coverage   86.86%   86.87%   +<.01% 
   ==
 Files  87   87  
 Lines   1951919554  +35 
   ==
   + Hits1695616988  +32 
   - Misses   2563 2566   +3
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[src/alloc\_pool.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL2FsbG9jX3Bvb2wuYw==)
 | `98.55% <100%> (+0.3%)` | :arrow_up: |
   | 
[src/router\_core/transfer.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL3RyYW5zZmVyLmM=)
 | `94.6% <0%> (-0.78%)` | :arrow_down: |
   | 
[src/router\_core/agent\_link.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL2FnZW50X2xpbmsuYw==)
 | `67.52% <0%> (-0.52%)` | :arrow_down: |
   | 
[src/parse.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3BhcnNlLmM=)
 | `84.76% <0%> (-0.24%)` | :arrow_down: |
   | 
[src/router\_core/router\_core.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL3JvdXRlcl9jb3JlLmM=)
 | `86.53% <0%> (-0.2%)` | :arrow_down: |
   | 
[src/iterator.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL2l0ZXJhdG9yLmM=)
 | `89.23% <0%> (+0.19%)` | :arrow_up: |
   | 
[src/router\_node.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9ub2RlLmM=)
 | `93.81% <0%> (+0.24%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=footer).
 Last update 
[5885b4f...a946341](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated

[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873555#comment-16873555
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297714884
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
 
 Review comment:
   I love the "compressed pointers" idea and given that we are mostly memory 
limited here it would allow to pack more pointers...something that using an 
intrusive pointer solution was really tricky (if not impossible) to implement...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873553#comment-16873553
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297786059
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -164,14 +242,17 @@ void *qd_alloc(qd_alloc_type_desc_t *desc, 
qd_alloc_pool_t **tpool)
 desc->stats->held_by_threads += desc->config->transfer_batch_size;
 #endif
 for (idx = 0; idx < desc->config->transfer_batch_size; idx++) {
-item = DEQ_HEAD(desc->global_pool->free_list);
-DEQ_REMOVE_HEAD(desc->global_pool->free_list);
-DEQ_INSERT_TAIL(pool->free_list, item);
+item = pop_stack(&desc->global_pool->free_list);
 
 Review comment:
   OK it seems that the original chunk size was good enough, no significant 
improvements
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873511#comment-16873511
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297765540
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -164,14 +242,17 @@ void *qd_alloc(qd_alloc_type_desc_t *desc, 
qd_alloc_pool_t **tpool)
 desc->stats->held_by_threads += desc->config->transfer_batch_size;
 #endif
 for (idx = 0; idx < desc->config->transfer_batch_size; idx++) {
-item = DEQ_HEAD(desc->global_pool->free_list);
-DEQ_REMOVE_HEAD(desc->global_pool->free_list);
-DEQ_INSERT_TAIL(pool->free_list, item);
+item = pop_stack(&desc->global_pool->free_list);
 
 Review comment:
   I've added another commit (that I would squash/drop if necessary) to test 
it: will work on the documentation part too. I've set the CHUNK_SIZE = 128 that 
seems the default `local_free_list_max` just to see if it makes any difference
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873505#comment-16873505
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297751591
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
 
 Review comment:
   Sure, and I can add a configuration option on the chunk size as well. I 
think (but will need to test it with Mick) that using chunks as big as the 
transfer size (or OS page size) will improve things a lil more too
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873479#comment-16873479
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297752306
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -164,14 +242,17 @@ void *qd_alloc(qd_alloc_type_desc_t *desc, 
qd_alloc_pool_t **tpool)
 desc->stats->held_by_threads += desc->config->transfer_batch_size;
 #endif
 for (idx = 0; idx < desc->config->transfer_batch_size; idx++) {
-item = DEQ_HEAD(desc->global_pool->free_list);
-DEQ_REMOVE_HEAD(desc->global_pool->free_list);
-DEQ_INSERT_TAIL(pool->free_list, item);
+item = pop_stack(&desc->global_pool->free_list);
 
 Review comment:
   The tranfer in batch will require me some more time, but the change on chunk 
size would be pretty easy to be achieved, so I can do it
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873475#comment-16873475
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297751591
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
 
 Review comment:
   Sure, and I can add a configuration option on the chunk size as well. I 
think (but will need to test it with Mick) that using chunks as big as the 
transfer size (or OS page size) will improve things a lil more too
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873472#comment-16873472
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

kgiusti commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297749315
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
 
 Review comment:
   Franz - can you add a comment explaining that?  Otherwise someone may "clean 
up" that using DEQ (like myself) and not realize the effect.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873474#comment-16873474
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

kgiusti commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297749678
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -164,14 +242,17 @@ void *qd_alloc(qd_alloc_type_desc_t *desc, 
qd_alloc_pool_t **tpool)
 desc->stats->held_by_threads += desc->config->transfer_batch_size;
 #endif
 for (idx = 0; idx < desc->config->transfer_batch_size; idx++) {
-item = DEQ_HEAD(desc->global_pool->free_list);
-DEQ_REMOVE_HEAD(desc->global_pool->free_list);
-DEQ_INSERT_TAIL(pool->free_list, item);
+item = pop_stack(&desc->global_pool->free_list);
 
 Review comment:
   I'm OK with doing the chunk_size changes is a follow on patch if you can't 
spend time on it right now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873406#comment-16873406
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297714884
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
 
 Review comment:
   I love the "compressed pointers" idea and given that we are mostly memory 
limited here it would allow to pack more pointers...something that using an 
intrusive pointer solution was really tricky (if not impossible) to 
implemente...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872524#comment-16872524
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

astitcher commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297288448
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
 
 Review comment:
   Hmm, difficult to say - in any event it's going to be close to some entries 
in the list and far from others. Perhaps eliminating it completely might be 
possible.
   
   Another thing to think about is not to use pointers at all, but rather use a 
base (64 bit address) and smaller offset (32 bit entry count) - that should 
allow more entries into the same number of cache lines - you would have to 
allocate larger and contiguous blocks though.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872450#comment-16872450
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297255900
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -164,14 +242,17 @@ void *qd_alloc(qd_alloc_type_desc_t *desc, 
qd_alloc_pool_t **tpool)
 desc->stats->held_by_threads += desc->config->transfer_batch_size;
 #endif
 for (idx = 0; idx < desc->config->transfer_batch_size; idx++) {
-item = DEQ_HEAD(desc->global_pool->free_list);
-DEQ_REMOVE_HEAD(desc->global_pool->free_list);
-DEQ_INSERT_TAIL(pool->free_list, item);
+item = pop_stack(&desc->global_pool->free_list);
+push_stack(&pool->free_list, item);
 }
 } else {
 //
 // Allocate a full batch from the heap and put it on the thread list.
 //
+//TODO(franz):
+//  -   would be better to allocate in batches == transfer_batch_size
+//  and put a small (== sizeof(transfer_batch_size)) ref_count to 
help the final free
+//  -   could be beneficial directly to delink a chunk?
 
 Review comment:
   I need to explain it better IMHO. Let me rephrase something that I haven't 
clarified yet on the JIRA.
   The behaviour of this chunked stack is not actually the same of the original 
one, in particular while "rebalancing" resources into/from the global free list.
   I have chosen to just move `qd_alloc_item` between free lists, but not 
`qd_alloc_chunk_t` on purpose, because:
   
   - the purpose of `qd_alloc_chunk_t` is to cope with the allocation rate of 
`qd_alloc_item` on a same thread
   - to reduce the need to move "big" chunks  between threads (causing 
additional cache misses on the first attempt to access them)
   - thread local lists are nearly bounded in size `localFreeListMax` ie can't 
grow unbounded (maybe)
   
   I'm not sure I've chosen right, that's the reason of the second comment ie 
   > could be beneficial directly to delink a chunk?
   
   Maybe! I need a paper and a pencil to think about the implications of it and 
which benefits will bring!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872438#comment-16872438
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

kgiusti commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297245644
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
+uint64_t  size;
+qd_alloc_chunk_t  base_chunk;
 
 Review comment:
   That answers my question - thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872406#comment-16872406
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297228983
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
 
 Review comment:
   > to avoid the padding in the middle of the struct you want to put top at 
the end of the struct.
   
   I haven't considered field padding, you're right!
   In this case, considering that `base_chunk` is > 128 bytes I won't risk to 
get a cache miss on each access?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872394#comment-16872394
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

astitcher commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297216711
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
 
 Review comment:
   On a 64 bit machine you'll get padding after the uint32 (`top`) to pad it 
out to 64 bits. Everything else in the struct is 64 bits (pointers) so to avoid 
the padding in the middle of the struct you want to put `top` at the end of the 
struct.
   [On a 32 bit machine this is not a concern, but don't care as much about 
them anyway].
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872387#comment-16872387
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297209667
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
 
 Review comment:
   I'm not aware of any struct padding here...but maybe there are some 
introduced automagically somehwere? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872386#comment-16872386
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297209084
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
 
 Review comment:
   I would love to have a similar behaviour on Java :D Nice to hear it, thnks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872384#comment-16872384
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

astitcher commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297208433
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
 
 Review comment:
   The C standard guarantees that members in a struct will appear in memory in 
the order you specify - The only change the compiler can make is to pad struct 
members to maintain machine dependent alignment constraints.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872380#comment-16872380
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

astitcher commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297207667
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
 
 Review comment:
   In this case I was just thinking about eliminating useless struct padding - 
which might allow you to fit more into a cache line, but might not help 
locality or prefetching.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872336#comment-16872336
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297173727
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
+uint64_t  size;
+qd_alloc_chunk_t  base_chunk;
 
 Review comment:
   I wanted its allocation to be part of `qd_alloc_linked_stack_t` to increase 
localty or I have misunderstood the question?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872334#comment-16872334
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297173727
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
+uint64_t  size;
+qd_alloc_chunk_t  base_chunk;
 
 Review comment:
   I wanted its allocation to be part of `qd_alloc_linked_stack_t` to increase 
localty or am I misunderstood the question?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872333#comment-16872333
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297173552
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
 
 Review comment:
   I wanted prev/next to be in that specific order: I know that is not 100% 
guaranteed by many compilers, but I see that clang and gcc tends to stick to 
what is written in the code
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872326#comment-16872326
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297171096
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
 Review comment:
   Uh right, I haven't commented it: it is using the Adjacent Cache-Line 
Prefetch size ie 128 bytes 
(https://software.intel.com/en-us/articles/optimizing-application-performance-on-intel-coret-microarchitecture-using-hardware-implemented-prefetchers)
 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872325#comment-16872325
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r297171096
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
 Review comment:
   Uh right, I haven't commented it: it is using as the base size see the 
Adjacent Cache-Line Prefetch size ie 128 bytes 
(https://software.intel.com/en-us/articles/optimizing-application-performance-on-intel-coret-microarchitecture-using-hardware-implemented-prefetchers)
 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872316#comment-16872316
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

kgiusti commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r296754514
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
 Review comment:
   what's the significance of 128 here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872319#comment-16872319
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

kgiusti commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r296755530
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
+uint64_t  size;
+qd_alloc_chunk_t  base_chunk;
 
 Review comment:
   related to previous DEQ comment:  why not a DEQ list instead of a dummy 
base_chunk?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872320#comment-16872320
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

kgiusti commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r296759355
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -164,14 +242,17 @@ void *qd_alloc(qd_alloc_type_desc_t *desc, 
qd_alloc_pool_t **tpool)
 desc->stats->held_by_threads += desc->config->transfer_batch_size;
 #endif
 for (idx = 0; idx < desc->config->transfer_batch_size; idx++) {
-item = DEQ_HEAD(desc->global_pool->free_list);
-DEQ_REMOVE_HEAD(desc->global_pool->free_list);
-DEQ_INSERT_TAIL(pool->free_list, item);
+item = pop_stack(&desc->global_pool->free_list);
 
 Review comment:
   Hmmm - totally not thought through, but if CHUNK_SIZE was == 
transfer_batch_size, could we avoid this looping while holding the global pool 
lock?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872318#comment-16872318
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

kgiusti commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r296759846
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -164,14 +242,17 @@ void *qd_alloc(qd_alloc_type_desc_t *desc, 
qd_alloc_pool_t **tpool)
 desc->stats->held_by_threads += desc->config->transfer_batch_size;
 #endif
 for (idx = 0; idx < desc->config->transfer_batch_size; idx++) {
-item = DEQ_HEAD(desc->global_pool->free_list);
-DEQ_REMOVE_HEAD(desc->global_pool->free_list);
-DEQ_INSERT_TAIL(pool->free_list, item);
+item = pop_stack(&desc->global_pool->free_list);
+push_stack(&pool->free_list, item);
 }
 } else {
 //
 // Allocate a full batch from the heap and put it on the thread list.
 //
+//TODO(franz):
+//  -   would be better to allocate in batches == transfer_batch_size
+//  and put a small (== sizeof(transfer_batch_size)) ref_count to 
help the final free
+//  -   could be beneficial directly to delink a chunk?
 
 Review comment:
   Oh - you already thought about that!  :+1: 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872317#comment-16872317
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

kgiusti commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r296755132
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
 
 Review comment:
   Why not use a DEQ here?  (DEQ_LINKS) - see include/qpid/dispatch/ctools.h
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869782#comment-16869782
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r296309434
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
 
 Review comment:
   We can try to reorder fields and see if it will get better cache localty: I 
see that `qd_alloc_linked_stack_t` is suboptimal, but I've just tried to put 
`base_chunk` near to the most accessed fields to make used of the adiacent line 
prefetching of many Intel CPUs; so I've focused just on that...
   I can see too that `CHUNK_SIZE` could be changed to improve things too...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869655#comment-16869655
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r296309434
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
 
 Review comment:
   We can try to reorder fields and see if it will get better cache localty: I 
see that `qd_alloc_linked_stack_t` is suboptimal, but I've just tried to put 
`base_chunk` near to the most accessed fields to make used of the adiacent line 
prefetching of many Intel CPUs; so I've focused just on that...
   I can see too that `CAPACITY` could be changed to improve things too...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-21 Thread Francesco Nigro (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869650#comment-16869650
 ] 

Francesco Nigro commented on DISPATCH-1372:
---

I've performed some load tests with 9 pairs getting ~64 K msg/sec x 9 = 576 K 
msg/sec
vs master = 52 K msg/sec  x 9 =  468 K msg/sec

The improvement is already there, need to find out which adjustments could be 
done to improve it further:
* fields of chunks to be reordered
* configurable CHUNK_SIZE
* queue-like access pattern instead of stack

> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869516#comment-16869516
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

astitcher commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r296253140
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
 
 Review comment:
   Actually as `qp_alloc_chunk_t` contains only pointers you probably want to 
put `top` as the very last member in this structure (but you'll still get some 
padding)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869513#comment-16869513
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

astitcher commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r296253140
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
 
 Review comment:
   Actually as `qp_alloc_chunk_t` contains only pointers you probably want to 
put `size` as the very last member in this structure (but you'll still get some 
padding)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869512#comment-16869512
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

astitcher commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#discussion_r296252051
 
 

 ##
 File path: src/alloc_pool.c
 ##
 @@ -44,20 +46,97 @@ DEQ_DECLARE(qd_alloc_type_t, qd_alloc_type_list_t);
 #define PATTERN_BACK  0xbabecafe
 
 struct qd_alloc_item_t {
-DEQ_LINKS(qd_alloc_item_t);
 uint32_t  sequence;
 #ifdef QD_MEMORY_DEBUG
 qd_alloc_type_desc_t *desc;
 uint32_t  header;
 #endif
 };
 
-DEQ_DECLARE(qd_alloc_item_t, qd_alloc_item_list_t);
+#define CHUNK_SIZE 128/sizeof(void*)
 
+struct qd_alloc_chunk_t {
+qd_alloc_chunk_t *prev;
+qd_alloc_item_t  *items[CHUNK_SIZE];
+qd_alloc_chunk_t *next;
+};
+
+struct qd_alloc_linked_stack_t {
+//the base
+qd_alloc_chunk_t *top_chunk;
+uint32_t  top;  //qd_alloc_item* top_item = 
top_chunk->items[top+1] <-> top > 0
 
 Review comment:
   Probably want to reorder `top` and `size`. On 64 bit machine there would be 
32 bits of padding in the middle of the struct with this ordering.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869462#comment-16869462
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

codecov-io commented on issue #525: DISPATCH-1372 alloc_pool intrusive linked 
list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#issuecomment-504422052
 
 
   # 
[Codecov](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=h1) 
Report
   > Merging 
[#525](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/qpid-dispatch/commit/ecc149babe13eb34f7bd1937c1b72adf73dfd535?src=pr&el=desc)
 will **increase** coverage by `0.04%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/qpid-dispatch/pull/525/graphs/tree.svg?width=650&token=rk2Cgd27pP&height=150&src=pr)](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff@@
   ##   master#525  +/-   ##
   =
   + Coverage   86.86%   86.9%   +0.04% 
   =
 Files  87  87  
 Lines   19519   19555  +36 
   =
   + Hits16955   16995  +40 
   + Misses   25642560   -4
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[src/alloc\_pool.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL2FsbG9jX3Bvb2wuYw==)
 | `98.55% <100%> (+0.3%)` | :arrow_up: |
   | 
[src/router\_node.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9ub2RlLmM=)
 | `93.57% <0%> (-0.25%)` | :arrow_down: |
   | 
[src/iterator.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL2l0ZXJhdG9yLmM=)
 | `89.26% <0%> (+0.22%)` | :arrow_up: |
   | 
[src/router\_core/transfer.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL3RyYW5zZmVyLmM=)
 | `95.37% <0%> (+0.25%)` | :arrow_up: |
   | 
[src/router\_core/connections.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL2Nvbm5lY3Rpb25zLmM=)
 | `95.04% <0%> (+0.34%)` | :arrow_up: |
   | 
[src/router\_core/agent\_link.c](https://codecov.io/gh/apache/qpid-dispatch/pull/525/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcl9jb3JlL2FnZW50X2xpbmsuYw==)
 | `68.04% <0%> (+0.51%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=footer).
 Last update 
[ecc149b...afbfad0](https://codecov.io/gh/apache/qpid-dispatch/pull/525?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems coun

[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869316#comment-16869316
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on issue #525: DISPATCH-1372 alloc_pool intrusive linked 
list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525#issuecomment-50437
 
 
   Please do not merge it yet: it has been sent for review :+1: 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869315#comment-16869315
 ] 

ASF GitHub Bot commented on DISPATCH-1372:
--

franz1981 commented on pull request #525: DISPATCH-1372 alloc_pool intrusive 
linked list can be replaced by a linked stack
URL: https://github.com/apache/qpid-dispatch/pull/525
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1372) alloc_pool intrusive linked list can be replaced by a linked stack

2019-06-21 Thread Francesco Nigro (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869312#comment-16869312
 ] 

Francesco Nigro commented on DISPATCH-1372:
---

Below the flamegraph of a 1 pair + router cache-misses events.

On master with the original allocator:

!image-2019-06-21-11-08-17-015.png!

On masterbut with a linked stack allocator:

!image-2019-06-21-11-09-02-228.png!

In violet there are the calls to qd_alloc.
To explain the improvement, the relative samples for qd_buffer in 
_qd_buffer_list_clone are:

master linked list: 277 samples
master linked stack: 54 samples

The overall cache misses on the router_core_thread instead are:

master linked list: 1817 samples
master linked stack: 1488 samples

The improvement has yet to be verified under load with more clients, but seems 
promising.


> alloc_pool intrusive linked list can be replaced by a linked stack
> --
>
> Key: DISPATCH-1372
> URL: https://issues.apache.org/jira/browse/DISPATCH-1372
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 1.8.0
>Reporter: Francesco Nigro
>Priority: Major
> Attachments: DOOM-3-BFG-Technical-Note.pdf, 
> image-2019-06-21-11-08-17-015.png, image-2019-06-21-11-09-02-228.png, 
> linked_list_misses.svg, stack_list_misses.svg
>
>
> alloc_pool is currently using a intrusive linked list approach to reduce the 
> need of external data structures to hold data, saving expensive pointer 
> chasing, but on modern architectures the data dependency between a current 
> node and next/prev prevent the CPU prefetcher to stream nodes speculatively.
> There are different approaches that could benefit of prefetcing, but need to 
> decouple the data stored from its container eg a linked stack.
> A linked stack is composed by doubly-linked chunks (allocated lazily) that 
> make possible for the CPU to prefetch next/prev pointers given that those are 
> already contained in the current chunk (if any).
> Although it seems counter-intuitive (given that introduce 1 more hop to reach 
> the data), such data-structure is much more cache-friendly on modern 
> architectures: I will attach some cache misses analysis to show it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org