[ 
https://issues.apache.org/jira/browse/DISPATCH-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418620#comment-17418620
 ] 

Ken Giusti commented on DISPATCH-1487:
--------------------------------------

Analysis of CPU hot spots in router message annotation processing (based on the 
fix to [DISPATCH-2251|https://issues.apache.org/jira/browse/DISPATCH-2251])

There are two general phases of processing per-message router message 
annotations: receive processing and transmit processing.

On message receive the router invokes qd_message_message_annotations().  This 
routine parses the raw message annotations into parsed fields stored in the 
qd_message_content_t structure.  Note: this structure is a per-message 
structure that represents the actual message data - it is referenced by the 
incoming qd_message_t and all outgoing qd_message_t per-delivery structures.  
The parse fields include the following MA fields:  ingress, phase, to-override, 
trace, streaming-flag.  Any user MAs are also stored in the message content 
instance.   After these values are parsed into the message content, the router 
invokes router_annotate_message().  This routine performs two functions: it 
copies out the trace, to-override, phase, and ingress values from the content 
into qd_buffer_t lists attached to the qd_message_t structure.  It then 
computes the link-exclusions bitmask, distance, and ingress-index which are 
used during the routing phase.   When the routing process creates an outgoing 
qd_message_t the MA buffer chains are copied (cloned) from the inbound 
qd_message_t.

When sending out the message the router invokes compose_message_annotations if 
the router-specific MAs are NOT to be stripped out (like in the case of the 
inter-router links).  This routine gathers the MA buffer lists, etc from the 
qd_message_t and writes them to proton (via pn_link_send).

Using perf to analyze code behavior in a two-hop router benchmark (one 
producing client on the inbound router forwarding to the one consumer on the 
outbound router 100 byte payload messages), the number of samples taken during 
the above routines varies based on the role of the router - upstream (connected 
to the producing client) vs. downstream (connected to the consuming client).

The following illustrates the % of samples taken of the worker thread during 
each of the principal MA processing routines for a 15 second run of the 
benchmark:

 

upstream router:

0.5% in qd_message_message_annotations()

3.3% in router_annotate_message()

4.7% in compose_message_annotations()

Sum: 8.7% worker thread samples used for processing MA

downstream router:

5.9% qd_message_message_annotations()

3.2% router_annotate_message()

0.2% compose_message_annotations()

Sum: 9.3% worker thread samples used for processing MA

 

Observations:  First, in general the downstream router needs slightly more CPU 
resources for dealing with MA than the upstream.  The hotspots vary depending 
on load (annotation parsing vs annotation writing).  Assuming 
router_annotate_message is roughly constant regardless of mode, one can 
extrapolate a "middle" router on a three hop configuration would experience hot 
spots on both phases.

 

> Improve the parsing of message annotations
> ------------------------------------------
>
>                 Key: DISPATCH-1487
>                 URL: https://issues.apache.org/jira/browse/DISPATCH-1487
>             Project: Qpid Dispatch
>          Issue Type: Improvement
>          Components: Router Node
>    Affects Versions: 1.9.0
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Major
>             Fix For: 1.18.0
>
>
> ToDo: Refactor inbound MA parsing on inbound inter-router links to improve 
> throughput and reduce latency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to