[
https://issues.apache.org/jira/browse/DISPATCH-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418620#comment-17418620
]
Ken Giusti commented on DISPATCH-1487:
--------------------------------------
Analysis of CPU hot spots in router message annotation processing (based on the
fix to [DISPATCH-2251|https://issues.apache.org/jira/browse/DISPATCH-2251])
There are two general phases of processing per-message router message
annotations: receive processing and transmit processing.
On message receive the router invokes qd_message_message_annotations(). This
routine parses the raw message annotations into parsed fields stored in the
qd_message_content_t structure. Note: this structure is a per-message
structure that represents the actual message data - it is referenced by the
incoming qd_message_t and all outgoing qd_message_t per-delivery structures.
The parse fields include the following MA fields: ingress, phase, to-override,
trace, streaming-flag. Any user MAs are also stored in the message content
instance. After these values are parsed into the message content, the router
invokes router_annotate_message(). This routine performs two functions: it
copies out the trace, to-override, phase, and ingress values from the content
into qd_buffer_t lists attached to the qd_message_t structure. It then
computes the link-exclusions bitmask, distance, and ingress-index which are
used during the routing phase. When the routing process creates an outgoing
qd_message_t the MA buffer chains are copied (cloned) from the inbound
qd_message_t.
When sending out the message the router invokes compose_message_annotations if
the router-specific MAs are NOT to be stripped out (like in the case of the
inter-router links). This routine gathers the MA buffer lists, etc from the
qd_message_t and writes them to proton (via pn_link_send).
Using perf to analyze code behavior in a two-hop router benchmark (one
producing client on the inbound router forwarding to the one consumer on the
outbound router 100 byte payload messages), the number of samples taken during
the above routines varies based on the role of the router - upstream (connected
to the producing client) vs. downstream (connected to the consuming client).
The following illustrates the % of samples taken of the worker thread during
each of the principal MA processing routines for a 15 second run of the
benchmark:
upstream router:
0.5% in qd_message_message_annotations()
3.3% in router_annotate_message()
4.7% in compose_message_annotations()
Sum: 8.7% worker thread samples used for processing MA
downstream router:
5.9% qd_message_message_annotations()
3.2% router_annotate_message()
0.2% compose_message_annotations()
Sum: 9.3% worker thread samples used for processing MA
Observations: First, in general the downstream router needs slightly more CPU
resources for dealing with MA than the upstream. The hotspots vary depending
on load (annotation parsing vs annotation writing). Assuming
router_annotate_message is roughly constant regardless of mode, one can
extrapolate a "middle" router on a three hop configuration would experience hot
spots on both phases.
> Improve the parsing of message annotations
> ------------------------------------------
>
> Key: DISPATCH-1487
> URL: https://issues.apache.org/jira/browse/DISPATCH-1487
> Project: Qpid Dispatch
> Issue Type: Improvement
> Components: Router Node
> Affects Versions: 1.9.0
> Reporter: Ken Giusti
> Assignee: Ken Giusti
> Priority: Major
> Fix For: 1.18.0
>
>
> ToDo: Refactor inbound MA parsing on inbound inter-router links to improve
> throughput and reduce latency.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]