[ 
https://issues.apache.org/jira/browse/IMPALA-9221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542663#comment-17542663
 ] 

Michael Smith edited comment on IMPALA-9221 at 5/26/22 8:54 PM:
----------------------------------------------------------------

The HashRing is a consistent hashing implementation that uses a {{std::map}} to 
map stable 32-bit hash values to hosts, with a number of replicas per host 
(hard-coded to 25 in 
[https://github.com/apache/impala/blob/4.1.0-rc2/be/src/scheduling/executor-group.cc#L27]).
 That produces a fairly small map (~100s of entries) that gets read more often 
than that (but still only thousands or 10s of thousands of reads per minute in 
heavily used clusters).

The map must be ordered to allow lookups to select the next hash in the ring, 
as most hash values won't have an explicit mapping. A lookup uses 
{{lower_bound}} for its search, and lookups are based on a partial random 
number generator (prng), so use of this table is almost exclusively 
individually random access.

I augmented 
[https://github.com/apache/impala/blob/4.1.0-rc2/be/src/experiments/hash-ring-util.cc]
 ([https://gerrit.cloudera.org/c/18570/]) to include running a configurable 
number of {{GetNode}} calls after the HashRing is created, and captured 
creation and reads separately. Initial measurements were performed with that 
tool to produce the graph above, as well as (Y-axis time in nanoseconds)
|| # Reads||map 10||map 20||map 50||map 100||flat_map 10||flat_map 20||flat_map 
50||flat_map 100||
|100000|3181057|3385086|4049409|4941224|5089331|3283456|6768097|4392207|
|500000|16744846|17314178|20767737|27458343|17610202|23570711|21735860|23419023|
|1000000|28863828|40580527|42529707|53662274|34785447|32856285|43590664|48776533|

!image-2022-05-26-13-54-15-448.png|width=607,height=376!
|| # Reads||map 10||flat_map 10||map 50||flat_map 50||map 200||flat_map 200||
|1000000|27122273|31605522|39876436|43084369|55279923|56237839|
|10000000|272097482|288243122|410200286|384171025|564927591|483062593|
|100000000|2669742185|2808572405|3932075784|3758481700|5547453568|4762407659|

!image-2022-05-26-13-54-31-752.png|width=606,height=375!

Adding some code to interfere with L1 caching
{code:java}
// Blast the CPU cache
const int size = 20*1024*1024; // Allocate 20M. Set much larger then L2
vector<int> vec(size);
for (int i = 0; i < vec.size(); i++) { vec[i] = i; }
{code}
and running 100 iterations brought results closer together (~1% difference).

These results suggest that any improvement with flat_map is negligible. However 
adopting it noticeably increases the algorithmic cost of adding/removing hosts, 
and batching updates would be a relatively rare and messy optimization.


was (Author: JIRAUSER288956):
The HashRing is a consistent hashing implementation that uses a {{std::map}} to 
map stable 32-bit hash values to hosts, with a number of replicas per host 
(hard-coded to 25 in 
[https://github.com/apache/impala/blob/4.1.0-rc2/be/src/scheduling/executor-group.cc#L27]).
 That produces a fairly small map (~100s of entries) that gets read more often 
than that (but still only thousands or 10s of thousands of reads per minute in 
heavily used clusters).

The map must be ordered to allow lookups to select the next hash in the ring, 
as most hash values won't have an explicit mapping. A lookup uses 
{{lower_bound}} for its search, and lookups are based on a partial random 
number generator (prng), so use of this table is almost exclusively 
individually random access.

I augmented 
[https://github.com/apache/impala/blob/4.1.0-rc2/be/src/experiments/hash-ring-util.cc]
 ([https://gerrit.cloudera.org/c/18570/]) to include running a configurable 
number of {{GetNode}} calls after the HashRing is created, and captured 
creation and reads separately. Initial measurements were performed with that 
tool to produce the graph above, as well as (Y-axis time in nanoseconds)
||# Reads||map 10||map 20||map 50||map 100||flat_map 10||flat_map 20||flat_map 
50||flat_map 100||
|100000|3181057|3385086|4049409|4941224|5089331|3283456|6768097|4392207|
|500000|16744846|17314178|20767737|27458343|17610202|23570711|21735860|23419023|
|1000000|28863828|40580527|42529707|53662274|34785447|32856285|43590664|48776533|

!image-2022-05-26-13-42-22-963.png|width=688,height=426!
||# Reads||map 10||flat_map 10||map 50||flat_map 50||map 200||flat_map 200||
|1000000|27122273|31605522|39876436|43084369|55279923|56237839|
|10000000|272097482|288243122|410200286|384171025|564927591|483062593|
|100000000|2669742185|2808572405|3932075784|3758481700|5547453568|4762407659|

!image-2022-05-26-13-42-41-857.png|width=688,height=426!

Adding some code to interfere with L1 caching
{code}
// Blast the CPU cache
const int size = 20*1024*1024; // Allocate 20M. Set much larger then L2
vector<int> vec(size);
for (int i = 0; i < vec.size(); i++) { vec[i] = i; }
{code}
and running 100 iterations brought results closer together (~1% difference).

These results suggest that any improvement with flat_map is negligible. However 
adopting it noticeably increases the algorithmic cost of adding/removing hosts, 
and batching updates would be a relatively rare and messy optimization.

> Optimize HashRing's map implementation
> --------------------------------------
>
>                 Key: IMPALA-9221
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9221
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 3.4.0
>            Reporter: Joe McDonnell
>            Assignee: Michael Smith
>            Priority: Major
>         Attachments: image-2022-05-26-10-23-57-678.png, 
> image-2022-05-26-13-42-22-963.png, image-2022-05-26-13-42-41-857.png, 
> image-2022-05-26-13-54-15-448.png, image-2022-05-26-13-54-31-752.png
>
>
> The hash ring used for consistent scheduling currently uses a std::map for 
> the hash-to-IpAddr lookup. HashRing is heavy on reads, with writes only 
> happening when executors come and go. There are some cases where we copy the 
> HashRing.
> The standard map uses a large number of small allocations. This hurts cache 
> performance, adds overhead, and also increases the cost of copying the 
> structure. Something like boost's flat_map or Abseil's btree_map is likely to 
> be more efficient.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to