[ 
https://issues.apache.org/jira/browse/IMPALA-8005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032771#comment-17032771
 ] 

Joe McDonnell commented on IMPALA-8005:
---------------------------------------

It looks like the code for this revolves around EXCHANGE_HASH_SEED in 
krpc-data-stream-sender.h/.cc:

[https://github.com/apache/impala/blob/master/be/src/runtime/krpc-data-stream-sender.h#L253]

Other code is in the KrpcDataStreamSender constructor (see init of channels_) 
as well as HashAndAddRows(), HashRow(), and AddRowToChannel().

> Randomize partitioning exchanges destinations
> ---------------------------------------------
>
>                 Key: IMPALA-8005
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8005
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Distributed Exec
>    Affects Versions: Impala 3.1.0
>            Reporter: Michael Ho
>            Assignee: Anurag Mantripragada
>            Priority: Major
>              Labels: ramp-up
>
> Currently, we use the same hash seed for partitioning exchanges at the 
> sender. For a table with skew in distribution in the shuffling keys, multiple 
> queries using the same shuffling keys for exchanges will end up hashing to 
> the same destination fragments running on particular host and potentially 
> overloading that host.
> We should consider using the query id or other query specific information to 
> seed the hashing function to randomize the destinations for different 
> queries. Thanks to [~tlipcon] for pointing this problem out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to