[ https://issues.apache.org/jira/browse/IMPALA-8005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032771#comment-17032771 ]
Joe McDonnell commented on IMPALA-8005: --------------------------------------- It looks like the code for this revolves around EXCHANGE_HASH_SEED in krpc-data-stream-sender.h/.cc: [https://github.com/apache/impala/blob/master/be/src/runtime/krpc-data-stream-sender.h#L253] Other code is in the KrpcDataStreamSender constructor (see init of channels_) as well as HashAndAddRows(), HashRow(), and AddRowToChannel(). > Randomize partitioning exchanges destinations > --------------------------------------------- > > Key: IMPALA-8005 > URL: https://issues.apache.org/jira/browse/IMPALA-8005 > Project: IMPALA > Issue Type: Improvement > Components: Distributed Exec > Affects Versions: Impala 3.1.0 > Reporter: Michael Ho > Assignee: Anurag Mantripragada > Priority: Major > Labels: ramp-up > > Currently, we use the same hash seed for partitioning exchanges at the > sender. For a table with skew in distribution in the shuffling keys, multiple > queries using the same shuffling keys for exchanges will end up hashing to > the same destination fragments running on particular host and potentially > overloading that host. > We should consider using the query id or other query specific information to > seed the hashing function to randomize the destinations for different > queries. Thanks to [~tlipcon] for pointing this problem out. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org