[jira] [Updated] (FLINK-11964) Avoid hash collision of partition and bucket in HybridHashTable in Blink SQL

2020-01-02 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-11964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated FLINK-11964:
-
Component/s: (was: Runtime / Task)
 Table SQL / Runtime

> Avoid hash collision of partition and bucket in HybridHashTable in Blink SQL
> 
>
> Key: FLINK-11964
> URL: https://issues.apache.org/jira/browse/FLINK-11964
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Reporter: Jingsong Lee
>Priority: Major
> Fix For: 1.10.0
>
>
> In HybridHashTable, first select the corresponding partition according to 
> hashCode, and then select the bucket in the partition according to hashCode, 
> using the same hashCode can easily cause hash collision.
> Consider doing some mix to hashCode when choosing bucket.
> Like JDK HashMap, we can just XOR some shifted bits in the cheapest possible 
> way to reduce systematic lossage, as well as to incorporate impact of the 
> highest bits that would otherwise never be used in index calculations because 
> of table bounds. (bucket use power-of-two masking).  Just like:  (hash ^ 
> (hash >>> 16))
> In some cases, if a lot of conflicts occurred, this will lead to job hang, 
> because hash join will degenerate to nested loop join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-11964) Avoid hash collision of partition and bucket in HybridHashTable in Blink SQL

2020-01-02 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-11964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated FLINK-11964:
-
Description: 
In HybridHashTable, first select the corresponding partition according to 
hashCode, and then select the bucket in the partition according to hashCode, 
using the same hashCode can easily cause hash collision.

Consider doing some mix to hashCode when choosing bucket.

Like JDK HashMap, we can just XOR some shifted bits in the cheapest possible 
way to reduce systematic lossage, as well as to incorporate impact of the 
highest bits that would otherwise never be used in index calculations because 
of table bounds. (bucket use power-of-two masking).  Just like:  (hash ^ (hash 
>>> 16))

In some cases, if a lot of conflicts occurred, this will lead to job hang, 
because hash join will degenerate to nested loop join.

  was:
In HybridHashTable, first select the corresponding partition according to 
hashCode, and then select the bucket in the partition according to hashCode, 
using the same hashCode can easily cause hash collision.

Consider doing some mix to hashCode when choosing bucket.

Like JDK HashMap, we can just XOR some shifted bits in the cheapest possible 
way to reduce systematic lossage, as well as to incorporate impact of the 
highest bits that would otherwise never be used in index calculations because 
of table bounds. (bucket use power-of-two masking).  Just like:  (hash ^ (hash 
>>> 16))

In some cases, if a lot of conflicts occurred, this will lead to job hang, 
because hash join will regression to nested loop join.


> Avoid hash collision of partition and bucket in HybridHashTable in Blink SQL
> 
>
> Key: FLINK-11964
> URL: https://issues.apache.org/jira/browse/FLINK-11964
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Reporter: Jingsong Lee
>Priority: Major
> Fix For: 1.10.0
>
>
> In HybridHashTable, first select the corresponding partition according to 
> hashCode, and then select the bucket in the partition according to hashCode, 
> using the same hashCode can easily cause hash collision.
> Consider doing some mix to hashCode when choosing bucket.
> Like JDK HashMap, we can just XOR some shifted bits in the cheapest possible 
> way to reduce systematic lossage, as well as to incorporate impact of the 
> highest bits that would otherwise never be used in index calculations because 
> of table bounds. (bucket use power-of-two masking).  Just like:  (hash ^ 
> (hash >>> 16))
> In some cases, if a lot of conflicts occurred, this will lead to job hang, 
> because hash join will degenerate to nested loop join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-11964) Avoid hash collision of partition and bucket in HybridHashTable in Blink SQL

2020-01-02 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-11964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated FLINK-11964:
-
Description: 
In HybridHashTable, first select the corresponding partition according to 
hashCode, and then select the bucket in the partition according to hashCode, 
using the same hashCode can easily cause hash collision.

Consider doing some mix to hashCode when choosing bucket.

Like JDK HashMap, we can just XOR some shifted bits in the cheapest possible 
way to reduce systematic lossage, as well as to incorporate impact of the 
highest bits that would otherwise never be used in index calculations because 
of table bounds. (bucket use power-of-two masking).  Just like:  (hash ^ (hash 
>>> 16))

In some cases, if a lot of conflicts occurred, this will lead to job hang, 
because hash join will regression to nested loop join.

  was:
In HybridHashTable, first select the corresponding partition according to 
hashCode, and then select the bucket in the partition according to hashCode, 
using the same hashCode can easily cause hash collision.

Consider doing some mix to hashCode when choosing bucket.

Like JDK HashMap, we can just XOR some shifted bits in the cheapest possible 
way to reduce systematic lossage, as well as to incorporate impact of the 
highest bits that would otherwise never be used in index calculations because 
of table bounds. (bucket use power-of-two masking).  Just like:  (hash ^ (hash 
>>> 16))


> Avoid hash collision of partition and bucket in HybridHashTable in Blink SQL
> 
>
> Key: FLINK-11964
> URL: https://issues.apache.org/jira/browse/FLINK-11964
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Reporter: Jingsong Lee
>Priority: Major
> Fix For: 1.10.0
>
>
> In HybridHashTable, first select the corresponding partition according to 
> hashCode, and then select the bucket in the partition according to hashCode, 
> using the same hashCode can easily cause hash collision.
> Consider doing some mix to hashCode when choosing bucket.
> Like JDK HashMap, we can just XOR some shifted bits in the cheapest possible 
> way to reduce systematic lossage, as well as to incorporate impact of the 
> highest bits that would otherwise never be used in index calculations because 
> of table bounds. (bucket use power-of-two masking).  Just like:  (hash ^ 
> (hash >>> 16))
> In some cases, if a lot of conflicts occurred, this will lead to job hang, 
> because hash join will regression to nested loop join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-11964) Avoid hash collision of partition and bucket in HybridHashTable in Blink SQL

2020-01-02 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-11964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated FLINK-11964:
-
Issue Type: Bug  (was: Improvement)

> Avoid hash collision of partition and bucket in HybridHashTable in Blink SQL
> 
>
> Key: FLINK-11964
> URL: https://issues.apache.org/jira/browse/FLINK-11964
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Reporter: Jingsong Lee
>Priority: Major
> Fix For: 1.10.0
>
>
> In HybridHashTable, first select the corresponding partition according to 
> hashCode, and then select the bucket in the partition according to hashCode, 
> using the same hashCode can easily cause hash collision.
> Consider doing some mix to hashCode when choosing bucket.
> Like JDK HashMap, we can just XOR some shifted bits in the cheapest possible 
> way to reduce systematic lossage, as well as to incorporate impact of the 
> highest bits that would otherwise never be used in index calculations because 
> of table bounds. (bucket use power-of-two masking).  Just like:  (hash ^ 
> (hash >>> 16))



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-11964) Avoid hash collision of partition and bucket in HybridHashTable in Blink SQL

2020-01-02 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-11964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated FLINK-11964:
-
Fix Version/s: 1.10.0

> Avoid hash collision of partition and bucket in HybridHashTable in Blink SQL
> 
>
> Key: FLINK-11964
> URL: https://issues.apache.org/jira/browse/FLINK-11964
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Task
>Reporter: Jingsong Lee
>Priority: Major
> Fix For: 1.10.0
>
>
> In HybridHashTable, first select the corresponding partition according to 
> hashCode, and then select the bucket in the partition according to hashCode, 
> using the same hashCode can easily cause hash collision.
> Consider doing some mix to hashCode when choosing bucket.
> Like JDK HashMap, we can just XOR some shifted bits in the cheapest possible 
> way to reduce systematic lossage, as well as to incorporate impact of the 
> highest bits that would otherwise never be used in index calculations because 
> of table bounds. (bucket use power-of-two masking).  Just like:  (hash ^ 
> (hash >>> 16))



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-11964) Avoid hash collision of partition and bucket in HybridHashTable in Blink SQL

2019-03-19 Thread Jingsong Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated FLINK-11964:
-
Summary: Avoid hash collision of partition and bucket in HybridHashTable in 
Blink SQL  (was: Avoid hash collision of partition and bucket in 
HybridHashTable)

> Avoid hash collision of partition and bucket in HybridHashTable in Blink SQL
> 
>
> Key: FLINK-11964
> URL: https://issues.apache.org/jira/browse/FLINK-11964
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Operators
>Reporter: Jingsong Lee
>Priority: Major
>
> In HybridHashTable, first select the corresponding partition according to 
> hashCode, and then select the bucket in the partition according to hashCode, 
> using the same hashCode can easily cause hash collision.
> Consider doing some mix to hashCode when choosing bucket.
> Like JDK HashMap, we can just XOR some shifted bits in the cheapest possible 
> way to reduce systematic lossage, as well as to incorporate impact of the 
> highest bits that would otherwise never be used in index calculations because 
> of table bounds. (bucket use power-of-two masking).  Just like:  (hash ^ 
> (hash >>> 16))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)