[jira] [Commented] (HIVE-20892) Benchmark XXhash for 64 bit hashing function instead of Murmum hash

2018-11-08 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680595#comment-16680595
 ] 

slim bouguerra commented on HIVE-20892:
---

forgot to add that we need to look at 32bit hashes since that is what Hive uses 
for Joins and Grouping.

 

> Benchmark XXhash for 64 bit hashing function instead of Murmum hash
> ---
>
> Key: HIVE-20892
> URL: https://issues.apache.org/jira/browse/HIVE-20892
> Project: Hive
>  Issue Type: Sub-task
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>
> https://cyan4973.github.io/xxHash/
> FYI this is used by lot of other MPP systems ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20892) Benchmark XXhash for 64 bit hashing function instead of Murmum hash

2018-11-08 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680592#comment-16680592
 ] 

slim bouguerra commented on HIVE-20892:
---

[~prasanth_j] thanks that is what planing to do, seems like you have done most 
of the work, maybe worth re-run it with newer JVMs and on something else than 
laptop? 

Also am curious about the impact of the distribution over actual data like TPC-H

> Benchmark XXhash for 64 bit hashing function instead of Murmum hash
> ---
>
> Key: HIVE-20892
> URL: https://issues.apache.org/jira/browse/HIVE-20892
> Project: Hive
>  Issue Type: Sub-task
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>
> https://cyan4973.github.io/xxHash/
> FYI this is used by lot of other MPP systems ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20892) Benchmark XXhash for 64 bit hashing function instead of Murmum hash

2018-11-08 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680441#comment-16680441
 ] 

Prasanth Jayachandran commented on HIVE-20892:
--

[https://github.com/prasanthj/hasher]

Murmur2 is slightly better in terms of perf than Murmur3 but for this reason 
[https://github.com/apache/hive/blob/master/storage-api/src/java/org/apache/hive/common/util/BloomFilter.java#L37-L40]
 Murmur3 is chosen for bloomfilter and HLL in Hive. 

> Benchmark XXhash for 64 bit hashing function instead of Murmum hash
> ---
>
> Key: HIVE-20892
> URL: https://issues.apache.org/jira/browse/HIVE-20892
> Project: Hive
>  Issue Type: Sub-task
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>
> https://cyan4973.github.io/xxHash/
> FYI this is used by lot of other MPP systems ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)