Cannot reproduce your situation.
Can you share Spark version?

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.sql("select hash('40514XXXXX'),hash('41751XXXX')").show()
+----------------+---------------+
|hash(40514XXXXX)|hash(41751XXXX)|
+----------------+---------------+
|     -1898845883|      916273350|
+----------------+---------------+


scala> spark.sql("select hash('14589'),hash('40004XXXX')").show()
+-----------+---------------+
|hash(14589)|hash(40004XXXX)|
+-----------+---------------+
|  777096871|    -1593820563|
+-----------+---------------+


scala>

From: Gokula Krishnan D <email2...@gmail.com>
Date: Tuesday, September 25, 2018 at 8:57 PM
To: user <user@spark.apache.org>
Subject: [Spark SQL] why spark sql hash() are returns the same hash value 
though the keys/expr are not same

Hello All,

I am calculating the hash value  of few columns and determining whether its an 
Insert/Delete/Update Record but found a scenario which is little weird since 
some of the records returns same hash value though the key's are totally 
different.

For the instance,


scala> spark.sql("select hash('40514XXXXX'),hash('41751XXXX')").show()

+---------------+---------------+

|hash(40514XXXX)|hash(41751XXXX)|

+---------------+---------------+

|      976573657|      976573657|

+---------------+---------------+


scala> spark.sql("select hash('14589'),hash('40004XXXX')").show()

+-----------+---------------+

|hash(14589)|hash(40004XXXX)|

+-----------+---------------+

|  777096871|      777096871|

+-----------+---------------+
I do understand that hash() returns an integer, are these reached the max 
value?.

Thanks & Regards,
Gokula Krishnan (Gokul)

Reply via email to