Cannot reproduce your situation. Can you share Spark version? Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0 /_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92) Type in expressions to have them evaluated. Type :help for more information. scala> spark.sql("select hash('40514XXXXX'),hash('41751XXXX')").show() +----------------+---------------+ |hash(40514XXXXX)|hash(41751XXXX)| +----------------+---------------+ | -1898845883| 916273350| +----------------+---------------+ scala> spark.sql("select hash('14589'),hash('40004XXXX')").show() +-----------+---------------+ |hash(14589)|hash(40004XXXX)| +-----------+---------------+ | 777096871| -1593820563| +-----------+---------------+ scala> From: Gokula Krishnan D <email2...@gmail.com> Date: Tuesday, September 25, 2018 at 8:57 PM To: user <user@spark.apache.org> Subject: [Spark SQL] why spark sql hash() are returns the same hash value though the keys/expr are not same Hello All, I am calculating the hash value of few columns and determining whether its an Insert/Delete/Update Record but found a scenario which is little weird since some of the records returns same hash value though the key's are totally different. For the instance, scala> spark.sql("select hash('40514XXXXX'),hash('41751XXXX')").show() +---------------+---------------+ |hash(40514XXXX)|hash(41751XXXX)| +---------------+---------------+ | 976573657| 976573657| +---------------+---------------+ scala> spark.sql("select hash('14589'),hash('40004XXXX')").show() +-----------+---------------+ |hash(14589)|hash(40004XXXX)| +-----------+---------------+ | 777096871| 777096871| +-----------+---------------+ I do understand that hash() returns an integer, are these reached the max value?. Thanks & Regards, Gokula Krishnan (Gokul)