my dataset has NULL included in the columns.
do you know why the select results below have not consistent behavior?

scala> dfs.select("cand_status").count()
val res37: Long = 881793

scala> dfs.select("cand_status").where($"cand_status" =!= "NULL").count()
val res38: Long = 383717

scala> dfs.select("cand_status").where($"cand_status" === "NULL").count()
val res39: Long = 86402

scala> dfs.select("cand_status").where($"cand_status" === "NULL").where($"cand_status" =!= "NULL").count()
val res40: Long = 0


as you see: 383717 + 86402  != 881793
for which i expect them to be equal.

Thanks.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to