sorry i have found what's the reasons. for null I can not compare it
directly. I have wrote a note for this.
https://bigcount.xyz/how-spark-handles-null-and-abnormal-values.html
Thanks.
wilson wrote:
do you know why the select results below have not consistent behavior?
my dataset has NULL included in the columns.
do you know why the select results below have not consistent behavior?
scala> dfs.select("cand_status").count()
val res37: Long = 881793
scala> dfs.select("cand_status").where($"cand_status" =!= "NULL").count()
val res38: Long = 383717
scala>