Thanks Mich.
But many original datasource has the abnormal values included from my
experience.
I already used rlike and filter to implement the data cleaning as my
this writing:
https://bigcount.xyz/calculate-urban-words-vote-in-spark.html
What I am surprised is that spark does the string to numeric converting
automatically and ignore those non-numeric columns. Based on this, my
data cleaning seems meaningless.
Thanks.
Mich Talebzadeh wrote:
Agg and ave are numeric functions dealing with the numeric values. Why
is column number defined as String type?
Do you perform data cleaning beforehand by any chance? It is good practice.
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org