Thanks Mich.
But many original datasource has the abnormal values included from my experience. I already used rlike and filter to implement the data cleaning as my this writing:
https://bigcount.xyz/calculate-urban-words-vote-in-spark.html

What I am surprised is that spark does the string to numeric converting automatically and ignore those non-numeric columns. Based on this, my data cleaning seems meaningless.

Thanks.

Mich Talebzadeh wrote:
Agg and ave are numeric functions dealing with the numeric values. Why is column number defined as String type?

Do you perform data cleaning beforehand by any chance? It is good practice.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to