Re: how spark handle the abnormal values

wilson Sun, 01 May 2022 17:20:58 -0700

I did a small test as follows.

scala> df.printSchema()
root
 |-- fruit: string (nullable = true)
 |-- number: string (nullable = true)



scala> df.show()
+------+------+
| fruit|number|
+------+------+
| apple|     2|
|orange|     5|
|cherry|     7|
|  plum|   xyz|
+------+------+


scala> df.agg(avg("number")).show()
+-----------------+
|      avg(number)|
+-----------------+
|4.666666666666667|
+-----------------+

As you see, the "number" column is string type, and there is a abnormalvalue in it.

But for these two cases spark still handles the result pretty well. So Iguess:

1) spark can make some auto translation from string to numeric whenaggregating.2) spark ignore those abnormal values automatically when calculating therelevant stuff.


Am I right? thank you.

wilson




wilson wrote:

my dataset has abnormal values in the column whose normal values arenumeric. I can select them as:


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: how spark handle the abnormal values

Reply via email to