subject:"how spark handle the abnormal values"

Re: how spark handle the abnormal values

2022-05-02 Thread wilson

Thanks Mich. But many original datasource has the abnormal values included from my experience. I already used rlike and filter to implement the data cleaning as my this writing: https://bigcount.xyz/calculate-urban-words-vote-in-spark.html What I am surprised is that spark does the string to

Re: how spark handle the abnormal values

2022-05-02 Thread Mich Talebzadeh

Agg and ave are numeric functions dealing with the numeric values. Why is column number defined as String type? Do you perform data cleaning beforehand by any chance? It is good practice. Alternatively you can use the rlike() function to filter rows that have numeric values in a column..

Re: how spark handle the abnormal values

2022-05-01 Thread Artemis User

Your test result just gave the verdict so #2 is the answer - Spark ignores those non-numeric rows completely when aggregating the average. On 5/1/22 8:20 PM, wilson wrote: I did a small test as follows. scala> df.printSchema() root |-- fruit: string (nullable = true) |-- number: string

Re: how spark handle the abnormal values

2022-05-01 Thread wilson

I did a small test as follows. scala> df.printSchema() root |-- fruit: string (nullable = true) |-- number: string (nullable = true) scala> df.show() +--+--+ | fruit|number| +--+--+ | apple| 2| |orange| 5| |cherry| 7| | plum| xyz| +--+--+ scala>

how spark handle the abnormal values

2022-05-01 Thread wilson

|65.18445431897453| +-----+ so how spark handle the abnormal values in a numeric column? just ignore them? Thank you. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: how spark handle the abnormal values

Re: how spark handle the abnormal values

Re: how spark handle the abnormal values

Re: how spark handle the abnormal values

how spark handle the abnormal values

5 matches

Site Navigation

Mail list logo

Footer information