Parse Execution Plan from PySpark

2022-05-02 Thread Pablo Alcain
Hello all! I'm working with PySpark trying to reproduce some of the results we see on batch through streaming processes, just as a PoC for now. For this, I'm thinking of trying to interpret the execution plan and eventually write it back to Python (I'm doing something similar with pandas as well,

unsubscribe

2022-05-02 Thread Ahmed Kamal Abdelfatah
-- *This email, including any information it contains and any attachments to it, is confidential and may be privileged. This email is intended only for the use of the named recipient(s). If you are not a named recipient, please notify the sender immediately by replying to this

unsubscribe

2022-05-02 Thread Ray Qiu

Re: Vulnerabilities in htrace-core4-4.1.0-incubating.jar jar used in spark.

2022-05-02 Thread Artemis User
What scanner did you use? Looks like all CVEs you listed for jackson-databind-xxx.jar are for older versions (2.9.10.x).  A quick search on NVD revealed that there is only one CVE (CVE-2020-36518) that affects your Spark versions.  This CVE (not on your scanned CVE list) is on jackson-databind

Unsubscribe

2022-05-02 Thread Sahil Bali
Unsubscribe

Re: how spark handle the abnormal values

2022-05-02 Thread wilson
Thanks Mich. But many original datasource has the abnormal values included from my experience. I already used rlike and filter to implement the data cleaning as my this writing: https://bigcount.xyz/calculate-urban-words-vote-in-spark.html What I am surprised is that spark does the string to

Re: how spark handle the abnormal values

2022-05-02 Thread Mich Talebzadeh
Agg and ave are numeric functions dealing with the numeric values. Why is column number defined as String type? Do you perform data cleaning beforehand by any chance? It is good practice. Alternatively you can use the rlike() function to filter rows that have numeric values in a column..