Re: modifying spark's optimizer for research

2021-04-22 Thread Walter Cai
Hi Cheng Su and All, Thanks for your reply; the change I'm attempting to make would be a significant philosophical change to how optimizers currently handle cardinality estimation. With that in mind, I think it might be wiser to first perform a prototype/proof of concept as versus the traditional

Why is failing to get Hive token non fatal ?

2021-04-22 Thread Manu Zhang
Hi all, Recently, we had an incident where applications failed to connect to Hive metastore due to authentication error. The applications were run in YARN cluster mode and the error was thrown in the driver. However, we find the client already threw errors when it failed to get the Hive token. It

[DISCUSS] Multiple columns adding/replacing support in PySpark DataFrame API

2021-04-22 Thread Yikun Jiang
Hi, all *Background:* Currently, there is a withColumns [1] method to help users/devs add/replace multiple columns at once. But this method is private

Re: modifying spark's optimizer for research

2021-04-22 Thread Cheng Su
Hello Walter, Just FYI - https://spark.apache.org/contributing.html is the general guide for how to contributing in Spark. > implement a prototype modification to spark's optimizer to exhibit/experiment > some of my PhD work Maybe you could share some links or pointers for the work you have