Re: Why are hash functions seeded with 42?

2022-09-30 Thread Felix Cheung
+1 to doc, seed argument would be great if possible From: Sean Owen Sent: Monday, September 26, 2022 5:26:26 PM To: Nicholas Gustafson Cc: dev Subject: Re: Why are hash functions seeded with 42? Oh yeah I get why we love to pick 42 for random things. I'm

Re: Depolying stage-level scheduling for Spark SQL

2022-09-30 Thread Chenghao Lyu
Thanks for the clarification Tom! A bit more backgrounds for what we want to do: we have proposed a fine-grained (stage-level) resource optimization approach in VLDB22  https://www.vldb.org/pvldb/vol15/p3098-lyu.pdf and would like to try it over Spark. Our approach can recommend the resource

Re: Depolying stage-level scheduling for Spark SQL

2022-09-30 Thread Tom Graves
see the original SPIP for as to why we only support RDD:  https://issues.apache.org/jira/browse/SPARK-27495 The main problem is exactly what you are referring to. The RDD level is not exposed to the user when using SQL or Dataframe API. This is on purpose and user shouldn't have to know

Re: Depolying stage-level scheduling for Spark SQL

2022-09-30 Thread Chenghao Lyu
Thanks for the reply! To clarify, for issue 2, it could still break apart a query into multiple jobs without AQE — I have turned off the AQE in my posted example. For 1, an end user just needs to turn on/off a knob to use the stage-level scheduling for Spark SQL — I am considering adding a