[sql] how to connect query stage to Spark job/stages?

2023-11-29 Thread Chenghao Lyu
Hi, I am seeking advice on measuring the performance of each QueryStage (QS) when AQE is enabled in Spark SQL. Specifically, I need help to automatically map a QS to its corresponding jobs (or stages) to get the QS runtime metrics. I recorded the QS structure via a customized injected Query

[SparkSQL, SparkUI, RESTAPI] How to extract the WholeStageCodeGen ids from SparkUI

2023-04-07 Thread Chenghao Lyu
Hi, The detailed stage page shows the involved WholeStageCodegen Ids in its DAG visualization from the Spark UI when running a SparkSQL. (e.g., under the link node:18088/history/application_1663600377480_62091/stages/stage/?id=1=0). However, I have trouble extracting the WholeStageCodegen ids

Re: Depolying stage-level scheduling for Spark SQL

2022-09-30 Thread Chenghao Lyu
way to apply stage level scheduling > to SQL/dataframe, or like mentioned in original issue if AQE gets smart > enough it would just do it for the user, but lots of factors that come into > play that make that difficult as well. > > Tom > On Friday, September 30, 2022,

Re: Depolying stage-level scheduling for Spark SQL

2022-09-30 Thread Chenghao Lyu
is, IMO, should be based > on analysis and costing the plan. For this RDD only stage level scheduling > should be sufficient. > > > On Thu, Sep 29, 2022 at 8:56 AM Chenghao Lyu wrote: > > > Hi, > > > > > > I plan to deploy the stage-level scheduling for Spar

Depolying stage-level scheduling for Spark SQL

2022-09-29 Thread Chenghao Lyu
Hi, I plan to deploy the stage-level scheduling for Spark SQL to apply some fine-grained optimizations over the DAG of stages. However, I am blocked by the following issues: 1. The current stage-level scheduling supports RDD APIs only. So is there a way to reuse the stage-level scheduling for