Re: How to determine the function of tasks on each stage in an Apache Spark application?

2023-05-02 Thread Trường Trần Phan An
Hi all, I have written a program and overridden two events onStageCompleted and onTaskEnd. However, these two events do not provide information on when a Task/Stage is completed. What I want to know is which Task corresponds to which stage of a DAG (the Spark history server only tells me how

Re: How to determine the function of tasks on each stage in an Apache Spark application?

2023-04-14 Thread Jacek Laskowski
Hi, Start with intercepting stage completions using SparkListenerStageCompleted [1]. That's Spark Core (jobs, stages and tasks). Go up the execution chain to Spark SQL with SparkListenerSQLExecutionStart [2] and SparkListenerSQLExecutionEnd [3], and correlate infos. You may want to look at how

Re: How to determine the function of tasks on each stage in an Apache Spark application?

2023-04-13 Thread Trường Trần Phan An
Hi, Can you give me more details or give me a tutorial on "You'd have to intercept execution events and correlate them. Not an easy task yet doable" Thank Vào Th 4, 12 thg 4, 2023 vào lúc 21:04 Jacek Laskowski đã viết: > Hi, > > tl;dr it's not possible to "reverse-engineer" tasks to

Re: How to determine the function of tasks on each stage in an Apache Spark application?

2023-04-12 Thread Maytas Monsereenusorn
Hi, I was wondering if it's not possible to determine tasks to functions, is it still possible to easily figure out which job and stage completed which part of the query from the UI? For example, in the SQL tab of the Spark UI, I am able to see the query and the Job IDs for that query. However,

Re: How to determine the function of tasks on each stage in an Apache Spark application?

2023-04-12 Thread Jacek Laskowski
Hi, tl;dr it's not possible to "reverse-engineer" tasks to functions. In essence, Spark SQL is an abstraction layer over RDD API that's made up of partitions and tasks. Tasks are Scala functions (possibly with some Python for PySpark). A simple-looking high-level operator like DataFrame.join can