Hi I am using PySpark for writing Spark queries. My research project requires me to accurately measure latency for each and every operator/stage in the query. I can make some guesses but unable to exactly map the stages (shown in the DAG on Spark UI) to the exact line in my PySpark code.
Can some one help? I can share some examples if required. Thanks Dhruv -------------------------------------------------- Dhruv Kumar PhD Candidate Computer Science and Engineering University of Minnesota www.dhruvkumar.me <http://dhruvkumar.me/>