Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22353 > Although event log is in JSON format, it's mostly for internal usage, to be load by history server and used to build the Spark UI. AFAIK, there are more and more projects replay event log to analysis jobs offline, especially in a platform/infra team in a big company. Dr-elephant doesn't read event log, instead, query SHS to get information causing many problems like compatibility or data accuracy. In eBay we are building a system similar with Dr-elephant but much powerful. One of use cases in this system is building a data lineage and monitor the input/output path and data size for each application. Difference with Apache Altas who need attach a spark listener into the spark runtime, we choose to replay event log to build all context we need. Before 2.3, we can get above information from the `metadata` field in SQLExecutionStart event. Now it was removed. So I hope this PR could add it back. What's more is make more probability on event log instead of only using in SHS.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org