ueshin commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1566391570
########## sql/core/src/main/scala/org/apache/spark/sql/package.scala: ########## @@ -123,45 +98,16 @@ package object sql { if (CurrentOrigin.get.stackTrace.isDefined) { f } else { - val origin = Origin(stackTrace = Some(captureStackTrace())) - CurrentOrigin.withOrigin(origin)(f) - } - } - - /** - * This overloaded helper function captures the call site information specifically for PySpark, - * using provided PySpark logging information instead of capturing the current Java stack trace. - * - * This method is designed to enhance the debuggability of PySpark by including PySpark-specific - * logging information (e.g., method names and call sites within PySpark scripts) in debug logs, - * without the overhead of capturing and processing Java stack traces that are less relevant - * to PySpark developers. - * - * The `pysparkErrorContext` parameter allows for passing PySpark call site information, which - * is then included in the Origin context. This facilitates more precise and useful logging for - * troubleshooting PySpark applications. - * - * This method should be used in places where PySpark API calls are made, and PySpark logging - * information is available and beneficial for debugging purposes. - * - * @param pysparkErrorContext Optional PySpark logging information including the call site, - * represented as a (String, String). - * This may contain keys like "fragment" and "callSite" to provide - * detailed context about the PySpark call site. - * @param f The function that can utilize the modified Origin context with - * PySpark logging information. - * @return The result of executing `f` within the context of the provided PySpark logging - * information. - */ - private[sql] def withOrigin[T]( - pysparkErrorContext: Option[(String, String)] = None)(f: => T): T = { - if (CurrentOrigin.get.stackTrace.isDefined) { - f - } else { - val origin = Origin( - stackTrace = Some(captureStackTrace()), - pysparkErrorContext = pysparkErrorContext - ) + val st = Thread.currentThread().getStackTrace + var i = 0 + // Find the beginning of Spark code traces + while (i < st.length && !sparkCode(st(i))) i += 1 + // Stop at the end of the first Spark code traces + while (i < st.length && sparkCode(st(i))) i += 1 + val origin = Origin(stackTrace = Some(st.slice( + from = i - 1, + until = i + SQLConf.get.stackTracesInDataFrameContext)), + pysparkErrorContext = CurrentOrigin.get.pysparkErrorContext) Review Comment: I guess we can use `object PySparkCurrentOrigin` in the previous PR [here](https://github.com/apache/spark/compare/master...c8d98ea1f1b5ba7994eaa1d0fe01c7022ec120a2#diff-2b0bd51d4eb6798d69876a78b0fe2097bda3cc973cd95df51b6bc6abfc893438R94-R107), but consume the `pysparkErrorContext` here, instead of when creating `DataFrameQueryContext`? We may want to change what's stored there from `mutable.Map[String, String]` to `Option[Tuple[String, String]]`, though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org