Re: [PR] [WIP][SPARK-47858][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

via GitHub Mon, 15 Apr 2024 13:36:06 -0700


ueshin commented on code in PR #46063:
URL: https://github.com/apache/spark/pull/46063#discussion_r1566391570



##########
sql/core/src/main/scala/org/apache/spark/sql/package.scala:
##########
@@ -123,45 +98,16 @@ package object sql {
     if (CurrentOrigin.get.stackTrace.isDefined) {
       f
     } else {
-      val origin = Origin(stackTrace = Some(captureStackTrace()))
-      CurrentOrigin.withOrigin(origin)(f)
-    }
-  }
-
-  /**
-   * This overloaded helper function captures the call site information 
specifically for PySpark,
-   * using provided PySpark logging information instead of capturing the 
current Java stack trace.
-   *
-   * This method is designed to enhance the debuggability of PySpark by 
including PySpark-specific
-   * logging information (e.g., method names and call sites within PySpark 
scripts) in debug logs,
-   * without the overhead of capturing and processing Java stack traces that 
are less relevant
-   * to PySpark developers.
-   *
-   * The `pysparkErrorContext` parameter allows for passing PySpark call site 
information, which
-   * is then included in the Origin context. This facilitates more precise and 
useful logging for
-   * troubleshooting PySpark applications.
-   *
-   * This method should be used in places where PySpark API calls are made, 
and PySpark logging
-   * information is available and beneficial for debugging purposes.
-   *
-   * @param pysparkErrorContext Optional PySpark logging information including 
the call site,
-   *                            represented as a (String, String).
-   *                            This may contain keys like "fragment" and 
"callSite" to provide
-   *                            detailed context about the PySpark call site.
-   * @param f                   The function that can utilize the modified 
Origin context with
-   *                            PySpark logging information.
-   * @return The result of executing `f` within the context of the provided 
PySpark logging
-   *         information.
-   */
-  private[sql] def withOrigin[T](
-      pysparkErrorContext: Option[(String, String)] = None)(f: => T): T = {
-    if (CurrentOrigin.get.stackTrace.isDefined) {
-      f
-    } else {
-      val origin = Origin(
-        stackTrace = Some(captureStackTrace()),
-        pysparkErrorContext = pysparkErrorContext
-      )
+      val st = Thread.currentThread().getStackTrace
+      var i = 0
+      // Find the beginning of Spark code traces
+      while (i < st.length && !sparkCode(st(i))) i += 1
+      // Stop at the end of the first Spark code traces
+      while (i < st.length && sparkCode(st(i))) i += 1
+      val origin = Origin(stackTrace = Some(st.slice(
+        from = i - 1,
+        until = i + SQLConf.get.stackTracesInDataFrameContext)),
+        pysparkErrorContext = CurrentOrigin.get.pysparkErrorContext)

Review Comment:
   I guess we can use `object PySparkCurrentOrigin` in the previous PR 
[here](https://github.com/apache/spark/compare/master...c8d98ea1f1b5ba7994eaa1d0fe01c7022ec120a2#diff-2b0bd51d4eb6798d69876a78b0fe2097bda3cc973cd95df51b6bc6abfc893438R94-R107),
 but consume the `pysparkErrorContext` here, instead of when creating 
`DataFrameQueryContext`?
   
   We may want to change what's stored there from `mutable.Map[String, String]` 
to `Option[Tuple[String, String]]`, though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [WIP][SPARK-47858][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

Reply via email to