Re: [PR] [WIP][SPARK-45022][SQL] Provide context for dataset API errors [spark]

via GitHub Wed, 25 Oct 2023 11:23:16 -0700


MaxGekk commented on code in PR #43334:
URL: https://github.com/apache/spark/pull/43334#discussion_r1372154497



##########
sql/core/src/main/scala/org/apache/spark/sql/package.scala:
##########
@@ -73,4 +76,43 @@ package object sql {
    * with rebasing.
    */
   private[sql] val SPARK_LEGACY_INT96_METADATA_KEY = 
"org.apache.spark.legacyINT96"
+
+  /**
+   * This helper function captures the Spark API and its call site in the user 
code from the current
+   * stacktrace.
+   *
+   * As adding `withOrigin` explicitly to all Spark API definition would be a 
huge change,
+   * `withOrigin` is used only at certain places where all API implementation 
surely pass through
+   * and the current stacktrace is filtered to the point where first Spark API 
code is invoked from
+   * the user code.
+   *
+   * As there might be multiple nested `withOrigin` calls (e.g. any Spark API 
implementations can
+   * invoke other APIs) only the first `withOrigin` is captured because that 
is closer to the user
+   * code.
+   *
+   * @param framesToDrop the number of stack frames we can surely drop before 
searching for the user
+   *                     code
+   * @param f the function that can use the origin
+   * @return the result of `f`
+   */
+  private[sql] def withOrigin[T](framesToDrop: Int = 0)(f: => T): T = {
+    if (CurrentOrigin.get.stackTrace.isDefined) {
+      f
+    } else {
+      val st = Thread.currentThread().getStackTrace
+      var i = framesToDrop + 3
+      while (sparkCode(st(i))) i += 1

Review Comment:
   We set `framesToDrop = 1` in a few places:
   - `Column.fn`
   - `withExpr`
   - `repartitionByExpression`
   - `repartitionByRange`
   - `withAggregateFunction`
   - `createLambda`
   
   So, there are 2 options either
   - the function `sparkCode` doesn't work properly, and we skip 1 frame 
forcibly
   - or a premature optimization.
   
   I will check that after all tests passed eventually.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [WIP][SPARK-45022][SQL] Provide context for dataset API errors [spark]

Reply via email to