[ 
https://issues.apache.org/jira/browse/SPARK-38615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-38615:
-----------------------------------
    Description: 
Currently,  there is not enough error context for runtime ANSI failures.

In the following example, the error message only tells that there is a "divide 
by zero" error, without pointing out where the exact SQL statement is.
{code:java}
> SELECT
  ss1.ca_county,
  ss1.d_year,
  ws2.web_sales / ws1.web_sales web_q1_q2_increase,
  ss2.store_sales / ss1.store_sales store_q1_q2_increase,
  ws3.web_sales / ws2.web_sales web_q2_q3_increase,
  ss3.store_sales / ss2.store_sales store_q2_q3_increase
FROM
  ss ss1, ss ss2, ss ss3, ws ws1, ws ws2, ws ws3
WHERE
  ss1.d_qoy = 1
    AND ss1.d_year = 2000
    AND ss1.ca_county = ss2.ca_county
    AND ss2.d_qoy = 2
    AND ss2.d_year = 2000
    AND ss2.ca_county = ss3.ca_county
    AND ss3.d_qoy = 3
    AND ss3.d_year = 2000
    AND ss1.ca_county = ws1.ca_county
    AND ws1.d_qoy = 1
    AND ws1.d_year = 2000
    AND ws1.ca_county = ws2.ca_county
    AND ws2.d_qoy = 2
    AND ws2.d_year = 2000
    AND ws1.ca_county = ws3.ca_county
    AND ws3.d_qoy = 3
    AND ws3.d_year = 2000
    AND CASE WHEN ws1.web_sales > 0
    THEN ws2.web_sales / ws1.web_sales
        ELSE NULL END
    > CASE WHEN ss1.store_sales > 0
    THEN ss2.store_sales / ss1.store_sales
      ELSE NULL END
    AND CASE WHEN ws2.web_sales > 0
    THEN ws3.web_sales / ws2.web_sales
        ELSE NULL END
    > CASE WHEN ss2.store_sales > 0
    THEN ss3.store_sales / ss2.store_sales
      ELSE NULL END
ORDER BY ss1.ca_county
 {code}
{code:java}
org.apache.spark.SparkArithmeticException: divide by zero at 
org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:140)
 at 
org.apache.spark.sql.catalyst.expressions.DivModLike.eval(arithmetic.scala:437) 
at 
org.apache.spark.sql.catalyst.expressions.DivModLike.eval$(arithmetic.scala:425)
 at org.apache.spark.sql.catalyst.expressions.Divide.eval(arithmetic.scala:534)
...{code}
 

I suggest that we provide details in the error message,  including:
 * the problematic expression from the original SQL query, e.g. 
"ss3.store_sales / ss2.store_sales store_q2_q3_increase"
 * the line number and starting char position of the problematic expression, in 
case of queries like "select a + b from t1 union select a + b from t2"

So that the error message will be precise 
{code:java}
org.apache.spark.SparkArithmeticException: divide by zero
SparkArithmeticException: divide by zero. To return NULL instead, use 
'try_divide'. If necessary set spark.sql.ansi.enabled to false (except for ANSI 
interval type) to bypass this error.
== SQL(line 2, position 43) ==
ws2.web_sales / ws1.web_sales web_q1_q2, ss2.store_sales / ss1.store_sales 
store_q1_q2
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
{code}
[^SQL Error Attribution Framework.pdf]

  was:
Currently,  there is not enough error context for runtime ANSI failures.

In the following example, the error message only tells that there is a "divide 
by zero" error, without pointing out where the exact SQL statement is.
{code:java}
> SELECT
  ss1.ca_county,
  ss1.d_year,
  ws2.web_sales / ws1.web_sales web_q1_q2_increase,
  ss2.store_sales / ss1.store_sales store_q1_q2_increase,
  ws3.web_sales / ws2.web_sales web_q2_q3_increase,
  ss3.store_sales / ss2.store_sales store_q2_q3_increase
FROM
  ss ss1, ss ss2, ss ss3, ws ws1, ws ws2, ws ws3
WHERE
  ss1.d_qoy = 1
    AND ss1.d_year = 2000
    AND ss1.ca_county = ss2.ca_county
    AND ss2.d_qoy = 2
    AND ss2.d_year = 2000
    AND ss2.ca_county = ss3.ca_county
    AND ss3.d_qoy = 3
    AND ss3.d_year = 2000
    AND ss1.ca_county = ws1.ca_county
    AND ws1.d_qoy = 1
    AND ws1.d_year = 2000
    AND ws1.ca_county = ws2.ca_county
    AND ws2.d_qoy = 2
    AND ws2.d_year = 2000
    AND ws1.ca_county = ws3.ca_county
    AND ws3.d_qoy = 3
    AND ws3.d_year = 2000
    AND CASE WHEN ws1.web_sales > 0
    THEN ws2.web_sales / ws1.web_sales
        ELSE NULL END
    > CASE WHEN ss1.store_sales > 0
    THEN ss2.store_sales / ss1.store_sales
      ELSE NULL END
    AND CASE WHEN ws2.web_sales > 0
    THEN ws3.web_sales / ws2.web_sales
        ELSE NULL END
    > CASE WHEN ss2.store_sales > 0
    THEN ss3.store_sales / ss2.store_sales
      ELSE NULL END
ORDER BY ss1.ca_county
 {code}
{code:java}
org.apache.spark.SparkArithmeticException: divide by zero at 
org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:140)
 at 
org.apache.spark.sql.catalyst.expressions.DivModLike.eval(arithmetic.scala:437) 
at 
org.apache.spark.sql.catalyst.expressions.DivModLike.eval$(arithmetic.scala:425)
 at org.apache.spark.sql.catalyst.expressions.Divide.eval(arithmetic.scala:534)
{code}
 

I suggest that we provide details in the error message,  including:
 * the problematic expression from the original SQL query, e.g. 
"ss3.store_sales / ss2.store_sales store_q2_q3_increase"
 * the line number and starting char position of the problematic expression, in 
case of queries like "select a + b from t1 union select a + b from t2"


> SQL Error Attribution Framework
> -------------------------------
>
>                 Key: SPARK-38615
>                 URL: https://issues.apache.org/jira/browse/SPARK-38615
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Gengliang Wang
>            Priority: Major
>         Attachments: SQL Error Attribution Framework.pdf
>
>
> Currently,  there is not enough error context for runtime ANSI failures.
> In the following example, the error message only tells that there is a 
> "divide by zero" error, without pointing out where the exact SQL statement is.
> {code:java}
> > SELECT
>   ss1.ca_county,
>   ss1.d_year,
>   ws2.web_sales / ws1.web_sales web_q1_q2_increase,
>   ss2.store_sales / ss1.store_sales store_q1_q2_increase,
>   ws3.web_sales / ws2.web_sales web_q2_q3_increase,
>   ss3.store_sales / ss2.store_sales store_q2_q3_increase
> FROM
>   ss ss1, ss ss2, ss ss3, ws ws1, ws ws2, ws ws3
> WHERE
>   ss1.d_qoy = 1
>     AND ss1.d_year = 2000
>     AND ss1.ca_county = ss2.ca_county
>     AND ss2.d_qoy = 2
>     AND ss2.d_year = 2000
>     AND ss2.ca_county = ss3.ca_county
>     AND ss3.d_qoy = 3
>     AND ss3.d_year = 2000
>     AND ss1.ca_county = ws1.ca_county
>     AND ws1.d_qoy = 1
>     AND ws1.d_year = 2000
>     AND ws1.ca_county = ws2.ca_county
>     AND ws2.d_qoy = 2
>     AND ws2.d_year = 2000
>     AND ws1.ca_county = ws3.ca_county
>     AND ws3.d_qoy = 3
>     AND ws3.d_year = 2000
>     AND CASE WHEN ws1.web_sales > 0
>     THEN ws2.web_sales / ws1.web_sales
>         ELSE NULL END
>     > CASE WHEN ss1.store_sales > 0
>     THEN ss2.store_sales / ss1.store_sales
>       ELSE NULL END
>     AND CASE WHEN ws2.web_sales > 0
>     THEN ws3.web_sales / ws2.web_sales
>         ELSE NULL END
>     > CASE WHEN ss2.store_sales > 0
>     THEN ss3.store_sales / ss2.store_sales
>       ELSE NULL END
> ORDER BY ss1.ca_county
>  {code}
> {code:java}
> org.apache.spark.SparkArithmeticException: divide by zero at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:140)
>  at 
> org.apache.spark.sql.catalyst.expressions.DivModLike.eval(arithmetic.scala:437)
>  at 
> org.apache.spark.sql.catalyst.expressions.DivModLike.eval$(arithmetic.scala:425)
>  at 
> org.apache.spark.sql.catalyst.expressions.Divide.eval(arithmetic.scala:534)
> ...{code}
>  
> I suggest that we provide details in the error message,  including:
>  * the problematic expression from the original SQL query, e.g. 
> "ss3.store_sales / ss2.store_sales store_q2_q3_increase"
>  * the line number and starting char position of the problematic expression, 
> in case of queries like "select a + b from t1 union select a + b from t2"
> So that the error message will be precise 
> {code:java}
> org.apache.spark.SparkArithmeticException: divide by zero
> SparkArithmeticException: divide by zero. To return NULL instead, use 
> 'try_divide'. If necessary set spark.sql.ansi.enabled to false (except for 
> ANSI interval type) to bypass this error.
> == SQL(line 2, position 43) ==
> ws2.web_sales / ws1.web_sales web_q1_q2, ss2.store_sales / ss1.store_sales 
> store_q1_q2
>                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
> {code}
> [^SQL Error Attribution Framework.pdf]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to