[ https://issues.apache.org/jira/browse/SPARK-38615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gengliang Wang updated SPARK-38615: ----------------------------------- Description: Currently, there is not enough error context for runtime ANSI failures. In the following example, the error message only tells that there is a "divide by zero" error, without pointing out where the exact SQL statement is. {code:java} > SELECT ss1.ca_county, ss1.d_year, ws2.web_sales / ws1.web_sales web_q1_q2_increase, ss2.store_sales / ss1.store_sales store_q1_q2_increase, ws3.web_sales / ws2.web_sales web_q2_q3_increase, ss3.store_sales / ss2.store_sales store_q2_q3_increase FROM ss ss1, ss ss2, ss ss3, ws ws1, ws ws2, ws ws3 WHERE ss1.d_qoy = 1 AND ss1.d_year = 2000 AND ss1.ca_county = ss2.ca_county AND ss2.d_qoy = 2 AND ss2.d_year = 2000 AND ss2.ca_county = ss3.ca_county AND ss3.d_qoy = 3 AND ss3.d_year = 2000 AND ss1.ca_county = ws1.ca_county AND ws1.d_qoy = 1 AND ws1.d_year = 2000 AND ws1.ca_county = ws2.ca_county AND ws2.d_qoy = 2 AND ws2.d_year = 2000 AND ws1.ca_county = ws3.ca_county AND ws3.d_qoy = 3 AND ws3.d_year = 2000 AND CASE WHEN ws1.web_sales > 0 THEN ws2.web_sales / ws1.web_sales ELSE NULL END > CASE WHEN ss1.store_sales > 0 THEN ss2.store_sales / ss1.store_sales ELSE NULL END AND CASE WHEN ws2.web_sales > 0 THEN ws3.web_sales / ws2.web_sales ELSE NULL END > CASE WHEN ss2.store_sales > 0 THEN ss3.store_sales / ss2.store_sales ELSE NULL END ORDER BY ss1.ca_county {code} {code:java} org.apache.spark.SparkArithmeticException: divide by zero at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:140) at org.apache.spark.sql.catalyst.expressions.DivModLike.eval(arithmetic.scala:437) at org.apache.spark.sql.catalyst.expressions.DivModLike.eval$(arithmetic.scala:425) at org.apache.spark.sql.catalyst.expressions.Divide.eval(arithmetic.scala:534) ...{code} I suggest that we provide details in the error message, including: * the problematic expression from the original SQL query, e.g. "ss3.store_sales / ss2.store_sales store_q2_q3_increase" * the line number and starting char position of the problematic expression, in case of queries like "select a + b from t1 union select a + b from t2" So that the error message will be precise {code:java} org.apache.spark.SparkArithmeticException: divide by zero SparkArithmeticException: divide by zero. To return NULL instead, use 'try_divide'. If necessary set spark.sql.ansi.enabled to false (except for ANSI interval type) to bypass this error. == SQL(line 2, position 43) == ws2.web_sales / ws1.web_sales web_q1_q2, ss2.store_sales / ss1.store_sales store_q1_q2 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ {code} [^SQL Error Attribution Framework.pdf] was: Currently, there is not enough error context for runtime ANSI failures. In the following example, the error message only tells that there is a "divide by zero" error, without pointing out where the exact SQL statement is. {code:java} > SELECT ss1.ca_county, ss1.d_year, ws2.web_sales / ws1.web_sales web_q1_q2_increase, ss2.store_sales / ss1.store_sales store_q1_q2_increase, ws3.web_sales / ws2.web_sales web_q2_q3_increase, ss3.store_sales / ss2.store_sales store_q2_q3_increase FROM ss ss1, ss ss2, ss ss3, ws ws1, ws ws2, ws ws3 WHERE ss1.d_qoy = 1 AND ss1.d_year = 2000 AND ss1.ca_county = ss2.ca_county AND ss2.d_qoy = 2 AND ss2.d_year = 2000 AND ss2.ca_county = ss3.ca_county AND ss3.d_qoy = 3 AND ss3.d_year = 2000 AND ss1.ca_county = ws1.ca_county AND ws1.d_qoy = 1 AND ws1.d_year = 2000 AND ws1.ca_county = ws2.ca_county AND ws2.d_qoy = 2 AND ws2.d_year = 2000 AND ws1.ca_county = ws3.ca_county AND ws3.d_qoy = 3 AND ws3.d_year = 2000 AND CASE WHEN ws1.web_sales > 0 THEN ws2.web_sales / ws1.web_sales ELSE NULL END > CASE WHEN ss1.store_sales > 0 THEN ss2.store_sales / ss1.store_sales ELSE NULL END AND CASE WHEN ws2.web_sales > 0 THEN ws3.web_sales / ws2.web_sales ELSE NULL END > CASE WHEN ss2.store_sales > 0 THEN ss3.store_sales / ss2.store_sales ELSE NULL END ORDER BY ss1.ca_county {code} {code:java} org.apache.spark.SparkArithmeticException: divide by zero at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:140) at org.apache.spark.sql.catalyst.expressions.DivModLike.eval(arithmetic.scala:437) at org.apache.spark.sql.catalyst.expressions.DivModLike.eval$(arithmetic.scala:425) at org.apache.spark.sql.catalyst.expressions.Divide.eval(arithmetic.scala:534) {code} I suggest that we provide details in the error message, including: * the problematic expression from the original SQL query, e.g. "ss3.store_sales / ss2.store_sales store_q2_q3_increase" * the line number and starting char position of the problematic expression, in case of queries like "select a + b from t1 union select a + b from t2" > SQL Error Attribution Framework > ------------------------------- > > Key: SPARK-38615 > URL: https://issues.apache.org/jira/browse/SPARK-38615 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.3.0 > Reporter: Gengliang Wang > Priority: Major > Attachments: SQL Error Attribution Framework.pdf > > > Currently, there is not enough error context for runtime ANSI failures. > In the following example, the error message only tells that there is a > "divide by zero" error, without pointing out where the exact SQL statement is. > {code:java} > > SELECT > ss1.ca_county, > ss1.d_year, > ws2.web_sales / ws1.web_sales web_q1_q2_increase, > ss2.store_sales / ss1.store_sales store_q1_q2_increase, > ws3.web_sales / ws2.web_sales web_q2_q3_increase, > ss3.store_sales / ss2.store_sales store_q2_q3_increase > FROM > ss ss1, ss ss2, ss ss3, ws ws1, ws ws2, ws ws3 > WHERE > ss1.d_qoy = 1 > AND ss1.d_year = 2000 > AND ss1.ca_county = ss2.ca_county > AND ss2.d_qoy = 2 > AND ss2.d_year = 2000 > AND ss2.ca_county = ss3.ca_county > AND ss3.d_qoy = 3 > AND ss3.d_year = 2000 > AND ss1.ca_county = ws1.ca_county > AND ws1.d_qoy = 1 > AND ws1.d_year = 2000 > AND ws1.ca_county = ws2.ca_county > AND ws2.d_qoy = 2 > AND ws2.d_year = 2000 > AND ws1.ca_county = ws3.ca_county > AND ws3.d_qoy = 3 > AND ws3.d_year = 2000 > AND CASE WHEN ws1.web_sales > 0 > THEN ws2.web_sales / ws1.web_sales > ELSE NULL END > > CASE WHEN ss1.store_sales > 0 > THEN ss2.store_sales / ss1.store_sales > ELSE NULL END > AND CASE WHEN ws2.web_sales > 0 > THEN ws3.web_sales / ws2.web_sales > ELSE NULL END > > CASE WHEN ss2.store_sales > 0 > THEN ss3.store_sales / ss2.store_sales > ELSE NULL END > ORDER BY ss1.ca_county > {code} > {code:java} > org.apache.spark.SparkArithmeticException: divide by zero at > org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:140) > at > org.apache.spark.sql.catalyst.expressions.DivModLike.eval(arithmetic.scala:437) > at > org.apache.spark.sql.catalyst.expressions.DivModLike.eval$(arithmetic.scala:425) > at > org.apache.spark.sql.catalyst.expressions.Divide.eval(arithmetic.scala:534) > ...{code} > > I suggest that we provide details in the error message, including: > * the problematic expression from the original SQL query, e.g. > "ss3.store_sales / ss2.store_sales store_q2_q3_increase" > * the line number and starting char position of the problematic expression, > in case of queries like "select a + b from t1 union select a + b from t2" > So that the error message will be precise > {code:java} > org.apache.spark.SparkArithmeticException: divide by zero > SparkArithmeticException: divide by zero. To return NULL instead, use > 'try_divide'. If necessary set spark.sql.ansi.enabled to false (except for > ANSI interval type) to bypass this error. > == SQL(line 2, position 43) == > ws2.web_sales / ws1.web_sales web_q1_q2, ss2.store_sales / ss1.store_sales > store_q1_q2 > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > {code} > [^SQL Error Attribution Framework.pdf] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org