[ https://issues.apache.org/jira/browse/SPARK-38384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Max Gekk updated SPARK-38384: ----------------------------- Epic Link: SPARK-38781 > Improve error messages of ParseException from ANTLR > --------------------------------------------------- > > Key: SPARK-38384 > URL: https://issues.apache.org/jira/browse/SPARK-38384 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.3.0 > Reporter: Xinyi Yu > Priority: Major > > This task is intended to improve the error messages of ParseException > directly coming from ANTLR. > h2. Bad Error Messages > Many error messages defined in ANTLR are not user-friendly. For example, > {code:java} > spark.sql("sel 1") > > ParseException: > mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', > 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', > 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', > 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', > 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', > 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', > 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, > pos 0) > > == SQL == > sel 1 > ^^^ {code} > Following the [Spark Error Message > Guidelines|https://spark.apache.org/error-message-guidelines.html], the words > in this message are vague and hard to follow. It states ‘What’, but is > unclear on the ‘Why’ and ‘How’. > Or, > {code:java} > spark.sql("") // empty query > ParseException: > mismatched input '<EOF>' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', > 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', > 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', > 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', > 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', > 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', > 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) > == SQL == > ^^^ {code} > Instead of simply telling users it’s an empty line, it outputs a long > message, even giving the jargon '<EOF>'. > h2. Where do these error messages come from? > There has been much work on improving ParseException in general (see > [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala] > for example). But lots of the above error messages are defined in ANTLR and > stay unmodified in Spark. > When such an error is encountered in ANTLR, ANTLR notified the exception > listener with a message like ‘mismatched input {} expecting {}’. The Spark > exception listener _appends_ the line and position to the message, as well as > the problematic SQL and several ‘^^^’ marking the error position. Then it > throws a ParseException with the appended error message. Spark doesn’t modify > the error message given from ANTLR. > This task focuses on those error messages from ANTLR. > h2. Goals > # Improve the error messages of ParseException that are from ANTLR; Modify > all affected test cases accordingly. > # Make sure the new error message framework is applied in this change. > h2. Proposed Error Messages Change > It should be in each sub-task and includes concrete before & after cases. See > the description of each sub-task for more details. > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org