[ https://issues.apache.org/jira/browse/SPARK-24260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-24260. ---------------------------------- Resolution: Incomplete > Support for multi-statement SQL in SparkSession.sql API > ------------------------------------------------------- > > Key: SPARK-24260 > URL: https://issues.apache.org/jira/browse/SPARK-24260 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.3.0 > Reporter: Ravindra Nath Kakarla > Priority: Minor > Labels: bulk-closed > > sparkSession.sql API only supports a single SQL statement to be executed for > a call. A multi-statement SQL cannot be executed in a single call. For > example, > {code:java} > SparkSession sparkSession = > SparkSession.builder().appName("MultiStatementSQL") > .master("local").config("", "").getOrCreate() > sparkSession.sql("DROP TABLE IF EXISTS count_employees; CACHE TABLE > employees; CREATE TEMPORARY VIEW count_employees AS SELECT count(*) as cnt > FROM employees; SELECT * FROM count_employees") > {code} > Above code fails with the error, > {code:java} > org.apache.spark.sql.catalyst.parser.ParseException: mismatched input ';' > expecting <EOF>{code} > Solution to this problem is to use the .sql API multiple times in a specific > order. > {code:java} > sparkSession.sql("DROP TABLE IF EXISTS count_employees") > sparkSession.sql("CACHE TABLE employees") > sparkSession.sql("CREATE TEMPORARY VIEW count_employees AS SELECT count(*) as > cnt FROM employees;") > sparkSession.sql("SELECT * FROM count_employees") > {code} > If these SQL statements come from a string / file, users have to implement > their own parsers to execute this. Like, > {code:java} > val sqlFromFile = """DROP TABLE IF EXISTS count_employees; > |CACHE TABLE employees; > |CREATE TEMPORARY VIEW count_employees AS SELECT count(*) as cnt FROM > employees; SELECT * FROM count_employees""".stripMargin{code} > {code:java} > sqlFromFile.split(";") > .forEach(line => sparkSession.sql(line)) > {code} > This naive parser can fail for many edge cases (like ";" inside a string). > Even if users use the same grammar used by Spark and implement their own > parsing, it can go out of sync with the way Spark parses the statements. > Can support for multiple SQL statements be built into SparkSession.sql API > itself? > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org