[ https://issues.apache.org/jira/browse/SPARK-16217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust updated SPARK-16217: ------------------------------------- Target Version/s: 2.3.0 (was: 2.2.0) > Support SELECT INTO statement > ----------------------------- > > Key: SPARK-16217 > URL: https://issues.apache.org/jira/browse/SPARK-16217 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.1.0 > Reporter: GuangFancui(ISCAS) > > The *SELECT INTO* statement selects data from one table and inserts it into a > new table as follows. > {code:sql} > SELECT column_name(s) > INTO newtable > FROM table1; > {code} > This statement is commonly used in SQL but not currently supported in > SparkSQL. > We investigated the Catalyst and found that this statement can be implemented > by improving the grammar and reusing the logical plan of *CREAT TABLE AS > SELECT* as follows. > # Improve grammar: Add _intoClause_ to _SELECT ... FROM_ in > _querySpecification_ grammar in SqlBase.g4 file. > !https://raw.githubusercontent.com/wuxianxingkong/storage/master/selectinto_g4_v2.png! > For example > {code:sql} > SELECT * > INTO NEW_TABLE > FROM OLD_TABLE > {code} > Then the grammar tree will be: > !https://raw.githubusercontent.com/wuxianxingkong/storage/master/selectinto_tree_v2.png! > Furthermore, we can argue whether it's necessary to add _intoCaluse_ to > _TRANSFORM_ in _querySpecification_ > # Identify _SELECT INTO_ in _Parser_: Modify _visitSingleInsertQuery_ > function. Extract _IntoClauseContext_ with _existIntoClause_ fucntion. > _IntoClauseContext_ is then passed as an argument to _withSelectInto_ > function .(_intoClause_ and queryOrganization are not in the same level, so > we need to extract _IntoClauseContext_ when visiting _singleInsertQuery_) > # Conversion in _Parser_: Convert current logical plan to _CTAS_(Strictly > speaking, as a child of CTAS) using _withSelectInto_ function. > *Hive support* should be opened since _CreateHiveTableAsSelectCommand_ relies > on it. > _withSelectInto_ function copies code of _visitCreateTable_ to do conversion. > So it requires further discussion and optimization. > Implements are based on the following _assumptions_: > # _intoClause_ must be together with _fromClause_.{code:sql}(intoClause? > fromClause)?{code}This structure can ensure that this modification won’t > affect existed _multiInsertQuery_. > # _SELECT INOT_ statement will be translated to the following tree structure: > !https://raw.githubusercontent.com/wuxianxingkong/storage/master/hierarchy.png! > As shown, if there is a _intoClause_, the actual subclass of _queryTerm_ is > _queryTermDefault_, besides, the actual subclass of _queryPrimary_ is > _queryPrimaryDefault_. We use _existIntoClause_ function to match designated > subclass. Only all conditions are satisfied can this function return > intoClauseContext, if not, return null. > We’ve implemented and tested the above approach. Please refer to PR: > https://github.com/apache/spark/pull/14191 -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org