[ https://issues.apache.org/jira/browse/SPARK-24630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641516#comment-16641516 ]
Jackey Lee edited comment on SPARK-24630 at 10/8/18 8:36 AM: ------------------------------------------------------------- [~kabhwan] The DDL in SQLStreaming has nothing change with Batch SQL. Adding 'stream' keyword has two purposes: # *Mark the entire sql query as a stream query and generate the SQLStreaming plan tree.* # *Mark the table type as UnResolvedStreamRelation.* Parse the table as StreamingRelation or other Relation, especially in the stream join batch queries, such as kafka join mysql. A little example to show importances of 'stream': read stream from kafka stream table, and join mysql to count user message # *with 'stream'* ## select stream kafka_sql_test.name, count(door) from kafka_sql_test inner join mysql_test on kafka_sql_test.name == mysql_test.name group by kafka_sql_test.name ## It will be regarded as Streaming Query, the kafka_sql_test will be parsed as StreamingRelation and _mysql_test will be parsed as JDBCRelation, not Streaming Relation._ # *without 'stream'* ## select kafka_sql.name, count(door) from kafka_sql_test inner join mysql_test on kafka_sql_test.name == mysql_test.name group by kafka_sql_test.name ## It will be regarded as Batch Query, the kafka_sql_test will be parsed to KafkaRelation and mysql_test will be parsed as JDBCRelation. *As for Flink, it uses the StreamExecutionEnvironment and StreamTableEnvironment to achieve the above goals.* Thus, 'stream' keyword is no need for Flink. was (Author: jackey lee): [~kabhwan] The DDL in SQLStreaming has nothing change with Batch SQL. Adding 'stream' keyword has two purposes: # *Mark the entire sql query as a stream query and generate the SQLStreaming plan tree.* # *Mark the table type as UnResolvedStreamRelation.* Parse the table as StreamingRelation or other Relation, especially in the stream join batch queries, such as kafka join mysql. A little example to show importances of 'stream': read stream from kafka stream table, and join mysql to count user message # *with 'stream'* ## select stream name, count(*) from kafka_sql_test inner join mysql_test on kafka_sql_test.name == mysql_test.name ## It will be regarded as Streaming Query, the kafka_sql_test will be parsed as StreamingRelation and _mysql_test will be parsed as JDBCRelation, not Streaming Relation._ # *without 'stream'* ## select name, count(*) from kafka_sql_test inner join mysql_test on kafka_sql_test.name == mysql_test.name ## It will be regarded as Batch Query, the kafka_sql_test will be parsed to KafkaRelation and mysql_test will be parsed as JDBCRelation. *As for Flink, it uses the StreamExecutionEnvironment and StreamTableEnvironment to achieve the above goals.* Thus, 'stream' keyword is no need for Flink. > SPIP: Support SQLStreaming in Spark > ----------------------------------- > > Key: SPARK-24630 > URL: https://issues.apache.org/jira/browse/SPARK-24630 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 2.2.0, 2.2.1 > Reporter: Jackey Lee > Priority: Minor > Labels: SQLStreaming > Attachments: SQLStreaming SPIP.pdf > > > At present, KafkaSQL, Flink SQL(which is actually based on Calcite), > SQLStream, StormSQL all provide a stream type SQL interface, with which users > with little knowledge about streaming, can easily develop a flow system > processing model. In Spark, we can also support SQL API based on > StructStreamig. > To support for SQL Streaming, there are two key points: > 1, Analysis should be able to parse streaming type SQL. > 2, Analyzer should be able to map metadata information to the corresponding > Relation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org