[jira] [Comment Edited] (SPARK-24630) SPIP: Support SQLStreaming in Spark

Jackey Lee (JIRA) Mon, 08 Oct 2018 01:38:00 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-24630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641516#comment-16641516
 ]


Jackey Lee edited comment on SPARK-24630 at 10/8/18 8:36 AM:
-------------------------------------------------------------

[~kabhwan]
 The DDL in SQLStreaming has nothing change with Batch SQL.

Adding 'stream' keyword has two purposes:
 # *Mark the entire sql query as a stream query and generate the SQLStreaming 
plan tree.*
 # *Mark the table type as UnResolvedStreamRelation.* Parse the table as 
StreamingRelation or other Relation, especially in the stream join batch 
queries, such as kafka join mysql.

 

A little example to show importances of 'stream': read stream from kafka stream 
table, and join mysql to count user message
 # *with 'stream'* 
 ## select stream kafka_sql_test.name, count(door)  from kafka_sql_test inner 
join mysql_test on kafka_sql_test.name == mysql_test.name group by 
kafka_sql_test.name
 ## It will be regarded as Streaming Query, the kafka_sql_test will be parsed 
as StreamingRelation and _mysql_test will be parsed as JDBCRelation, not 
Streaming Relation._
 # *without 'stream'*
 ## select kafka_sql.name, count(door) from kafka_sql_test inner join 
mysql_test on kafka_sql_test.name == mysql_test.name group by 
kafka_sql_test.name
 ## It will be regarded as Batch Query, the kafka_sql_test will be parsed to 
KafkaRelation and mysql_test will be parsed as JDBCRelation.

 

*As for Flink, it uses the StreamExecutionEnvironment and 
StreamTableEnvironment to achieve the above goals.* Thus, 'stream' keyword is 
no need for Flink.


was (Author: jackey lee):
[~kabhwan]
 The DDL in SQLStreaming has nothing change with Batch SQL.

Adding 'stream' keyword has two purposes:
 # *Mark the entire sql query as a stream query and generate the SQLStreaming 
plan tree.*
 # *Mark the table type as UnResolvedStreamRelation.* Parse the table as 
StreamingRelation or other Relation, especially in the stream join batch 
queries, such as kafka join mysql.

 

A little example to show importances of 'stream': read stream from kafka stream 
table, and join mysql to count user message
 # *with 'stream'* 
 ## select stream name, count(*) from kafka_sql_test inner join mysql_test on 
kafka_sql_test.name == mysql_test.name
 ## It will be regarded as Streaming Query, the kafka_sql_test will be parsed 
as StreamingRelation and _mysql_test will be parsed as JDBCRelation, not 
Streaming Relation._
 # *without 'stream'*
 ## select name, count(*) from kafka_sql_test inner join mysql_test on 
kafka_sql_test.name == mysql_test.name
 ## It will be regarded as Batch Query, the kafka_sql_test will be parsed to 
KafkaRelation and mysql_test will be parsed as JDBCRelation.

 

*As for Flink, it uses the StreamExecutionEnvironment and 
StreamTableEnvironment to achieve the above goals.* Thus, 'stream' keyword is 
no need for Flink.

> SPIP: Support SQLStreaming in Spark
> -----------------------------------
>
>                 Key: SPARK-24630
>                 URL: https://issues.apache.org/jira/browse/SPARK-24630
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.2.0, 2.2.1
>            Reporter: Jackey Lee
>            Priority: Minor
>              Labels: SQLStreaming
>         Attachments: SQLStreaming SPIP.pdf
>
>
> At present, KafkaSQL, Flink SQL(which is actually based on Calcite), 
> SQLStream, StormSQL all provide a stream type SQL interface, with which users 
> with little knowledge about streaming,  can easily develop a flow system 
> processing model. In Spark, we can also support SQL API based on 
> StructStreamig.
> To support for SQL Streaming, there are two key points: 
> 1, Analysis should be able to parse streaming type SQL. 
> 2, Analyzer should be able to map metadata information to the corresponding 
> Relation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-24630) SPIP: Support SQLStreaming in Spark

Reply via email to