[ 
https://issues.apache.org/jira/browse/FLINK-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994172#comment-15994172
 ] 

sunjincheng edited comment on FLINK-6428 at 5/3/17 2:22 AM:
------------------------------------------------------------

Hi [~rtudoran], Thanks for pay attention to this JIRA.

In standard database there are two situations can using `DISTINCT` keyword. 
*  in `SELECT Clause`, e.g.: `SELECT DISTINCT name FROM table` 
*  in `AGG Clause`, e.g.: `COUNT([ALL|DISTINCT] 
expression)`,`SUM([ALL|DISTINCT] expression)`, etc. 

First up, [FLINK-6249 | https://issues.apache.org/jira/browse/FLINK-6249] talk 
about  `AGG Clause`. And in this JIRA. talk about `SELECT Clause`.

Next up, we talk about growing elements, the limitations tend to be back-end 
storage(flink state). In theory, external storage is infinitely large (user can 
control and expect), this point of view, the infinite STREAM of the DISTINCT 
can be supported.In addition, external storage, for example: RocksDB, the user 
can set the TTL according to the actual amount of business data to ensure that 
external storage is working properly.

So, IMO. we can support `DISTINCT` feature in `SELECT Clause`, And reminds the 
user to pay attention to the control of external storage. What do you think?

Thanks,
SunJincheng


was (Author: sunjincheng121):
Hi [~rtudoran], Thanks for pay attention to this JIRA.

In standard database there are two situations can using `DISTINCT` keyword. 
*  in `SELECT Clause`, e.g.: `SELECT DISTINCT name FROM table` 
*  in `AGG Clause`, e.g.: `COUNT([ALL|DISTINCT] 
expression)`,`SUM([ALL|DISTINCT] expression)`, etc. 

First up, [FLINK-6249 | https://issues.apache.org/jira/browse/FLINK-6249] talk 
about  `AGG Clause`. And in this JIRA. talk about `SELECT Clause`.

Next up, we talk about growing elements, the limitations tend to be back-end 
storage(flink state). In theory, external storage is infinitely large (user can 
control and expect), this point of view, the infinite STREAM of the DISTINCT 
can be supported.In addition, external storage, for example: RocksDB, the user 
can set the TTL according to the actual amount of business data to ensure that 
external storage is working properly.

So, IMO. we can support `DISTINCT` feature, And reminds the user to pay 
attention to the control of external storage. What do you think?

Thanks,
SunJincheng

> Add support DISTINCT in dataStream SQL
> --------------------------------------
>
>                 Key: FLINK-6428
>                 URL: https://issues.apache.org/jira/browse/FLINK-6428
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: sunjincheng
>            Assignee: sunjincheng
>
> Add support DISTINCT in dataStream SQL as follow:
> DATA:
> {code}
> (name, age)
> (kevin, 28),
> (sunny, 6),
> (jack, 6)
> {code}
> SQL:
> {code}
> SELECT DISTINCT age FROM MyTable"
> {code}
> RESULTS:
> {code}
> 28, 6
> {code}
> [~fhueske] do we need this feature?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to