[jira] [Commented] (FLINK-17173) Supports query hint to config "IdleStateRetentionTime" per query in SQL

2020-04-22 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17089412#comment-17089412
 ] 

Danny Chen commented on FLINK-17173:


Hi, [~libenchao] [~qzhzm173227], per operator scope is definitely in the 
ability of query hints.

> Supports query hint to config "IdleStateRetentionTime" per query in SQL
> ---
>
> Key: FLINK-17173
> URL: https://issues.apache.org/jira/browse/FLINK-17173
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / API
>Affects Versions: 1.11.0
>Reporter: Danny Chen
>Priority: Major
>
> The motivation why we need this (copy from user mailing list [~qzhzm173227])
> In some of the use cases our users have, they have a couple of complex join 
> queries where the key domains key evolving - we definitely want some sort of 
> state retention for those queries; but there are other where the key domain 
> doesn't evolve overtime, but there isn't really a guarantee on what's the 
> maximum gap between 2 records of the same key to appear in the stream, we 
> don't want to accidentally invalidate the state for those keys in these 
> streams.
> Because of queries with different requirements can both exist in the 
> pipeline, I think we have to config `IDLE_STATE_RETENTION_TIME` per operator.
> Just wondering, has similar requirement not come up much for SQL users 
> before? (being able to set table / query configuration inside SQL queries)
> We are also a little bit concerned because right now since 
> 'toRetractStream(Table, Class, QueryConfig)' is deprecated, relying on the 
> fact that TableConfig is read during toDataStream feels like relying on an 
> implementation details that just happens to work, and there is no guarantee 
> that it will keep working in the future versions...
> Demo syntax:
> {code:sql}
> CREATE TABLE `/output` AS
> SELECT /*+ IDLE_STATE_RETENTION_TIME(minTime ='5m', maxTime ='11m') */ *
> FROM `/input1` a
> INNER JOIN `/input2` b
> ON a.column_name = b.column_name;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17173) Supports query hint to config "IdleStateRetentionTime" per query in SQL

2020-04-18 Thread Jiahui Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17086655#comment-17086655
 ] 

Jiahui Jiang commented on FLINK-17173:
--

[~danny0405] I'd love to contribute the feature if that's okay! :D 
 But there are some implementation details that I think worth discussing before 
I submit the PR.
 1. In order for the parser to extract the hints and have it available in 
TableImpl, we basically need to bring QueryConfig and have it as part of the 
Parser#parse's response. I know QueryConfig was [JUST 
removed|https://github.com/apache/flink/pull/11481/files] from the external 
api, but is it acceptable if I add it back as an internal api? Or is there a 
better approach?
 2. To have this 'QueryConfig' available in TableEnvironmentImpl, I can
 (1) either change the signature of Parser.parse() to return a ParserResponse, 
which contains the current List and an additional config
 (2) or add queryConfig as an additional field of PlannerQueryOperation. 
 I think option 2 is cleaner and will be a less involved change. Just want to 
make sure this won't cause unexpected impact!

Thank you :)

> Supports query hint to config "IdleStateRetentionTime" per query in SQL
> ---
>
> Key: FLINK-17173
> URL: https://issues.apache.org/jira/browse/FLINK-17173
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / API
>Affects Versions: 1.11.0
>Reporter: Danny Chen
>Priority: Major
>
> The motivation why we need this (copy from user mailing list [~qzhzm173227])
> In some of the use cases our users have, they have a couple of complex join 
> queries where the key domains key evolving - we definitely want some sort of 
> state retention for those queries; but there are other where the key domain 
> doesn't evolve overtime, but there isn't really a guarantee on what's the 
> maximum gap between 2 records of the same key to appear in the stream, we 
> don't want to accidentally invalidate the state for those keys in these 
> streams.
> Because of queries with different requirements can both exist in the 
> pipeline, I think we have to config `IDLE_STATE_RETENTION_TIME` per operator.
> Just wondering, has similar requirement not come up much for SQL users 
> before? (being able to set table / query configuration inside SQL queries)
> We are also a little bit concerned because right now since 
> 'toRetractStream(Table, Class, QueryConfig)' is deprecated, relying on the 
> fact that TableConfig is read during toDataStream feels like relying on an 
> implementation details that just happens to work, and there is no guarantee 
> that it will keep working in the future versions...
> Demo syntax:
> {code:sql}
> CREATE TABLE `/output` AS
> SELECT /*+ IDLE_STATE_RETENTION_TIME(minTime ='5m', maxTime ='11m') */ *
> FROM `/input1` a
> INNER JOIN `/input2` b
> ON a.column_name = b.column_name;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17173) Supports query hint to config "IdleStateRetentionTime" per query in SQL

2020-04-16 Thread Jiahui Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17084921#comment-17084921
 ] 

Jiahui Jiang commented on FLINK-17173:
--

Right now idleStateRetentionTime config is already on query level on table API 
level. It seems to be hard to support finer granularity and set it per operator 
there (since this is only a table API concept).
If they only want to set the retention time for some of the operators, 
splitting up the query into several subqueries doesn't sound bad for usability 
:D 

> Supports query hint to config "IdleStateRetentionTime" per query in SQL
> ---
>
> Key: FLINK-17173
> URL: https://issues.apache.org/jira/browse/FLINK-17173
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / API
>Affects Versions: 1.11.0
>Reporter: Danny Chen
>Priority: Major
>
> The motivation why we need this (copy from user mailing list [~qzhzm173227])
> In some of the use cases our users have, they have a couple of complex join 
> queries where the key domains key evolving - we definitely want some sort of 
> state retention for those queries; but there are other where the key domain 
> doesn't evolve overtime, but there isn't really a guarantee on what's the 
> maximum gap between 2 records of the same key to appear in the stream, we 
> don't want to accidentally invalidate the state for those keys in these 
> streams.
> Because of queries with different requirements can both exist in the 
> pipeline, I think we have to config `IDLE_STATE_RETENTION_TIME` per operator.
> Just wondering, has similar requirement not come up much for SQL users 
> before? (being able to set table / query configuration inside SQL queries)
> We are also a little bit concerned because right now since 
> 'toRetractStream(Table, Class, QueryConfig)' is deprecated, relying on the 
> fact that TableConfig is read during toDataStream feels like relying on an 
> implementation details that just happens to work, and there is no guarantee 
> that it will keep working in the future versions...
> Demo syntax:
> {code:sql}
> CREATE TABLE `/output` AS
> SELECT /*+ IDLE_STATE_RETENTION_TIME(minTime ='5m', maxTime ='11m') */ *
> FROM `/input1` a
> INNER JOIN `/input2` b
> ON a.column_name = b.column_name;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17173) Supports query hint to config "IdleStateRetentionTime" per query in SQL

2020-04-15 Thread Benchao Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17084509#comment-17084509
 ] 

Benchao Li commented on FLINK-17173:


Hi [~danny0405], it's a very useful feature, +1 for this.

I have some concerns about what's the granularity of this hint should work. If 
the hint applies on query, what if we have multiple operators in this query 
like "a Join b Join c"?

> Supports query hint to config "IdleStateRetentionTime" per query in SQL
> ---
>
> Key: FLINK-17173
> URL: https://issues.apache.org/jira/browse/FLINK-17173
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / API
>Affects Versions: 1.11.0
>Reporter: Danny Chen
>Priority: Major
>
> The motivation why we need this (copy from user mailing list [~qzhzm173227])
> In some of the use cases our users have, they have a couple of complex join 
> queries where the key domains key evolving - we definitely want some sort of 
> state retention for those queries; but there are other where the key domain 
> doesn't evolve overtime, but there isn't really a guarantee on what's the 
> maximum gap between 2 records of the same key to appear in the stream, we 
> don't want to accidentally invalidate the state for those keys in these 
> streams.
> Because of queries with different requirements can both exist in the 
> pipeline, I think we have to config `IDLE_STATE_RETENTION_TIME` per operator.
> Just wondering, has similar requirement not come up much for SQL users 
> before? (being able to set table / query configuration inside SQL queries)
> We are also a little bit concerned because right now since 
> 'toRetractStream(Table, Class, QueryConfig)' is deprecated, relying on the 
> fact that TableConfig is read during toDataStream feels like relying on an 
> implementation details that just happens to work, and there is no guarantee 
> that it will keep working in the future versions...
> Demo syntax:
> {code:sql}
> CREATE TABLE `/output` AS
> SELECT /*+ IDLE_STATE_RETENTION_TIME(minTime ='5m', maxTime ='11m') */ *
> FROM `/input1` a
> INNER JOIN `/input2` b
> ON a.column_name = b.column_name;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)