[jira] [Comment Edited] (FLINK-6966) Add maxParallelism and UIDs to all operators generated by the Table API / SQL
[ https://issues.apache.org/jira/browse/FLINK-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633442#comment-16633442 ] Fan weiwen edited comment on FLINK-6966 at 9/30/18 5:04 PM: this is my test sql {code:java} //代码占位符 i use redis sink but can test in kafka or other sink Asql select reqIp as factorContenta, count(*) as eCount, 60 * 60 as expire from kafkasource where uri in not null group by hop( rowtime, interval '2' second, interval '60' minute ), reqIp having count(*) >= 1 Bsql select uid as factorContentb, count(*) as eCount, 60 * 60 as expire from kafkasource where uri is not null group by hop( rowtime, interval '2' second, interval '60' minute ), uid having count(*) >= 1 {code} this is my test data {code:java} //代码占位符 { "code" : "200", "reqIp" : "656.19.173.34", "t" : 1537950912546, "uid" : 6630519, "uri" : "/test" } {code} first the job only Asql , start A sql redis has a key 656.19.173.34 then stop Asql and savepoint del redis key 656.19.173.34 now start Bsql from savepoint you can find it key 656.19.173.34 is create redis has Asql create key and Bsql create key this is wrong, Bsql fetch from Asql state was (Author: fanweiwen): this is my test sql {code:java} //代码占位符 i use redis sink but can test in kafka or other sink Asql select reqIp as factorContenta, count(*) as eCount, 60 * 60 as expire from kafkasource where uri in not null group by hop( rowtime, interval '2' second, interval '60' minute ), reqIp having count(*) >= 1 Bsql select uid as factorContentb, count(*) as eCount, 60 * 60 as expire from kafkasource where uri is not null group by hop( rowtime, interval '2' second, interval '60' minute ), uid having count(*) >= 1 {code} this is my test data {code:java} //代码占位符 { "code" : "200", "reqIp" : "656.19.173.34", "t" : 1537950912546, "uid" : 6630519, "uri" : "/test" } {code} first the job only Asql , start A sql redis has a key 656.19.173.34 then stop Asql and savepoint del redis key 656.19.173.34 now start Bsql from savepoint you can find it key 656.19.173.34 is create redis has Asql create key and Bsql create key this is wrong > Add maxParallelism and UIDs to all operators generated by the Table API / SQL > - > > Key: FLINK-6966 > URL: https://issues.apache.org/jira/browse/FLINK-6966 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Affects Versions: 1.4.0 >Reporter: Fabian Hueske >Priority: Major > > At the moment, the Table API does not assign UIDs and the max parallelism to > operators (except for operators with parallelism 1). > We should do that to avoid problems when rescaling or restarting jobs from > savepoints. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-6966) Add maxParallelism and UIDs to all operators generated by the Table API / SQL
[ https://issues.apache.org/jira/browse/FLINK-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633442#comment-16633442 ] Fan weiwen edited comment on FLINK-6966 at 9/30/18 5:02 PM: this is my test sql {code:java} //代码占位符 i use redis sink but can test in kafka or other sink Asql select reqIp as factorContenta, count(*) as eCount, 60 * 60 as expire from kafkasource where uri in not null group by hop( rowtime, interval '2' second, interval '60' minute ), reqIp having count(*) >= 1 Bsql select uid as factorContentb, count(*) as eCount, 60 * 60 as expire from kafkasource where uri is not null group by hop( rowtime, interval '2' second, interval '60' minute ), uid having count(*) >= 1 {code} this is my test data {code:java} //代码占位符 { "code" : "200", "reqIp" : "656.19.173.34", "t" : 1537950912546, "uid" : 6630519, "uri" : "/test" } {code} first the job only Asql , start A sql redis has a key 656.19.173.34 then stop Asql and savepoint del redis key 656.19.173.34 now start Bsql from savepoint you can find it key 656.19.173.34 is create redis has Asql create key and Bsql create key this is wrong was (Author: fanweiwen): this is my test sql {code:java} //代码占位符 i use redis sink but can test in kafka or other sink Asql select reqIp as factorContenta, count(*) as eCount, 60 * 60 as expire from xp_sso_biz_source where uri in not null group by hop( rowtime, interval '2' second, interval '60' minute ), reqIp having count(*) >= 1 Bsql select uid as factorContentb, count(*) as eCount, 60 * 60 as expire from xp_sso_biz_source where uri is not null group by hop( rowtime, interval '2' second, interval '60' minute ), uid having count(*) >= 1 {code} this is my test data {code:java} //代码占位符 { "code" : "200", "reqIp" : "656.19.173.34", "t" : 1537950912546, "uid" : 6630519, "uri" : "/test" } {code} first the job only Asql , start A sql redis has a key 656.19.173.34 then stop Asql and savepoint del redis key 656.19.173.34 now start Bsql from savepoint you can find it key 656.19.173.34 is create redis has Asql create key and Bsql create key this is wrong > Add maxParallelism and UIDs to all operators generated by the Table API / SQL > - > > Key: FLINK-6966 > URL: https://issues.apache.org/jira/browse/FLINK-6966 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Affects Versions: 1.4.0 >Reporter: Fabian Hueske >Priority: Major > > At the moment, the Table API does not assign UIDs and the max parallelism to > operators (except for operators with parallelism 1). > We should do that to avoid problems when rescaling or restarting jobs from > savepoints. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-6966) Add maxParallelism and UIDs to all operators generated by the Table API / SQL
[ https://issues.apache.org/jira/browse/FLINK-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059493#comment-16059493 ] sunjincheng edited comment on FLINK-6966 at 6/22/17 2:53 PM: - Hi [~fhueske] That's true. I got it. In our side we already meet that problem. we can not increase the parallelism of an operator if we do not set {{maxParallelism}}. * About {{maxParallelism}}, it is not easy to set every operator's {{maxParallelism}} by SQL/TableAPI. I think we can set global {{maxParallelism}} by {{StreamExecutionEnvironment}}. Of course we can define a default value for user or give a hint message if the user is not set. * About UID, for the first version I think we can just deal with the operators which are generated by {{DataStreamRel#translateToPlan}}. Please let me know what you think? Hi [~jark] I think {{QueryConfig}} is difficult to set set every operator's {{maxParallelism}}, If we want a global value, the {{StreamExecutionEnvironment}} is enough. What do you think? Feel free to correct me If there are any incorrect describe. [~fhueske] [~jark] was (Author: sunjincheng121): Hi [~fhueske] That's true. I got it. In our side we already meet that problem. we can not increase the parallelism of an operator if we do not set {{maxParallelism}}. * About {{maxParallelism}}, it is not easy to set every operator's {{maxParallelism}} by SQL/TableAPI. I think we can set global {{maxParallelism}} by {{StreamExecutionEnvironment}}. Of course we can define a default value for user or give a hint message if the user is not set. * About UID, for the first version I think we can just deal with the operators which are generated by {{DataStreamRel#translateToPlan}}. Please let me know what you think? Hi [~jark] I think {{QueryConfig }} is difficult to set set every operator's {{maxParallelism}}, If we want a global value, the {{StreamExecutionEnvironment}} is enough. What do you think? Feel free to correct me If there are any incorrect describe. [~fhueske] [~jark] > Add maxParallelism and UIDs to all operators generated by the Table API / SQL > - > > Key: FLINK-6966 > URL: https://issues.apache.org/jira/browse/FLINK-6966 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL >Affects Versions: 1.4.0 >Reporter: Fabian Hueske > > At the moment, the Table API does not assign UIDs and the max parallelism to > operators (except for operators with parallelism 1). > We should do that to avoid problems when rescaling or restarting jobs from > savepoints. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (FLINK-6966) Add maxParallelism and UIDs to all operators generated by the Table API
[ https://issues.apache.org/jira/browse/FLINK-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057656#comment-16057656 ] sunjincheng edited comment on FLINK-6966 at 6/21/17 3:19 PM: - Hi, [~fhueske] do you meant that we need call {{uid/setUidHash}} when we translate to DataStream? And we provided unique UID for per transformation and job.If so, we need to consider all the Operators which are in the {{table.scala}}. right? was (Author: sunjincheng121): Hi, [~fhueske] do you meant that we need call {{uid/setUidHash}} when we translate to DataStream? And we provided unique UID for per transformation and job. right? > Add maxParallelism and UIDs to all operators generated by the Table API > --- > > Key: FLINK-6966 > URL: https://issues.apache.org/jira/browse/FLINK-6966 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL >Affects Versions: 1.4.0 >Reporter: Fabian Hueske > > At the moment, the Table API does not assign UIDs and the max parallelism to > operators (except for operators with parallelism 1). > We should do that to avoid problems when rescaling or restarting jobs from > savepoints. -- This message was sent by Atlassian JIRA (v6.4.14#64029)