Re: Re: [DISCUSS] FLIP-169: DataStream API for Fine-Grained Resource Requirements
Thanks for the comments, Zhu! Yes, it is a known limitation for fine-grained resource management. We also have filed this issue in FLINK-20865 when we proposed FLIP-156. As a first step, I agree that we can mark batch jobs with PIPELINED edges as an invalid case for this feature. However, just throwing an exception, in that case, might confuse users who do not understand the concept of pipeline region. Maybe we can force all the edges in this scenario to BLOCKING in compiling stage and well document it. So that, common users will not be interrupted while the expert users can understand the cost of that usage and make their decision. WDYT? Best, Yangze Guo On Mon, Jun 21, 2021 at 2:24 PM Zhu Zhu wrote: > > Thanks for proposing this @Yangze Guo and sorry for joining the discussion so > late. > The proposal generally looks good to me. But I find one problem that batch > job with PIPELINED edges might hang if enabling fine-grained resources. see > "Resource Deadlocks could still happen in certain Cases" section in > https://cwiki.apache.org/confluence/display/FLINK/FLIP-119+Pipelined+Region+Scheduling > However, this problem may happen only in batch cases with PIPELINED edges, > because > 1. streaming jobs would always require all resource requirements to be > fulfilled at the same time. > 2. batch jobs without PIPELINED edges consist of multiple single vertex > regions and thus each slot can be individually used and returned > So maybe in the first step, let's mark batch jobs with PIPELINED edges as an > invalid case for fine-grained resources and throw exception for it in early > compiling stage? > > Thanks, > Zhu > > Yangze Guo 于2021年6月15日周二 下午4:57写道: >> >> Thanks for the supplement, Arvid and Yun. I've annotated these two >> points in the FLIP. >> The vote is now started in [1]. >> >> [1] >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-169-DataStream-API-for-Fine-Grained-Resource-Requirements-td51381.html >> >> Best, >> Yangze Guo >> >> On Fri, Jun 11, 2021 at 2:50 PM Yun Gao wrote: >> > >> > Hi, >> > >> > Very thanks @Yangze for bringing up this discuss. Overall +1 for >> > exposing the fine-grained resource requirements in the DataStream API. >> > >> > One similar issue as Arvid has pointed out is that users may also creating >> > different SlotSharingGroup objects, with different names but with different >> > resources. We might need to do some check internally. But We could also >> > leave that during the development of the actual PR. >> > >> > Best, >> > Yun >> > >> > >> > >> > --Original Mail -- >> > Sender:Arvid Heise >> > Send Date:Thu Jun 10 15:33:37 2021 >> > Recipients:dev >> > Subject:Re: [DISCUSS] FLIP-169: DataStream API for Fine-Grained Resource >> > Requirements >> > Hi Yangze, >> > >> > >> > >> > Thanks for incorporating the ideas and sorry for missing the builder part. >> > >> > My main idea is that SlotSharingGroup is immutable, such that the user >> > >> > doesn't do: >> > >> > >> > >> > ssg = new SlotSharingGroup(); >> > >> > ssg.setCpus(2); >> > >> > operator1.slotSharingGroup(ssg); >> > >> > ssg.setCpus(4); >> > >> > operator2.slotSharingGroup(ssg); >> > >> > >> > >> > and wonders why both operators have the same CPU spec. But the details can >> > >> > be fleshed out in the actual PR. >> > >> > >> > >> > On Thu, Jun 10, 2021 at 5:13 AM Yangze Guo wrote: >> > >> > >> > >> > > Thanks all for the discussion. I've updated the FLIP accordingly, the >> > >> > > key changes are: >> > >> > > - Introduce SlotSharingGroup instead of ResourceSpec which contains >> > >> > > the resource spec of slot sharing group >> > >> > > - Introduce two interfaces for specifying the SlotSharingGroup: >> > >> > > #slotSharingGroup(SlotSharingGroup) and >> > >> > > StreamExecutionEnvironment#registerSlotSharingGroup(SlotSharingGroup). >> > >> > > >> > >> > > If there is no more feedback, I'd start a vote next week. >> > >> > > >> > >> > > Best, >> > >> > > Yangze Guo >> > >> > > >> > >> > > On Wed, Jun 9, 2021 at 10:46 AM Yangze Guo wrote: >> > >> > > > >> > >> > > > Thanks for the valuable suggestion, Arvid. >> > >> > > > >> > >> > > > 1) Yes, we can add a new SlotSharingGroup which includes the name and >> > >> > > > its resource. After that, we have two interfaces for configuring the >> > >> > > > slot sharing group of an operator: >> > >> > > > - #slotSharingGroup(String name) // the resource of it can be >> > >> > > > configured through StreamExecutionEnvironment#registerSlotSharingGroup >> > >> > > > - #slotSharingGroup(SlotSharingGroup ssg) // Directly configure the >> > >> > > resource >> > >> > > > And one interface to configure the resource of a SSG: >> > >> > > > - StreamExecutionEnvironment#registerSlotSharingGroup(SlotSharingGroup) >> > >> > > > We can also define the priority of the above two approaches, e.g. the >> > >> > > > resource registering in the StreamExecutionEnvironment will always be >> > >> > > > resp
Re: Re: [DISCUSS] FLIP-169: DataStream API for Fine-Grained Resource Requirements
Thanks for the quick response Yangze. The proposal sounds good to me. Thanks, Zhu Yangze Guo 于2021年6月21日周一 下午3:01写道: > Thanks for the comments, Zhu! > > Yes, it is a known limitation for fine-grained resource management. We > also have filed this issue in FLINK-20865 when we proposed FLIP-156. > > As a first step, I agree that we can mark batch jobs with PIPELINED > edges as an invalid case for this feature. However, just throwing an > exception, in that case, might confuse users who do not understand the > concept of pipeline region. Maybe we can force all the edges in this > scenario to BLOCKING in compiling stage and well document it. So that, > common users will not be interrupted while the expert users can > understand the cost of that usage and make their decision. WDYT? > > Best, > Yangze Guo > > On Mon, Jun 21, 2021 at 2:24 PM Zhu Zhu wrote: > > > > Thanks for proposing this @Yangze Guo and sorry for joining the > discussion so late. > > The proposal generally looks good to me. But I find one problem that > batch job with PIPELINED edges might hang if enabling fine-grained > resources. see "Resource Deadlocks could still happen in certain Cases" > section in > https://cwiki.apache.org/confluence/display/FLINK/FLIP-119+Pipelined+Region+Scheduling > > However, this problem may happen only in batch cases with PIPELINED > edges, because > > 1. streaming jobs would always require all resource requirements to be > fulfilled at the same time. > > 2. batch jobs without PIPELINED edges consist of multiple single vertex > regions and thus each slot can be individually used and returned > > So maybe in the first step, let's mark batch jobs with PIPELINED edges > as an invalid case for fine-grained resources and throw exception for it in > early compiling stage? > > > > Thanks, > > Zhu > > > > Yangze Guo 于2021年6月15日周二 下午4:57写道: > >> > >> Thanks for the supplement, Arvid and Yun. I've annotated these two > >> points in the FLIP. > >> The vote is now started in [1]. > >> > >> [1] > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-169-DataStream-API-for-Fine-Grained-Resource-Requirements-td51381.html > >> > >> Best, > >> Yangze Guo > >> > >> On Fri, Jun 11, 2021 at 2:50 PM Yun Gao > wrote: > >> > > >> > Hi, > >> > > >> > Very thanks @Yangze for bringing up this discuss. Overall +1 for > >> > exposing the fine-grained resource requirements in the DataStream API. > >> > > >> > One similar issue as Arvid has pointed out is that users may also > creating > >> > different SlotSharingGroup objects, with different names but with > different > >> > resources. We might need to do some check internally. But We could > also > >> > leave that during the development of the actual PR. > >> > > >> > Best, > >> > Yun > >> > > >> > > >> > > >> > --Original Mail -- > >> > Sender:Arvid Heise > >> > Send Date:Thu Jun 10 15:33:37 2021 > >> > Recipients:dev > >> > Subject:Re: [DISCUSS] FLIP-169: DataStream API for Fine-Grained > Resource Requirements > >> > Hi Yangze, > >> > > >> > > >> > > >> > Thanks for incorporating the ideas and sorry for missing the builder > part. > >> > > >> > My main idea is that SlotSharingGroup is immutable, such that the user > >> > > >> > doesn't do: > >> > > >> > > >> > > >> > ssg = new SlotSharingGroup(); > >> > > >> > ssg.setCpus(2); > >> > > >> > operator1.slotSharingGroup(ssg); > >> > > >> > ssg.setCpus(4); > >> > > >> > operator2.slotSharingGroup(ssg); > >> > > >> > > >> > > >> > and wonders why both operators have the same CPU spec. But the > details can > >> > > >> > be fleshed out in the actual PR. > >> > > >> > > >> > > >> > On Thu, Jun 10, 2021 at 5:13 AM Yangze Guo wrote: > >> > > >> > > >> > > >> > > Thanks all for the discussion. I've updated the FLIP accordingly, > the > >> > > >> > > key changes are: > >> > > >> > > - Introduce SlotSharingGroup instead of ResourceSpec which contains > >> > > >> > > the resource spec of slot sharing group > >> > > >> > > - Introduce two interfaces for specifying the SlotSharingGroup: > >> > > >> > > #slotSharingGroup(SlotSharingGroup) and > >> > > >> > > > StreamExecutionEnvironment#registerSlotSharingGroup(SlotSharingGroup). > >> > > >> > > > >> > > >> > > If there is no more feedback, I'd start a vote next week. > >> > > >> > > > >> > > >> > > Best, > >> > > >> > > Yangze Guo > >> > > >> > > > >> > > >> > > On Wed, Jun 9, 2021 at 10:46 AM Yangze Guo wrote: > >> > > >> > > > > >> > > >> > > > Thanks for the valuable suggestion, Arvid. > >> > > >> > > > > >> > > >> > > > 1) Yes, we can add a new SlotSharingGroup which includes the name > and > >> > > >> > > > its resource. After that, we have two interfaces for configuring > the > >> > > >> > > > slot sharing group of an operator: > >> > > >> > > > - #slotSharingGroup(String name) // the resource of it can be > >> > > >> > > > configured through > StreamExecutionEnvironment#registerSlotSharingGroup > >> > > >> > > > - #slotSharingGroup(Slo
[jira] [Created] (FLINK-23054) Correct upsert optimization by upsert keys
Jingsong Lee created FLINK-23054: Summary: Correct upsert optimization by upsert keys Key: FLINK-23054 URL: https://issues.apache.org/jira/browse/FLINK-23054 Project: Flink Issue Type: Bug Components: Table SQL / Runtime Reporter: Jingsong Lee Assignee: Jingsong Lee Fix For: 1.14.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Re: [DISCUSS] FLIP-169: DataStream API for Fine-Grained Resource Requirements
Thanks, I append it to the known limitations of this FLIP. Best, Yangze Guo On Mon, Jun 21, 2021 at 3:20 PM Zhu Zhu wrote: > > Thanks for the quick response Yangze. > The proposal sounds good to me. > > Thanks, > Zhu > > Yangze Guo 于2021年6月21日周一 下午3:01写道: >> >> Thanks for the comments, Zhu! >> >> Yes, it is a known limitation for fine-grained resource management. We >> also have filed this issue in FLINK-20865 when we proposed FLIP-156. >> >> As a first step, I agree that we can mark batch jobs with PIPELINED >> edges as an invalid case for this feature. However, just throwing an >> exception, in that case, might confuse users who do not understand the >> concept of pipeline region. Maybe we can force all the edges in this >> scenario to BLOCKING in compiling stage and well document it. So that, >> common users will not be interrupted while the expert users can >> understand the cost of that usage and make their decision. WDYT? >> >> Best, >> Yangze Guo >> >> On Mon, Jun 21, 2021 at 2:24 PM Zhu Zhu wrote: >> > >> > Thanks for proposing this @Yangze Guo and sorry for joining the discussion >> > so late. >> > The proposal generally looks good to me. But I find one problem that batch >> > job with PIPELINED edges might hang if enabling fine-grained resources. >> > see "Resource Deadlocks could still happen in certain Cases" section in >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-119+Pipelined+Region+Scheduling >> > However, this problem may happen only in batch cases with PIPELINED edges, >> > because >> > 1. streaming jobs would always require all resource requirements to be >> > fulfilled at the same time. >> > 2. batch jobs without PIPELINED edges consist of multiple single vertex >> > regions and thus each slot can be individually used and returned >> > So maybe in the first step, let's mark batch jobs with PIPELINED edges as >> > an invalid case for fine-grained resources and throw exception for it in >> > early compiling stage? >> > >> > Thanks, >> > Zhu >> > >> > Yangze Guo 于2021年6月15日周二 下午4:57写道: >> >> >> >> Thanks for the supplement, Arvid and Yun. I've annotated these two >> >> points in the FLIP. >> >> The vote is now started in [1]. >> >> >> >> [1] >> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-169-DataStream-API-for-Fine-Grained-Resource-Requirements-td51381.html >> >> >> >> Best, >> >> Yangze Guo >> >> >> >> On Fri, Jun 11, 2021 at 2:50 PM Yun Gao >> >> wrote: >> >> > >> >> > Hi, >> >> > >> >> > Very thanks @Yangze for bringing up this discuss. Overall +1 for >> >> > exposing the fine-grained resource requirements in the DataStream API. >> >> > >> >> > One similar issue as Arvid has pointed out is that users may also >> >> > creating >> >> > different SlotSharingGroup objects, with different names but with >> >> > different >> >> > resources. We might need to do some check internally. But We could also >> >> > leave that during the development of the actual PR. >> >> > >> >> > Best, >> >> > Yun >> >> > >> >> > >> >> > >> >> > --Original Mail -- >> >> > Sender:Arvid Heise >> >> > Send Date:Thu Jun 10 15:33:37 2021 >> >> > Recipients:dev >> >> > Subject:Re: [DISCUSS] FLIP-169: DataStream API for Fine-Grained >> >> > Resource Requirements >> >> > Hi Yangze, >> >> > >> >> > >> >> > >> >> > Thanks for incorporating the ideas and sorry for missing the builder >> >> > part. >> >> > >> >> > My main idea is that SlotSharingGroup is immutable, such that the user >> >> > >> >> > doesn't do: >> >> > >> >> > >> >> > >> >> > ssg = new SlotSharingGroup(); >> >> > >> >> > ssg.setCpus(2); >> >> > >> >> > operator1.slotSharingGroup(ssg); >> >> > >> >> > ssg.setCpus(4); >> >> > >> >> > operator2.slotSharingGroup(ssg); >> >> > >> >> > >> >> > >> >> > and wonders why both operators have the same CPU spec. But the details >> >> > can >> >> > >> >> > be fleshed out in the actual PR. >> >> > >> >> > >> >> > >> >> > On Thu, Jun 10, 2021 at 5:13 AM Yangze Guo wrote: >> >> > >> >> > >> >> > >> >> > > Thanks all for the discussion. I've updated the FLIP accordingly, the >> >> > >> >> > > key changes are: >> >> > >> >> > > - Introduce SlotSharingGroup instead of ResourceSpec which contains >> >> > >> >> > > the resource spec of slot sharing group >> >> > >> >> > > - Introduce two interfaces for specifying the SlotSharingGroup: >> >> > >> >> > > #slotSharingGroup(SlotSharingGroup) and >> >> > >> >> > > StreamExecutionEnvironment#registerSlotSharingGroup(SlotSharingGroup). >> >> > >> >> > > >> >> > >> >> > > If there is no more feedback, I'd start a vote next week. >> >> > >> >> > > >> >> > >> >> > > Best, >> >> > >> >> > > Yangze Guo >> >> > >> >> > > >> >> > >> >> > > On Wed, Jun 9, 2021 at 10:46 AM Yangze Guo wrote: >> >> > >> >> > > > >> >> > >> >> > > > Thanks for the valuable suggestion, Arvid. >> >> > >> >> > > > >> >> > >> >> > > > 1) Yes, we can add a new SlotSharingGroup which includes the name >> >> > > > an
退订
Re: [VOTE] FLIP-169: DataStream API for Fine-Grained Resource Requirements
According to the latest comment of Zhu Zhu[1], I append the potential resource deadlock in batch jobs as a known limitation to this FLIP. Thus, I'd extend the voting period for another 72h. [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-169-DataStream-API-for-Fine-Grained-Resource-Requirements-td51071.html Best, Yangze Guo On Tue, Jun 15, 2021 at 7:53 PM Xintong Song wrote: > > +1 (binding) > > > Thank you~ > > Xintong Song > > > > On Tue, Jun 15, 2021 at 6:21 PM Arvid Heise wrote: > > > LGTM +1 (binding) from my side. > > > > On Tue, Jun 15, 2021 at 11:00 AM Yangze Guo wrote: > > > > > Hi everyone, > > > > > > I'd like to start the vote of FLIP-169 [1]. This FLIP is discussed in > > > the thread[2]. > > > > > > The vote will be open for at least 72 hours. Unless there is an > > > objection, I will try to close it by Jun. 18, 2021 if we have received > > > sufficient votes. > > > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-169+DataStream+API+for+Fine-Grained+Resource+Requirements > > > [2] > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-169-DataStream-API-for-Fine-Grained-Resource-Requirements-td51071.html > > > > > > Best, > > > Yangze Guo > > > > >
[jira] [Created] (FLINK-23055) Add document for Window TVF offset
JING ZHANG created FLINK-23055: -- Summary: Add document for Window TVF offset Key: FLINK-23055 URL: https://issues.apache.org/jira/browse/FLINK-23055 Project: Flink Issue Type: Sub-task Components: Documentation Affects Versions: 1.14.0 Reporter: JING ZHANG Fix For: 1.14.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23056) flink docs sql builtin functions out of date
Junhan Yang created FLINK-23056: --- Summary: flink docs sql builtin functions out of date Key: FLINK-23056 URL: https://issues.apache.org/jira/browse/FLINK-23056 Project: Flink Issue Type: Bug Components: Documentation Reporter: Junhan Yang Attachments: image-2021-06-21-16-44-27-117.png !image-2021-06-21-16-44-27-117.png! Functions `LAST_VALUE` and `FIRST_VALUE` should support multiple expressions as parameters -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23057) flink-console.sh doesn't do variable expansion for FLINK_ENV_JAVA_OPTS like flink-daemon.sh
LIU Xiao created FLINK-23057: Summary: flink-console.sh doesn't do variable expansion for FLINK_ENV_JAVA_OPTS like flink-daemon.sh Key: FLINK-23057 URL: https://issues.apache.org/jira/browse/FLINK-23057 Project: Flink Issue Type: Bug Components: Client / Job Submission Affects Versions: 1.12.4, 1.13.1 Reporter: LIU Xiao In flink-deamon.sh: {code:java} ... # Evaluate user options for local variable expansion FLINK_ENV_JAVA_OPTS=$(eval echo ${FLINK_ENV_JAVA_OPTS}) echo "Starting $DAEMON daemon on host $HOSTNAME." "$JAVA_RUN" $JVM_ARGS ${FLINK_ENV_JAVA_OPTS} "${log_setting[@]}" -classpath "`manglePathList "$FLINK_TM_CLASSPATH:$INTERNAL_HADOOP_CLASSPATHS"`" ${CLASS_TO_RUN} "${ARGS[@]}" > "$out" 200<&- 2>&1 < /dev/null & ... {code} There is a "$(eval echo ...)" line, so variables like ${FLINK_LOG_PREFIX} in FLINK_ENV_JAVA_OPTS can be expanded, as described in [https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/debugging/application_profiling/] but flink-console.sh doesn't have the line, and as kubernetes-jobmanager.sh and kubernetes-taskmanager.sh all depend on flink-console.sh, so in native kubernetes application mode, variable expansion of FLINK_ENV_JAVA_OPTS is not working. Add that line to flink-console.sh sovles the problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23058) flink-elasticsearch-6 not work
yandufeng created FLINK-23058: - Summary: flink-elasticsearch-6 not work Key: FLINK-23058 URL: https://issues.apache.org/jira/browse/FLINK-23058 Project: Flink Issue Type: Bug Components: Connectors / ElasticSearch Affects Versions: 1.13.1 Reporter: yandufeng i have two questions. 1. when i add elasticserach host and port,i random write host, but not report error. for example List httpHosts = new ArrayList<>(); httpHosts.add(new HttpHost("sdfsdfsf", 9200, "http")); 2. when i write conrrect elasticsearch host and port, but no response, also not create index in elasticsearch -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Change in accumutors semantics with jobClient
Thanks for bringing this to the dev ML Etienne. Could you maybe update the release notes for Flink 1.13 [1] to include this change? That way it might be a bit more prominent. I think the change needs to go into the release-1.13 and master branch. [1] https://github.com/apache/flink/blob/master/docs/content/release-notes/flink-1.13.md Cheers, Till On Fri, Jun 18, 2021 at 2:45 PM Etienne Chauchot wrote: > Hi all, > > I did a fix some time ago regarding accumulators: > the/JobClient.getAccumulators()/ was infinitely blocking in local > environment for a streaming job (1). The change (2) consisted of giving > the current accumulators value for the running job. And when fixing this > in the PR, it appeared that I had to change the accumulators semantics > with /JobClient/ and I just realized that I forgot to bring this back to > the ML: > > Previously /JobClient/ assumed that getAccumulator() was called on a > bounded pipeline and that the user wanted to acquire the *final > accumulator values* after the job is finished. > > But now it returns the *current value of accumulators* immediately to be > compatible with unbounded pipelines. > > If it is run on a bounded pipeline, then to get the final accumulator > values after the job is finished, one needs to call > > /getJobExecutionResult().thenApply(JobExecutionResult::getAllAccumulatorResults)/ > > (1): https://issues.apache.org/jira/browse/FLINK-18685 > > (2): https://github.com/apache/flink/pull/14558# > > > Cheers, > > Etienne > >
[jira] [Created] (FLINK-23059) Update playgrounds for Flink 1.13
David Anderson created FLINK-23059: -- Summary: Update playgrounds for Flink 1.13 Key: FLINK-23059 URL: https://issues.apache.org/jira/browse/FLINK-23059 Project: Flink Issue Type: Improvement Components: Documentation / Training / Exercises Affects Versions: 1.13.0 Reporter: David Anderson Assignee: David Anderson The various playgrounds in apache/flink-playgrounds all need an update for the 1.13 release. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Releasing Flink 1.11.4, 1.12.5, 1.13.2
Thanks for starting this discussion Dawid. We have collected a couple of fixes for the different releases: #fixes: 1.11.4: 72 1.12.5: 35 1.13.2: 49 which in my opinion warrants new bugfix releases. Note that we intend to do another 1.11 release because of the seriousness of the FLINK-22815 [1] which can lead to silent data loss. I think that FLINK-23025 [2] might be nice to include but I wouldn't block the release on it. @pnowojski do you have an ETA for FLINK-23011 [3]? I do agree that this would be nice to fix but on the other hand, this issue is in Flink since the introduction of FLIP-27. Moreover, it does not affect Flink 1.11.x. [1] https://issues.apache.org/jira/browse/FLINK-22815 [2] https://issues.apache.org/jira/browse/FLINK-23025 [3] https://issues.apache.org/jira/browse/FLINK-23011 Cheers, Till On Fri, Jun 18, 2021 at 5:15 PM Piotr Nowojski wrote: > Hi, > > Thanks for bringing this up. I think before releasing 1.12.x/1.13.x/1.14.x, > it would be good to decide what to do with FLINK-23011 [1] and if there is > a relatively easy fix, I would wait for it before releasing. > > Best, > Piotrek > > [1] with https://issues.apache.org/jira/browse/FLINK-23011 > > pt., 18 cze 2021 o 16:35 Konstantin Knauf napisał(a): > > > Hi Dawid, > > > > Thank you for starting the discussion. I'd like to add > > https://issues.apache.org/jira/browse/FLINK-23025 to the list for Flink > > 1.13.2. > > > > Cheers, > > > > Konstantin > > > > On Fri, Jun 18, 2021 at 3:26 PM Dawid Wysakowicz > > > wrote: > > > > > Hi devs, > > > > > > Quite recently we pushed, in our opinion, quite an important fix[1] for > > > unaligned checkpoints which disables UC for broadcast partitioning. > > > Without the fix there might be some broadcast state corruption. > > > Therefore we think it would be beneficial to release it soonish. What > do > > > you think? Do you have other issues in mind you'd like to have included > > > in these versions. > > > > > > Would someone be willing to volunteer to help with the releases as a > > > release manager? I guess there is a couple of spots to fill in here ;) > > > > > > Best, > > > > > > Dawid > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-22815 > > > > > > > > > > > > > -- > > > > Konstantin Knauf > > > > https://twitter.com/snntrable > > > > https://github.com/knaufk > > >
Re: Re: [DISCUSS] FLIP-169: DataStream API for Fine-Grained Resource Requirements
I would be more in favor of what Zhu Zhu proposed to throw an exception with a meaningful and understandable explanation that also includes how to resolve this problem. I do understand the reasoning behind automatically switching the edge types in order to make things easier to use but a) this can also be confusing if the user does not expect this to happen and b) it can add some complexity which makes other feature development harder in the future because users might rely on it. An example of such a case I stumbled upon rather recently is that we adjust the maximum parallelism wrt the given savepoint if it has not been explicitly configured. On the paper this sounds like a good usability improvement, however, for the AdaptiveScheduler it posed a quite annoying complexity. If instead, we said that we fail the job submission if the max parallelism does not equal the max parallelism of the savepoint, it would have been a lot easier. Cheers, Till On Mon, Jun 21, 2021 at 9:36 AM Yangze Guo wrote: > Thanks, I append it to the known limitations of this FLIP. > > Best, > Yangze Guo > > On Mon, Jun 21, 2021 at 3:20 PM Zhu Zhu wrote: > > > > Thanks for the quick response Yangze. > > The proposal sounds good to me. > > > > Thanks, > > Zhu > > > > Yangze Guo 于2021年6月21日周一 下午3:01写道: > >> > >> Thanks for the comments, Zhu! > >> > >> Yes, it is a known limitation for fine-grained resource management. We > >> also have filed this issue in FLINK-20865 when we proposed FLIP-156. > >> > >> As a first step, I agree that we can mark batch jobs with PIPELINED > >> edges as an invalid case for this feature. However, just throwing an > >> exception, in that case, might confuse users who do not understand the > >> concept of pipeline region. Maybe we can force all the edges in this > >> scenario to BLOCKING in compiling stage and well document it. So that, > >> common users will not be interrupted while the expert users can > >> understand the cost of that usage and make their decision. WDYT? > >> > >> Best, > >> Yangze Guo > >> > >> On Mon, Jun 21, 2021 at 2:24 PM Zhu Zhu wrote: > >> > > >> > Thanks for proposing this @Yangze Guo and sorry for joining the > discussion so late. > >> > The proposal generally looks good to me. But I find one problem that > batch job with PIPELINED edges might hang if enabling fine-grained > resources. see "Resource Deadlocks could still happen in certain Cases" > section in > https://cwiki.apache.org/confluence/display/FLINK/FLIP-119+Pipelined+Region+Scheduling > >> > However, this problem may happen only in batch cases with PIPELINED > edges, because > >> > 1. streaming jobs would always require all resource requirements to > be fulfilled at the same time. > >> > 2. batch jobs without PIPELINED edges consist of multiple single > vertex regions and thus each slot can be individually used and returned > >> > So maybe in the first step, let's mark batch jobs with PIPELINED > edges as an invalid case for fine-grained resources and throw exception for > it in early compiling stage? > >> > > >> > Thanks, > >> > Zhu > >> > > >> > Yangze Guo 于2021年6月15日周二 下午4:57写道: > >> >> > >> >> Thanks for the supplement, Arvid and Yun. I've annotated these two > >> >> points in the FLIP. > >> >> The vote is now started in [1]. > >> >> > >> >> [1] > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-169-DataStream-API-for-Fine-Grained-Resource-Requirements-td51381.html > >> >> > >> >> Best, > >> >> Yangze Guo > >> >> > >> >> On Fri, Jun 11, 2021 at 2:50 PM Yun Gao > wrote: > >> >> > > >> >> > Hi, > >> >> > > >> >> > Very thanks @Yangze for bringing up this discuss. Overall +1 for > >> >> > exposing the fine-grained resource requirements in the DataStream > API. > >> >> > > >> >> > One similar issue as Arvid has pointed out is that users may also > creating > >> >> > different SlotSharingGroup objects, with different names but with > different > >> >> > resources. We might need to do some check internally. But We > could also > >> >> > leave that during the development of the actual PR. > >> >> > > >> >> > Best, > >> >> > Yun > >> >> > > >> >> > > >> >> > > >> >> > --Original Mail -- > >> >> > Sender:Arvid Heise > >> >> > Send Date:Thu Jun 10 15:33:37 2021 > >> >> > Recipients:dev > >> >> > Subject:Re: [DISCUSS] FLIP-169: DataStream API for Fine-Grained > Resource Requirements > >> >> > Hi Yangze, > >> >> > > >> >> > > >> >> > > >> >> > Thanks for incorporating the ideas and sorry for missing the > builder part. > >> >> > > >> >> > My main idea is that SlotSharingGroup is immutable, such that the > user > >> >> > > >> >> > doesn't do: > >> >> > > >> >> > > >> >> > > >> >> > ssg = new SlotSharingGroup(); > >> >> > > >> >> > ssg.setCpus(2); > >> >> > > >> >> > operator1.slotSharingGroup(ssg); > >> >> > > >> >> > ssg.setCpus(4); > >> >> > > >> >> > operator2.slotSharingGroup(ssg); > >> >> > > >> >> > > >> >> > > >> >> > and wonders why both operators have th
Re: [DISCUSS] FLIP-171: Async Sink
Hi Piotr, to pick up this discussion thread again: - This FLIP is about providing some base implementation for FLIP-143 sinks that make adding new implementations easier, similar to the SourceReaderBase. - The whole availability topic will most likely be a separate FLIP. The basic issue just popped up here because we currently have no way to signal backpressure in sinks except by blocking `write`. This feels quite natural in sinks with sync communication but quite unnatural in async sinks. Now we have a couple of options. In all cases, we would have some WIP limit on the number of records/requests being able to be processed in parallel asynchronously (similar to asyncIO). 1. We use some blocking queue in `write`, then we need to handle interruptions. In the easiest case, we extend `write` to throw the `InterruptedException`, which is a small API change. 2. We use a blocking queue, but handle interrupts and swallow/translate them. No API change. Both solutions block the task thread, so any RPC message / unaligned checkpoint would be processed only after the backpressure is temporarily lifted. That's similar to the discussions that you linked. Cancellation may also be a tad harder on 2. 3. We could also add some `wakeUp` to the `SinkWriter` similar to `SplitFetcher` [1]. Basically, you use a normal queue with a completeable future on which you block. Wakeup would be a clean way to complete it next to the natural completion through finished requests. 4. We add availability to the sink. However, this API change also requires that we allow operators to be available so it may be a bigger change with undesired side-effects. On the other hand, we could also use the same mechanism for asyncIO. For users of FLIP-171, none of the options are exposed. So we could also start with a simple solution (add `InterruptedException`) and later try to add availability. Option 1+2 would also not require an additional FLIP; we could add it as part of this FLIP. Best, Arvid [1] https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/fetcher/SplitFetcher.java#L258-L258 On Thu, Jun 10, 2021 at 10:09 AM Hausmann, Steffen wrote: > Hey Piotrek, > > Thanks for your comments on the FLIP. I'll address your second question > first, as I think it's more central to this FLIP. Just looking at the AWS > ecosystem, there are several sinks with overlapping functionality. I've > chosen AWS sinks here because I'm most familiar with those, but a similar > argument applies more generically for destination that support async ingest. > > There is, for instance, a sink for Amazon Kinesis Data Streams that is > part of Apache Flink [1], a sink for Amazon Kinesis Data Firehose [2], a > sink for Amazon DynamoDB [3], and a sink for Amazon Timestream [4]. All > these sinks have implemented their own mechanisms for batching, persisting, > and retrying events. And I'm not sure if all of them properly participate > in checkpointing. [3] even seems to closely mirror [1] as it contains > references to the Kinesis Producer Library, which is unrelated to Amazon > DynamoDB. > > These sinks predate FLIP-143. But as batching, persisting, and retrying > capabilities do not seem to be part of FLIP-143, I'd argue that we would > end up with similar duplication, even if these sinks were rewritten today > based on FLIP-143. And that's the idea of FLIP-171: abstract away these > commonly required capabilities so that it becomes easy to create support > for a wide range of destination without having to think about batching, > retries, checkpointing, etc. I've included an example in the FLIP [5] that > shows that it only takes a couple of lines of code to implement a sink with > exactly-once semantics. To be fair, the example is lacking robust failure > handling and some more advanced capabilities of [1], but I think it still > supports this point. > > Regarding your point on the isAvailable pattern. We need some way for the > sink to propagate backpressure and we would also like to support time based > buffering hints. There are two options I currently see and would need > additional input on which one is the better or more desirable one. The > first option is to use the non-blocking isAvailable pattern. Internally, > the sink persists buffered events in the snapshot state which avoids having > to flush buffered record on a checkpoint. This seems to align well with the > non-blocking isAvailable pattern. The second option is to make calls to > `write` blocking and leverage an internal thread to trigger flushes based > on time based buffering hints. We've discussed these options with Arvid and > suggested to assumed that the `isAvailable` pattern will become available > for sinks through and additional FLIP. > > I think it is an important discussion to have. My understanding of the > implications for Flink in general are very naïve, so I'd be happy to get > further guidance. However
[jira] [Created] (FLINK-23060) Move FutureUtils.toJava into separate class
Chesnay Schepler created FLINK-23060: Summary: Move FutureUtils.toJava into separate class Key: FLINK-23060 URL: https://issues.apache.org/jira/browse/FLINK-23060 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.14.0 FutureUtils.toJava converts a scala future to a java CompletableFuture. In FLINK-18783 this method will be moved to a new akka-specific module, and we can simplify this move by moving this into a separate class now. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23061) Split BootstrapTools
Chesnay Schepler created FLINK-23061: Summary: Split BootstrapTools Key: FLINK-23061 URL: https://issues.apache.org/jira/browse/FLINK-23061 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.14.0 The BootstrapTools contain utils for creating java processes and actor systems. The akka parts will be moved into a separate module in FLINK-18783, and this would be easier if the tools were already split ahead of time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23062) FLIP-129: Register sources/sinks in Table API
Ingo Bürk created FLINK-23062: - Summary: FLIP-129: Register sources/sinks in Table API Key: FLINK-23062 URL: https://issues.apache.org/jira/browse/FLINK-23062 Project: Flink Issue Type: New Feature Components: Table SQL / API Reporter: Ingo Bürk Assignee: Ingo Bürk (!) FLIP-129 is awaiting another voting. These issues are preliminary. (!) https://cwiki.apache.org/confluence/display/FLINK/FLIP-129%3A+Refactor+Descriptor+API+to+register+connectors+in+Table+API -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23063) Remove TableEnvironment#connect
Ingo Bürk created FLINK-23063: - Summary: Remove TableEnvironment#connect Key: FLINK-23063 URL: https://issues.apache.org/jira/browse/FLINK-23063 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Ingo Bürk -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23064) Centralize connector options
Ingo Bürk created FLINK-23064: - Summary: Centralize connector options Key: FLINK-23064 URL: https://issues.apache.org/jira/browse/FLINK-23064 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Ingo Bürk For built-in connectors, we need to refactor their corresponding *Options classes to * … not contain any internal things * … be marked PublicEvolvinv * … be located in a common package -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23066) Implement TableEnvironment#from
Ingo Bürk created FLINK-23066: - Summary: Implement TableEnvironment#from Key: FLINK-23066 URL: https://issues.apache.org/jira/browse/FLINK-23066 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Ingo Bürk -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23067) Implement Table#executeInsert(TableDescriptor)
Ingo Bürk created FLINK-23067: - Summary: Implement Table#executeInsert(TableDescriptor) Key: FLINK-23067 URL: https://issues.apache.org/jira/browse/FLINK-23067 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Ingo Bürk -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23068) Support LIKE in TableDescriptor
Ingo Bürk created FLINK-23068: - Summary: Support LIKE in TableDescriptor Key: FLINK-23068 URL: https://issues.apache.org/jira/browse/FLINK-23068 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Ingo Bürk -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23069) Support schema-less #executeInsert(TableDescriptor)
Ingo Bürk created FLINK-23069: - Summary: Support schema-less #executeInsert(TableDescriptor) Key: FLINK-23069 URL: https://issues.apache.org/jira/browse/FLINK-23069 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Ingo Bürk The schema should be inferred automatically if no schema was defined in the descriptor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23070) Implement TableEnvironment#createTable
Ingo Bürk created FLINK-23070: - Summary: Implement TableEnvironment#createTable Key: FLINK-23070 URL: https://issues.apache.org/jira/browse/FLINK-23070 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Ingo Bürk -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23065) Implement TableEnvironment#createTemporaryTable
Ingo Bürk created FLINK-23065: - Summary: Implement TableEnvironment#createTemporaryTable Key: FLINK-23065 URL: https://issues.apache.org/jira/browse/FLINK-23065 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Ingo Bürk -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23071) Implement StatementSet#addInsert(TableDescriptor, Table)
Ingo Bürk created FLINK-23071: - Summary: Implement StatementSet#addInsert(TableDescriptor, Table) Key: FLINK-23071 URL: https://issues.apache.org/jira/browse/FLINK-23071 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Ingo Bürk -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Releasing Flink 1.11.4, 1.12.5, 1.13.2
Hi Dawid, Thanks for driving this discussion, I am willing to volunteer as the release manager for these versions. Best Yun Tang From: Konstantin Knauf Sent: Friday, June 18, 2021 22:35 To: dev Subject: Re: [DISCUSS] Releasing Flink 1.11.4, 1.12.5, 1.13.2 Hi Dawid, Thank you for starting the discussion. I'd like to add https://issues.apache.org/jira/browse/FLINK-23025 to the list for Flink 1.13.2. Cheers, Konstantin On Fri, Jun 18, 2021 at 3:26 PM Dawid Wysakowicz wrote: > Hi devs, > > Quite recently we pushed, in our opinion, quite an important fix[1] for > unaligned checkpoints which disables UC for broadcast partitioning. > Without the fix there might be some broadcast state corruption. > Therefore we think it would be beneficial to release it soonish. What do > you think? Do you have other issues in mind you'd like to have included > in these versions. > > Would someone be willing to volunteer to help with the releases as a > release manager? I guess there is a couple of spots to fill in here ;) > > Best, > > Dawid > > [1] https://issues.apache.org/jira/browse/FLINK-22815 > > > -- Konstantin Knauf https://twitter.com/snntrable https://github.com/knaufk
[VOTE] FLIP-129 Register sources/sinks in Table API
Hi everyone, thanks for all the feedback so far. Based on the discussion[1] we seem to have consensus, so I would like to start a vote on FLIP-129 for which the FLIP has now also been updated[2]. The vote will last for at least 72 hours (Thu, Jun 24th 12:00 GMT) unless there is an objection or insufficient votes. Thanks Ingo [1] https://lists.apache.org/thread.html/rc75d64e889bf35592e9843dde86e82bdfea8fd4eb4c3df150112b305%40%3Cdev.flink.apache.org%3E [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-129%3A+Refactor+Descriptor+API+to+register+connectors+in+Table+API
Re: [DISCUSS] FLIP-171: Async Sink
Hi, Thanks Steffen for the explanations. I think it makes sense to me. Re Arvid/Steffen: - Keep in mind that even if we choose to provide a non blocking API using the `isAvailable()`/`getAvailableFuture()` method, we would still need to support blocking inside the sinks. For example at the very least, emitting many records at once (`flatMap`) or firing timers are scenarios when output availability would be ignored at the moment by the runtime. Also I would imagine writing very large (like 1GB) records would be blocking on something as well. - Secondly, exposing availability to the API level might not be that easy/trivial. The availability pattern as defined in `AvailabilityProvider` class is quite complicated and not that easy to implement by a user. Both of those combined with lack of a clear motivation for adding `AvailabilityProvider` to the sinks/operators/functions, I would vote on just starting with blocking `write` calls. This can always be extended in the future with availability if needed/motivated properly. That would be aligned with either Arvid's option 1 or 2. I don't know what are the best practices with `InterruptedException`, but I'm always afraid of it, so I would feel personally safer with option 2. I'm not sure what problem option 3 is helping to solve? Adding `wakeUp()` would sound strange to me. Best, Piotrek pon., 21 cze 2021 o 12:15 Arvid Heise napisał(a): > Hi Piotr, > > to pick up this discussion thread again: > - This FLIP is about providing some base implementation for FLIP-143 sinks > that make adding new implementations easier, similar to the > SourceReaderBase. > - The whole availability topic will most likely be a separate FLIP. The > basic issue just popped up here because we currently have no way to signal > backpressure in sinks except by blocking `write`. This feels quite natural > in sinks with sync communication but quite unnatural in async sinks. > > Now we have a couple of options. In all cases, we would have some WIP > limit on the number of records/requests being able to be processed in > parallel asynchronously (similar to asyncIO). > 1. We use some blocking queue in `write`, then we need to handle > interruptions. In the easiest case, we extend `write` to throw the > `InterruptedException`, which is a small API change. > 2. We use a blocking queue, but handle interrupts and swallow/translate > them. No API change. > Both solutions block the task thread, so any RPC message / unaligned > checkpoint would be processed only after the backpressure is temporarily > lifted. That's similar to the discussions that you linked. Cancellation may > also be a tad harder on 2. > 3. We could also add some `wakeUp` to the `SinkWriter` similar to > `SplitFetcher` [1]. Basically, you use a normal queue with a completeable > future on which you block. Wakeup would be a clean way to complete it next > to the natural completion through finished requests. > 4. We add availability to the sink. However, this API change also requires > that we allow operators to be available so it may be a bigger change with > undesired side-effects. On the other hand, we could also use the same > mechanism for asyncIO. > > For users of FLIP-171, none of the options are exposed. So we could also > start with a simple solution (add `InterruptedException`) and later try to > add availability. Option 1+2 would also not require an additional FLIP; we > could add it as part of this FLIP. > > Best, > > Arvid > > [1] > https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/fetcher/SplitFetcher.java#L258-L258 > On Thu, Jun 10, 2021 at 10:09 AM Hausmann, Steffen > wrote: > >> Hey Piotrek, >> >> Thanks for your comments on the FLIP. I'll address your second question >> first, as I think it's more central to this FLIP. Just looking at the AWS >> ecosystem, there are several sinks with overlapping functionality. I've >> chosen AWS sinks here because I'm most familiar with those, but a similar >> argument applies more generically for destination that support async ingest. >> >> There is, for instance, a sink for Amazon Kinesis Data Streams that is >> part of Apache Flink [1], a sink for Amazon Kinesis Data Firehose [2], a >> sink for Amazon DynamoDB [3], and a sink for Amazon Timestream [4]. All >> these sinks have implemented their own mechanisms for batching, persisting, >> and retrying events. And I'm not sure if all of them properly participate >> in checkpointing. [3] even seems to closely mirror [1] as it contains >> references to the Kinesis Producer Library, which is unrelated to Amazon >> DynamoDB. >> >> These sinks predate FLIP-143. But as batching, persisting, and retrying >> capabilities do not seem to be part of FLIP-143, I'd argue that we would >> end up with similar duplication, even if these sinks were rewritten today >> based on FLIP-143. And that's the idea of FLIP-171: abstract away these >> commonly re
Re: Re: [DISCUSS] FLIP-169: DataStream API for Fine-Grained Resource Requirements
Thanks for the feedback, Till! Actually, we cannot give user any resolution for this issue as there is no API for DataStream users to influence the edge types at the moment. The edge types are currently fixed based on the jobs' mode (batch or streaming). a) I think it might not confuse the user a lot as the behavior has never been documented or guaranteed to be unchanged. b) Thanks for your illustration. I agree that add complexity can make other feature development harder in the future. However, I think this might not introduce much complexity. In this case, we construct an all-edges-blocking job graph, which already exists since 1.11 and should have been considered by the following features. I admit we cannot assume the all-edges-blocking job graph will exist forever in Flink, but AFAIK there is no seeable feature that will intend to deprecate it. WDYT? Best, Yangze Guo On Mon, Jun 21, 2021 at 6:10 PM Till Rohrmann wrote: > > I would be more in favor of what Zhu Zhu proposed to throw an exception > with a meaningful and understandable explanation that also includes how to > resolve this problem. I do understand the reasoning behind automatically > switching the edge types in order to make things easier to use but a) this > can also be confusing if the user does not expect this to happen and b) it > can add some complexity which makes other feature development harder in the > future because users might rely on it. An example of such a case I stumbled > upon rather recently is that we adjust the maximum parallelism wrt the > given savepoint if it has not been explicitly configured. On the paper this > sounds like a good usability improvement, however, for the > AdaptiveScheduler it posed a quite annoying complexity. If instead, we said > that we fail the job submission if the max parallelism does not equal the > max parallelism of the savepoint, it would have been a lot easier. > > Cheers, > Till > > On Mon, Jun 21, 2021 at 9:36 AM Yangze Guo wrote: > > > Thanks, I append it to the known limitations of this FLIP. > > > > Best, > > Yangze Guo > > > > On Mon, Jun 21, 2021 at 3:20 PM Zhu Zhu wrote: > > > > > > Thanks for the quick response Yangze. > > > The proposal sounds good to me. > > > > > > Thanks, > > > Zhu > > > > > > Yangze Guo 于2021年6月21日周一 下午3:01写道: > > >> > > >> Thanks for the comments, Zhu! > > >> > > >> Yes, it is a known limitation for fine-grained resource management. We > > >> also have filed this issue in FLINK-20865 when we proposed FLIP-156. > > >> > > >> As a first step, I agree that we can mark batch jobs with PIPELINED > > >> edges as an invalid case for this feature. However, just throwing an > > >> exception, in that case, might confuse users who do not understand the > > >> concept of pipeline region. Maybe we can force all the edges in this > > >> scenario to BLOCKING in compiling stage and well document it. So that, > > >> common users will not be interrupted while the expert users can > > >> understand the cost of that usage and make their decision. WDYT? > > >> > > >> Best, > > >> Yangze Guo > > >> > > >> On Mon, Jun 21, 2021 at 2:24 PM Zhu Zhu wrote: > > >> > > > >> > Thanks for proposing this @Yangze Guo and sorry for joining the > > discussion so late. > > >> > The proposal generally looks good to me. But I find one problem that > > batch job with PIPELINED edges might hang if enabling fine-grained > > resources. see "Resource Deadlocks could still happen in certain Cases" > > section in > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-119+Pipelined+Region+Scheduling > > >> > However, this problem may happen only in batch cases with PIPELINED > > edges, because > > >> > 1. streaming jobs would always require all resource requirements to > > be fulfilled at the same time. > > >> > 2. batch jobs without PIPELINED edges consist of multiple single > > vertex regions and thus each slot can be individually used and returned > > >> > So maybe in the first step, let's mark batch jobs with PIPELINED > > edges as an invalid case for fine-grained resources and throw exception for > > it in early compiling stage? > > >> > > > >> > Thanks, > > >> > Zhu > > >> > > > >> > Yangze Guo 于2021年6月15日周二 下午4:57写道: > > >> >> > > >> >> Thanks for the supplement, Arvid and Yun. I've annotated these two > > >> >> points in the FLIP. > > >> >> The vote is now started in [1]. > > >> >> > > >> >> [1] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-169-DataStream-API-for-Fine-Grained-Resource-Requirements-td51381.html > > >> >> > > >> >> Best, > > >> >> Yangze Guo > > >> >> > > >> >> On Fri, Jun 11, 2021 at 2:50 PM Yun Gao > > wrote: > > >> >> > > > >> >> > Hi, > > >> >> > > > >> >> > Very thanks @Yangze for bringing up this discuss. Overall +1 for > > >> >> > exposing the fine-grained resource requirements in the DataStream > > API. > > >> >> > > > >> >> > One similar issue as Arvid has pointed out is that users may also > > creating > > >> >> > different
[jira] [Created] (FLINK-23072) Add benchmarks for SQL internal and external serializers
Timo Walther created FLINK-23072: Summary: Add benchmarks for SQL internal and external serializers Key: FLINK-23072 URL: https://issues.apache.org/jira/browse/FLINK-23072 Project: Flink Issue Type: Improvement Components: Benchmarks Reporter: Timo Walther Currently, we don't benchmark any of the serializers of the SQL layer. We should test {{RowData}} with at least a field of each logical type. Also {{ExternalTypeInfo}} might be interesting to monitor because it is used between Table API and DataStream API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Dashboard/HistoryServer authentication
Hi All, We see that adding any kind of specific authentication raises more questions than answers. What would be if a generic API would be added without any real authentication logic? That way every provider can add its own protocol implementation as additional jar. BR, G On Thu, Jun 17, 2021 at 7:53 PM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Hi all, > > Sorry to be joining the conversation late. I'm also on the side of > Konstantin, generally, in that this seems to not be a core goal of Flink as > a project and adds a maintenance burden. > > Would another con of Kerberos be that is likely a fading project in terms > of network security? (serious question, please correct me if there is > reason to believe it is gaining adoption) > > The point about Kerberos being independent of infrastructure is a good one > but is something that is also solved by modern sidecar proxies + service > meshes that can run across Kubernetes and bare-metal. These solutions also > handle certificate provisioning, rotation, etc. in addition to higher-level > authorization policies. Some examples of projects with this "universal > infrastructure support" are Kuma[1] (CNCF Sandbox, I'm a maintainer) and > Istio[2] (Google). > > Wondering out loud: has anyone tried to run Flink on top of cilium[3], > which also provides zero-trust networking at the kernel level without > needing to instrument applications? This currently only runs on Kubernetes > on Linux, so that's a major limitation, but solves many of the request > forging concerns at all levels. > > Thanks, > Austin > > [1]: https://kuma.io/docs/1.1.6/quickstart/universal/ > [2]: https://istio.io/latest/docs/setup/install/virtual-machine/ > [3]: https://cilium.io/ > > On Thu, Jun 17, 2021 at 1:50 PM Till Rohrmann > wrote: > > > I left some comments in the Google document. It would be great if > > someone from the community with security experience could also take a > look > > at it. Maybe Eron you have an opinion on the topic. > > > > Cheers, > > Till > > > > On Thu, Jun 17, 2021 at 6:57 PM Till Rohrmann > > wrote: > > > > > Hi Gabor, > > > > > > I haven't found time to look into the updated FLIP yet. I'll try to do > it > > > asap. > > > > > > Cheers, > > > Till > > > > > > On Wed, Jun 16, 2021 at 9:35 PM Konstantin Knauf > > > wrote: > > > > > >> Hi Gabor, > > >> > > >> > However representing Kerberos as completely new feature is not true > > >> because > > >> it's already in since Flink makes authentication at least with HDFS > and > > >> Hbase through Kerberos. > > >> > > >> True, that is one way to look at it, but there are differences, too: > > >> Control Plane vs Data Plane, Core vs Connectors. > > >> > > >> > Adding OIDC or OAuth2 has the exact same concerns what you've guys > > just > > >> raised. Why exactly these? If you think this would be beneficial we > can > > >> discuss it in detail > > >> > > >> That's exactly my point. Once we start adding authx support, we will > > >> sooner or later discuss other options besides Kerberos, too. A user > who > > >> would like to use OAuth can not easily use Kerberos, right? > > >> That is one of the reasons I am skeptical about adding initial authx > > >> support. > > >> > > >> > Related authorization you've mentioned it can be complicated over > > time. > > >> Can > > >> you show us an example? We've knowledge with couple of open source > > >> components > > >> but authorization was never a horror complex story. I personally have > > the > > >> most experience with Spark which I think is quite simple and stable. > > Users > > >> can be viewers/admins > > >> and jobs started by others can't be modified. If you can share an > > example > > >> over-complication we can discuss on facts. > > >> > > >> Authorization is a new aspect that needs to be considered for every > > >> addition to the REST API. In the future users might ask for additional > > >> roles (e.g. an editor), user-defined roles and you've already > mentioned > > >> job-level permissions yourself. And keep in mind that there might also > > be > > >> larger additions in the future like the flink-sql-gateway. > Contributions > > >> like this become more expensive the more aspects we need to consider. > > >> > > >> In general, I believe, it is important that the community focuses its > > >> efforts where we can generate the most value to the user and - > > personally - > > >> I don't think there is much to gain by extending Flink's scope in that > > >> direction. Of course, this is not black and white and there are other > > valid > > >> opinions. > > >> > > >> Thanks, > > >> > > >> Konstantin > > >> > > >> On Wed, Jun 16, 2021 at 7:38 PM Gabor Somogyi < > > gabor.g.somo...@gmail.com> > > >> wrote: > > >> > > >>> Hi Konstantin, > > >>> > > >>> Thanks for the response. Related new feature introduction in case of > > >>> Basic > > >>> auth I tend to agree, anything else can be chosen. > > >>> > > >>> However representing Kerberos as completely new
Re: [VOTE] FLIP-129 Register sources/sinks in Table API
+1 (binding) Thanks for driving this. Regards, Timo On 21.06.21 13:24, Ingo Bürk wrote: Hi everyone, thanks for all the feedback so far. Based on the discussion[1] we seem to have consensus, so I would like to start a vote on FLIP-129 for which the FLIP has now also been updated[2]. The vote will last for at least 72 hours (Thu, Jun 24th 12:00 GMT) unless there is an objection or insufficient votes. Thanks Ingo [1] https://lists.apache.org/thread.html/rc75d64e889bf35592e9843dde86e82bdfea8fd4eb4c3df150112b305%40%3Cdev.flink.apache.org%3E [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-129%3A+Refactor+Descriptor+API+to+register+connectors+in+Table+API
Re: [VOTE] FLIP-129 Register sources/sinks in Table API
+1 (binding) Best, Jark On Mon, 21 Jun 2021 at 22:51, Timo Walther wrote: > +1 (binding) > > Thanks for driving this. > > Regards, > Timo > > On 21.06.21 13:24, Ingo Bürk wrote: > > Hi everyone, > > > > thanks for all the feedback so far. Based on the discussion[1] we seem to > > have consensus, so I would like to start a vote on FLIP-129 for which the > > FLIP has now also been updated[2]. > > > > The vote will last for at least 72 hours (Thu, Jun 24th 12:00 GMT) unless > > there is an objection or insufficient votes. > > > > > > Thanks > > Ingo > > > > [1] > > > https://lists.apache.org/thread.html/rc75d64e889bf35592e9843dde86e82bdfea8fd4eb4c3df150112b305%40%3Cdev.flink.apache.org%3E > > [2] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-129%3A+Refactor+Descriptor+API+to+register+connectors+in+Table+API > > > >
Re: [DISCUSS] Dashboard/HistoryServer authentication
Hi team, Thank you for your input. Based on this discussion I agree with G that selecting and standardizing on a specific strong authentication mechanism is more challenging than the whole rest of the scope of this authentication story. :-) I suggest that G and I go back to the drawing board and come up with an API that can support multiple authentication mechanisms, and we would only merge said API to Flink. Specific implementations of it can be maintained outside of the project. This way we tackle the main challenge in a truly minimal way. Best, Marton On Mon, Jun 21, 2021 at 4:18 PM Gabor Somogyi wrote: > Hi All, > > We see that adding any kind of specific authentication raises more > questions than answers. > What would be if a generic API would be added without any real > authentication logic? > That way every provider can add its own protocol implementation as > additional jar. > > BR, > G > > > On Thu, Jun 17, 2021 at 7:53 PM Austin Cawley-Edwards < > austin.caw...@gmail.com> wrote: > >> Hi all, >> >> Sorry to be joining the conversation late. I'm also on the side of >> Konstantin, generally, in that this seems to not be a core goal of Flink >> as >> a project and adds a maintenance burden. >> >> Would another con of Kerberos be that is likely a fading project in terms >> of network security? (serious question, please correct me if there is >> reason to believe it is gaining adoption) >> >> The point about Kerberos being independent of infrastructure is a good one >> but is something that is also solved by modern sidecar proxies + service >> meshes that can run across Kubernetes and bare-metal. These solutions also >> handle certificate provisioning, rotation, etc. in addition to >> higher-level >> authorization policies. Some examples of projects with this "universal >> infrastructure support" are Kuma[1] (CNCF Sandbox, I'm a maintainer) and >> Istio[2] (Google). >> >> Wondering out loud: has anyone tried to run Flink on top of cilium[3], >> which also provides zero-trust networking at the kernel level without >> needing to instrument applications? This currently only runs on Kubernetes >> on Linux, so that's a major limitation, but solves many of the request >> forging concerns at all levels. >> >> Thanks, >> Austin >> >> [1]: https://kuma.io/docs/1.1.6/quickstart/universal/ >> [2]: https://istio.io/latest/docs/setup/install/virtual-machine/ >> [3]: https://cilium.io/ >> >> On Thu, Jun 17, 2021 at 1:50 PM Till Rohrmann >> wrote: >> >> > I left some comments in the Google document. It would be great if >> > someone from the community with security experience could also take a >> look >> > at it. Maybe Eron you have an opinion on the topic. >> > >> > Cheers, >> > Till >> > >> > On Thu, Jun 17, 2021 at 6:57 PM Till Rohrmann >> > wrote: >> > >> > > Hi Gabor, >> > > >> > > I haven't found time to look into the updated FLIP yet. I'll try to >> do it >> > > asap. >> > > >> > > Cheers, >> > > Till >> > > >> > > On Wed, Jun 16, 2021 at 9:35 PM Konstantin Knauf >> > > wrote: >> > > >> > >> Hi Gabor, >> > >> >> > >> > However representing Kerberos as completely new feature is not true >> > >> because >> > >> it's already in since Flink makes authentication at least with HDFS >> and >> > >> Hbase through Kerberos. >> > >> >> > >> True, that is one way to look at it, but there are differences, too: >> > >> Control Plane vs Data Plane, Core vs Connectors. >> > >> >> > >> > Adding OIDC or OAuth2 has the exact same concerns what you've guys >> > just >> > >> raised. Why exactly these? If you think this would be beneficial we >> can >> > >> discuss it in detail >> > >> >> > >> That's exactly my point. Once we start adding authx support, we will >> > >> sooner or later discuss other options besides Kerberos, too. A user >> who >> > >> would like to use OAuth can not easily use Kerberos, right? >> > >> That is one of the reasons I am skeptical about adding initial authx >> > >> support. >> > >> >> > >> > Related authorization you've mentioned it can be complicated over >> > time. >> > >> Can >> > >> you show us an example? We've knowledge with couple of open source >> > >> components >> > >> but authorization was never a horror complex story. I personally have >> > the >> > >> most experience with Spark which I think is quite simple and stable. >> > Users >> > >> can be viewers/admins >> > >> and jobs started by others can't be modified. If you can share an >> > example >> > >> over-complication we can discuss on facts. >> > >> >> > >> Authorization is a new aspect that needs to be considered for every >> > >> addition to the REST API. In the future users might ask for >> additional >> > >> roles (e.g. an editor), user-defined roles and you've already >> mentioned >> > >> job-level permissions yourself. And keep in mind that there might >> also >> > be >> > >> larger additions in the future like the flink-sql-gateway. >> Contributions >> > >> like this become more expensive the more aspects we need to consid
PR: "Propagate watermarks to sink API"
Would someone be willing and able to review the PR which adds watermarks to the sink API? https://github.com/apache/flink/pull/15950 Thanks! Eron
[jira] [Created] (FLINK-23073) Fix space handling in CSV timestamp parser
Seth Wiesman created FLINK-23073: Summary: Fix space handling in CSV timestamp parser Key: FLINK-23073 URL: https://issues.apache.org/jira/browse/FLINK-23073 Project: Flink Issue Type: Bug Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) Affects Versions: 1.14.0, 1.13.2 Reporter: Seth Wiesman Assignee: Seth Wiesman FLINK-21947 Added support for TIMESTAMP_LTZ in the CSV format by replacing java.sql.Timestamp.valueOf with java.time.LocalDateTime.parse. Timestamp.valueOf internally calls `trim()` on the string before parsing while LocalDateTime.parse does not. This caused a breaking change where the CSV format can no longer parse timestamps of CSV's with spaces after the delimiter. We should manually re-add the call to trim to revert the behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS]FLIP-150: Introduce Hybrid Source
Here is a summary of where we are at with the PR: * Added capability to construct sources at switch time through a factory interface. This can support all previously discussed scenarios. The simple case (sources with fixed start position) is still simple, but for scenarios that require deferred instantiation, sources can now be created through their respective builders at switch time with access to the previous enumerator. This is a modification of option 3 described previously. * There is now unit test coverage for reader and enumerator. * Ideas such as a universal interface for exchange of start positions can be added on top of the current implementation. However, I would like to keep that as exercise for the future and the scope of this initial work contained. * FLIP page will be updated to reflect the changes made since it was originally created. Nicholas volunteered to take this up and also send a VOTE thread. Thanks all and especially Arvid for taking the time to review and discuss. Thomas On Tue, Jun 15, 2021 at 11:01 AM Thomas Weise wrote: > > Hi Arvid, > > Thanks for your reply --> > > On Mon, Jun 14, 2021 at 2:55 PM Arvid Heise wrote: > > > > Hi Thomas, > > > > Thanks for bringing this up. I think this is a tough nut to crack :/. > > Imho 1 and 3 or 1+3 can work but it is ofc a pity if the source implementor > > is not aware of HybridSource. I'm also worried that we may not have a > > universal interface to specify start offset/time. > > I guess it also would be much easier if we would have an abstract base > > source class where we could implement some basic support. > > > > When I initially looked at the issue I was thinking that sources should > > always be immutable (we have some bad experiences with mutable life-cycles > > in operator implementations) and the only modifiable thing should be the > > builder. That would mean that a HybridSource actually just gets a list of > > source builders and creates the sources when needed with the correct > > start/end offset. However, we neither have base builders (something that > > I'd like to change) nor are any of the builders serializable. We could > > convert sources back to builders, update the start offset, and convert to > > sources again but this also seems overly complicated. So I'm assuming that > > we should go with modifiable sources as also expressed in the FLIP draft. > > The need to set a start position at runtime indicates that sources > should not be immutable. I think it would be better to have a setter > on the source that clearly describes the mutation. > > Regarding deferred construction of the sources (supplier pattern): > This is actually a very interesting idea that would also help in > situations where the exact sequence of sources isn't known upfront. > However, Source is also the factory for split and enumerator > checkpoint serializers. If we were to instantiate the source at switch > time, we would also need to distribute the serializers at switch time. > This would lead to even more complexity and move us further away from > the original goal of having a relatively simple implementation for the > basic scenarios. > > > If we could assume that we are always switching by time, we could also > > change Source(Enumerator)#start to take the start time as a parameter. Can > > we deduce the end time by the record timestamp? But I guess that has all > > been discussed already, so sorry if I derail the discussion. > > This actually hasn't been discussed. The original proposal left the > type of the start position open, which also makes it less attractive > (user still has to supply a converter). > > For initial internal usage of the hybrid source, we are planning to > use a timestamp. But there may be use cases where the start position > could be encoded in other ways, such as based on Kafka offsets. > > > I'm also leaning towards extending the Source interface to include these > > methods (with defaults) to make it harder for implementers to miss. > > It would be possible to introduce an optional interface as a follow-up > task. It can be implemented as the default of option 3. > > > > > > > On Fri, Jun 11, 2021 at 7:02 PM Thomas Weise wrote: > > > > > Thanks for the suggestions and feedback on the PR. > > > > > > A variation of hybrid source that can switch back and forth was > > > brought up before and it is something that will be eventually > > > required. It was also suggested by Stephan that in the future there > > > may be more than one implementation of hybrid source for different > > > requirements. > > > > > > I want to bring back the topic of how enumerator end state can be > > > converted into start position from the PR [1]. We started in the FLIP > > > page with "switchable" interfaces, the prototype had checkpoint > > > conversion and now the PR has a function that allows to augment the > > > source. Each of these has pros and cons but we will need to converge. > > > > > > 1. Switchable interfaces > > > * u
trying (and failing) to update pyflink-walkthrough for Flink 1.13
I've been trying to upgrade the pyflink-walkthrough to Flink 1.13.1, but without any success. Unless I give it a lot of resources the data generator times out trying to connect to Kafka. If I give it 6 cores and 11GB (which is about all I can offer it) it does manage to connect, but then fails trying to write to kafka. Not sure what's wrong? Any suggestions? See [1] to review what I tried. Best, David [1] https://github.com/alpinegizmo/flink-playgrounds/commit/777274355ba04de6d8c8f1308b24be99ec86a0d6 21:40 $ docker-compose logs -f generator Attaching to pyflink-walkthrough_generator_1 generator_1 | Connecting to Kafka brokers generator_1 | Waiting for brokers to become available generator_1 | Waiting for brokers to become available generator_1 | Connected to Kafka generator_1 | Traceback (most recent call last): generator_1 | File "./generate_source_data.py", line 61, in generator_1 | write_data(producer) generator_1 | File "./generate_source_data.py", line 42, in write_data generator_1 | producer.send(topic, value=cur_data) generator_1 | File "/usr/local/lib/python3.7/site-packages/kafka/producer/kafka.py", line 576, in send generator_1 | self._wait_on_metadata(topic, self.config['max_block_ms'] / 1000.0) generator_1 | File "/usr/local/lib/python3.7/site-packages/kafka/producer/kafka.py", line 703, in _wait_on_metadata generator_1 | "Failed to update metadata after %.1f secs." % (max_wait,)) generator_1 | kafka.errors.KafkaTimeoutError: KafkaTimeoutError: Failed to update metadata after 60.0 secs.
Re: PR: "Propagate watermarks to sink API"
Hi Eron, I will check it tomorrow. Sorry for the delay. If someone beats me to that, please go ahead. On Mon, Jun 21, 2021 at 8:18 PM Eron Wright wrote: > Would someone be willing and able to review the PR which adds watermarks to > the sink API? > > https://github.com/apache/flink/pull/15950 > > Thanks! > Eron >
Re: [DISCUSS] Dashboard/HistoryServer authentication
Hi Gabor + Marton, I don't believe that the issue with this proposal is the specific mechanism proposed (Kerberos), but rather that it is not the level to implement it at (Flink). I'm just one voice, so please take this with a grain of salt. In the other solutions previously noted there is no need to instrument Flink which, in addition to reducing the maintenance burden, provides a better, decoupled end result. IMO we should not add any new API in Flink for this use case. I think it is unfortunate and sympathize with the work that has already been done on this feature – perhaps we could brainstorm ways to run this alongside Flink in your setup. Again, I don't think the proposed solution of an agnostic API would not work, nor is it a bad idea, but is not one that will make Flink more compatible with the modern solutions to this problem. Best, Austin On Mon, Jun 21, 2021 at 2:18 PM Márton Balassi wrote: > Hi team, > > Thank you for your input. Based on this discussion I agree with G that > selecting and standardizing on a specific strong authentication mechanism > is more challenging than the whole rest of the scope of this authentication > story. :-) I suggest that G and I go back to the drawing board and come up > with an API that can support multiple authentication mechanisms, and we > would only merge said API to Flink. Specific implementations of it can be > maintained outside of the project. This way we tackle the main challenge in > a truly minimal way. > > Best, > Marton > > On Mon, Jun 21, 2021 at 4:18 PM Gabor Somogyi > wrote: > > > Hi All, > > > > We see that adding any kind of specific authentication raises more > > questions than answers. > > What would be if a generic API would be added without any real > > authentication logic? > > That way every provider can add its own protocol implementation as > > additional jar. > > > > BR, > > G > > > > > > On Thu, Jun 17, 2021 at 7:53 PM Austin Cawley-Edwards < > > austin.caw...@gmail.com> wrote: > > > >> Hi all, > >> > >> Sorry to be joining the conversation late. I'm also on the side of > >> Konstantin, generally, in that this seems to not be a core goal of Flink > >> as > >> a project and adds a maintenance burden. > >> > >> Would another con of Kerberos be that is likely a fading project in > terms > >> of network security? (serious question, please correct me if there is > >> reason to believe it is gaining adoption) > >> > >> The point about Kerberos being independent of infrastructure is a good > one > >> but is something that is also solved by modern sidecar proxies + service > >> meshes that can run across Kubernetes and bare-metal. These solutions > also > >> handle certificate provisioning, rotation, etc. in addition to > >> higher-level > >> authorization policies. Some examples of projects with this "universal > >> infrastructure support" are Kuma[1] (CNCF Sandbox, I'm a maintainer) and > >> Istio[2] (Google). > >> > >> Wondering out loud: has anyone tried to run Flink on top of cilium[3], > >> which also provides zero-trust networking at the kernel level without > >> needing to instrument applications? This currently only runs on > Kubernetes > >> on Linux, so that's a major limitation, but solves many of the request > >> forging concerns at all levels. > >> > >> Thanks, > >> Austin > >> > >> [1]: https://kuma.io/docs/1.1.6/quickstart/universal/ > >> [2]: https://istio.io/latest/docs/setup/install/virtual-machine/ > >> [3]: https://cilium.io/ > >> > >> On Thu, Jun 17, 2021 at 1:50 PM Till Rohrmann > >> wrote: > >> > >> > I left some comments in the Google document. It would be great if > >> > someone from the community with security experience could also take a > >> look > >> > at it. Maybe Eron you have an opinion on the topic. > >> > > >> > Cheers, > >> > Till > >> > > >> > On Thu, Jun 17, 2021 at 6:57 PM Till Rohrmann > >> > wrote: > >> > > >> > > Hi Gabor, > >> > > > >> > > I haven't found time to look into the updated FLIP yet. I'll try to > >> do it > >> > > asap. > >> > > > >> > > Cheers, > >> > > Till > >> > > > >> > > On Wed, Jun 16, 2021 at 9:35 PM Konstantin Knauf > > >> > > wrote: > >> > > > >> > >> Hi Gabor, > >> > >> > >> > >> > However representing Kerberos as completely new feature is not > true > >> > >> because > >> > >> it's already in since Flink makes authentication at least with HDFS > >> and > >> > >> Hbase through Kerberos. > >> > >> > >> > >> True, that is one way to look at it, but there are differences, > too: > >> > >> Control Plane vs Data Plane, Core vs Connectors. > >> > >> > >> > >> > Adding OIDC or OAuth2 has the exact same concerns what you've > guys > >> > just > >> > >> raised. Why exactly these? If you think this would be beneficial we > >> can > >> > >> discuss it in detail > >> > >> > >> > >> That's exactly my point. Once we start adding authx support, we > will > >> > >> sooner or later discuss other options besides Kerberos, too. A user > >> who > >> > >> would like to use OAuth can not e
Re: [DISCUSS] Releasing Flink 1.11.4, 1.12.5, 1.13.2
+1 to the release. It would be great if we could get FLINK-23073 into 1.13.2. There's already an open PR and it unblocks upgrading the table API walkthrough in apache/flink-playgrounds to 1.13. Seth On Mon, Jun 21, 2021 at 6:28 AM Yun Tang wrote: > Hi Dawid, > > Thanks for driving this discussion, I am willing to volunteer as the > release manager for these versions. > > > Best > Yun Tang > > From: Konstantin Knauf > Sent: Friday, June 18, 2021 22:35 > To: dev > Subject: Re: [DISCUSS] Releasing Flink 1.11.4, 1.12.5, 1.13.2 > > Hi Dawid, > > Thank you for starting the discussion. I'd like to add > https://issues.apache.org/jira/browse/FLINK-23025 to the list for Flink > 1.13.2. > > Cheers, > > Konstantin > > On Fri, Jun 18, 2021 at 3:26 PM Dawid Wysakowicz > wrote: > > > Hi devs, > > > > Quite recently we pushed, in our opinion, quite an important fix[1] for > > unaligned checkpoints which disables UC for broadcast partitioning. > > Without the fix there might be some broadcast state corruption. > > Therefore we think it would be beneficial to release it soonish. What do > > you think? Do you have other issues in mind you'd like to have included > > in these versions. > > > > Would someone be willing to volunteer to help with the releases as a > > release manager? I guess there is a couple of spots to fill in here ;) > > > > Best, > > > > Dawid > > > > [1] https://issues.apache.org/jira/browse/FLINK-22815 > > > > > > > > -- > > Konstantin Knauf > > https://twitter.com/snntrable > > https://github.com/knaufk >
Re: [DISCUSS] Releasing Flink 1.11.4, 1.12.5, 1.13.2
Thanks for driving this, Dawid. +1 to the release. I would also like to volunteer as the release manager for these versions. Best Godfrey Seth Wiesman 于2021年6月22日周二 上午8:39写道: > +1 to the release. > > It would be great if we could get FLINK-23073 into 1.13.2. There's already > an open PR and it unblocks upgrading the table API walkthrough in > apache/flink-playgrounds to 1.13. > > Seth > > On Mon, Jun 21, 2021 at 6:28 AM Yun Tang wrote: > > > Hi Dawid, > > > > Thanks for driving this discussion, I am willing to volunteer as the > > release manager for these versions. > > > > > > Best > > Yun Tang > > > > From: Konstantin Knauf > > Sent: Friday, June 18, 2021 22:35 > > To: dev > > Subject: Re: [DISCUSS] Releasing Flink 1.11.4, 1.12.5, 1.13.2 > > > > Hi Dawid, > > > > Thank you for starting the discussion. I'd like to add > > https://issues.apache.org/jira/browse/FLINK-23025 to the list for Flink > > 1.13.2. > > > > Cheers, > > > > Konstantin > > > > On Fri, Jun 18, 2021 at 3:26 PM Dawid Wysakowicz > > > wrote: > > > > > Hi devs, > > > > > > Quite recently we pushed, in our opinion, quite an important fix[1] for > > > unaligned checkpoints which disables UC for broadcast partitioning. > > > Without the fix there might be some broadcast state corruption. > > > Therefore we think it would be beneficial to release it soonish. What > do > > > you think? Do you have other issues in mind you'd like to have included > > > in these versions. > > > > > > Would someone be willing to volunteer to help with the releases as a > > > release manager? I guess there is a couple of spots to fill in here ;) > > > > > > Best, > > > > > > Dawid > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-22815 > > > > > > > > > > > > > -- > > > > Konstantin Knauf > > > > https://twitter.com/snntrable > > > > https://github.com/knaufk > > >
Re: [DISCUSS] Releasing Flink 1.11.4, 1.12.5, 1.13.2
+1 to the release. Thanks Dawid for driving this discussion. I am willing to volunteer as the release manager too. Best, Jingsong On Tue, Jun 22, 2021 at 9:58 AM godfrey he wrote: > Thanks for driving this, Dawid. +1 to the release. > I would also like to volunteer as the release manager for these versions. > > > Best > Godfrey > > Seth Wiesman 于2021年6月22日周二 上午8:39写道: > > > +1 to the release. > > > > It would be great if we could get FLINK-23073 into 1.13.2. There's > already > > an open PR and it unblocks upgrading the table API walkthrough in > > apache/flink-playgrounds to 1.13. > > > > Seth > > > > On Mon, Jun 21, 2021 at 6:28 AM Yun Tang wrote: > > > > > Hi Dawid, > > > > > > Thanks for driving this discussion, I am willing to volunteer as the > > > release manager for these versions. > > > > > > > > > Best > > > Yun Tang > > > > > > From: Konstantin Knauf > > > Sent: Friday, June 18, 2021 22:35 > > > To: dev > > > Subject: Re: [DISCUSS] Releasing Flink 1.11.4, 1.12.5, 1.13.2 > > > > > > Hi Dawid, > > > > > > Thank you for starting the discussion. I'd like to add > > > https://issues.apache.org/jira/browse/FLINK-23025 to the list for > Flink > > > 1.13.2. > > > > > > Cheers, > > > > > > Konstantin > > > > > > On Fri, Jun 18, 2021 at 3:26 PM Dawid Wysakowicz < > dwysakow...@apache.org > > > > > > wrote: > > > > > > > Hi devs, > > > > > > > > Quite recently we pushed, in our opinion, quite an important fix[1] > for > > > > unaligned checkpoints which disables UC for broadcast partitioning. > > > > Without the fix there might be some broadcast state corruption. > > > > Therefore we think it would be beneficial to release it soonish. What > > do > > > > you think? Do you have other issues in mind you'd like to have > included > > > > in these versions. > > > > > > > > Would someone be willing to volunteer to help with the releases as a > > > > release manager? I guess there is a couple of spots to fill in here > ;) > > > > > > > > Best, > > > > > > > > Dawid > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-22815 > > > > > > > > > > > > > > > > > > -- > > > > > > Konstantin Knauf > > > > > > https://twitter.com/snntrable > > > > > > https://github.com/knaufk > > > > > > -- Best, Jingsong Lee
Re: [VOTE] FLIP-129 Register sources/sinks in Table API
+1 (binding) Thanks for driving. Best, Jingsong On Tue, Jun 22, 2021 at 12:07 AM Jark Wu wrote: > +1 (binding) > > Best, > Jark > > On Mon, 21 Jun 2021 at 22:51, Timo Walther wrote: > > > +1 (binding) > > > > Thanks for driving this. > > > > Regards, > > Timo > > > > On 21.06.21 13:24, Ingo Bürk wrote: > > > Hi everyone, > > > > > > thanks for all the feedback so far. Based on the discussion[1] we seem > to > > > have consensus, so I would like to start a vote on FLIP-129 for which > the > > > FLIP has now also been updated[2]. > > > > > > The vote will last for at least 72 hours (Thu, Jun 24th 12:00 GMT) > unless > > > there is an objection or insufficient votes. > > > > > > > > > Thanks > > > Ingo > > > > > > [1] > > > > > > https://lists.apache.org/thread.html/rc75d64e889bf35592e9843dde86e82bdfea8fd4eb4c3df150112b305%40%3Cdev.flink.apache.org%3E > > > [2] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-129%3A+Refactor+Descriptor+API+to+register+connectors+in+Table+API > > > > > > > > -- Best, Jingsong Lee
Re: [VOTE] FLIP-129 Register sources/sinks in Table API
+1 (binding) Best regards, JING ZHANG Jingsong Li 于2021年6月22日周二 上午10:11写道: > +1 (binding) > > Thanks for driving. > > Best, > Jingsong > > On Tue, Jun 22, 2021 at 12:07 AM Jark Wu wrote: > > > +1 (binding) > > > > Best, > > Jark > > > > On Mon, 21 Jun 2021 at 22:51, Timo Walther wrote: > > > > > +1 (binding) > > > > > > Thanks for driving this. > > > > > > Regards, > > > Timo > > > > > > On 21.06.21 13:24, Ingo Bürk wrote: > > > > Hi everyone, > > > > > > > > thanks for all the feedback so far. Based on the discussion[1] we > seem > > to > > > > have consensus, so I would like to start a vote on FLIP-129 for which > > the > > > > FLIP has now also been updated[2]. > > > > > > > > The vote will last for at least 72 hours (Thu, Jun 24th 12:00 GMT) > > unless > > > > there is an objection or insufficient votes. > > > > > > > > > > > > Thanks > > > > Ingo > > > > > > > > [1] > > > > > > > > > > https://lists.apache.org/thread.html/rc75d64e889bf35592e9843dde86e82bdfea8fd4eb4c3df150112b305%40%3Cdev.flink.apache.org%3E > > > > [2] > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-129%3A+Refactor+Descriptor+API+to+register+connectors+in+Table+API > > > > > > > > > > > > > > > -- > Best, Jingsong Lee >
Re: [VOTE] FLIP-129 Register sources/sinks in Table API
+1 Thanks Ingo for picking up this FLIP. Best, Leonard > 在 2021年6月22日,10:16,JING ZHANG 写道: > > +1 (binding) > > Best regards, > JING ZHANG > > Jingsong Li 于2021年6月22日周二 上午10:11写道: > >> +1 (binding) >> >> Thanks for driving. >> >> Best, >> Jingsong >> >> On Tue, Jun 22, 2021 at 12:07 AM Jark Wu wrote: >> >>> +1 (binding) >>> >>> Best, >>> Jark >>> >>> On Mon, 21 Jun 2021 at 22:51, Timo Walther wrote: >>> +1 (binding) Thanks for driving this. Regards, Timo On 21.06.21 13:24, Ingo Bürk wrote: > Hi everyone, > > thanks for all the feedback so far. Based on the discussion[1] we >> seem >>> to > have consensus, so I would like to start a vote on FLIP-129 for which >>> the > FLIP has now also been updated[2]. > > The vote will last for at least 72 hours (Thu, Jun 24th 12:00 GMT) >>> unless > there is an objection or insufficient votes. > > > Thanks > Ingo > > [1] > >>> >> https://lists.apache.org/thread.html/rc75d64e889bf35592e9843dde86e82bdfea8fd4eb4c3df150112b305%40%3Cdev.flink.apache.org%3E > [2] > >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-129%3A+Refactor+Descriptor+API+to+register+connectors+in+Table+API > >>> >> >> >> -- >> Best, Jingsong Lee >>
Re: trying (and failing) to update pyflink-walkthrough for Flink 1.13
Thanks David for taking care of this. I will take a look at this issue. Regards, Dian > 2021年6月22日 上午4:06,David Anderson 写道: > > I've been trying to upgrade the pyflink-walkthrough to Flink 1.13.1, but > without any success. > > Unless I give it a lot of resources the data generator times out trying to > connect to Kafka. If I give it 6 cores and 11GB (which is about all I can > offer it) it does manage to connect, but then fails trying to write to > kafka. > > Not sure what's wrong? Any suggestions? > > See [1] to review what I tried. > > Best, > David > > [1] > https://github.com/alpinegizmo/flink-playgrounds/commit/777274355ba04de6d8c8f1308b24be99ec86a0d6 > > 21:40 $ docker-compose logs -f generator > > Attaching to pyflink-walkthrough_generator_1 > > generator_1 | Connecting to Kafka brokers > > generator_1 | Waiting for brokers to become available > > generator_1 | Waiting for brokers to become available > > generator_1 | Connected to Kafka > > generator_1 | Traceback (most recent call last): > > generator_1 | File "./generate_source_data.py", line 61, in > > > generator_1 | write_data(producer) > > generator_1 | File "./generate_source_data.py", line 42, in > write_data > > generator_1 | producer.send(topic, value=cur_data) > > generator_1 | File > "/usr/local/lib/python3.7/site-packages/kafka/producer/kafka.py", line 576, > in send > > generator_1 | self._wait_on_metadata(topic, > self.config['max_block_ms'] / 1000.0) > > generator_1 | File > "/usr/local/lib/python3.7/site-packages/kafka/producer/kafka.py", line 703, > in _wait_on_metadata > > generator_1 | "Failed to update metadata after %.1f secs." > % (max_wait,)) > > generator_1 | kafka.errors.KafkaTimeoutError: > KafkaTimeoutError: Failed to update metadata after 60.0 secs.
[jira] [Created] (FLINK-23074) There is a class conflict between flink-connector-hive and flink-parquet
Ada Wong created FLINK-23074: Summary: There is a class conflict between flink-connector-hive and flink-parquet Key: FLINK-23074 URL: https://issues.apache.org/jira/browse/FLINK-23074 Project: Flink Issue Type: Improvement Components: Connectors / Hive Reporter: Ada Wong flink-connector-hive 2.3.6 include parquet-hadoop 1.8.1 version but flink-parquet include 1.11.1. org.apache.parquet.hadoop.example.GroupWriteSupport is different. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23075) Python API for enabling ChangelogStateBackend
Zakelly Lan created FLINK-23075: --- Summary: Python API for enabling ChangelogStateBackend Key: FLINK-23075 URL: https://issues.apache.org/jira/browse/FLINK-23075 Project: Flink Issue Type: Improvement Components: API / Python Affects Versions: 1.14.0 Reporter: Zakelly Lan After FLINK-22678, two APIs ```enableChangelogStateBackend``` and ```isChangelogStateBackendEnabled``` have been added. The corresponding interfaces should be added to python API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Releasing Flink 1.11.4, 1.12.5, 1.13.2
Thanks Dawid for starting the discussion, and thanks Yun, Godfrey and Jingsong for volunteering as release managers. +1 for the releases, and +1 for the release managers. Thank you~ Xintong Song On Tue, Jun 22, 2021 at 10:15 AM Jingsong Li wrote: > +1 to the release. > > Thanks Dawid for driving this discussion. > > I am willing to volunteer as the release manager too. > > Best, > Jingsong > > On Tue, Jun 22, 2021 at 9:58 AM godfrey he wrote: > > > Thanks for driving this, Dawid. +1 to the release. > > I would also like to volunteer as the release manager for these versions. > > > > > > Best > > Godfrey > > > > Seth Wiesman 于2021年6月22日周二 上午8:39写道: > > > > > +1 to the release. > > > > > > It would be great if we could get FLINK-23073 into 1.13.2. There's > > already > > > an open PR and it unblocks upgrading the table API walkthrough in > > > apache/flink-playgrounds to 1.13. > > > > > > Seth > > > > > > On Mon, Jun 21, 2021 at 6:28 AM Yun Tang wrote: > > > > > > > Hi Dawid, > > > > > > > > Thanks for driving this discussion, I am willing to volunteer as the > > > > release manager for these versions. > > > > > > > > > > > > Best > > > > Yun Tang > > > > > > > > From: Konstantin Knauf > > > > Sent: Friday, June 18, 2021 22:35 > > > > To: dev > > > > Subject: Re: [DISCUSS] Releasing Flink 1.11.4, 1.12.5, 1.13.2 > > > > > > > > Hi Dawid, > > > > > > > > Thank you for starting the discussion. I'd like to add > > > > https://issues.apache.org/jira/browse/FLINK-23025 to the list for > > Flink > > > > 1.13.2. > > > > > > > > Cheers, > > > > > > > > Konstantin > > > > > > > > On Fri, Jun 18, 2021 at 3:26 PM Dawid Wysakowicz < > > dwysakow...@apache.org > > > > > > > > wrote: > > > > > > > > > Hi devs, > > > > > > > > > > Quite recently we pushed, in our opinion, quite an important fix[1] > > for > > > > > unaligned checkpoints which disables UC for broadcast partitioning. > > > > > Without the fix there might be some broadcast state corruption. > > > > > Therefore we think it would be beneficial to release it soonish. > What > > > do > > > > > you think? Do you have other issues in mind you'd like to have > > included > > > > > in these versions. > > > > > > > > > > Would someone be willing to volunteer to help with the releases as > a > > > > > release manager? I guess there is a couple of spots to fill in here > > ;) > > > > > > > > > > Best, > > > > > > > > > > Dawid > > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-22815 > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Konstantin Knauf > > > > > > > > https://twitter.com/snntrable > > > > > > > > https://github.com/knaufk > > > > > > > > > > > > -- > Best, Jingsong Lee >
[jira] [Created] (FLINK-23076) DispatcherTest.testWaitingForJobMasterLeadership fails on azure
Xintong Song created FLINK-23076: Summary: DispatcherTest.testWaitingForJobMasterLeadership fails on azure Key: FLINK-23076 URL: https://issues.apache.org/jira/browse/FLINK-23076 Project: Flink Issue Type: Bug Affects Versions: 1.12.4 Reporter: Xintong Song Fix For: 1.12.5 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19265&view=logs&j=d89de3df-4600-5585-dadc-9bbc9a5e661c&t=19336553-69ec-5b03-471a-791a483cced6&l=6511 {code} [ERROR] Failures: [ERROR] DispatcherTest.testWaitingForJobMasterLeadership:672 Expected: is but: was {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23077) Running nexmark q5 with 1.13.1 of pipeline.object-reuse=true, the taskmanager will be killed and produce failover.
xiaojin.wy created FLINK-23077: -- Summary: Running nexmark q5 with 1.13.1 of pipeline.object-reuse=true, the taskmanager will be killed and produce failover. Key: FLINK-23077 URL: https://issues.apache.org/jira/browse/FLINK-23077 Project: Flink Issue Type: Bug Components: Table SQL / Runtime Affects Versions: 1.13.1, 1.13.0 Reporter: xiaojin.wy Running nexmark with flink 1.13.0, 1.13.1, q5 can`t success. *The conf is: * pipeline.object-reuse=true *The sql is:* CREATE TABLE discard_sink ( auction BIGINT, num BIGINT ) WITH ( 'connector' = 'blackhole' ); INSERT INTO discard_sink SELECT AuctionBids.auction, AuctionBids.num FROM ( SELECT B1.auction, count(*) AS num, HOP_START(B1.dateTime, INTERVAL '2' SECOND, INTERVAL '10' SECOND) AS starttime, HOP_END(B1.dateTime, INTERVAL '2' SECOND, INTERVAL '10' SECOND) AS endtime FROM bid B1 GROUP BY B1.auction, HOP(B1.dateTime, INTERVAL '2' SECOND, INTERVAL '10' SECOND) ) AS AuctionBids JOIN ( SELECT max(CountBids.num) AS maxn, CountBids.starttime, CountBids.endtime FROM ( SELECT count(*) AS num, HOP_START(B2.dateTime, INTERVAL '2' SECOND, INTERVAL '10' SECOND) AS starttime, HOP_END(B2.dateTime, INTERVAL '2' SECOND, INTERVAL '10' SECOND) AS endtime FROM bid B2 GROUP BY B2.auction, HOP(B2.dateTime, INTERVAL '2' SECOND, INTERVAL '10' SECOND) ) AS CountBids GROUP BY CountBids.starttime, CountBids.endtime ) AS MaxBids ON AuctionBids.starttime = MaxBids.starttime AND AuctionBids.endtime = MaxBids.endtime AND AuctionBids.num >= MaxBids.maxn;% *The error is:* !image-2021-06-22-11-30-58-022.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Re: [DISCUSS] FLIP-169: DataStream API for Fine-Grained Resource Requirements
I second Zhu and Till's opinion. Failing with an exception that also includes how to resolve the problem sounds better, in terms of making it explicit to users that pipelined edges are replaced with blocking edges. Concerning absence of knobs tuning the edge types, we can introduce a configuration option. Since currently the edge types are fixed based on the job execution mode and are not exposed to users, I'd suggest introducing a configuration option that only affects fine-grained resource management use cases. To be specific, we can have something like 'fine-grained.xxx.all-blocking'. The default value should be false, and we can suggest users to set it to true in the error message. When set to true, this should take effect only when fine-grained resource requirements are detected. Thus, it should not affect the default execution-mode based edge type strategy for non fine-grained use cases. Thank you~ Xintong Song On Mon, Jun 21, 2021 at 8:59 PM Yangze Guo wrote: > Thanks for the feedback, Till! > > Actually, we cannot give user any resolution for this issue as there > is no API for DataStream users to influence the edge types at the > moment. The edge types are currently fixed based on the jobs' mode > (batch or streaming). > a) I think it might not confuse the user a lot as the behavior has > never been documented or guaranteed to be unchanged. > b) Thanks for your illustration. I agree that add complexity can make > other feature development harder in the future. However, I think this > might not introduce much complexity. In this case, we construct an > all-edges-blocking job graph, which already exists since 1.11 and > should have been considered by the following features. I admit we > cannot assume the all-edges-blocking job graph will exist forever in > Flink, but AFAIK there is no seeable feature that will intend to > deprecate it. > > WDYT? > > > > Best, > Yangze Guo > > On Mon, Jun 21, 2021 at 6:10 PM Till Rohrmann > wrote: > > > > I would be more in favor of what Zhu Zhu proposed to throw an exception > > with a meaningful and understandable explanation that also includes how > to > > resolve this problem. I do understand the reasoning behind automatically > > switching the edge types in order to make things easier to use but a) > this > > can also be confusing if the user does not expect this to happen and b) > it > > can add some complexity which makes other feature development harder in > the > > future because users might rely on it. An example of such a case I > stumbled > > upon rather recently is that we adjust the maximum parallelism wrt the > > given savepoint if it has not been explicitly configured. On the paper > this > > sounds like a good usability improvement, however, for the > > AdaptiveScheduler it posed a quite annoying complexity. If instead, we > said > > that we fail the job submission if the max parallelism does not equal the > > max parallelism of the savepoint, it would have been a lot easier. > > > > Cheers, > > Till > > > > On Mon, Jun 21, 2021 at 9:36 AM Yangze Guo wrote: > > > > > Thanks, I append it to the known limitations of this FLIP. > > > > > > Best, > > > Yangze Guo > > > > > > On Mon, Jun 21, 2021 at 3:20 PM Zhu Zhu wrote: > > > > > > > > Thanks for the quick response Yangze. > > > > The proposal sounds good to me. > > > > > > > > Thanks, > > > > Zhu > > > > > > > > Yangze Guo 于2021年6月21日周一 下午3:01写道: > > > >> > > > >> Thanks for the comments, Zhu! > > > >> > > > >> Yes, it is a known limitation for fine-grained resource management. > We > > > >> also have filed this issue in FLINK-20865 when we proposed FLIP-156. > > > >> > > > >> As a first step, I agree that we can mark batch jobs with PIPELINED > > > >> edges as an invalid case for this feature. However, just throwing an > > > >> exception, in that case, might confuse users who do not understand > the > > > >> concept of pipeline region. Maybe we can force all the edges in this > > > >> scenario to BLOCKING in compiling stage and well document it. So > that, > > > >> common users will not be interrupted while the expert users can > > > >> understand the cost of that usage and make their decision. WDYT? > > > >> > > > >> Best, > > > >> Yangze Guo > > > >> > > > >> On Mon, Jun 21, 2021 at 2:24 PM Zhu Zhu wrote: > > > >> > > > > >> > Thanks for proposing this @Yangze Guo and sorry for joining the > > > discussion so late. > > > >> > The proposal generally looks good to me. But I find one problem > that > > > batch job with PIPELINED edges might hang if enabling fine-grained > > > resources. see "Resource Deadlocks could still happen in certain Cases" > > > section in > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-119+Pipelined+Region+Scheduling > > > >> > However, this problem may happen only in batch cases with > PIPELINED > > > edges, because > > > >> > 1. streaming jobs would always require all resource requirements > to > > > be fulfilled at the same time. > > > >> > 2. b
[jira] [Created] (FLINK-23078) Scheduler Benchmarks not compiling
Piotr Nowojski created FLINK-23078: -- Summary: Scheduler Benchmarks not compiling Key: FLINK-23078 URL: https://issues.apache.org/jira/browse/FLINK-23078 Project: Flink Issue Type: Bug Components: Benchmarks, Runtime / Coordination Affects Versions: 1.14.0 Reporter: Piotr Nowojski Fix For: 1.14.0 {code:java} 07:46:50 [ERROR] /home/jenkins/workspace/flink-master-benchmarks/flink-benchmarks/src/main/java/org/apache/flink/scheduler/benchmark/SchedulerBenchmarkBase.java:21:44: error: cannot find symbol {code} CC [~chesnay] [~Thesharing] -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Re: [DISCUSS] FLIP-169: DataStream API for Fine-Grained Resource Requirements
Thanks for the comment, Xintong. I used to wonder if it was reasonable or worthwhile to introduce a configuration like "table.exec.shuffle-mode" for DataStream API. Narrow down the scope of effect sounds good to me. Best, Yangze Guo On Tue, Jun 22, 2021 at 2:08 PM Xintong Song wrote: > > I second Zhu and Till's opinion. > > Failing with an exception that also includes how to resolve the problem > sounds better, in terms of making it explicit to users that pipelined edges > are replaced with blocking edges. > > Concerning absence of knobs tuning the edge types, we can introduce a > configuration option. Since currently the edge types are fixed based on the > job execution mode and are not exposed to users, I'd suggest introducing a > configuration option that only affects fine-grained resource management use > cases. To be specific, we can have something like > 'fine-grained.xxx.all-blocking'. The default value should be false, and we > can suggest users to set it to true in the error message. When set to true, > this should take effect only when fine-grained resource requirements are > detected. Thus, it should not affect the default execution-mode based edge > type strategy for non fine-grained use cases. > > > Thank you~ > > Xintong Song > > > > On Mon, Jun 21, 2021 at 8:59 PM Yangze Guo wrote: > > > Thanks for the feedback, Till! > > > > Actually, we cannot give user any resolution for this issue as there > > is no API for DataStream users to influence the edge types at the > > moment. The edge types are currently fixed based on the jobs' mode > > (batch or streaming). > > a) I think it might not confuse the user a lot as the behavior has > > never been documented or guaranteed to be unchanged. > > b) Thanks for your illustration. I agree that add complexity can make > > other feature development harder in the future. However, I think this > > might not introduce much complexity. In this case, we construct an > > all-edges-blocking job graph, which already exists since 1.11 and > > should have been considered by the following features. I admit we > > cannot assume the all-edges-blocking job graph will exist forever in > > Flink, but AFAIK there is no seeable feature that will intend to > > deprecate it. > > > > WDYT? > > > > > > > > Best, > > Yangze Guo > > > > On Mon, Jun 21, 2021 at 6:10 PM Till Rohrmann > > wrote: > > > > > > I would be more in favor of what Zhu Zhu proposed to throw an exception > > > with a meaningful and understandable explanation that also includes how > > to > > > resolve this problem. I do understand the reasoning behind automatically > > > switching the edge types in order to make things easier to use but a) > > this > > > can also be confusing if the user does not expect this to happen and b) > > it > > > can add some complexity which makes other feature development harder in > > the > > > future because users might rely on it. An example of such a case I > > stumbled > > > upon rather recently is that we adjust the maximum parallelism wrt the > > > given savepoint if it has not been explicitly configured. On the paper > > this > > > sounds like a good usability improvement, however, for the > > > AdaptiveScheduler it posed a quite annoying complexity. If instead, we > > said > > > that we fail the job submission if the max parallelism does not equal the > > > max parallelism of the savepoint, it would have been a lot easier. > > > > > > Cheers, > > > Till > > > > > > On Mon, Jun 21, 2021 at 9:36 AM Yangze Guo wrote: > > > > > > > Thanks, I append it to the known limitations of this FLIP. > > > > > > > > Best, > > > > Yangze Guo > > > > > > > > On Mon, Jun 21, 2021 at 3:20 PM Zhu Zhu wrote: > > > > > > > > > > Thanks for the quick response Yangze. > > > > > The proposal sounds good to me. > > > > > > > > > > Thanks, > > > > > Zhu > > > > > > > > > > Yangze Guo 于2021年6月21日周一 下午3:01写道: > > > > >> > > > > >> Thanks for the comments, Zhu! > > > > >> > > > > >> Yes, it is a known limitation for fine-grained resource management. > > We > > > > >> also have filed this issue in FLINK-20865 when we proposed FLIP-156. > > > > >> > > > > >> As a first step, I agree that we can mark batch jobs with PIPELINED > > > > >> edges as an invalid case for this feature. However, just throwing an > > > > >> exception, in that case, might confuse users who do not understand > > the > > > > >> concept of pipeline region. Maybe we can force all the edges in this > > > > >> scenario to BLOCKING in compiling stage and well document it. So > > that, > > > > >> common users will not be interrupted while the expert users can > > > > >> understand the cost of that usage and make their decision. WDYT? > > > > >> > > > > >> Best, > > > > >> Yangze Guo > > > > >> > > > > >> On Mon, Jun 21, 2021 at 2:24 PM Zhu Zhu wrote: > > > > >> > > > > > >> > Thanks for proposing this @Yangze Guo and sorry for joining the > > > > discussion so late. > > > > >> > The proposal generally looks good to me. But I find
[jira] [Created] (FLINK-23079) HiveTableSinkITCase fails
JING ZHANG created FLINK-23079: -- Summary: HiveTableSinkITCase fails Key: FLINK-23079 URL: https://issues.apache.org/jira/browse/FLINK-23079 Project: Flink Issue Type: Bug Components: Connectors / Hive Reporter: JING ZHANG There are 4 tests in HiveTableSinkITCase fails: testBatchAppend, testPartStreamingMrWrite, testHiveTableSinkWithParallelismInStreaming, testStreamingSinkWithTimestampLtzWatermark https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19241&view=logs&j=e25d5e7e-2a9c-5589-4940-0b638d75a414&t=a6e0f756-5bb9-5ea8-a468-5f60db442a29 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23080) Add finish method to the SinkFunction
Dawid Wysakowicz created FLINK-23080: Summary: Add finish method to the SinkFunction Key: FLINK-23080 URL: https://issues.apache.org/jira/browse/FLINK-23080 Project: Flink Issue Type: Sub-task Components: Runtime / Checkpointing Reporter: Dawid Wysakowicz Fix For: 1.14.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)