Re: Help. Who can add permission in FLIP.
Hey, I gave you edit permissions in the Flink wiki! On Mon, May 17, 2021 at 3:30 AM wrote: > Hi,I want to write a FLIP in [confluence]( > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals).Who > can help? Thx. > My username is wangwj.My email is wangw...@sina.cn. > > >
[jira] [Created] (FLINK-22678) Fix Loading Changelog Statebackend with configs set in job-level and cluster-level separately
Yuan Mei created FLINK-22678: Summary: Fix Loading Changelog Statebackend with configs set in job-level and cluster-level separately Key: FLINK-22678 URL: https://issues.apache.org/jira/browse/FLINK-22678 Project: Flink Issue Type: Bug Reporter: Yuan Mei -- This message was sent by Atlassian Jira (v8.3.4#803005)
Status of a savepoint operation returns Completed but an error was thrown
Hi guys, We developed some scripts to improve the rolling updates in our pipelines, and one of the tasks done is to trigger a savepoint and waits for the response until the status is Completed or until it achieves the limit of retries. It was noticed that sometimes the response has the status Completed but the request failed: { "status": { "id": "COMPLETED" }, "operation": { "failure-cause": { "class": "java.util.concurrent.CompletionException", "stack-trace": "java.util.concurrent.CompletionException: )\n\t... 47 more\n", "serialized-throwable": "..." } } } An easy way to reproduce the issue is to put the job in a restart loop and trigger a savepoint. Should the status be in-progress, right?
[jira] [Created] (FLINK-22677) Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion
Jin Xing created FLINK-22677: Summary: Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion Key: FLINK-22677 URL: https://issues.apache.org/jira/browse/FLINK-22677 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Reporter: Jin Xing Current scheduler enforces a synchronous registration though the API of ShuffleMaster#registerPartitionWithProducer returns a CompletableFuture. In scenario of remote shuffle service, the talk between ShuffleMaster and remote cluster tends to be expensive. A synchronous registration risks to block main thread potentially and might cause negative side effects like heartbeat timeout. Additionally, expensive synchronous invokes to remote could bottleneck the throughput for applying shuffle resource, especially for batch jobs with complicated DAGs; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22676) The partition tracker should support remote shuffle properly
Jin Xing created FLINK-22676: Summary: The partition tracker should support remote shuffle properly Key: FLINK-22676 URL: https://issues.apache.org/jira/browse/FLINK-22676 Project: Flink Issue Type: Sub-task Components: Runtime / Network Reporter: Jin Xing In current Flink, data partition is bound with the ResourceID of TM in Execution#startTrackingPartitions and partition tracker will stop tracking corresponding partitions when a TM disconnects(JobMaster#disconnectTaskManager), i.e. the lifecycle of shuffle data is bound with computing resource (TM). It works fine for internal shuffle service, but doesn't for remote shuffle service. Note that shuffle data is accommodated on remote, the lifecycle of a completed partition is capable to be decoupled with TM, i.e. TM is totally fine to be released when no computing task on it and further shuffle reading requests could be directed to remote shuffle cluster. In addition, when a TM is lost, its completed data partitions on remote shuffle cluster could avoid reproducing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22675) Add an interface method ShuffleMaster#close
Jin Xing created FLINK-22675: Summary: Add an interface method ShuffleMaster#close Key: FLINK-22675 URL: https://issues.apache.org/jira/browse/FLINK-22675 Project: Flink Issue Type: Sub-task Components: Runtime / Network Reporter: Jin Xing When extending remote shuffle service based on 'pluggable shuffle service', ShuffleMaster talks with remote cluster by network connection. This Jira proposes to add an interface method – ShuffleMaster#close, which can be extended and do cleanup work and will be called when Flink application is closed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22673) Add document about add jar related commands
Shengkai Fang created FLINK-22673: - Summary: Add document about add jar related commands Key: FLINK-22673 URL: https://issues.apache.org/jira/browse/FLINK-22673 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Shengkai Fang Fix For: 1.14.0 Including {{ADD JAR}}, {{SHOW JAR}}, {{REMOVE JAR}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22674) Provide JobID when apply shuffle resource by ShuffleMaster
Jin Xing created FLINK-22674: Summary: Provide JobID when apply shuffle resource by ShuffleMaster Key: FLINK-22674 URL: https://issues.apache.org/jira/browse/FLINK-22674 Project: Flink Issue Type: Sub-task Components: Runtime / Network Reporter: Jin Xing In current Flink 'pluggable shuffle service' framework, only PartitionDescriptor and ProducerDescriptor are included as parameters in ShuffleMaster#registerPartitionWithProducer. But when extending a remote shuffle service based on 'pluggable shuffle service', JobID is also needed when apply shuffle resource from remote cluster. It can be used as an identification to link shuffle resource with the corresponding job: # Remote shuffle cluster can isolate or do capacity control on shuffle resource between jobs; # Remote shuffle cluster can use JobID for shuffle data cleanup when job is lost thus to avoid file leak; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22672) Some enhancements for pluggable shuffle service framework
Jin Xing created FLINK-22672: Summary: Some enhancements for pluggable shuffle service framework Key: FLINK-22672 URL: https://issues.apache.org/jira/browse/FLINK-22672 Project: Flink Issue Type: Improvement Components: Runtime / Network Reporter: Jin Xing "Pluggable shuffle service" in Flink provides an architecture which are unified for both streaming and batch jobs, allowing user to customize the process of data transfer between shuffle stages according to scenarios. There are already a number of implementations of "remote shuffle service" on Spark like [1][2][3]. Remote shuffle enables to shuffle data from/to a remote cluster and achieves benefits like : # The lifecycle of computing resource can be decoupled with shuffle data, once computing task is finished, idle computing nodes can be released with its completed shuffle data accormadated on remote shuffle cluster. # There is no need to reserve disk capacity for shuffle on computing nodes. Remote shuffle cluster serves shuffling request with better scaling ability and alleviates the local disk pressure on computing nodes when data skew. Based "pluggable shuffle service", we build our own "remote shuffle service" on Flink -- Lattice, which targets to provide functionalities and improve performance for batch processing jobs. Basically it works as below: # Lattice cluster works as an independent service for shuffling request; # LatticeShuffleMaster extends ShuffleMaster, works inside JM and talks with remote Lattice cluster for shuffle resouce application and shuffle data lifecycle management; # LatticeShuffleEnvironmente extends ShuffleEnvironment, works inside TM and provides an environment for shuffling data from/to remote Lattice cluster; During the process of building Lattice we find some potential enhancements on "pluggable shuffle service". I will enumerate and create some sub JIRAs under this umbrella [1] [https://www.alibabacloud.com/blog/emr-remote-shuffle-service-a-powerful-elastic-tool-of-serverless-spark_597728] [2] [https://bestoreo.github.io/post/cosco/cosco/] [3] [https://github.com/uber/RemoteShuffleService] -- This message was sent by Atlassian Jira (v8.3.4#803005)
Help. Who can add permission in FLIP.
Hi,I want to write a FLIP in [confluence](https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals).Who can help? Thx. My username is wangwj.My email is wangw...@sina.cn.
Re: [VOTE] Release 1.12.4, release candidate #1
+1 (binding) * Verified checksums and signatures * Checked no significant version changes compared to 1.12.3 (one new test scope dependency) * Checked no changes to the NOTICE files * Built from sources * Run example using binary 2.12 distribution * verified a random class in flink-scala_2.11 and _2.12 if it was compiled with the correct scala version Best, Dawid On 10/05/2021 23:34, Arvid Heise wrote: > Hi everyone, > > Please review and vote on the release candidate #1 for the version 1.12.4, > as follows: > [ ] +1, Approve the release > [ ] -1, Do not approve the release (please provide specific comments) > > The complete staging area is available for your review, which includes: > * JIRA release notes [1], > * the official Apache source release and binary convenience releases to be > deployed to dist.apache.org [2], which are signed with the key with > fingerprint 476DAA5D1FF08189 [3], > * all artifacts to be deployed to the Maven Central Repository [4], > * source code tag "release-1.12.4-rc1" [5], > * website pull request listing the new release and adding announcement blog > post [6]. > > The vote will be open for at least 72 hours. It is adopted by majority > approval, with at least 3 PMC affirmative votes. > > Thanks, > Your friendly release manager Arvid > > [1] > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350110 > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.12.4-rc1/ > [3] https://dist.apache.org/repos/dist/release/flink/KEYS > [4] https://repository.apache.org/content/repositories/orgapacheflink-1421 > [5] https://github.com/apache/flink/releases/tag/release-1.12.4-rc1 > [6] https://github.com/apache/flink-web/pull/446 > OpenPGP_signature Description: OpenPGP digital signature
[jira] [Created] (FLINK-22671) xxx
王彬 created FLINK-22671: -- Summary: xxx Key: FLINK-22671 URL: https://issues.apache.org/jira/browse/FLINK-22671 Project: Flink Issue Type: Bug Reporter: 王彬 -- This message was sent by Atlassian Jira (v8.3.4#803005)