[VOTE] FLIP-150: Introduce Hybrid Source
Hi everyone, Thanks for all the feedback to Hybrid Source so far. Based on the discussion[1] we seem to have consensus, so I would like to start a vote on FLIP-150 for which the FLIP has also been updated[2]. The vote will last for at least 72 hours (Sun, Jul 4th 12:00 GMT) unless there is an objection or insufficient votes. Thanks, Nicholas Jiang [1] https://lists.apache.org/thread.html/r94057d19f0df2a211695820375502d60cddeeab5ad27057c1ca988d6%40%3Cdev.flink.apache.org%3E [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source
[jira] [Created] (FLINK-23198) An old interface method is used in this section of [Passing Options Factory to RocksDB].
Carl created FLINK-23198: Summary: An old interface method is used in this section of [Passing Options Factory to RocksDB]. Key: FLINK-23198 URL: https://issues.apache.org/jira/browse/FLINK-23198 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 1.12.0 Reporter: Carl Attachments: image-2021-07-01-11-30-57-676.png, image-2021-07-01-11-32-25-200.png [https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/state/state_backends.html] !image-2021-07-01-11-30-57-676.png! In version 1.12 of Flink, this method has been replaced by the following one: !image-2021-07-01-11-32-25-200.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23197) Errorneous description about 'TIMESTAMP WITH TIME ZONE' date type
wangqinghuan created FLINK-23197: Summary: Errorneous description about 'TIMESTAMP WITH TIME ZONE' date type Key: FLINK-23197 URL: https://issues.apache.org/jira/browse/FLINK-23197 Project: Flink Issue Type: Bug Components: Documentation Reporter: wangqinghuan Currently, Flink dont support 'TIMESTAMP WITH TIME ZONE' type. We should add an note or remove 'TIMESTAMP WITH TIME ZONE' type description in docs.[1] [1]https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/types/#date-and-time -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23196) JobMasterITCase fail on azure due to BindException
Xintong Song created FLINK-23196: Summary: JobMasterITCase fail on azure due to BindException Key: FLINK-23196 URL: https://issues.apache.org/jira/browse/FLINK-23196 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.14.0 Reporter: Xintong Song Fix For: 1.14.0 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19753=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56=4251 {code} Jul 01 00:00:27 [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.272 s <<< FAILURE! - in org.apache.flink.runtime.jobmaster.JobMasterITCase Jul 01 00:00:27 [ERROR] testRejectionOfEmptyJobGraphs(org.apache.flink.runtime.jobmaster.JobMasterITCase) Time elapsed: 3.009 s <<< ERROR! Jul 01 00:00:27 org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent. Jul 01 00:00:27 at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:275) Jul 01 00:00:27 at org.apache.flink.runtime.minicluster.MiniCluster.createDispatcherResourceManagerComponents(MiniCluster.java:470) Jul 01 00:00:27 at org.apache.flink.runtime.minicluster.MiniCluster.setupDispatcherResourceManagerComponents(MiniCluster.java:429) Jul 01 00:00:27 at org.apache.flink.runtime.minicluster.MiniCluster.start(MiniCluster.java:373) Jul 01 00:00:27 at org.apache.flink.runtime.jobmaster.JobMasterITCase.testRejectionOfEmptyJobGraphs(JobMasterITCase.java:56) Jul 01 00:00:27 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Jul 01 00:00:27 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) Jul 01 00:00:27 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Jul 01 00:00:27 at java.lang.reflect.Method.invoke(Method.java:498) Jul 01 00:00:27 at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) Jul 01 00:00:27 at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) Jul 01 00:00:27 at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) Jul 01 00:00:27 at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) Jul 01 00:00:27 at org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) Jul 01 00:00:27 at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) Jul 01 00:00:27 at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) Jul 01 00:00:27 at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) Jul 01 00:00:27 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) Jul 01 00:00:27 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) Jul 01 00:00:27 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) Jul 01 00:00:27 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) Jul 01 00:00:27 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) Jul 01 00:00:27 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) Jul 01 00:00:27 at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) Jul 01 00:00:27 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) Jul 01 00:00:27 at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) Jul 01 00:00:27 at org.junit.runners.ParentRunner.run(ParentRunner.java:413) Jul 01 00:00:27 at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) Jul 01 00:00:27 at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) Jul 01 00:00:27 at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) Jul 01 00:00:27 at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) Jul 01 00:00:27 at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) Jul 01 00:00:27 at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) Jul 01 00:00:27 at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) Jul 01 00:00:27 at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) Jul 01 00:00:27 Caused by: java.net.BindException: Could not start rest endpoint on any port in port range 8081 Jul 01 00:00:27 at
[jira] [Created] (FLINK-23195) No value for kafka metrics in Flink Web ui
MOBIN created FLINK-23195: - Summary: No value for kafka metrics in Flink Web ui Key: FLINK-23195 URL: https://issues.apache.org/jira/browse/FLINK-23195 Project: Flink Issue Type: Bug Components: Connectors / Kafka, Runtime / Metrics Affects Versions: 1.11.2 Reporter: MOBIN Attachments: image-2021-07-01-10-02-04-483.png, image-2021-07-01-10-03-02-214.png The kafka metrics in the Flink Web ui have no value, but the other metrics are displayed normally !image-2021-07-01-10-02-04-483.png|width=698,height=445! !image-2021-07-01-10-03-02-214.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Dashboard/HistoryServer authentication
> > * Even if there is a major breaking version, Flink releases major versions > too where it could be added > Netty framework locking is true but AFAIK there was a discussion to > rewrite the Netty stuff to a more sexy thing but there was no agreement to > do that. > Flink major releases seem to happen even less frequently than Netty releases :( It would be unfortunate if a breaking Netty API change ended up in the FLINK-3957[1] catch-all 2.0 changes. All in all I would agree on making it experimental. > Thus I am happy with this compromise, thank you :) This would simply restrict use-cases where order is not important. Limiting > devs such an add way is no-go. > I think the only case to be made for imposing limitations would be to encourage devs to only use this API in very specific situations, otherwise to solve this in another way, and revisit the API if these limitations are met and alternatives do not work. That said, I am still trying to understand this specific Cloudera case – anything you can say about the limitations of its Flink setup (i.e, difficult to spawn sidecar processes (because of Yarn?)) would be greatly helpful to me and others without this bit of context. But I think the proposed priority function that you've added is a nice compromise as well, so +1 from my side with the proposal. I would only further suggest that we include the other options to this problem in the docs as the preferred approach, where possible. Thanks, Austin [1]: https://issues.apache.org/jira/browse/FLINK-3957 On Wed, Jun 30, 2021 at 10:25 AM Gabor Somogyi wrote: > Answered here because the text started to be crowded. > > > It also locks Flink into the current major version of Netty (and the > Netty framework itself) for the foreseeable future. > It's not doing any Netty version locking because: > * Netty not necessarily will add breaking changes in major versions, the > API is quite stable > * Even if there is a major breaking version, Flink releases major versions > too where it could be added > Netty framework locking is true but AFAIK there was a discussion to > rewrite the Netty stuff to a more sexy thing but there was no agreement to > do that. > All in all I would agree on making it experimental. > > > why not restrict the service loader to only allow one? > This would simply restrict use-cases where order is not important. > Limiting devs such an add way is no-go. > I think the ordering came up multiple places which I think is a good > reason fill this gap with a priority function. > I've updated the doc and added it... > > BR, > G > > > On Wed, Jun 30, 2021 at 3:53 PM Austin Cawley-Edwards < > austin.caw...@gmail.com> wrote: > >> Hi Gabor, >> >> Thanks for your answers. I appreciate the explanations. Please see my >> responses + further questions below. >> >> >> * What stability semantics do you envision for this API? >>> As I foresee the API will be as stable as Netty API. Since there is >>> guarantee on no breaking changes between minor versions we can give the >>> same guarantee. >>> If for whatever reason we need to break it we can do it in major version >>> like every other open source project does. >>> >> >> * Does Flink expose dependencies’ APIs in other places? Since this exposes the Netty API, will this make it difficult to upgrade Netty? >>> I don't expect breaking changes between minor versions so such cases >>> there will be no issues. If there is a breaking change in major version >>> we need to wait Flink major version too. >>> >> >> To clarify, you are proposing this new API to have the same stability >> guarantees as @Public currently does? Where we will not introduce breaking >> changes unless absolutely necessary (and requiring a FLIP, etc.)? >> >> If this is the case, I think this puts the community in a tough position >> where we are forced to maintain compatibility with something that we do not >> have control over. It also locks Flink into the current major version of >> Netty (and the Netty framework itself) for the foreseeable future. >> >> I am saying we should not do this, perhaps this is the best solution to >> finding a good compromise here, but I am trying to discover + acknowledge >> the full implications of this proposal so they can be discussed. >> >> What do you think about marking this API as @Experimental and not >> guaranteeing stability between versions? Then, if we do decide we need to >> upgrade Netty (or move away from it), we can do so. >> >> * I share Till's concern about multiple factories – other HTTP middleware frameworks commonly support chaining middlewares. Since the proposed API does not include these features/guarantee ordering, do you see any reason to allow more than one factory? >>> I personally can't come up with a use-case where ordering is a must. I'm >>> not telling that this is not a valid use-case but adding a feature w/o >>> business rationale would include the maintenance cost (though I'm open to
Re: Job Recovery Time on TM Lost
Thanks Gen! cc flink-dev to collect more inputs. Best Lu On Wed, Jun 30, 2021 at 12:55 AM Gen Luo wrote: > I'm also wondering here. > > In my opinion, it's because the JM can not confirm whether the TM is lost > or it's a temporary network trouble and will recover soon, since I can see > in the log that akka has got a Connection refused but JM still sends a > heartbeat request to the lost TM until it reaches heartbeat timeout. But > I'm not sure if it's indeed designed like this. > > I would really appreciate it if anyone who knows more details could > answer. Thanks. >
[jira] [Created] (FLINK-23194) Cache and reuse the ContainerLaunchContext and accelarate the progress of createTaskExecutorLaunchContext on yarn
zlzhang0122 created FLINK-23194: --- Summary: Cache and reuse the ContainerLaunchContext and accelarate the progress of createTaskExecutorLaunchContext on yarn Key: FLINK-23194 URL: https://issues.apache.org/jira/browse/FLINK-23194 Project: Flink Issue Type: Improvement Components: Deployment / YARN Affects Versions: 1.12.4, 1.13.1 Reporter: zlzhang0122 Fix For: 1.14.0 When starting the TaskExecutor in container on yarn, this will create ContainerLaunchContext for n times(n represent the number of the TaskManager). When I examine the progress of this creation, I found that most of them were in common and have nothing to do with the particular TaskManager except the launchCommand. We can create ContainerLaunchContext once and reuse it. Only the launchCommand need to create separately for every particular TaskManager. So I propose that we can cache and reuse the ContainerLaunchContext object to accelerate this creation progress. I think this can have some benefit like below: # this can accelerate the creation of ContainerLaunchContext and also the start of the TaskExecutor, especially under the situation of massive TaskManager. # this can decrease the pressure of the HDFS, etc. # this can also avoid the suddenly failure of the HDFS or yarn, etc. We have implemented this on our production environment. So far there has no problem and have a good benefit. Let me know if there's any point that I haven't considered. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[VOTE] Migrating Test Framework to JUnit 5
Dear devs, As discussed in the thread[1], I’d like to start a vote on migrating the test framework of Flink project to JUnit 5. JUnit 5[2] provides many exciting new features that can simplify the development of test cases and make our test structure cleaner, such as pluggable extension models (replacing rules such as TestLogger/MiniCluster), improved parameterized test, annotation support and nested/dynamic test. The migration path towards JUnit 5 would be: 1. Remove JUnit 4 dependency and introduce junit5-vintage-engine in the project The vintage engine will keep all existing JUnit4-style cases still valid. Since classes of JUnit 4 and 5 are located under different packages, there won’t be conflict having JUnit 4 cases in the project. 2. Rewrite JUnit 4 rules in JUnit 5 extension style (~10 rules) 3. Migrate all existing tests to JUnit 5 This would be a giant commit similar to code reformatting and could be merged after cutting the 1.14 release branch. There are many migration examples and experiences to refer to, also Intellij IDEA provides tools for refactoring. 4. Ban JUnit 4 imports in CheckStyle Some modules ilke Testcontainers still require JUnit 4 in the classpath, and JUnit 4 could still appear as transitive dependency. Banning JUnit 4 imports can avoid developers mistakenly using JUnit 4 and split the project into 4 & 5 again. 5. Remove vintage runner and some cleanup This vote will last for at least 72 hours, following the consensus voting process. Thanks! [1] https://lists.apache.org/thread.html/r6c8047c7265b8a9f2cb3ef6d6153dd80b94d36ebb03daccf36ab4940%40%3Cdev.flink.apache.org%3E [2] https://junit.org/junit5 -- Best Regards, *Qingsheng Ren* Email: renqs...@gmail.com
Re: [DISCUSS] Moving to JUnit5
Thanks Arvid for the feedback! I think merging the giant commit after cutting 1.14 release branch would be a good idea. Since no one has objections in the discussion, I’d like to move a step forward and raise a vote on the migration. Looking forward to having JUnit 5 in the 1.14 release cycle! -- Best Regards, Qingsheng Ren Email: renqs...@gmail.com On Jun 29, 2021, 9:20 PM +0800, Arvid Heise , wrote: > Hi Qingsheng, > > I like the idea of enforcing JUnit5 tests with checkstyle. I'm assuming > JUnit4 will bleed from time to time into the test classpath. > > Obviously, we can only do that after all tests are migrated and we are > confident that no small change would require a contributor to do the > migration of a test in an unrelated change. > > For the big commit, I'd propose to have it after branch cut for 1.14 > release. So for 1.14 we would just have the coexistance PR with the vintage > engine. In that way, the least possible number of contributors should be > affected. Of course, the big commit can be prepared beforehand. > > On Tue, Jun 29, 2021 at 11:44 AM Qingsheng Ren wrote: > > > Thanks for wrapping things up and effort on the migration Arvid! > > > > I’m +1 for the migration plan. > > > > To summarize the migration path proposed by Arvid: > > > > 1. Remove JUnit 4 dependency and introduce junit5-vintage-engine in the > > project (all existing cases will still work) > > 2. Rewrite JUnit 4 rules in JUnit 5 extension style (~10 rules) > > 3. Migrate all existing tests to JUnit 5 (This is a giant commit similar > > to code formatting) > > 4. Remove vintage runner and some cleanup > > > > One issue is that we cannot totally get rid of JUnit 4 dependency from the > > project because: > > > > 1. Testcontainers. As mentioned in their official docs[1], Testcontainers > > still depends on JUnit 4, and the problem might not be solved until > > Testcontainer 2, which still has no roadmap[2]. > > 2. It’s still possible to appear as transitive dependency > > > > Since classes of JUnit 4 and 5 are under different packages, there won’t > > be conflict having JUnit 4 in the project. To prevent the project splitting > > into 4 & 5 again, we can ban JUnit 4 imports in CheckStyle to prevent > > developers to write test cases in JUnit 4 style intentionally or mistakenly. > > > > I’m happy and willing to take over the migration work. This migration > > indeed takes some efforts, but it will help with test case developing in > > the future. > > > > [1] https://www.testcontainers.org/test_framework_integration/junit_5/ > > [2] > > https://github.com/testcontainers/testcontainers-java/issues/970#issuecomment-437273363 > > > > -- > > Best Regards, > > > > Qingsheng Ren > > Email: renqs...@gmail.com > > On Jun 16, 2021, 3:13 AM +0800, Arvid Heise , wrote: > > > Sorry for following up so late. A while ago, I spiked a junit 5 > > migration. > > > > > > To recap: here is the migration plan. > > > > > > 0. (There is a way to use JUnit4 + 5 at the same time in a project - > > you'd > > > > use a specific JUnit4 runner to execute JUnit5. I'd like to skip this > > > > step as it would slow down migration significantly) > > > > 1. Use JUnit5 with vintage runner. JUnit4 tests run mostly out of the > > > > box. The most important difference is that only 3 base rules are > > supported > > > > and the remainder needs to be migrated. Luckily, most of our rules > > derive > > > > from the supported ExternalResource. So in this step, we would need to > > > > migrate the rules. > > > > 2. Implement new tests in JUnit5. > > > > 3. Soft-migrate old tests in JUnit5. This is mostly a renaming of > > > > annotation (@Before -> @BeforeEach, etc.). Adjust parameterized tests > > > > (~400), replace rule usages (~670) with extensions, exception handling > > > > (~1600 tests), and timeouts (~200). This can be done on a test class by > > > > test class base and there is no hurry. > > > > 4. Remove vintage runner, once most tests are migrated by doing a final > > > > push for lesser used modules. > > > > > > > > > > Here are my insights: > > > 0. works but I don't see the benefit > > > 1. works well [1] with a small diff [2]. Note that the branch is based > > on a > > > quite old master. > > > 2. works well as well [3]. > > > 2a. However, we should be aware that we need to port quite a few rules to > > > extensions before we can implement more complex JUnit5 tests, especially > > > ITCases (I'd probably skip junit-jupiter-migrationsupport that allows us > > to > > > reuse _some_ rules using specific base classes). We have ~10-15 rules > > that > > > need to be ported. > > > 3. Soft migration will take forever and probably never finish. Many tests > > > can be automatically ported with some (I used 8) simple regexes. I'd > > rather > > > do a hard migration of all tests at a particular point (no freeze) and > > have > > > that git commit excluded from blame, similar to the spotless commit. > > > 3a. A huge chunk of changes (>90%) comes from
Re: [DISCUSS] Moving to JUnit5
Thanks Arvid for the feedback! I think merging the giant commit after cutting 1.14 release branch would be a good idea. Since no one has objections in the discussion, I’d like to move a step forward and raise a vote on the migration. Looking forward to having JUnit 5 in the 1.14 release cycle! -- Best Regards, Qingsheng Ren Email: renqs...@gmail.com On Jun 29, 2021, 9:20 PM +0800, Arvid Heise , wrote: > Hi Qingsheng, > > I like the idea of enforcing JUnit5 tests with checkstyle. I'm assuming > JUnit4 will bleed from time to time into the test classpath. > > Obviously, we can only do that after all tests are migrated and we are > confident that no small change would require a contributor to do the > migration of a test in an unrelated change. > > For the big commit, I'd propose to have it after branch cut for 1.14 > release. So for 1.14 we would just have the coexistance PR with the vintage > engine. In that way, the least possible number of contributors should be > affected. Of course, the big commit can be prepared beforehand. > > On Tue, Jun 29, 2021 at 11:44 AM Qingsheng Ren wrote: > > > Thanks for wrapping things up and effort on the migration Arvid! > > > > I’m +1 for the migration plan. > > > > To summarize the migration path proposed by Arvid: > > > > 1. Remove JUnit 4 dependency and introduce junit5-vintage-engine in the > > project (all existing cases will still work) > > 2. Rewrite JUnit 4 rules in JUnit 5 extension style (~10 rules) > > 3. Migrate all existing tests to JUnit 5 (This is a giant commit similar > > to code formatting) > > 4. Remove vintage runner and some cleanup > > > > One issue is that we cannot totally get rid of JUnit 4 dependency from the > > project because: > > > > 1. Testcontainers. As mentioned in their official docs[1], Testcontainers > > still depends on JUnit 4, and the problem might not be solved until > > Testcontainer 2, which still has no roadmap[2]. > > 2. It’s still possible to appear as transitive dependency > > > > Since classes of JUnit 4 and 5 are under different packages, there won’t > > be conflict having JUnit 4 in the project. To prevent the project splitting > > into 4 & 5 again, we can ban JUnit 4 imports in CheckStyle to prevent > > developers to write test cases in JUnit 4 style intentionally or mistakenly. > > > > I’m happy and willing to take over the migration work. This migration > > indeed takes some efforts, but it will help with test case developing in > > the future. > > > > [1] https://www.testcontainers.org/test_framework_integration/junit_5/ > > [2] > > https://github.com/testcontainers/testcontainers-java/issues/970#issuecomment-437273363 > > > > -- > > Best Regards, > > > > Qingsheng Ren > > Email: renqs...@gmail.com > > On Jun 16, 2021, 3:13 AM +0800, Arvid Heise , wrote: > > > Sorry for following up so late. A while ago, I spiked a junit 5 > > migration. > > > > > > To recap: here is the migration plan. > > > > > > 0. (There is a way to use JUnit4 + 5 at the same time in a project - > > you'd > > > > use a specific JUnit4 runner to execute JUnit5. I'd like to skip this > > > > step as it would slow down migration significantly) > > > > 1. Use JUnit5 with vintage runner. JUnit4 tests run mostly out of the > > > > box. The most important difference is that only 3 base rules are > > supported > > > > and the remainder needs to be migrated. Luckily, most of our rules > > derive > > > > from the supported ExternalResource. So in this step, we would need to > > > > migrate the rules. > > > > 2. Implement new tests in JUnit5. > > > > 3. Soft-migrate old tests in JUnit5. This is mostly a renaming of > > > > annotation (@Before -> @BeforeEach, etc.). Adjust parameterized tests > > > > (~400), replace rule usages (~670) with extensions, exception handling > > > > (~1600 tests), and timeouts (~200). This can be done on a test class by > > > > test class base and there is no hurry. > > > > 4. Remove vintage runner, once most tests are migrated by doing a final > > > > push for lesser used modules. > > > > > > > > > > Here are my insights: > > > 0. works but I don't see the benefit > > > 1. works well [1] with a small diff [2]. Note that the branch is based > > on a > > > quite old master. > > > 2. works well as well [3]. > > > 2a. However, we should be aware that we need to port quite a few rules to > > > extensions before we can implement more complex JUnit5 tests, especially > > > ITCases (I'd probably skip junit-jupiter-migrationsupport that allows us > > to > > > reuse _some_ rules using specific base classes). We have ~10-15 rules > > that > > > need to be ported. > > > 3. Soft migration will take forever and probably never finish. Many tests > > > can be automatically ported with some (I used 8) simple regexes. I'd > > rather > > > do a hard migration of all tests at a particular point (no freeze) and > > have > > > that git commit excluded from blame, similar to the spotless commit. > > > 3a. A huge chunk of changes (>90%) comes from
Re: [DISCUSS] Dashboard/HistoryServer authentication
Answered here because the text started to be crowded. > It also locks Flink into the current major version of Netty (and the Netty framework itself) for the foreseeable future. It's not doing any Netty version locking because: * Netty not necessarily will add breaking changes in major versions, the API is quite stable * Even if there is a major breaking version, Flink releases major versions too where it could be added Netty framework locking is true but AFAIK there was a discussion to rewrite the Netty stuff to a more sexy thing but there was no agreement to do that. All in all I would agree on making it experimental. > why not restrict the service loader to only allow one? This would simply restrict use-cases where order is not important. Limiting devs such an add way is no-go. I think the ordering came up multiple places which I think is a good reason fill this gap with a priority function. I've updated the doc and added it... BR, G On Wed, Jun 30, 2021 at 3:53 PM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Hi Gabor, > > Thanks for your answers. I appreciate the explanations. Please see my > responses + further questions below. > > > * What stability semantics do you envision for this API? >>> >> As I foresee the API will be as stable as Netty API. Since there is >> guarantee on no breaking changes between minor versions we can give the >> same guarantee. >> If for whatever reason we need to break it we can do it in major version >> like every other open source project does. >> > > * Does Flink expose dependencies’ APIs in other places? Since this exposes >>> the Netty API, will this make it difficult to upgrade Netty? >>> >> I don't expect breaking changes between minor versions so such cases >> there will be no issues. If there is a breaking change in major version >> we need to wait Flink major version too. >> > > To clarify, you are proposing this new API to have the same stability > guarantees as @Public currently does? Where we will not introduce breaking > changes unless absolutely necessary (and requiring a FLIP, etc.)? > > If this is the case, I think this puts the community in a tough position > where we are forced to maintain compatibility with something that we do not > have control over. It also locks Flink into the current major version of > Netty (and the Netty framework itself) for the foreseeable future. > > I am saying we should not do this, perhaps this is the best solution to > finding a good compromise here, but I am trying to discover + acknowledge > the full implications of this proposal so they can be discussed. > > What do you think about marking this API as @Experimental and not > guaranteeing stability between versions? Then, if we do decide we need to > upgrade Netty (or move away from it), we can do so. > > * I share Till's concern about multiple factories – other HTTP middleware >>> frameworks commonly support chaining middlewares. Since the proposed API >>> does not include these features/guarantee ordering, do you see any reason >>> to allow more than one factory? >>> >> I personally can't come up with a use-case where ordering is a must. I'm >> not telling that this is not a valid use-case but adding a feature w/o >> business rationale would include the maintenance cost (though I'm open to >> add). >> As I've seen Till also can't give example for that (please see the doc >> comments). If you have anything in mind please share it and we can add >> priority to the API. >> There is another option too, namely we can be defensive and we can add >> the priority right now. I would do this only if everybody states in mail >> that it would be the best option, >> otherwise I would stick to the original plan. >> > > Let me try to come up with a use case: > * Someone creates an authentication module for integrating with Google's > OAuth and publishes it to flink-packages > * Another person in another org wants to use Google OAuth and then add > internal authorization based on the user > * In this scenario, *Google OAuth must come before the internal > authorization* > * They place their module and the Google OAuth module to be picked up by > the service loader > * What happens? > > I do not think that the current proposal has a way to handle this, besides > having the implementor of the internal authorization module bundle > everything into one, as you have suggested. Since this is the only way to > achieve order, why not restrict the service loader to only allow one? This > way the API is explicit in what it supports. > > > Let me know what you think, > Austin > > > On Wed, Jun 30, 2021 at 5:24 AM Gabor Somogyi > wrote: > >> Hi Austin, >> >> Please see my answers embedded down below. >> >> BR, >> G >> >> >> >> On Tue, Jun 29, 2021 at 9:59 PM Austin Cawley-Edwards < >> austin.caw...@gmail.com> wrote: >> >>> Hi all, >>> >>> Thanks for the updated proposal. I have a few questions about the API, >>> please see below. >>> >>> * What stability semantics do you
Re: [DISCUSS] Dashboard/HistoryServer authentication
Small correction: I am *not *saying we should not do this, perhaps this is the best solution to finding a good compromise here, but I am trying to discover + acknowledge the full implications of this proposal so they can be discussed. Sorry :) On Wed, Jun 30, 2021 at 9:53 AM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Hi Gabor, > > Thanks for your answers. I appreciate the explanations. Please see my > responses + further questions below. > > > * What stability semantics do you envision for this API? >>> >> As I foresee the API will be as stable as Netty API. Since there is >> guarantee on no breaking changes between minor versions we can give the >> same guarantee. >> If for whatever reason we need to break it we can do it in major version >> like every other open source project does. >> > > * Does Flink expose dependencies’ APIs in other places? Since this exposes >>> the Netty API, will this make it difficult to upgrade Netty? >>> >> I don't expect breaking changes between minor versions so such cases >> there will be no issues. If there is a breaking change in major version >> we need to wait Flink major version too. >> > > To clarify, you are proposing this new API to have the same stability > guarantees as @Public currently does? Where we will not introduce breaking > changes unless absolutely necessary (and requiring a FLIP, etc.)? > > If this is the case, I think this puts the community in a tough position > where we are forced to maintain compatibility with something that we do not > have control over. It also locks Flink into the current major version of > Netty (and the Netty framework itself) for the foreseeable future. > > I am saying we should not do this, perhaps this is the best solution to > finding a good compromise here, but I am trying to discover + acknowledge > the full implications of this proposal so they can be discussed. > > What do you think about marking this API as @Experimental and not > guaranteeing stability between versions? Then, if we do decide we need to > upgrade Netty (or move away from it), we can do so. > > * I share Till's concern about multiple factories – other HTTP middleware >>> frameworks commonly support chaining middlewares. Since the proposed API >>> does not include these features/guarantee ordering, do you see any reason >>> to allow more than one factory? >>> >> I personally can't come up with a use-case where ordering is a must. I'm >> not telling that this is not a valid use-case but adding a feature w/o >> business rationale would include the maintenance cost (though I'm open to >> add). >> As I've seen Till also can't give example for that (please see the doc >> comments). If you have anything in mind please share it and we can add >> priority to the API. >> There is another option too, namely we can be defensive and we can add >> the priority right now. I would do this only if everybody states in mail >> that it would be the best option, >> otherwise I would stick to the original plan. >> > > Let me try to come up with a use case: > * Someone creates an authentication module for integrating with Google's > OAuth and publishes it to flink-packages > * Another person in another org wants to use Google OAuth and then add > internal authorization based on the user > * In this scenario, *Google OAuth must come before the internal > authorization* > * They place their module and the Google OAuth module to be picked up by > the service loader > * What happens? > > I do not think that the current proposal has a way to handle this, besides > having the implementor of the internal authorization module bundle > everything into one, as you have suggested. Since this is the only way to > achieve order, why not restrict the service loader to only allow one? This > way the API is explicit in what it supports. > > > Let me know what you think, > Austin > > > On Wed, Jun 30, 2021 at 5:24 AM Gabor Somogyi > wrote: > >> Hi Austin, >> >> Please see my answers embedded down below. >> >> BR, >> G >> >> >> >> On Tue, Jun 29, 2021 at 9:59 PM Austin Cawley-Edwards < >> austin.caw...@gmail.com> wrote: >> >>> Hi all, >>> >>> Thanks for the updated proposal. I have a few questions about the API, >>> please see below. >>> >>> * What stability semantics do you envision for this API? >>> >> As I foresee the API will be as stable as Netty API. Since there is >> guarantee on no breaking changes between minor versions we can give the >> same guarantee. >> If for whatever reason we need to break it we can do it in major version >> like every other open source project does. >> >> >>> * Does Flink expose dependencies’ APIs in other places? Since this >>> exposes the Netty API, will this make it difficult to upgrade Netty? >>> >> I don't expect breaking changes between minor versions so such cases >> there will be no issues. If there is a breaking change in major version >> we need to wait Flink major version too. >> >> >>> * I share Till's concern about multiple
Re: [DISCUSS] Dashboard/HistoryServer authentication
Hi Gabor, Thanks for your answers. I appreciate the explanations. Please see my responses + further questions below. * What stability semantics do you envision for this API? >> > As I foresee the API will be as stable as Netty API. Since there is > guarantee on no breaking changes between minor versions we can give the > same guarantee. > If for whatever reason we need to break it we can do it in major version > like every other open source project does. > * Does Flink expose dependencies’ APIs in other places? Since this exposes >> the Netty API, will this make it difficult to upgrade Netty? >> > I don't expect breaking changes between minor versions so such cases there > will be no issues. If there is a breaking change in major version > we need to wait Flink major version too. > To clarify, you are proposing this new API to have the same stability guarantees as @Public currently does? Where we will not introduce breaking changes unless absolutely necessary (and requiring a FLIP, etc.)? If this is the case, I think this puts the community in a tough position where we are forced to maintain compatibility with something that we do not have control over. It also locks Flink into the current major version of Netty (and the Netty framework itself) for the foreseeable future. I am saying we should not do this, perhaps this is the best solution to finding a good compromise here, but I am trying to discover + acknowledge the full implications of this proposal so they can be discussed. What do you think about marking this API as @Experimental and not guaranteeing stability between versions? Then, if we do decide we need to upgrade Netty (or move away from it), we can do so. * I share Till's concern about multiple factories – other HTTP middleware >> frameworks commonly support chaining middlewares. Since the proposed API >> does not include these features/guarantee ordering, do you see any reason >> to allow more than one factory? >> > I personally can't come up with a use-case where ordering is a must. I'm > not telling that this is not a valid use-case but adding a feature w/o > business rationale would include the maintenance cost (though I'm open to > add). > As I've seen Till also can't give example for that (please see the doc > comments). If you have anything in mind please share it and we can add > priority to the API. > There is another option too, namely we can be defensive and we can add the > priority right now. I would do this only if everybody states in mail that > it would be the best option, > otherwise I would stick to the original plan. > Let me try to come up with a use case: * Someone creates an authentication module for integrating with Google's OAuth and publishes it to flink-packages * Another person in another org wants to use Google OAuth and then add internal authorization based on the user * In this scenario, *Google OAuth must come before the internal authorization* * They place their module and the Google OAuth module to be picked up by the service loader * What happens? I do not think that the current proposal has a way to handle this, besides having the implementor of the internal authorization module bundle everything into one, as you have suggested. Since this is the only way to achieve order, why not restrict the service loader to only allow one? This way the API is explicit in what it supports. Let me know what you think, Austin On Wed, Jun 30, 2021 at 5:24 AM Gabor Somogyi wrote: > Hi Austin, > > Please see my answers embedded down below. > > BR, > G > > > > On Tue, Jun 29, 2021 at 9:59 PM Austin Cawley-Edwards < > austin.caw...@gmail.com> wrote: > >> Hi all, >> >> Thanks for the updated proposal. I have a few questions about the API, >> please see below. >> >> * What stability semantics do you envision for this API? >> > As I foresee the API will be as stable as Netty API. Since there is > guarantee on no breaking changes between minor versions we can give the > same guarantee. > If for whatever reason we need to break it we can do it in major version > like every other open source project does. > > >> * Does Flink expose dependencies’ APIs in other places? Since this >> exposes the Netty API, will this make it difficult to upgrade Netty? >> > I don't expect breaking changes between minor versions so such cases there > will be no issues. If there is a breaking change in major version > we need to wait Flink major version too. > > >> * I share Till's concern about multiple factories – other HTTP middleware >> frameworks commonly support chaining middlewares. Since the proposed API >> does not include these features/guarantee ordering, do you see any reason >> to allow more than one factory? >> > I personally can't come up with a use-case where ordering is a must. I'm > not telling that this is not a valid use-case but adding a feature w/o > business rationale would include the maintenance cost (though I'm open to > add). > As I've seen Till also can't give
[jira] [Created] (FLINK-23193) Support to display Options when using desc table statement
Fangliang Liu created FLINK-23193: - Summary: Support to display Options when using desc table statement Key: FLINK-23193 URL: https://issues.apache.org/jira/browse/FLINK-23193 Project: Flink Issue Type: New Feature Components: Table SQL / Client Reporter: Fangliang Liu I use the following statement to create a table. {code:java} CREATE TABLE datagen ( f_sequence INT, f_key1 INT, f_key2 INT, f_random_str STRING, log_ts TIMESTAMP(3), WATERMARK FOR log_ts AS log_ts ) WITH ( 'connector' = 'datagen', 'rows-per-second' = '10', 'fields.f_sequence.kind' = 'sequence', 'fields.f_sequence.start' = '1', 'fields.f_sequence.end' = '1', 'fields.f_key1.min' = '1', 'fields.f_key1.max' = '20', 'fields.f_key2.min' = '1', 'fields.f_key2.max' = '20', 'fields.f_random_str.length' = '5' ); {code} When I use the `desc datagen` to view the table. Got the following result. {code:java} +--++--+-++---+ | name | type | null | key | extras | watermark | +--++--+-++---+ | f_sequence |INT | true | || | | f_key1 |INT | true | || | | f_key2 |INT | true | || | | f_random_str | STRING | true | || | | log_ts | TIMESTAMP(3) *ROWTIME* | true | || `log_ts` | +--++--+-++---+ 5 rows in set {code} Cannot display the information in the with statement. I think the following information is also necessary to show when the desc statement is executed. {code:java} 'connector' = 'datagen', 'rows-per-second' = '10', 'fields.f_sequence.kind' = 'sequence', 'fields.f_sequence.start' = '1', 'fields.f_sequence.end' = '1', 'fields.f_key1.min' = '1', 'fields.f_key1.max' = '20', 'fields.f_key2.min' = '1', 'fields.f_key2.max' = '20', 'fields.f_random_str.length' = '5' {code} [~jark], what do you think?Looking forward to your reply. . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[DISCUSS] Releasing Flink 1.11.4
Hi devs, As discussed in [1], I would like to start a discussion for releasing Flink 1.11.4. I would like to volunteer as the release manger for 1.11.4, and will start the release process on the next Wednesday (July 7th). There are 75 issues that have been closed or resolved [2], and no blocker issues left [3] so far. If any issues need to be marked as blocker for 1.11.4, please let me know in this thread! Best, Godfrey [1] https://lists.apache.org/thread.html/r40a541027c6a04519f37c61f2a6f3dabdb821b3760cda9cc6ebe6ce9%40%3Cdev.flink.apache.org%3E [2] https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.11.4%20AND%20status%20in%20(Closed%2C%20Resolved) [3] https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.11.4%20AND%20status%20not%20in%20(Closed%2C%20Resolved)%20ORDER%20BY%20priority%20DESC
[jira] [Created] (FLINK-23192) Move connector/format option classes into a common package
Ingo Bürk created FLINK-23192: - Summary: Move connector/format option classes into a common package Key: FLINK-23192 URL: https://issues.apache.org/jira/browse/FLINK-23192 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Ingo Bürk -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Feedback Collection Jira Bot
I agree that we shouldn't discourage contributions. For me the main idea of the bot is not to clean up the JIRA but to improve our communication and expectation management with the community. There are many things we could do but for a lot of things we don't have the time and capacity. Then to say at some point that we won't do something is just being honest. This also shows when looking at the JIRA numbers of the merged commits. We very rarely resolve tickets which are older than x days and if we do, then we usually create a new ticket for the problem. The fact that we see some tickets with available pull requests go stale is the symptom that we don't value them to be important enough or allocate enough time for external contributions imo. Otherwise, they would have gotten the required attention and been merged. In such a case, raising awareness by pinging the watchers of the respective ticket is probably better than silently ignoring the PR. Also adding labels to filter for these PRs should help to get them the required attention. But also here, it happens very rarely that we actually merge a PR that is older than y days. Ideally we avoid this situation altogether by only assigning contributors to tickets for which a committer has review capacity. However, this does not seem to always work. In some sense, the JIRA bot shows us the things, which fall through the cracks, more explicitly (which is probably not different than before). Of course we should try to find the time periods for when to ping or de-prioritize tickets that work best for the community. +1 for the proposed changes (extended time periods, "Not a Priority", default priority and fixVersion). @Piotr, I think we have the priorities defined here [1]. Maybe it is enough to share the link so that everyone can check whether her assumptions are correct. [1] https://cwiki.apache.org/confluence/display/FLINK/Flink+Jira+Process Cheers, Till On Wed, Jun 30, 2021 at 10:59 AM Piotr Nowojski wrote: > > * Introduce "Not a Priority" priority and stop closing tickets. > > +1 for this one (I also like the name you proposed for this Konstantin) > > I also have no objections to other proposals that you summarised. Just a > remark, that independently of this discussion we might want to revisit or > reconfirm the priorities and their definition/interpretation across all > contributors. > > Best, > Piotrek > > śr., 30 cze 2021 o 10:15 Konstantin Knauf napisał(a): > > > Hi everyone, > > > > Thank you for the additional comments and suggestions. > > > > @Stephan, Kurt: I agree that we shouldn't discourage or dishearten > > contributors, and probably 14 days until a ticket becomes > "stale-assigned" > > are too few. That's why I've already proposed to increase that to 30 > days. > > Similarly the times for Major/Critical tickets can be increased. From my > > perspective, the root causes are the following: > > > > * tickets are opened with the wrong priority (see > > > > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Jira+Process#FlinkJiraProcess-TicketsPriorities > > ). > > Here it might help to change the default priority. > > * committers don't have the time to review tickets or don't bring > community > > contributions to a resolution. The Jira bot makes this fact more visible. > > Without the Jira Bot no external contributor would get more attention, > and > > no external contribution would be merged faster. Ideally, it'd be the > > opposite and committers would actively monitor tickets with labels > > "stale-assigned" and "pull-request-available" in order to review those > with > > priority. That's also why I am not a fan of excluding tickets with > > "pull-request-available" from the bot. The bot can help to make these > > tickets visible to reviewers. > > > > @Jing Zhang: That's a problem. We should try to change the permissions > > accordingly or need to find a different solution. > > > > @Piotr, Kurt: Instead of closing tickets, we could introduce an > additional > > priority like "Not a Priority" to which we move tickets. No ticket would > be > > closed automatically. > > > > Overall, the following changes could resolve most of the concerns, I > > believe: > > > > * Ignore tickets with a fixVersion for all rules but the stale-unassigned > > role. > > > > * We change the time intervals as follows, accepting reality a bit more > ;) > > > > * stale-assigned only after 30 days (instead of 14 days) > > * stale-critical only after 14 days (instead of 7 days) > > * stale-major only after 60 days (instead of 30 days) > > > > * Introduce "Not a Priority" priority and stop closing tickets. > > > > * Change default priority for new tickets of Flink project to "Minor" > > > > The measures, I proposed above, still try to achieve the goals mentioned > > and agreed upon in the original discussion thread, which were the > > following: > > > > > >- > > > >clearer communication and expectation management with the community > >-
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
I second Tison's opinion. Similar to how we skip docs_404_check for PRs that do not touch the documentation, we can skip other stages for PRs that only contain documentation changes. In addition to making merging documentation PRs easier, we can also reduce the workload on CI workers. Especially during the last days of a release cycle, which is usually the most busy time for the CI workers, and is also where most documentation efforts take place. Thank you~ Xintong Song On Wed, Jun 30, 2021 at 3:56 PM Till Rohrmann wrote: > I think you are right Tison that docs are a special case and they only > require flink-docs to pass. What I am wondering is how much of a problem > this will be (assuming that we have a decent build stability). The more > exceptions we add, the harder it will be to properly follow the guidelines. > Maybe we can observe how many docs PRs get delayed/not merged because of > this and then revisit this discussion if needed. > > Cheers, > Till > > On Wed, Jun 30, 2021 at 8:30 AM tison wrote: > > > Hi, > > > > There are a number of PRs modifying docs only, but we still require all > > tests passed on that. > > > > It is a good proposal we avoid merge PR with "unrelated" failure, but can > > we improve the case where the contributor only works for docs? > > > > For example, base on the file change set, run doc tests only. > > > > Best, > > tison. > > > > > > godfrey he 于2021年6月30日周三 下午2:17写道: > > > > > +1 for the proposal. Thanks Xintong! > > > > > > Best, > > > Godfrey > > > > > > > > > > > > Rui Li 于2021年6月30日周三 上午11:36写道: > > > > > > > Thanks Xintong. +1 to the proposal. > > > > > > > > On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 > > wrote: > > > > > > > > > +1 for the proposal. Since the test time is long and environment > may > > > > vary, > > > > > unstable tests are really annoying for developers. The solution is > > > > welcome. > > > > > > > > > > Best > > > > > liujiangang > > > > > > > > > > Jingsong Li 于2021年6月29日周二 上午10:31写道: > > > > > > > > > > > +1 Thanks Xintong for the update! > > > > > > > > > > > > Best, > > > > > > Jingsong > > > > > > > > > > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann < > > trohrm...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > +1, thanks for updating the guidelines Xintong! > > > > > > > > > > > > > > Cheers, > > > > > > > Till > > > > > > > > > > > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo < > karma...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > Thanks Xintong for drafting this doc. > > > > > > > > > > > > > > > > Best, > > > > > > > > Yangze Guo > > > > > > > > > > > > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG < > > beyond1...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Thanks Xintong for giving detailed documentation. > > > > > > > > > > > > > > > > > > The best practice for handling test failure is very > detailed, > > > > it's > > > > > a > > > > > > > good > > > > > > > > > guidelines document with clear action steps. > > > > > > > > > > > > > > > > > > +1 to Xintong's proposal. > > > > > > > > > > > > > > > > > > Xintong Song 于2021年6月28日周一 > 下午4:07写道: > > > > > > > > > > > > > > > > > > > Thanks all for the discussion. > > > > > > > > > > > > > > > > > > > > Based on the opinions so far, I've drafted the new > > guidelines > > > > > [1], > > > > > > > as a > > > > > > > > > > potential replacement of the original wiki page [2]. > > > > > > > > > > > > > > > > > > > > Hopefully this draft has covered the most opinions > > discussed > > > > and > > > > > > > > consensus > > > > > > > > > > made in this discussion thread. > > > > > > > > > > > > > > > > > > > > Looking forward to your feedback. > > > > > > > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing > > > > > > > > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski < > > > > > > > pnowoj...@apache.org> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Thanks for the clarification Till. +1 for what you have > > > > > written. > > > > > > > > > > > > > > > > > > > > > > Piotrek > > > > > > > > > > > > > > > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann < > > > trohrm...@apache.org > > > > > > > > > > > > > > > napisał(a): > > > > > > > > > > > > > > > > > > > > > > > One quick note for clarification. I don't have > anything > > > > > against > > > > > > > > builds > > > > > > > > > > > > running on your
Re: [DISCUSS] Dashboard/HistoryServer authentication
Hi Austin, Please see my answers embedded down below. BR, G On Tue, Jun 29, 2021 at 9:59 PM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Hi all, > > Thanks for the updated proposal. I have a few questions about the API, > please see below. > > * What stability semantics do you envision for this API? > As I foresee the API will be as stable as Netty API. Since there is guarantee on no breaking changes between minor versions we can give the same guarantee. If for whatever reason we need to break it we can do it in major version like every other open source project does. > * Does Flink expose dependencies’ APIs in other places? Since this exposes > the Netty API, will this make it difficult to upgrade Netty? > I don't expect breaking changes between minor versions so such cases there will be no issues. If there is a breaking change in major version we need to wait Flink major version too. > * I share Till's concern about multiple factories – other HTTP middleware > frameworks commonly support chaining middlewares. Since the proposed API > does not include these features/guarantee ordering, do you see any reason > to allow more than one factory? > I personally can't come up with a use-case where ordering is a must. I'm not telling that this is not a valid use-case but adding a feature w/o business rationale would include the maintenance cost (though I'm open to add). As I've seen Till also can't give example for that (please see the doc comments). If you have anything in mind please share it and we can add priority to the API. There is another option too, namely we can be defensive and we can add the priority right now. I would do this only if everybody states in mail that it would be the best option, otherwise I would stick to the original plan. > > Best, > Austin > > On Tue, Jun 29, 2021 at 8:55 AM Márton Balassi > wrote: > >> Hi all, >> >> I commend Konstantin and Till when it comes to standing up for the >> community values. >> >> Based on your feedback we are withdrawing the original proposal and >> attaching a more general custom netty handler API proposal [1] written by >> G. The change necessary to the Flink repository is approximately 500 lines >> of code. [2] >> >> Please let us focus on discussing the details of this API and whether it >> covers the necessary use cases. >> >> [1] >> >> https://docs.google.com/document/d/1Idnw8YauMK1x_14iv0rVF0Hqm58J6Dg-hi-hEuL6hwM/edit#heading=h.ijcbce3c5gip >> [2] >> >> https://github.com/gaborgsomogyi/flink/commit/942f23679ac21428bb87fc85557b9b443fcaf310 >> >> Thanks, >> Marton >> >> On Wed, Jun 23, 2021 at 9:36 PM Austin Cawley-Edwards < >> austin.caw...@gmail.com> wrote: >> >> > Hi all, >> > >> > Thanks, Konstantin and Till, for guiding the discussion. >> > >> > I was not aware of the results of the call with Konstantin and was >> > attempting to resolve the unanswered questions before more, potentially >> > fruitless, work was done. >> > >> > I am also looking forward to the coming proposal, as well as increasing >> my >> > understanding of this specific use case + its limitations! >> > >> > Best, >> > Austin >> > >> > On Tue, Jun 22, 2021 at 6:32 AM Till Rohrmann >> > wrote: >> > >> > > Hi everyone, >> > > >> > > I do like the idea of keeping the actual change outside of Flink but >> to >> > > enable Flink to support such a use case (different authentication >> > > mechanisms). I think this is a good compromise for the community that >> > > combines long-term maintainability with support for new use-cases. I >> am >> > > looking forward to your proposal. >> > > >> > > I also want to second Konstantin here that the tone of your last >> email, >> > > Marton, does not reflect the values and manners of the Flink community >> > and >> > > is not representative of how we conduct discussions. Especially, the >> more >> > > senior community members should know this and act accordingly in >> order to >> > > be good role models for others in the community. Technical discussions >> > > should not be decided by who wields presumably the greatest authority >> but >> > > by the soundness of arguments and by what is the best solution for a >> > > problem. >> > > >> > > Let us now try to find the best solution for the problem at hand! >> > > >> > > Cheers, >> > > Till >> > > >> > > On Tue, Jun 22, 2021 at 11:24 AM Konstantin Knauf >> > > wrote: >> > > >> > > > Hi everyone, >> > > > >> > > > First, Marton and I had a brief conversation yesterday offline and >> > > > discussed exploring the approach of exposing the authentication >> > > > functionality via an API. So, I am looking forward to your proposal >> in >> > > that >> > > > direction. The benefit of such a solution would be that it is >> > extensible >> > > > for others and it does add a smaller maintenance (in particular >> > testing) >> > > > footprint to Apache Flink itself. If we end up going down this >> route, >> > > > flink-packages.org would be a great way to promote these third >>
Re: [DISCUSS] Feedback Collection Jira Bot
> * Introduce "Not a Priority" priority and stop closing tickets. +1 for this one (I also like the name you proposed for this Konstantin) I also have no objections to other proposals that you summarised. Just a remark, that independently of this discussion we might want to revisit or reconfirm the priorities and their definition/interpretation across all contributors. Best, Piotrek śr., 30 cze 2021 o 10:15 Konstantin Knauf napisał(a): > Hi everyone, > > Thank you for the additional comments and suggestions. > > @Stephan, Kurt: I agree that we shouldn't discourage or dishearten > contributors, and probably 14 days until a ticket becomes "stale-assigned" > are too few. That's why I've already proposed to increase that to 30 days. > Similarly the times for Major/Critical tickets can be increased. From my > perspective, the root causes are the following: > > * tickets are opened with the wrong priority (see > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Jira+Process#FlinkJiraProcess-TicketsPriorities > ). > Here it might help to change the default priority. > * committers don't have the time to review tickets or don't bring community > contributions to a resolution. The Jira bot makes this fact more visible. > Without the Jira Bot no external contributor would get more attention, and > no external contribution would be merged faster. Ideally, it'd be the > opposite and committers would actively monitor tickets with labels > "stale-assigned" and "pull-request-available" in order to review those with > priority. That's also why I am not a fan of excluding tickets with > "pull-request-available" from the bot. The bot can help to make these > tickets visible to reviewers. > > @Jing Zhang: That's a problem. We should try to change the permissions > accordingly or need to find a different solution. > > @Piotr, Kurt: Instead of closing tickets, we could introduce an additional > priority like "Not a Priority" to which we move tickets. No ticket would be > closed automatically. > > Overall, the following changes could resolve most of the concerns, I > believe: > > * Ignore tickets with a fixVersion for all rules but the stale-unassigned > role. > > * We change the time intervals as follows, accepting reality a bit more ;) > > * stale-assigned only after 30 days (instead of 14 days) > * stale-critical only after 14 days (instead of 7 days) > * stale-major only after 60 days (instead of 30 days) > > * Introduce "Not a Priority" priority and stop closing tickets. > > * Change default priority for new tickets of Flink project to "Minor" > > The measures, I proposed above, still try to achieve the goals mentioned > and agreed upon in the original discussion thread, which were the > following: > > >- > >clearer communication and expectation management with the community >- > > a user or contributor should be able to judge the urgency of a ticket > by its priority > - > > if a ticket is assigned to someone the expectation that someone is > working on it should hold > - > >generally reduce noise in Jira >- > >reduce overhead of committers to ask about status updates of >contributions or bug reports >- > > “Are you still working on this?” > - > > “Are you still interested in this?” > - > > “Does this still happen on Flink 1.x?” > - > > “Are you still experiencing this issue?” > - > > “What is the status of the implementation”? > - > >while still encouraging users to add new tickets and to leave feedback >about existing tickets > > > Kurt, Stephan, if you'd like to change the bot to "just close very old > tickets", I suggest you start a separate discussion and subsequent voting > thread. > > Best, > > Konstantin > > > On Wed, Jun 30, 2021 at 9:01 AM Kurt Young wrote: > > > +1 to Stephan's opinion, with just one minor difference. For my > experience > > and a project > > as big as Flink, picking up an issue created 1-2 years ago seems normal > to > > me. To be more > > specific, during the blink planner merge, I created lots of clean up & > > refactor issues, trying > > to make the code be more clean. I haven't had a chance to resolve all > these > > but I think they are > > still good improvements. Thus I would propose we don't close any stall > > issues for at least 2 years. > > > > Best, > > Kurt > > > > > > On Wed, Jun 30, 2021 at 7:22 AM Stephan Ewen wrote: > > > > > Being a bit late to the party, and don't want to ask to change > > everything, > > > just maybe some observation. > > > > > > My main observation and concern is still that this puts pressure and > > > confusion on contributors, which are mostly blocked on committers for > > > reviews, or are taking tickets as multi-week projects. I think it is > not > > a > > > great experience for contributors, when they are already unsure why > their > > > work isn't getting the attention from committers that
Re: [DISCUSS] Feedback Collection Jira Bot
Hi everyone, Thank you for the additional comments and suggestions. @Stephan, Kurt: I agree that we shouldn't discourage or dishearten contributors, and probably 14 days until a ticket becomes "stale-assigned" are too few. That's why I've already proposed to increase that to 30 days. Similarly the times for Major/Critical tickets can be increased. From my perspective, the root causes are the following: * tickets are opened with the wrong priority (see https://cwiki.apache.org/confluence/display/FLINK/Flink+Jira+Process#FlinkJiraProcess-TicketsPriorities). Here it might help to change the default priority. * committers don't have the time to review tickets or don't bring community contributions to a resolution. The Jira bot makes this fact more visible. Without the Jira Bot no external contributor would get more attention, and no external contribution would be merged faster. Ideally, it'd be the opposite and committers would actively monitor tickets with labels "stale-assigned" and "pull-request-available" in order to review those with priority. That's also why I am not a fan of excluding tickets with "pull-request-available" from the bot. The bot can help to make these tickets visible to reviewers. @Jing Zhang: That's a problem. We should try to change the permissions accordingly or need to find a different solution. @Piotr, Kurt: Instead of closing tickets, we could introduce an additional priority like "Not a Priority" to which we move tickets. No ticket would be closed automatically. Overall, the following changes could resolve most of the concerns, I believe: * Ignore tickets with a fixVersion for all rules but the stale-unassigned role. * We change the time intervals as follows, accepting reality a bit more ;) * stale-assigned only after 30 days (instead of 14 days) * stale-critical only after 14 days (instead of 7 days) * stale-major only after 60 days (instead of 30 days) * Introduce "Not a Priority" priority and stop closing tickets. * Change default priority for new tickets of Flink project to "Minor" The measures, I proposed above, still try to achieve the goals mentioned and agreed upon in the original discussion thread, which were the following: - clearer communication and expectation management with the community - a user or contributor should be able to judge the urgency of a ticket by its priority - if a ticket is assigned to someone the expectation that someone is working on it should hold - generally reduce noise in Jira - reduce overhead of committers to ask about status updates of contributions or bug reports - “Are you still working on this?” - “Are you still interested in this?” - “Does this still happen on Flink 1.x?” - “Are you still experiencing this issue?” - “What is the status of the implementation”? - while still encouraging users to add new tickets and to leave feedback about existing tickets Kurt, Stephan, if you'd like to change the bot to "just close very old tickets", I suggest you start a separate discussion and subsequent voting thread. Best, Konstantin On Wed, Jun 30, 2021 at 9:01 AM Kurt Young wrote: > +1 to Stephan's opinion, with just one minor difference. For my experience > and a project > as big as Flink, picking up an issue created 1-2 years ago seems normal to > me. To be more > specific, during the blink planner merge, I created lots of clean up & > refactor issues, trying > to make the code be more clean. I haven't had a chance to resolve all these > but I think they are > still good improvements. Thus I would propose we don't close any stall > issues for at least 2 years. > > Best, > Kurt > > > On Wed, Jun 30, 2021 at 7:22 AM Stephan Ewen wrote: > > > Being a bit late to the party, and don't want to ask to change > everything, > > just maybe some observation. > > > > My main observation and concern is still that this puts pressure and > > confusion on contributors, which are mostly blocked on committers for > > reviews, or are taking tickets as multi-week projects. I think it is not > a > > great experience for contributors, when they are already unsure why their > > work isn't getting the attention from committers that they hoped for, to > > then see issues unassigned or deprioritized automatically. I think we > > really need to weigh this discouragement of contributors against the > desire > > for a tidy ticket system. > > I also think by now this isn't just a matter of phrasing the bot's > message > > correctly. Auto unassignment and deprioritization sends a subtle message > > that jira resolution is a more important goal than paying attention to > > contributors (at least I think that is how it will be perceived by many). > > > > Back to the original motivation, to not have issues lying around forever, > > ensuring there is closure eventually. > > For that, even much longer intervals
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
I think you are right Tison that docs are a special case and they only require flink-docs to pass. What I am wondering is how much of a problem this will be (assuming that we have a decent build stability). The more exceptions we add, the harder it will be to properly follow the guidelines. Maybe we can observe how many docs PRs get delayed/not merged because of this and then revisit this discussion if needed. Cheers, Till On Wed, Jun 30, 2021 at 8:30 AM tison wrote: > Hi, > > There are a number of PRs modifying docs only, but we still require all > tests passed on that. > > It is a good proposal we avoid merge PR with "unrelated" failure, but can > we improve the case where the contributor only works for docs? > > For example, base on the file change set, run doc tests only. > > Best, > tison. > > > godfrey he 于2021年6月30日周三 下午2:17写道: > > > +1 for the proposal. Thanks Xintong! > > > > Best, > > Godfrey > > > > > > > > Rui Li 于2021年6月30日周三 上午11:36写道: > > > > > Thanks Xintong. +1 to the proposal. > > > > > > On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 > wrote: > > > > > > > +1 for the proposal. Since the test time is long and environment may > > > vary, > > > > unstable tests are really annoying for developers. The solution is > > > welcome. > > > > > > > > Best > > > > liujiangang > > > > > > > > Jingsong Li 于2021年6月29日周二 上午10:31写道: > > > > > > > > > +1 Thanks Xintong for the update! > > > > > > > > > > Best, > > > > > Jingsong > > > > > > > > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann < > trohrm...@apache.org> > > > > > wrote: > > > > > > > > > > > +1, thanks for updating the guidelines Xintong! > > > > > > > > > > > > Cheers, > > > > > > Till > > > > > > > > > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo > > > > wrote: > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > Thanks Xintong for drafting this doc. > > > > > > > > > > > > > > Best, > > > > > > > Yangze Guo > > > > > > > > > > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG < > beyond1...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > > > Thanks Xintong for giving detailed documentation. > > > > > > > > > > > > > > > > The best practice for handling test failure is very detailed, > > > it's > > > > a > > > > > > good > > > > > > > > guidelines document with clear action steps. > > > > > > > > > > > > > > > > +1 to Xintong's proposal. > > > > > > > > > > > > > > > > Xintong Song 于2021年6月28日周一 下午4:07写道: > > > > > > > > > > > > > > > > > Thanks all for the discussion. > > > > > > > > > > > > > > > > > > Based on the opinions so far, I've drafted the new > guidelines > > > > [1], > > > > > > as a > > > > > > > > > potential replacement of the original wiki page [2]. > > > > > > > > > > > > > > > > > > Hopefully this draft has covered the most opinions > discussed > > > and > > > > > > > consensus > > > > > > > > > made in this discussion thread. > > > > > > > > > > > > > > > > > > Looking forward to your feedback. > > > > > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing > > > > > > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski < > > > > > > pnowoj...@apache.org> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Thanks for the clarification Till. +1 for what you have > > > > written. > > > > > > > > > > > > > > > > > > > > Piotrek > > > > > > > > > > > > > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann < > > trohrm...@apache.org > > > > > > > > > > > > > napisał(a): > > > > > > > > > > > > > > > > > > > > > One quick note for clarification. I don't have anything > > > > against > > > > > > > builds > > > > > > > > > > > running on your personal Azure account and this is not > > > what I > > > > > > > > > understood > > > > > > > > > > > under "local environment". For me "local environment" > > means > > > > > that > > > > > > > > > someone > > > > > > > > > > > runs the test locally on his machine and then says that > > the > > > > > > > > > > > tests have passed locally. > > > > > > > > > > > > > > > > > > > > > > I do agree that there might be a conflict of interests > > if a > > > > PR > > > > > > > author > > > > > > > > > > > disables tests. Here I would argue that we don't have > > > > malignant > > > > > > > > > > committers > > > > > > > > > > > which means that every committer will probably first > > check > > > > the > > > > > > > > > respective > > > > > > > > > > > ticket for how often the test failed. Then I guess the > > next > > > > > step > > > > > > > would > > > > > > > > > be >
[jira] [Created] (FLINK-23191) Azure: Upload CI logs to S3 as well
Robert Metzger created FLINK-23191: -- Summary: Azure: Upload CI logs to S3 as well Key: FLINK-23191 URL: https://issues.apache.org/jira/browse/FLINK-23191 Project: Flink Issue Type: Improvement Components: Build System / Azure Pipelines Reporter: Robert Metzger We are currently uploading the CI logs to Azure as artifacts. The maximum retention (we've also configured) is 60 days. Afterwards, the logs are gone. For rarely occurring test failures, the logs might be lost at the time we start looking into them. Therefore, we should store the CI logs somewhere permanently, such as S3 (similarly to how we stored them when we were using travis) -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Feedback Collection Jira Bot
+1 to Stephan's opinion, with just one minor difference. For my experience and a project as big as Flink, picking up an issue created 1-2 years ago seems normal to me. To be more specific, during the blink planner merge, I created lots of clean up & refactor issues, trying to make the code be more clean. I haven't had a chance to resolve all these but I think they are still good improvements. Thus I would propose we don't close any stall issues for at least 2 years. Best, Kurt On Wed, Jun 30, 2021 at 7:22 AM Stephan Ewen wrote: > Being a bit late to the party, and don't want to ask to change everything, > just maybe some observation. > > My main observation and concern is still that this puts pressure and > confusion on contributors, which are mostly blocked on committers for > reviews, or are taking tickets as multi-week projects. I think it is not a > great experience for contributors, when they are already unsure why their > work isn't getting the attention from committers that they hoped for, to > then see issues unassigned or deprioritized automatically. I think we > really need to weigh this discouragement of contributors against the desire > for a tidy ticket system. > I also think by now this isn't just a matter of phrasing the bot's message > correctly. Auto unassignment and deprioritization sends a subtle message > that jira resolution is a more important goal than paying attention to > contributors (at least I think that is how it will be perceived by many). > > Back to the original motivation, to not have issues lying around forever, > ensuring there is closure eventually. > For that, even much longer intervals would be fine. Like pinging every > three months, closing after three pings - would resolve most tickets in a > year, which is not too bad in the scope of a project like Flink. Many > features/wishes easily move to two releases in the future, which is almost > a year. We would get rid of long dead tickets and interfere little with > current tickets. Contributors can probably understand ticket closing after > a year of inactivity. > > I am curious if a very simple bot that really just looks at stale issues > (no comment/change in three months), pings the > issue/reporter/assignee/watchers and closes it after three pings would do > the job. > We would get out of the un-assigning business (which can send very tricky > signals) and would rely on reporters/assignees/watchers to unassign if they > see that the contributor abandoned the issue. With a cadence of three > months for pinging, this isn't much work for the ones that get pinged. > > Issues where we rely on faster handling are probably the ones where > committers have a stake in getting those into an upcoming release, so these > tend to be watched anyways. > > On Wed, Jun 23, 2021 at 2:39 PM JING ZHANG wrote: > > > Hi Konstantin, Chesnay, > > > > > I would like it to not unassign people if a PR is open. These are > > > usually blocked by the reviewer, not the assignee, and having the > > > assignees now additionally having to update JIRA periodically is a bit > > > like rubbing salt into the wound. > > > > I agree with Chesnay about not un-assign an issue if a PR is open. > > Besides, Could assignees remove the "stale-assigned" tag by themself? It > > seems assignees have no permission to delete the tag if the issue is not > > created by themselves. > > > > Best regards, > > JING ZHANG > > > > Konstantin Knauf 于2021年6月23日周三 下午4:17写道: > > > > > > I agree there are such tickets, but I don't see how this is > addressing > > my > > > concerns. There are also tickets that just shouldn't be closed as I > > > described above. Why do you think that duplicating tickets and losing > > > discussions/knowledge is a good solution? > > > > > > I don't understand why we are necessarily losing discussion/knowledge. > > The > > > tickets are still there, just in "Closed" state, which are included in > > > default Jira search. We could of course just add a label, but closing > > seems > > > clearer to me given that likely this ticket will not get comitter > > attention > > > in the foreseeable future. > > > > > > > I would like to avoid having to constantly fight against the bot. > It's > > > already responsible for the majority of my daily emails, with quite > > little > > > benefit for me personally. I initially thought that after some period > of > > > time it will settle down, but now I'm afraid it won't happen. > > > > > > Can you elaborate which rules you are running into mostly? I'd rather > > like > > > to understand how we work right now and where this conflicts with the > > Jira > > > bot vs slowly disabling the jira bot via labels. > > > > > > On Wed, Jun 23, 2021 at 10:00 AM Piotr Nowojski > > > wrote: > > > > > > > Hi Konstantin, > > > > > > > > > In my opinion it is important that we close tickets eventually. > There > > > are > > > > a > > > > > lot of tickets (bugs, improvements, tech debt) that over time > became > > > > >
Re: [DISCUSS] FLIP-171: Async Sink
Thanks for addressing this issue :) Best, Piotrek wt., 29 cze 2021 o 17:58 Hausmann, Steffen napisał(a): > Hey Poitr, > > I've just adapted the FLIP and changed the signature for the > `submitRequestEntries` method: > > protected abstract void submitRequestEntries(List > requestEntries, ResultFuture requestResult); > > In addition, we are likely to use an AtomicLong to track the number of > outstanding requests, as you have proposed in 2b). I've already indicated > this in the FLIP, but it's not fully fleshed out. But as you have said, > that seems to be an implementation detail and the important part is the > change of the `submitRequestEntries` signature. > > Thanks for your feedback! > > Cheers, Steffen > > > On 25.06.21, 17:05, "Hausmann, Steffen" > wrote: > > CAUTION: This email originated from outside of the organization. Do > not click links or open attachments unless you can confirm the sender and > know the content is safe. > > > > Hi Piotr, > > I’m happy to take your guidance on this. I need to think through your > proposals and I’ll follow-up on Monday with some more context so that we > can close the discussion on these details. But for now, I’ll close the vote. > > Thanks, Steffen > > From: Piotr Nowojski > Date: Friday, 25. June 2021 at 14:48 > To: Till Rohrmann > Cc: Steffen Hausmann , "dev@flink.apache.org" < > dev@flink.apache.org>, Arvid Heise > Subject: RE: [EXTERNAL] [DISCUSS] FLIP-171: Async Sink > > > CAUTION: This email originated from outside of the organization. Do > not click links or open attachments unless you can confirm the sender and > know the content is safe. > > > Hey, > > I've just synced with Arvid about a couple of more remarks from my > side and he shared mine concerns. > > 1. I would very strongly recommend ditching `CompletableFuture ` > from the `protected abstract CompletableFuture > submitRequestEntries(List requestEntries);` in favor of > something like > `org.apache.flink.streaming.api.functions.async.ResultFuture` interface. > `CompletableFuture` would partially make the threading model of the > `AsyncSincWriter` part of the public API and it would tie our hands. > Regardless how `CompletableFuture` is used, it imposes performance > overhead because it's synchronisation/volatile inside of it. On the other > hand something like: > > protected abstract void submitRequestEntries(List > requestEntries, ResultFuture requestResult); > > Would allow us to implement the threading model as we wish. > `ResultFuture` could be backed via `CompletableFuture` underneath, but > it could also be something more efficient. I will explain what I have in > mind in a second. > > 2. It looks to me that proposed `AsyncSinkWriter` Internals are not > very efficient and maybe the threading model hasn't been thought through? > Especially private fields: > > private final BlockingDeque bufferedRequestEntries; > private BlockingDeque> inFlightRequests; > > are a bit strange to me. Why do we need two separate thread safe > collections? Why do we need a `BlockingDeque` of `CompletableFuture`s? > If we are already using a fully synchronised collection, there should be no > need for another layer of thread safe `CompletableFuture`. > > As I understand, the threading model of the `AsyncSinkWriter` is very > similar to that of the `AsyncWaitOperator`, with very similar requirements > for inducing backpressure. How I would see it implemented is for example: > > a) Having a single lock, that would encompass the whole > `AsyncSinkWriter#flush()` method. `flush()` would be called from the task > thread (mailbox). To induce backpressure, `#flush()` would just call > `lock.wait()`. `ResultFuture#complete(...)` called from an async thread, > would also synchronize on the same lock, and mark some of the inflight > requests as completed and call `lock.notify()`. > > b) More efficient solution. On the hot path we would have for example > only `AtomicLong numberOfInFlightRequests`. Task thread would be bumping > it, `ResultFuture#complete()` would be decreasing it. If the task thread > when bumping `numberOfInFlightRequests` exceeds a threshold, he goes to > sleep/wait on a lock or some `CompletableFuture`. If > `ResultFuture#complete()` when decreasing the count goes below the > threshold, it would wake up the task thread. Compared to the option a), > on the hot path, option b) would have only AtomicLong.increment overhead > > c) We could use mailbox, the same way as AsyncWaitOperator is doing. > In this case `ResultFuture#complete()` would be enquing mailbox action, > which is thread safe on it's own. > > Either of those options would be more efficient and simpler (from the > threading model perspective) than having two `BlockingQueues` and > `CompletableFuture`. Also as you can see, neither of those solutions > require the overhead of ` CompletableFuture > submitRequestEntries(List
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
Hi, There are a number of PRs modifying docs only, but we still require all tests passed on that. It is a good proposal we avoid merge PR with "unrelated" failure, but can we improve the case where the contributor only works for docs? For example, base on the file change set, run doc tests only. Best, tison. godfrey he 于2021年6月30日周三 下午2:17写道: > +1 for the proposal. Thanks Xintong! > > Best, > Godfrey > > > > Rui Li 于2021年6月30日周三 上午11:36写道: > > > Thanks Xintong. +1 to the proposal. > > > > On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 wrote: > > > > > +1 for the proposal. Since the test time is long and environment may > > vary, > > > unstable tests are really annoying for developers. The solution is > > welcome. > > > > > > Best > > > liujiangang > > > > > > Jingsong Li 于2021年6月29日周二 上午10:31写道: > > > > > > > +1 Thanks Xintong for the update! > > > > > > > > Best, > > > > Jingsong > > > > > > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann > > > > wrote: > > > > > > > > > +1, thanks for updating the guidelines Xintong! > > > > > > > > > > Cheers, > > > > > Till > > > > > > > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo > > > wrote: > > > > > > > > > > > +1 > > > > > > > > > > > > Thanks Xintong for drafting this doc. > > > > > > > > > > > > Best, > > > > > > Yangze Guo > > > > > > > > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG > > > > > wrote: > > > > > > > > > > > > > > Thanks Xintong for giving detailed documentation. > > > > > > > > > > > > > > The best practice for handling test failure is very detailed, > > it's > > > a > > > > > good > > > > > > > guidelines document with clear action steps. > > > > > > > > > > > > > > +1 to Xintong's proposal. > > > > > > > > > > > > > > Xintong Song 于2021年6月28日周一 下午4:07写道: > > > > > > > > > > > > > > > Thanks all for the discussion. > > > > > > > > > > > > > > > > Based on the opinions so far, I've drafted the new guidelines > > > [1], > > > > > as a > > > > > > > > potential replacement of the original wiki page [2]. > > > > > > > > > > > > > > > > Hopefully this draft has covered the most opinions discussed > > and > > > > > > consensus > > > > > > > > made in this discussion thread. > > > > > > > > > > > > > > > > Looking forward to your feedback. > > > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing > > > > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski < > > > > > pnowoj...@apache.org> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Thanks for the clarification Till. +1 for what you have > > > written. > > > > > > > > > > > > > > > > > > Piotrek > > > > > > > > > > > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann < > trohrm...@apache.org > > > > > > > > > > > napisał(a): > > > > > > > > > > > > > > > > > > > One quick note for clarification. I don't have anything > > > against > > > > > > builds > > > > > > > > > > running on your personal Azure account and this is not > > what I > > > > > > > > understood > > > > > > > > > > under "local environment". For me "local environment" > means > > > > that > > > > > > > > someone > > > > > > > > > > runs the test locally on his machine and then says that > the > > > > > > > > > > tests have passed locally. > > > > > > > > > > > > > > > > > > > > I do agree that there might be a conflict of interests > if a > > > PR > > > > > > author > > > > > > > > > > disables tests. Here I would argue that we don't have > > > malignant > > > > > > > > > committers > > > > > > > > > > which means that every committer will probably first > check > > > the > > > > > > > > respective > > > > > > > > > > ticket for how often the test failed. Then I guess the > next > > > > step > > > > > > would > > > > > > > > be > > > > > > > > > > to discuss on the ticket whether to disable it or not. > And > > > > > finally, > > > > > > > > after > > > > > > > > > > reaching a consensus, it will be disabled. If we see > > someone > > > > > > abusing > > > > > > > > this > > > > > > > > > > policy, then we can still think about how to guard > against > > > it. > > > > > But, > > > > > > > > > > honestly, I have very rarely seen such a case. I am also > ok > > > to > > > > > > pull in > > > > > > > > > the > > > > > > > > > > release manager to make the final call if this resolves > > > > concerns. > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > Till > > > > > > > > > > > > > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski < > > > > > > pnowoj...@apache.org> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > +1
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
+1 for the proposal. Thanks Xintong! Best, Godfrey Rui Li 于2021年6月30日周三 上午11:36写道: > Thanks Xintong. +1 to the proposal. > > On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 wrote: > > > +1 for the proposal. Since the test time is long and environment may > vary, > > unstable tests are really annoying for developers. The solution is > welcome. > > > > Best > > liujiangang > > > > Jingsong Li 于2021年6月29日周二 上午10:31写道: > > > > > +1 Thanks Xintong for the update! > > > > > > Best, > > > Jingsong > > > > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann > > > wrote: > > > > > > > +1, thanks for updating the guidelines Xintong! > > > > > > > > Cheers, > > > > Till > > > > > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo > > wrote: > > > > > > > > > +1 > > > > > > > > > > Thanks Xintong for drafting this doc. > > > > > > > > > > Best, > > > > > Yangze Guo > > > > > > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG > > > wrote: > > > > > > > > > > > > Thanks Xintong for giving detailed documentation. > > > > > > > > > > > > The best practice for handling test failure is very detailed, > it's > > a > > > > good > > > > > > guidelines document with clear action steps. > > > > > > > > > > > > +1 to Xintong's proposal. > > > > > > > > > > > > Xintong Song 于2021年6月28日周一 下午4:07写道: > > > > > > > > > > > > > Thanks all for the discussion. > > > > > > > > > > > > > > Based on the opinions so far, I've drafted the new guidelines > > [1], > > > > as a > > > > > > > potential replacement of the original wiki page [2]. > > > > > > > > > > > > > > Hopefully this draft has covered the most opinions discussed > and > > > > > consensus > > > > > > > made in this discussion thread. > > > > > > > > > > > > > > Looking forward to your feedback. > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing > > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski < > > > > pnowoj...@apache.org> > > > > > > > wrote: > > > > > > > > > > > > > > > Thanks for the clarification Till. +1 for what you have > > written. > > > > > > > > > > > > > > > > Piotrek > > > > > > > > > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann > > > > > > > > napisał(a): > > > > > > > > > > > > > > > > > One quick note for clarification. I don't have anything > > against > > > > > builds > > > > > > > > > running on your personal Azure account and this is not > what I > > > > > > > understood > > > > > > > > > under "local environment". For me "local environment" means > > > that > > > > > > > someone > > > > > > > > > runs the test locally on his machine and then says that the > > > > > > > > > tests have passed locally. > > > > > > > > > > > > > > > > > > I do agree that there might be a conflict of interests if a > > PR > > > > > author > > > > > > > > > disables tests. Here I would argue that we don't have > > malignant > > > > > > > > committers > > > > > > > > > which means that every committer will probably first check > > the > > > > > > > respective > > > > > > > > > ticket for how often the test failed. Then I guess the next > > > step > > > > > would > > > > > > > be > > > > > > > > > to discuss on the ticket whether to disable it or not. And > > > > finally, > > > > > > > after > > > > > > > > > reaching a consensus, it will be disabled. If we see > someone > > > > > abusing > > > > > > > this > > > > > > > > > policy, then we can still think about how to guard against > > it. > > > > But, > > > > > > > > > honestly, I have very rarely seen such a case. I am also ok > > to > > > > > pull in > > > > > > > > the > > > > > > > > > release manager to make the final call if this resolves > > > concerns. > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > Till > > > > > > > > > > > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski < > > > > > pnowoj...@apache.org> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > +1 for the general idea, however I have concerns about a > > > couple > > > > > of > > > > > > > > > details. > > > > > > > > > > > > > > > > > > > > > I would first try to not introduce the exception for > > local > > > > > builds. > > > > > > > > > > > It makes it quite hard for others to verify the build > and > > > to > > > > > make > > > > > > > > sure > > > > > > > > > > that the right things were executed. > > > > > > > > > > > > > > > > > > > > I would counter Till's proposal to ignore local green > > builds. > > > > If > > > > > > > > > committer > > > > > > > > > > is merging and closing a PR, with official azure failure, > > but > > > > > there > > > > > > > > was a > > > > > > > > > > green build