[VOTE] FLIP-150: Introduce Hybrid Source

2021-06-30 Thread 蒋晓峰
Hi everyone,




Thanks for all the feedback to Hybrid Source so far. Based on the discussion[1] 
we seem to have consensus, so I would like to start a vote on FLIP-150 for 
which the FLIP has also been updated[2].




The vote will last for at least 72 hours (Sun, Jul 4th 12:00 GMT) unless there 
is an objection or insufficient votes.




Thanks,

Nicholas Jiang




[1] 
https://lists.apache.org/thread.html/r94057d19f0df2a211695820375502d60cddeeab5ad27057c1ca988d6%40%3Cdev.flink.apache.org%3E

[2] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source

[jira] [Created] (FLINK-23198) An old interface method is used in this section of [Passing Options Factory to RocksDB].

2021-06-30 Thread Carl (Jira)
Carl created FLINK-23198:


 Summary: An old interface method is used in this section of 
[Passing Options Factory to RocksDB].
 Key: FLINK-23198
 URL: https://issues.apache.org/jira/browse/FLINK-23198
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.12.0
Reporter: Carl
 Attachments: image-2021-07-01-11-30-57-676.png, 
image-2021-07-01-11-32-25-200.png

[https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/state/state_backends.html]

 

!image-2021-07-01-11-30-57-676.png!

In version 1.12 of Flink, this method has been replaced by the following one:

!image-2021-07-01-11-32-25-200.png!

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-23197) Errorneous description about 'TIMESTAMP WITH TIME ZONE' date type

2021-06-30 Thread wangqinghuan (Jira)
wangqinghuan created FLINK-23197:


 Summary: Errorneous description about 'TIMESTAMP WITH TIME ZONE' 
date type
 Key: FLINK-23197
 URL: https://issues.apache.org/jira/browse/FLINK-23197
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: wangqinghuan


Currently, Flink dont support 'TIMESTAMP WITH TIME ZONE' type.  We should add 
an note or remove 'TIMESTAMP WITH TIME ZONE' type description in docs.[1]

[1]https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/types/#date-and-time



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-23196) JobMasterITCase fail on azure due to BindException

2021-06-30 Thread Xintong Song (Jira)
Xintong Song created FLINK-23196:


 Summary: JobMasterITCase fail on azure due to BindException
 Key: FLINK-23196
 URL: https://issues.apache.org/jira/browse/FLINK-23196
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.14.0
Reporter: Xintong Song
 Fix For: 1.14.0


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19753=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56=4251

{code}
Jul 01 00:00:27 [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time 
elapsed: 4.272 s <<< FAILURE! - in 
org.apache.flink.runtime.jobmaster.JobMasterITCase
Jul 01 00:00:27 [ERROR] 
testRejectionOfEmptyJobGraphs(org.apache.flink.runtime.jobmaster.JobMasterITCase)
  Time elapsed: 3.009 s  <<< ERROR!
Jul 01 00:00:27 org.apache.flink.util.FlinkException: Could not create the 
DispatcherResourceManagerComponent.
Jul 01 00:00:27 at 
org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:275)
Jul 01 00:00:27 at 
org.apache.flink.runtime.minicluster.MiniCluster.createDispatcherResourceManagerComponents(MiniCluster.java:470)
Jul 01 00:00:27 at 
org.apache.flink.runtime.minicluster.MiniCluster.setupDispatcherResourceManagerComponents(MiniCluster.java:429)
Jul 01 00:00:27 at 
org.apache.flink.runtime.minicluster.MiniCluster.start(MiniCluster.java:373)
Jul 01 00:00:27 at 
org.apache.flink.runtime.jobmaster.JobMasterITCase.testRejectionOfEmptyJobGraphs(JobMasterITCase.java:56)
Jul 01 00:00:27 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
Jul 01 00:00:27 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Jul 01 00:00:27 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Jul 01 00:00:27 at java.lang.reflect.Method.invoke(Method.java:498)
Jul 01 00:00:27 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
Jul 01 00:00:27 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
Jul 01 00:00:27 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
Jul 01 00:00:27 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
Jul 01 00:00:27 at 
org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
Jul 01 00:00:27 at 
org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
Jul 01 00:00:27 at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
Jul 01 00:00:27 at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
Jul 01 00:00:27 at 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
Jul 01 00:00:27 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
Jul 01 00:00:27 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
Jul 01 00:00:27 at 
org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
Jul 01 00:00:27 at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
Jul 01 00:00:27 at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
Jul 01 00:00:27 at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
Jul 01 00:00:27 at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
Jul 01 00:00:27 at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
Jul 01 00:00:27 at 
org.junit.runners.ParentRunner.run(ParentRunner.java:413)
Jul 01 00:00:27 at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
Jul 01 00:00:27 at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
Jul 01 00:00:27 at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
Jul 01 00:00:27 at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
Jul 01 00:00:27 at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
Jul 01 00:00:27 at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
Jul 01 00:00:27 at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
Jul 01 00:00:27 at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
Jul 01 00:00:27 Caused by: java.net.BindException: Could not start rest 
endpoint on any port in port range 8081
Jul 01 00:00:27 at 

[jira] [Created] (FLINK-23195) No value for kafka metrics in Flink Web ui

2021-06-30 Thread MOBIN (Jira)
MOBIN created FLINK-23195:
-

 Summary: No value for kafka metrics in Flink Web ui
 Key: FLINK-23195
 URL: https://issues.apache.org/jira/browse/FLINK-23195
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka, Runtime / Metrics
Affects Versions: 1.11.2
Reporter: MOBIN
 Attachments: image-2021-07-01-10-02-04-483.png, 
image-2021-07-01-10-03-02-214.png

The kafka metrics in the Flink Web ui have no value, but the other metrics are 
displayed normally

!image-2021-07-01-10-02-04-483.png|width=698,height=445!

!image-2021-07-01-10-03-02-214.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-30 Thread Austin Cawley-Edwards
>
> * Even if there is a major breaking version, Flink releases major versions
> too where it could be added
> Netty framework locking is true but AFAIK there was a discussion to
> rewrite the Netty stuff to a more sexy thing but there was no agreement to
> do that.
>

Flink major releases seem to happen even less frequently than Netty
releases :( It would be unfortunate if a breaking Netty API change ended up
in the FLINK-3957[1] catch-all 2.0 changes.

All in all I would agree on making it experimental.
>

Thus I am happy with this compromise, thank you :)

This would simply restrict use-cases where order is not important. Limiting
> devs such an add way is no-go.
>

I think the only case to be made for imposing limitations would be to
encourage devs to only use this API in very specific situations, otherwise
to solve this in another way, and revisit the API if these limitations are
met and alternatives do not work. That said, I am still trying to
understand this specific Cloudera case – anything you can say about the
limitations of its Flink setup (i.e, difficult to spawn sidecar processes
(because of Yarn?)) would be greatly helpful to me and others without this
bit of context.

But I think the proposed priority function that you've added is a nice
compromise as well, so +1 from my side with the proposal. I would only
further suggest that we include the other options to this problem in the
docs as the preferred approach, where possible.

Thanks,
Austin


[1]: https://issues.apache.org/jira/browse/FLINK-3957

On Wed, Jun 30, 2021 at 10:25 AM Gabor Somogyi 
wrote:

> Answered here because the text started to be crowded.
>
> > It also locks Flink into the current major version of Netty (and the
> Netty framework itself) for the foreseeable future.
> It's not doing any Netty version locking because:
> * Netty not necessarily will add breaking changes in major versions, the
> API is quite stable
> * Even if there is a major breaking version, Flink releases major versions
> too where it could be added
> Netty framework locking is true but AFAIK there was a discussion to
> rewrite the Netty stuff to a more sexy thing but there was no agreement to
> do that.
> All in all I would agree on making it experimental.
>
> > why not restrict the service loader to only allow one?
> This would simply restrict use-cases where order is not important.
> Limiting devs such an add way is no-go.
> I think the ordering came up multiple places which I think is a good
> reason fill this gap with a priority function.
> I've updated the doc and added it...
>
> BR,
> G
>
>
> On Wed, Jun 30, 2021 at 3:53 PM Austin Cawley-Edwards <
> austin.caw...@gmail.com> wrote:
>
>> Hi Gabor,
>>
>> Thanks for your answers. I appreciate the explanations. Please see my
>> responses + further questions below.
>>
>>
>> * What stability semantics do you envision for this API?

>>> As I foresee the API will be as stable as Netty API. Since there is
>>> guarantee on no breaking changes between minor versions we can give the
>>> same guarantee.
>>> If for whatever reason we need to break it we can do it in major version
>>> like every other open source project does.
>>>
>>
>> * Does Flink expose dependencies’ APIs in other places? Since this
 exposes the Netty API, will this make it difficult to upgrade Netty?

>>> I don't expect breaking changes between minor versions so such cases
>>> there will be no issues. If there is a breaking change in major version
>>> we need to wait Flink major version too.
>>>
>>
>> To clarify, you are proposing this new API to have the same stability
>> guarantees as @Public currently does? Where we will not introduce breaking
>> changes unless absolutely necessary (and requiring a FLIP, etc.)?
>>
>> If this is the case, I think this puts the community in a tough position
>> where we are forced to maintain compatibility with something that we do not
>> have control over. It also locks Flink into the current major version of
>> Netty (and the Netty framework itself) for the foreseeable future.
>>
>> I am saying we should not do this, perhaps this is the best solution to
>> finding a good compromise here, but I am trying to discover + acknowledge
>> the full implications of this proposal so they can be discussed.
>>
>> What do you think about marking this API as @Experimental and not
>> guaranteeing stability between versions? Then, if we do decide we need to
>> upgrade Netty (or move away from it), we can do so.
>>
>> * I share Till's concern about multiple factories – other HTTP middleware
 frameworks commonly support chaining middlewares. Since the proposed API
 does not include these features/guarantee ordering, do you see any reason
 to allow more than one factory?

>>> I personally can't come up with a use-case where ordering is a must. I'm
>>> not telling that this is not a valid use-case but adding a feature w/o
>>> business rationale would include the maintenance cost (though I'm open to

Re: Job Recovery Time on TM Lost

2021-06-30 Thread Lu Niu
Thanks Gen! cc flink-dev to collect more inputs.

Best
Lu

On Wed, Jun 30, 2021 at 12:55 AM Gen Luo  wrote:

> I'm also wondering here.
>
> In my opinion, it's because the JM can not confirm whether the TM is lost
> or it's a temporary network trouble and will recover soon, since I can see
> in the log that akka has got a Connection refused but JM still sends a
> heartbeat request to the lost TM until it reaches heartbeat timeout. But
> I'm not sure if it's indeed designed like this.
>
> I would really appreciate it if anyone who knows more details could
> answer. Thanks.
>


[jira] [Created] (FLINK-23194) Cache and reuse the ContainerLaunchContext and accelarate the progress of createTaskExecutorLaunchContext on yarn

2021-06-30 Thread zlzhang0122 (Jira)
zlzhang0122 created FLINK-23194:
---

 Summary: Cache and reuse the ContainerLaunchContext and accelarate 
the progress of createTaskExecutorLaunchContext on yarn
 Key: FLINK-23194
 URL: https://issues.apache.org/jira/browse/FLINK-23194
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Affects Versions: 1.12.4, 1.13.1
Reporter: zlzhang0122
 Fix For: 1.14.0


When starting the TaskExecutor in container on yarn, this will create 
ContainerLaunchContext for n times(n represent the number of the TaskManager).

When I examine the progress of this creation, I found that most of them were in 
common and have nothing to do with the particular TaskManager except the 
launchCommand. We can create ContainerLaunchContext once and reuse it. Only the 
launchCommand need to create separately for every particular TaskManager.

So I propose that we can cache and reuse the ContainerLaunchContext object to 
accelerate this creation progress. 

I think this can have some benefit like below:
 # this can accelerate the creation of ContainerLaunchContext and also the 
start of the TaskExecutor, especially under the situation of massive 
TaskManager.
 # this can decrease the pressure of the HDFS, etc. 
 # this can also avoid the suddenly failure of the HDFS or yarn, etc.

We have implemented this on our production environment. So far there has no 
problem and have a good benefit. Let me know if there's any point that I 
haven't considered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[VOTE] Migrating Test Framework to JUnit 5

2021-06-30 Thread Qingsheng Ren
Dear devs,


As discussed in the thread[1], I’d like to start a vote on migrating the
test framework of Flink project to JUnit 5.


JUnit 5[2] provides many exciting new features that can simplify the
development of test cases and make our test structure cleaner, such as
pluggable extension models (replacing rules such as
TestLogger/MiniCluster), improved parameterized test, annotation support
and nested/dynamic test.


The migration path towards JUnit 5 would be:


1. Remove JUnit 4 dependency and introduce junit5-vintage-engine in the
project


The vintage engine will keep all existing JUnit4-style cases still
valid. Since classes of JUnit 4 and 5 are located under different packages,
there won’t be conflict having JUnit 4 cases in the project.


2. Rewrite JUnit 4 rules in JUnit 5 extension style (~10 rules)


3. Migrate all existing tests to JUnit 5


This would be a giant commit similar to code reformatting and could be
merged after cutting the 1.14 release branch. There are many migration
examples and experiences to refer to, also Intellij IDEA provides tools for
refactoring.


4. Ban JUnit 4 imports in CheckStyle


Some modules ilke Testcontainers still require JUnit 4 in the
classpath, and JUnit 4 could still appear as transitive dependency. Banning
JUnit 4 imports can avoid developers mistakenly using JUnit 4 and split the
project into 4 & 5 again.


5. Remove vintage runner and some cleanup



This vote will last for at least 72 hours, following the consensus voting
process.


Thanks!


[1]
https://lists.apache.org/thread.html/r6c8047c7265b8a9f2cb3ef6d6153dd80b94d36ebb03daccf36ab4940%40%3Cdev.flink.apache.org%3E

[2] https://junit.org/junit5

-- 
Best Regards,

*Qingsheng Ren*

Email: renqs...@gmail.com


Re: [DISCUSS] Moving to JUnit5

2021-06-30 Thread Qingsheng Ren
Thanks Arvid for the feedback! I think merging the giant commit after cutting 
1.14 release branch would be a good idea.

Since no one has objections in the discussion, I’d like to move a step forward 
and raise a vote on the migration. Looking forward to having JUnit 5 in the 
1.14 release cycle!

--
Best Regards,

Qingsheng Ren
Email: renqs...@gmail.com
On Jun 29, 2021, 9:20 PM +0800, Arvid Heise , wrote:
> Hi Qingsheng,
>
> I like the idea of enforcing JUnit5 tests with checkstyle. I'm assuming
> JUnit4 will bleed from time to time into the test classpath.
>
> Obviously, we can only do that after all tests are migrated and we are
> confident that no small change would require a contributor to do the
> migration of a test in an unrelated change.
>
> For the big commit, I'd propose to have it after branch cut for 1.14
> release. So for 1.14 we would just have the coexistance PR with the vintage
> engine. In that way, the least possible number of contributors should be
> affected. Of course, the big commit can be prepared beforehand.
>
> On Tue, Jun 29, 2021 at 11:44 AM Qingsheng Ren  wrote:
>
> > Thanks for wrapping things up and effort on the migration Arvid!
> >
> > I’m +1 for the migration plan.
> >
> > To summarize the migration path proposed by Arvid:
> >
> > 1. Remove JUnit 4 dependency and introduce junit5-vintage-engine in the
> > project (all existing cases will still work)
> > 2. Rewrite JUnit 4 rules in JUnit 5 extension style (~10 rules)
> > 3. Migrate all existing tests to JUnit 5 (This is a giant commit similar
> > to code formatting)
> > 4. Remove vintage runner and some cleanup
> >
> > One issue is that we cannot totally get rid of JUnit 4 dependency from the
> > project because:
> >
> > 1. Testcontainers. As mentioned in their official docs[1], Testcontainers
> > still depends on JUnit 4, and the problem might not be solved until
> > Testcontainer 2, which still has no roadmap[2].
> > 2. It’s still possible to appear as transitive dependency
> >
> > Since classes of JUnit 4 and 5 are under different packages, there won’t
> > be conflict having JUnit 4 in the project. To prevent the project splitting
> > into 4 & 5 again, we can ban JUnit 4 imports in CheckStyle to prevent
> > developers to write test cases in JUnit 4 style intentionally or mistakenly.
> >
> > I’m happy and willing to take over the migration work. This migration
> > indeed takes some efforts, but it will help with test case developing in
> > the future.
> >
> > [1] https://www.testcontainers.org/test_framework_integration/junit_5/
> > [2]
> > https://github.com/testcontainers/testcontainers-java/issues/970#issuecomment-437273363
> >
> > --
> > Best Regards,
> >
> > Qingsheng Ren
> > Email: renqs...@gmail.com
> > On Jun 16, 2021, 3:13 AM +0800, Arvid Heise , wrote:
> > > Sorry for following up so late. A while ago, I spiked a junit 5
> > migration.
> > >
> > > To recap: here is the migration plan.
> > >
> > > 0. (There is a way to use JUnit4 + 5 at the same time in a project -
> > you'd
> > > > use a specific JUnit4 runner to execute JUnit5. I'd like to skip this
> > > > step as it would slow down migration significantly)
> > > > 1. Use JUnit5 with vintage runner. JUnit4 tests run mostly out of the
> > > > box. The most important difference is that only 3 base rules are
> > supported
> > > > and the remainder needs to be migrated. Luckily, most of our rules
> > derive
> > > > from the supported ExternalResource. So in this step, we would need to
> > > > migrate the rules.
> > > > 2. Implement new tests in JUnit5.
> > > > 3. Soft-migrate old tests in JUnit5. This is mostly a renaming of
> > > > annotation (@Before -> @BeforeEach, etc.). Adjust parameterized tests
> > > > (~400), replace rule usages (~670) with extensions, exception handling
> > > > (~1600 tests), and timeouts (~200). This can be done on a test class by
> > > > test class base and there is no hurry.
> > > > 4. Remove vintage runner, once most tests are migrated by doing a final
> > > > push for lesser used modules.
> > > >
> > >
> > > Here are my insights:
> > > 0. works but I don't see the benefit
> > > 1. works well [1] with a small diff [2]. Note that the branch is based
> > on a
> > > quite old master.
> > > 2. works well as well [3].
> > > 2a. However, we should be aware that we need to port quite a few rules to
> > > extensions before we can implement more complex JUnit5 tests, especially
> > > ITCases (I'd probably skip junit-jupiter-migrationsupport that allows us
> > to
> > > reuse _some_ rules using specific base classes). We have ~10-15 rules
> > that
> > > need to be ported.
> > > 3. Soft migration will take forever and probably never finish. Many tests
> > > can be automatically ported with some (I used 8) simple regexes. I'd
> > rather
> > > do a hard migration of all tests at a particular point (no freeze) and
> > have
> > > that git commit excluded from blame, similar to the spotless commit.
> > > 3a. A huge chunk of changes (>90%) comes from 

Re: [DISCUSS] Moving to JUnit5

2021-06-30 Thread Qingsheng Ren
Thanks Arvid for the feedback! I think merging the giant commit after cutting 
1.14 release branch would be a good idea.

Since no one has objections in the discussion, I’d like to move a step forward 
and raise a vote on the migration. Looking forward to having JUnit 5 in the 
1.14 release cycle!

--
Best Regards,

Qingsheng Ren
Email: renqs...@gmail.com
On Jun 29, 2021, 9:20 PM +0800, Arvid Heise , wrote:
> Hi Qingsheng,
>
> I like the idea of enforcing JUnit5 tests with checkstyle. I'm assuming
> JUnit4 will bleed from time to time into the test classpath.
>
> Obviously, we can only do that after all tests are migrated and we are
> confident that no small change would require a contributor to do the
> migration of a test in an unrelated change.
>
> For the big commit, I'd propose to have it after branch cut for 1.14
> release. So for 1.14 we would just have the coexistance PR with the vintage
> engine. In that way, the least possible number of contributors should be
> affected. Of course, the big commit can be prepared beforehand.
>
> On Tue, Jun 29, 2021 at 11:44 AM Qingsheng Ren  wrote:
>
> > Thanks for wrapping things up and effort on the migration Arvid!
> >
> > I’m +1 for the migration plan.
> >
> > To summarize the migration path proposed by Arvid:
> >
> > 1. Remove JUnit 4 dependency and introduce junit5-vintage-engine in the
> > project (all existing cases will still work)
> > 2. Rewrite JUnit 4 rules in JUnit 5 extension style (~10 rules)
> > 3. Migrate all existing tests to JUnit 5 (This is a giant commit similar
> > to code formatting)
> > 4. Remove vintage runner and some cleanup
> >
> > One issue is that we cannot totally get rid of JUnit 4 dependency from the
> > project because:
> >
> > 1. Testcontainers. As mentioned in their official docs[1], Testcontainers
> > still depends on JUnit 4, and the problem might not be solved until
> > Testcontainer 2, which still has no roadmap[2].
> > 2. It’s still possible to appear as transitive dependency
> >
> > Since classes of JUnit 4 and 5 are under different packages, there won’t
> > be conflict having JUnit 4 in the project. To prevent the project splitting
> > into 4 & 5 again, we can ban JUnit 4 imports in CheckStyle to prevent
> > developers to write test cases in JUnit 4 style intentionally or mistakenly.
> >
> > I’m happy and willing to take over the migration work. This migration
> > indeed takes some efforts, but it will help with test case developing in
> > the future.
> >
> > [1] https://www.testcontainers.org/test_framework_integration/junit_5/
> > [2]
> > https://github.com/testcontainers/testcontainers-java/issues/970#issuecomment-437273363
> >
> > --
> > Best Regards,
> >
> > Qingsheng Ren
> > Email: renqs...@gmail.com
> > On Jun 16, 2021, 3:13 AM +0800, Arvid Heise , wrote:
> > > Sorry for following up so late. A while ago, I spiked a junit 5
> > migration.
> > >
> > > To recap: here is the migration plan.
> > >
> > > 0. (There is a way to use JUnit4 + 5 at the same time in a project -
> > you'd
> > > > use a specific JUnit4 runner to execute JUnit5. I'd like to skip this
> > > > step as it would slow down migration significantly)
> > > > 1. Use JUnit5 with vintage runner. JUnit4 tests run mostly out of the
> > > > box. The most important difference is that only 3 base rules are
> > supported
> > > > and the remainder needs to be migrated. Luckily, most of our rules
> > derive
> > > > from the supported ExternalResource. So in this step, we would need to
> > > > migrate the rules.
> > > > 2. Implement new tests in JUnit5.
> > > > 3. Soft-migrate old tests in JUnit5. This is mostly a renaming of
> > > > annotation (@Before -> @BeforeEach, etc.). Adjust parameterized tests
> > > > (~400), replace rule usages (~670) with extensions, exception handling
> > > > (~1600 tests), and timeouts (~200). This can be done on a test class by
> > > > test class base and there is no hurry.
> > > > 4. Remove vintage runner, once most tests are migrated by doing a final
> > > > push for lesser used modules.
> > > >
> > >
> > > Here are my insights:
> > > 0. works but I don't see the benefit
> > > 1. works well [1] with a small diff [2]. Note that the branch is based
> > on a
> > > quite old master.
> > > 2. works well as well [3].
> > > 2a. However, we should be aware that we need to port quite a few rules to
> > > extensions before we can implement more complex JUnit5 tests, especially
> > > ITCases (I'd probably skip junit-jupiter-migrationsupport that allows us
> > to
> > > reuse _some_ rules using specific base classes). We have ~10-15 rules
> > that
> > > need to be ported.
> > > 3. Soft migration will take forever and probably never finish. Many tests
> > > can be automatically ported with some (I used 8) simple regexes. I'd
> > rather
> > > do a hard migration of all tests at a particular point (no freeze) and
> > have
> > > that git commit excluded from blame, similar to the spotless commit.
> > > 3a. A huge chunk of changes (>90%) comes from 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-30 Thread Gabor Somogyi
Answered here because the text started to be crowded.

> It also locks Flink into the current major version of Netty (and the
Netty framework itself) for the foreseeable future.
It's not doing any Netty version locking because:
* Netty not necessarily will add breaking changes in major versions, the
API is quite stable
* Even if there is a major breaking version, Flink releases major versions
too where it could be added
Netty framework locking is true but AFAIK there was a discussion to rewrite
the Netty stuff to a more sexy thing but there was no agreement to do that.
All in all I would agree on making it experimental.

> why not restrict the service loader to only allow one?
This would simply restrict use-cases where order is not important. Limiting
devs such an add way is no-go.
I think the ordering came up multiple places which I think is a good reason
fill this gap with a priority function.
I've updated the doc and added it...

BR,
G


On Wed, Jun 30, 2021 at 3:53 PM Austin Cawley-Edwards <
austin.caw...@gmail.com> wrote:

> Hi Gabor,
>
> Thanks for your answers. I appreciate the explanations. Please see my
> responses + further questions below.
>
>
> * What stability semantics do you envision for this API?
>>>
>> As I foresee the API will be as stable as Netty API. Since there is
>> guarantee on no breaking changes between minor versions we can give the
>> same guarantee.
>> If for whatever reason we need to break it we can do it in major version
>> like every other open source project does.
>>
>
> * Does Flink expose dependencies’ APIs in other places? Since this exposes
>>> the Netty API, will this make it difficult to upgrade Netty?
>>>
>> I don't expect breaking changes between minor versions so such cases
>> there will be no issues. If there is a breaking change in major version
>> we need to wait Flink major version too.
>>
>
> To clarify, you are proposing this new API to have the same stability
> guarantees as @Public currently does? Where we will not introduce breaking
> changes unless absolutely necessary (and requiring a FLIP, etc.)?
>
> If this is the case, I think this puts the community in a tough position
> where we are forced to maintain compatibility with something that we do not
> have control over. It also locks Flink into the current major version of
> Netty (and the Netty framework itself) for the foreseeable future.
>
> I am saying we should not do this, perhaps this is the best solution to
> finding a good compromise here, but I am trying to discover + acknowledge
> the full implications of this proposal so they can be discussed.
>
> What do you think about marking this API as @Experimental and not
> guaranteeing stability between versions? Then, if we do decide we need to
> upgrade Netty (or move away from it), we can do so.
>
> * I share Till's concern about multiple factories – other HTTP middleware
>>> frameworks commonly support chaining middlewares. Since the proposed API
>>> does not include these features/guarantee ordering, do you see any reason
>>> to allow more than one factory?
>>>
>> I personally can't come up with a use-case where ordering is a must. I'm
>> not telling that this is not a valid use-case but adding a feature w/o
>> business rationale would include the maintenance cost (though I'm open to
>> add).
>> As I've seen Till also can't give example for that (please see the doc
>> comments). If you have anything in mind please share it and we can add
>> priority to the API.
>> There is another option too, namely we can be defensive and we can add
>> the priority right now. I would do this only if everybody states in mail
>> that it would be the best option,
>> otherwise I would stick to the original plan.
>>
>
> Let me try to come up with a use case:
> * Someone creates an authentication module for integrating with Google's
> OAuth and publishes it to flink-packages
> * Another person in another org wants to use Google OAuth and then add
> internal authorization based on the user
> * In this scenario, *Google OAuth must come before the internal
> authorization*
> * They place their module and the Google OAuth module to be picked up by
> the service loader
> * What happens?
>
> I do not think that the current proposal has a way to handle this, besides
> having the implementor of the internal authorization module bundle
> everything into one, as you have suggested. Since this is the only way to
> achieve order, why not restrict the service loader to only allow one? This
> way the API is explicit in what it supports.
>
>
> Let me know what you think,
> Austin
>
>
> On Wed, Jun 30, 2021 at 5:24 AM Gabor Somogyi 
> wrote:
>
>> Hi Austin,
>>
>> Please see my answers embedded down below.
>>
>> BR,
>> G
>>
>>
>>
>> On Tue, Jun 29, 2021 at 9:59 PM Austin Cawley-Edwards <
>> austin.caw...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> Thanks for the updated proposal. I have a few questions about the API,
>>> please see below.
>>>
>>> * What stability semantics do you 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-30 Thread Austin Cawley-Edwards
Small correction:

I am *not *saying we should not do this, perhaps this is the best solution
to finding a good compromise here, but I am trying to discover +
acknowledge the full implications of this proposal so they can be
discussed.

Sorry :)

On Wed, Jun 30, 2021 at 9:53 AM Austin Cawley-Edwards <
austin.caw...@gmail.com> wrote:

> Hi Gabor,
>
> Thanks for your answers. I appreciate the explanations. Please see my
> responses + further questions below.
>
>
> * What stability semantics do you envision for this API?
>>>
>> As I foresee the API will be as stable as Netty API. Since there is
>> guarantee on no breaking changes between minor versions we can give the
>> same guarantee.
>> If for whatever reason we need to break it we can do it in major version
>> like every other open source project does.
>>
>
> * Does Flink expose dependencies’ APIs in other places? Since this exposes
>>> the Netty API, will this make it difficult to upgrade Netty?
>>>
>> I don't expect breaking changes between minor versions so such cases
>> there will be no issues. If there is a breaking change in major version
>> we need to wait Flink major version too.
>>
>
> To clarify, you are proposing this new API to have the same stability
> guarantees as @Public currently does? Where we will not introduce breaking
> changes unless absolutely necessary (and requiring a FLIP, etc.)?
>
> If this is the case, I think this puts the community in a tough position
> where we are forced to maintain compatibility with something that we do not
> have control over. It also locks Flink into the current major version of
> Netty (and the Netty framework itself) for the foreseeable future.
>
> I am saying we should not do this, perhaps this is the best solution to
> finding a good compromise here, but I am trying to discover + acknowledge
> the full implications of this proposal so they can be discussed.
>
> What do you think about marking this API as @Experimental and not
> guaranteeing stability between versions? Then, if we do decide we need to
> upgrade Netty (or move away from it), we can do so.
>
> * I share Till's concern about multiple factories – other HTTP middleware
>>> frameworks commonly support chaining middlewares. Since the proposed API
>>> does not include these features/guarantee ordering, do you see any reason
>>> to allow more than one factory?
>>>
>> I personally can't come up with a use-case where ordering is a must. I'm
>> not telling that this is not a valid use-case but adding a feature w/o
>> business rationale would include the maintenance cost (though I'm open to
>> add).
>> As I've seen Till also can't give example for that (please see the doc
>> comments). If you have anything in mind please share it and we can add
>> priority to the API.
>> There is another option too, namely we can be defensive and we can add
>> the priority right now. I would do this only if everybody states in mail
>> that it would be the best option,
>> otherwise I would stick to the original plan.
>>
>
> Let me try to come up with a use case:
> * Someone creates an authentication module for integrating with Google's
> OAuth and publishes it to flink-packages
> * Another person in another org wants to use Google OAuth and then add
> internal authorization based on the user
> * In this scenario, *Google OAuth must come before the internal
> authorization*
> * They place their module and the Google OAuth module to be picked up by
> the service loader
> * What happens?
>
> I do not think that the current proposal has a way to handle this, besides
> having the implementor of the internal authorization module bundle
> everything into one, as you have suggested. Since this is the only way to
> achieve order, why not restrict the service loader to only allow one? This
> way the API is explicit in what it supports.
>
>
> Let me know what you think,
> Austin
>
>
> On Wed, Jun 30, 2021 at 5:24 AM Gabor Somogyi 
> wrote:
>
>> Hi Austin,
>>
>> Please see my answers embedded down below.
>>
>> BR,
>> G
>>
>>
>>
>> On Tue, Jun 29, 2021 at 9:59 PM Austin Cawley-Edwards <
>> austin.caw...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> Thanks for the updated proposal. I have a few questions about the API,
>>> please see below.
>>>
>>> * What stability semantics do you envision for this API?
>>>
>> As I foresee the API will be as stable as Netty API. Since there is
>> guarantee on no breaking changes between minor versions we can give the
>> same guarantee.
>> If for whatever reason we need to break it we can do it in major version
>> like every other open source project does.
>>
>>
>>> * Does Flink expose dependencies’ APIs in other places? Since this
>>> exposes the Netty API, will this make it difficult to upgrade Netty?
>>>
>> I don't expect breaking changes between minor versions so such cases
>> there will be no issues. If there is a breaking change in major version
>> we need to wait Flink major version too.
>>
>>
>>> * I share Till's concern about multiple 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-30 Thread Austin Cawley-Edwards
Hi Gabor,

Thanks for your answers. I appreciate the explanations. Please see my
responses + further questions below.


* What stability semantics do you envision for this API?
>>
> As I foresee the API will be as stable as Netty API. Since there is
> guarantee on no breaking changes between minor versions we can give the
> same guarantee.
> If for whatever reason we need to break it we can do it in major version
> like every other open source project does.
>

* Does Flink expose dependencies’ APIs in other places? Since this exposes
>> the Netty API, will this make it difficult to upgrade Netty?
>>
> I don't expect breaking changes between minor versions so such cases there
> will be no issues. If there is a breaking change in major version
> we need to wait Flink major version too.
>

To clarify, you are proposing this new API to have the same stability
guarantees as @Public currently does? Where we will not introduce breaking
changes unless absolutely necessary (and requiring a FLIP, etc.)?

If this is the case, I think this puts the community in a tough position
where we are forced to maintain compatibility with something that we do not
have control over. It also locks Flink into the current major version of
Netty (and the Netty framework itself) for the foreseeable future.

I am saying we should not do this, perhaps this is the best solution to
finding a good compromise here, but I am trying to discover + acknowledge
the full implications of this proposal so they can be discussed.

What do you think about marking this API as @Experimental and not
guaranteeing stability between versions? Then, if we do decide we need to
upgrade Netty (or move away from it), we can do so.

* I share Till's concern about multiple factories – other HTTP middleware
>> frameworks commonly support chaining middlewares. Since the proposed API
>> does not include these features/guarantee ordering, do you see any reason
>> to allow more than one factory?
>>
> I personally can't come up with a use-case where ordering is a must. I'm
> not telling that this is not a valid use-case but adding a feature w/o
> business rationale would include the maintenance cost (though I'm open to
> add).
> As I've seen Till also can't give example for that (please see the doc
> comments). If you have anything in mind please share it and we can add
> priority to the API.
> There is another option too, namely we can be defensive and we can add the
> priority right now. I would do this only if everybody states in mail that
> it would be the best option,
> otherwise I would stick to the original plan.
>

Let me try to come up with a use case:
* Someone creates an authentication module for integrating with Google's
OAuth and publishes it to flink-packages
* Another person in another org wants to use Google OAuth and then add
internal authorization based on the user
* In this scenario, *Google OAuth must come before the internal
authorization*
* They place their module and the Google OAuth module to be picked up by
the service loader
* What happens?

I do not think that the current proposal has a way to handle this, besides
having the implementor of the internal authorization module bundle
everything into one, as you have suggested. Since this is the only way to
achieve order, why not restrict the service loader to only allow one? This
way the API is explicit in what it supports.


Let me know what you think,
Austin


On Wed, Jun 30, 2021 at 5:24 AM Gabor Somogyi 
wrote:

> Hi Austin,
>
> Please see my answers embedded down below.
>
> BR,
> G
>
>
>
> On Tue, Jun 29, 2021 at 9:59 PM Austin Cawley-Edwards <
> austin.caw...@gmail.com> wrote:
>
>> Hi all,
>>
>> Thanks for the updated proposal. I have a few questions about the API,
>> please see below.
>>
>> * What stability semantics do you envision for this API?
>>
> As I foresee the API will be as stable as Netty API. Since there is
> guarantee on no breaking changes between minor versions we can give the
> same guarantee.
> If for whatever reason we need to break it we can do it in major version
> like every other open source project does.
>
>
>> * Does Flink expose dependencies’ APIs in other places? Since this
>> exposes the Netty API, will this make it difficult to upgrade Netty?
>>
> I don't expect breaking changes between minor versions so such cases there
> will be no issues. If there is a breaking change in major version
> we need to wait Flink major version too.
>
>
>> * I share Till's concern about multiple factories – other HTTP middleware
>> frameworks commonly support chaining middlewares. Since the proposed API
>> does not include these features/guarantee ordering, do you see any reason
>> to allow more than one factory?
>>
> I personally can't come up with a use-case where ordering is a must. I'm
> not telling that this is not a valid use-case but adding a feature w/o
> business rationale would include the maintenance cost (though I'm open to
> add).
> As I've seen Till also can't give 

[jira] [Created] (FLINK-23193) Support to display Options when using desc table statement

2021-06-30 Thread Fangliang Liu (Jira)
Fangliang Liu created FLINK-23193:
-

 Summary: Support to display Options when using desc table statement
 Key: FLINK-23193
 URL: https://issues.apache.org/jira/browse/FLINK-23193
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Client
Reporter: Fangliang Liu


I use the following statement to create a table.
{code:java}
CREATE TABLE datagen (
  f_sequence INT,
  f_key1 INT,
  f_key2 INT,
  f_random_str STRING,
  log_ts  TIMESTAMP(3),
  WATERMARK FOR log_ts AS log_ts
) WITH (
  'connector' = 'datagen',
  'rows-per-second' = '10',
  'fields.f_sequence.kind' = 'sequence',
  'fields.f_sequence.start' = '1',
  'fields.f_sequence.end' = '1',
  'fields.f_key1.min' = '1',
  'fields.f_key1.max' = '20',
  'fields.f_key2.min' = '1',
  'fields.f_key2.max' = '20',
  'fields.f_random_str.length' = '5'
);
{code}
When I use the `desc datagen` to view the table. Got the following result.
{code:java}
+--++--+-++---+
| name |   type | null | key | extras | watermark |
+--++--+-++---+
|   f_sequence |INT | true | ||   |
|   f_key1 |INT | true | ||   |
|   f_key2 |INT | true | ||   |
| f_random_str | STRING | true | ||   |
|   log_ts | TIMESTAMP(3) *ROWTIME* | true | ||  `log_ts` |
+--++--+-++---+
5 rows in set
{code}
Cannot display the information in the with statement.

I think the following information is also necessary to show when the desc 
statement is executed.

 
{code:java}
'connector' = 'datagen', 
'rows-per-second' = '10', 
'fields.f_sequence.kind' = 'sequence', 
'fields.f_sequence.start' = '1', 
'fields.f_sequence.end' = '1', 
'fields.f_key1.min' = '1', 
'fields.f_key1.max' = '20', 
'fields.f_key2.min' = '1', 
'fields.f_key2.max' = '20', 
'fields.f_random_str.length' = '5'
{code}
 

[~jark], what do you think?Looking forward to your reply.

.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[DISCUSS] Releasing Flink 1.11.4

2021-06-30 Thread godfrey he
Hi devs,

As discussed in [1], I would like to start a discussion for releasing Flink
1.11.4.

I would like to volunteer as the release manger for 1.11.4, and will start
the release process on the next Wednesday (July 7th).

There are 75 issues that have been closed or resolved [2],
and no blocker issues left [3] so far.

If any issues need to be marked as blocker for 1.11.4, please let me know
in this thread!

Best,
Godfrey


[1]
https://lists.apache.org/thread.html/r40a541027c6a04519f37c61f2a6f3dabdb821b3760cda9cc6ebe6ce9%40%3Cdev.flink.apache.org%3E
[2]
https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.11.4%20AND%20status%20in%20(Closed%2C%20Resolved)
[3]
https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.11.4%20AND%20status%20not%20in%20(Closed%2C%20Resolved)%20ORDER%20BY%20priority%20DESC


[jira] [Created] (FLINK-23192) Move connector/format option classes into a common package

2021-06-30 Thread Jira
Ingo Bürk created FLINK-23192:
-

 Summary: Move connector/format option classes into a common package
 Key: FLINK-23192
 URL: https://issues.apache.org/jira/browse/FLINK-23192
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Ingo Bürk






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Feedback Collection Jira Bot

2021-06-30 Thread Till Rohrmann
I agree that we shouldn't discourage contributions.

For me the main idea of the bot is not to clean up the JIRA but to improve
our communication and expectation management with the community. There are
many things we could do but for a lot of things we don't have the time and
capacity. Then to say at some point that we won't do something is just
being honest. This also shows when looking at the JIRA numbers of the
merged commits. We very rarely resolve tickets which are older than x days
and if we do, then we usually create a new ticket for the problem.

The fact that we see some tickets with available pull requests go stale is
the symptom that we don't value them to be important enough or
allocate enough time for external contributions imo. Otherwise, they would
have gotten the required attention and been merged. In such a case, raising
awareness by pinging the watchers of the respective ticket is probably
better than silently ignoring the PR. Also adding labels to filter for
these PRs should help to get them the required attention. But also here, it
happens very rarely that we actually merge a PR that is older than y days.
Ideally we avoid this situation altogether by only assigning contributors
to tickets for which a committer has review capacity. However, this does
not seem to always work.

In some sense, the JIRA bot shows us the things, which fall through the
cracks, more explicitly (which is probably not different than before). Of
course we should try to find the time periods for when to ping or
de-prioritize tickets that work best for the community.

+1 for the proposed changes (extended time periods, "Not a Priority",
default priority and fixVersion).

@Piotr, I think we have the priorities defined here [1]. Maybe it is enough
to share the link so that everyone can check whether her assumptions are
correct.

[1] https://cwiki.apache.org/confluence/display/FLINK/Flink+Jira+Process

Cheers,
Till

On Wed, Jun 30, 2021 at 10:59 AM Piotr Nowojski 
wrote:

> > * Introduce "Not a Priority" priority and stop closing tickets.
>
> +1 for this one (I also like the name you proposed for this Konstantin)
>
> I also have no objections to other proposals that you summarised. Just a
> remark, that independently of this discussion we might want to revisit or
> reconfirm the priorities and their definition/interpretation across all
> contributors.
>
> Best,
> Piotrek
>
> śr., 30 cze 2021 o 10:15 Konstantin Knauf  napisał(a):
>
> > Hi everyone,
> >
> > Thank you for the additional comments and suggestions.
> >
> > @Stephan, Kurt: I agree that we shouldn't discourage or dishearten
> > contributors, and probably 14 days until a ticket becomes
> "stale-assigned"
> > are too few. That's why I've already proposed to increase that to 30
> days.
> > Similarly the times for Major/Critical tickets can be increased. From my
> > perspective, the root causes are the following:
> >
> > * tickets are opened with the wrong priority (see
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Jira+Process#FlinkJiraProcess-TicketsPriorities
> > ).
> > Here it might help to change the default priority.
> > * committers don't have the time to review tickets or don't bring
> community
> > contributions to a resolution. The Jira bot makes this fact more visible.
> > Without the Jira Bot no external contributor would get more attention,
> and
> > no external contribution would be merged faster. Ideally, it'd be the
> > opposite and committers would actively monitor tickets with labels
> > "stale-assigned" and "pull-request-available" in order to review those
> with
> > priority. That's also why I am not a fan of excluding tickets with
> > "pull-request-available" from the bot. The bot can help to make these
> > tickets visible to reviewers.
> >
> > @Jing Zhang: That's a problem. We should try to change the permissions
> > accordingly or need to find a different solution.
> >
> > @Piotr, Kurt: Instead of closing tickets, we could introduce an
> additional
> > priority like "Not a Priority" to which we move tickets. No ticket would
> be
> > closed automatically.
> >
> > Overall, the following changes could resolve most of the concerns, I
> > believe:
> >
> > * Ignore tickets with a fixVersion for all rules but the stale-unassigned
> > role.
> >
> > * We change the time intervals as follows, accepting reality a bit more
> ;)
> >
> > * stale-assigned only after 30 days (instead of 14 days)
> > * stale-critical only after 14 days (instead of 7 days)
> > * stale-major only after 60 days (instead of 30 days)
> >
> > * Introduce "Not a Priority" priority and stop closing tickets.
> >
> > * Change default priority for new tickets of Flink project to "Minor"
> >
> > The measures, I proposed above, still try to achieve the goals mentioned
> > and agreed upon in the original discussion thread, which were the
> > following:
> >
> >
> >-
> >
> >clearer communication and expectation management with the community
> >-

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-30 Thread Xintong Song
I second Tison's opinion.

Similar to how we skip docs_404_check for PRs that do not touch the
documentation, we can skip other stages for PRs that only contain
documentation changes.

In addition to making merging documentation PRs easier, we can also reduce
the workload on CI workers. Especially during the last days of a release
cycle, which is usually the most busy time for the CI workers, and is also
where most documentation efforts take place.

Thank you~

Xintong Song



On Wed, Jun 30, 2021 at 3:56 PM Till Rohrmann  wrote:

> I think you are right Tison that docs are a special case and they only
> require flink-docs to pass. What I am wondering is how much of a problem
> this will be (assuming that we have a decent build stability). The more
> exceptions we add, the harder it will be to properly follow the guidelines.
> Maybe we can observe how many docs PRs get delayed/not merged because of
> this and then revisit this discussion if needed.
>
> Cheers,
> Till
>
> On Wed, Jun 30, 2021 at 8:30 AM tison  wrote:
>
> > Hi,
> >
> > There are a number of PRs modifying docs only, but we still require all
> > tests passed on that.
> >
> > It is a good proposal we avoid merge PR with "unrelated" failure, but can
> > we improve the case where the contributor only works for docs?
> >
> > For example, base on the file change set, run doc tests only.
> >
> > Best,
> > tison.
> >
> >
> > godfrey he  于2021年6月30日周三 下午2:17写道:
> >
> > > +1 for the proposal. Thanks Xintong!
> > >
> > > Best,
> > > Godfrey
> > >
> > >
> > >
> > > Rui Li  于2021年6月30日周三 上午11:36写道:
> > >
> > > > Thanks Xintong. +1 to the proposal.
> > > >
> > > > On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 
> > wrote:
> > > >
> > > > > +1 for the proposal. Since the test time is long and environment
> may
> > > > vary,
> > > > > unstable tests are really annoying for developers. The solution is
> > > > welcome.
> > > > >
> > > > > Best
> > > > > liujiangang
> > > > >
> > > > > Jingsong Li  于2021年6月29日周二 上午10:31写道:
> > > > >
> > > > > > +1 Thanks Xintong for the update!
> > > > > >
> > > > > > Best,
> > > > > > Jingsong
> > > > > >
> > > > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann <
> > trohrm...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > +1, thanks for updating the guidelines Xintong!
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo <
> karma...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > > Thanks Xintong for drafting this doc.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yangze Guo
> > > > > > > >
> > > > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG <
> > beyond1...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Thanks Xintong for giving detailed documentation.
> > > > > > > > >
> > > > > > > > > The best practice for handling test failure is very
> detailed,
> > > > it's
> > > > > a
> > > > > > > good
> > > > > > > > > guidelines document with clear action steps.
> > > > > > > > >
> > > > > > > > > +1 to Xintong's proposal.
> > > > > > > > >
> > > > > > > > > Xintong Song  于2021年6月28日周一
> 下午4:07写道:
> > > > > > > > >
> > > > > > > > > > Thanks all for the discussion.
> > > > > > > > > >
> > > > > > > > > > Based on the opinions so far, I've drafted the new
> > guidelines
> > > > > [1],
> > > > > > > as a
> > > > > > > > > > potential replacement of the original wiki page [2].
> > > > > > > > > >
> > > > > > > > > > Hopefully this draft has covered the most opinions
> > discussed
> > > > and
> > > > > > > > consensus
> > > > > > > > > > made in this discussion thread.
> > > > > > > > > >
> > > > > > > > > > Looking forward to your feedback.
> > > > > > > > > >
> > > > > > > > > > Thank you~
> > > > > > > > > >
> > > > > > > > > > Xintong Song
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > > > > > > > > >
> > > > > > > > > > [2]
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> > > > > > > pnowoj...@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for the clarification Till. +1 for what you have
> > > > > written.
> > > > > > > > > > >
> > > > > > > > > > > Piotrek
> > > > > > > > > > >
> > > > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann <
> > > trohrm...@apache.org
> > > > >
> > > > > > > > > > napisał(a):
> > > > > > > > > > >
> > > > > > > > > > > > One quick note for clarification. I don't have
> anything
> > > > > against
> > > > > > > > builds
> > > > > > > > > > > > running on your 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-30 Thread Gabor Somogyi
Hi Austin,

Please see my answers embedded down below.

BR,
G



On Tue, Jun 29, 2021 at 9:59 PM Austin Cawley-Edwards <
austin.caw...@gmail.com> wrote:

> Hi all,
>
> Thanks for the updated proposal. I have a few questions about the API,
> please see below.
>
> * What stability semantics do you envision for this API?
>
As I foresee the API will be as stable as Netty API. Since there is
guarantee on no breaking changes between minor versions we can give the
same guarantee.
If for whatever reason we need to break it we can do it in major version
like every other open source project does.


> * Does Flink expose dependencies’ APIs in other places? Since this exposes
> the Netty API, will this make it difficult to upgrade Netty?
>
I don't expect breaking changes between minor versions so such cases there
will be no issues. If there is a breaking change in major version
we need to wait Flink major version too.


> * I share Till's concern about multiple factories – other HTTP middleware
> frameworks commonly support chaining middlewares. Since the proposed API
> does not include these features/guarantee ordering, do you see any reason
> to allow more than one factory?
>
I personally can't come up with a use-case where ordering is a must. I'm
not telling that this is not a valid use-case but adding a feature w/o
business rationale would include the maintenance cost (though I'm open to
add).
As I've seen Till also can't give example for that (please see the doc
comments). If you have anything in mind please share it and we can add
priority to the API.
There is another option too, namely we can be defensive and we can add the
priority right now. I would do this only if everybody states in mail that
it would be the best option,
otherwise I would stick to the original plan.


>
> Best,
> Austin
>
> On Tue, Jun 29, 2021 at 8:55 AM Márton Balassi 
> wrote:
>
>> Hi all,
>>
>> I commend Konstantin and Till when it comes to standing up for the
>> community values.
>>
>> Based on your feedback we are withdrawing the original proposal and
>> attaching a more general custom netty handler API proposal [1] written by
>> G. The change necessary to the Flink repository is approximately 500 lines
>> of code. [2]
>>
>> Please let us focus on discussing the details of this API and whether it
>> covers the necessary use cases.
>>
>> [1]
>>
>> https://docs.google.com/document/d/1Idnw8YauMK1x_14iv0rVF0Hqm58J6Dg-hi-hEuL6hwM/edit#heading=h.ijcbce3c5gip
>> [2]
>>
>> https://github.com/gaborgsomogyi/flink/commit/942f23679ac21428bb87fc85557b9b443fcaf310
>>
>> Thanks,
>> Marton
>>
>> On Wed, Jun 23, 2021 at 9:36 PM Austin Cawley-Edwards <
>> austin.caw...@gmail.com> wrote:
>>
>> > Hi all,
>> >
>> > Thanks, Konstantin and Till, for guiding the discussion.
>> >
>> > I was not aware of the results of the call with Konstantin and was
>> > attempting to resolve the unanswered questions before more, potentially
>> > fruitless, work was done.
>> >
>> > I am also looking forward to the coming proposal, as well as increasing
>> my
>> > understanding of this specific use case + its limitations!
>> >
>> > Best,
>> > Austin
>> >
>> > On Tue, Jun 22, 2021 at 6:32 AM Till Rohrmann 
>> > wrote:
>> >
>> > > Hi everyone,
>> > >
>> > > I do like the idea of keeping the actual change outside of Flink but
>> to
>> > > enable Flink to support such a use case (different authentication
>> > > mechanisms). I think this is a good compromise for the community that
>> > > combines long-term maintainability with support for new use-cases. I
>> am
>> > > looking forward to your proposal.
>> > >
>> > > I also want to second Konstantin here that the tone of your last
>> email,
>> > > Marton, does not reflect the values and manners of the Flink community
>> > and
>> > > is not representative of how we conduct discussions. Especially, the
>> more
>> > > senior community members should know this and act accordingly in
>> order to
>> > > be good role models for others in the community. Technical discussions
>> > > should not be decided by who wields presumably the greatest authority
>> but
>> > > by the soundness of arguments and by what is the best solution for a
>> > > problem.
>> > >
>> > > Let us now try to find the best solution for the problem at hand!
>> > >
>> > > Cheers,
>> > > Till
>> > >
>> > > On Tue, Jun 22, 2021 at 11:24 AM Konstantin Knauf 
>> > > wrote:
>> > >
>> > > > Hi everyone,
>> > > >
>> > > > First, Marton and I had a brief conversation yesterday offline and
>> > > > discussed exploring the approach of exposing the authentication
>> > > > functionality via an API. So, I am looking forward to your proposal
>> in
>> > > that
>> > > > direction. The benefit of such a solution would be that it is
>> > extensible
>> > > > for others and it does add a smaller maintenance (in particular
>> > testing)
>> > > > footprint to Apache Flink itself. If we end up going down this
>> route,
>> > > > flink-packages.org would be a great way to promote these third
>> 

Re: [DISCUSS] Feedback Collection Jira Bot

2021-06-30 Thread Piotr Nowojski
> * Introduce "Not a Priority" priority and stop closing tickets.

+1 for this one (I also like the name you proposed for this Konstantin)

I also have no objections to other proposals that you summarised. Just a
remark, that independently of this discussion we might want to revisit or
reconfirm the priorities and their definition/interpretation across all
contributors.

Best,
Piotrek

śr., 30 cze 2021 o 10:15 Konstantin Knauf  napisał(a):

> Hi everyone,
>
> Thank you for the additional comments and suggestions.
>
> @Stephan, Kurt: I agree that we shouldn't discourage or dishearten
> contributors, and probably 14 days until a ticket becomes "stale-assigned"
> are too few. That's why I've already proposed to increase that to 30 days.
> Similarly the times for Major/Critical tickets can be increased. From my
> perspective, the root causes are the following:
>
> * tickets are opened with the wrong priority (see
>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Jira+Process#FlinkJiraProcess-TicketsPriorities
> ).
> Here it might help to change the default priority.
> * committers don't have the time to review tickets or don't bring community
> contributions to a resolution. The Jira bot makes this fact more visible.
> Without the Jira Bot no external contributor would get more attention, and
> no external contribution would be merged faster. Ideally, it'd be the
> opposite and committers would actively monitor tickets with labels
> "stale-assigned" and "pull-request-available" in order to review those with
> priority. That's also why I am not a fan of excluding tickets with
> "pull-request-available" from the bot. The bot can help to make these
> tickets visible to reviewers.
>
> @Jing Zhang: That's a problem. We should try to change the permissions
> accordingly or need to find a different solution.
>
> @Piotr, Kurt: Instead of closing tickets, we could introduce an additional
> priority like "Not a Priority" to which we move tickets. No ticket would be
> closed automatically.
>
> Overall, the following changes could resolve most of the concerns, I
> believe:
>
> * Ignore tickets with a fixVersion for all rules but the stale-unassigned
> role.
>
> * We change the time intervals as follows, accepting reality a bit more ;)
>
> * stale-assigned only after 30 days (instead of 14 days)
> * stale-critical only after 14 days (instead of 7 days)
> * stale-major only after 60 days (instead of 30 days)
>
> * Introduce "Not a Priority" priority and stop closing tickets.
>
> * Change default priority for new tickets of Flink project to "Minor"
>
> The measures, I proposed above, still try to achieve the goals mentioned
> and agreed upon in the original discussion thread, which were the
> following:
>
>
>-
>
>clearer communication and expectation management with the community
>-
>
>   a user or contributor should be able to judge the urgency of a ticket
>   by its priority
>   -
>
>   if a ticket is assigned to someone the expectation that someone is
>   working on it should hold
>   -
>
>generally reduce noise in Jira
>-
>
>reduce overhead of committers to ask about status updates of
>contributions or bug reports
>-
>
>   “Are you still working on this?”
>   -
>
>   “Are you still interested in this?”
>   -
>
>   “Does this still happen on Flink 1.x?”
>   -
>
>   “Are you still experiencing this issue?”
>   -
>
>   “What is the status of the implementation”?
>   -
>
>while still encouraging users to add new tickets and to leave feedback
>about existing tickets
>
>
> Kurt, Stephan, if you'd like to change the bot to "just close very old
> tickets", I suggest you start a separate discussion and subsequent voting
> thread.
>
> Best,
>
> Konstantin
>
>
> On Wed, Jun 30, 2021 at 9:01 AM Kurt Young  wrote:
>
> > +1 to Stephan's opinion, with just one minor difference. For my
> experience
> > and a project
> > as big as Flink, picking up an issue created 1-2 years ago seems normal
> to
> > me. To be more
> > specific, during the blink planner merge, I created lots of clean up &
> > refactor issues, trying
> > to make the code be more clean. I haven't had a chance to resolve all
> these
> > but I think they are
> > still good improvements. Thus I would propose we don't close any stall
> > issues for at least 2 years.
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Jun 30, 2021 at 7:22 AM Stephan Ewen  wrote:
> >
> > > Being a bit late to the party, and don't want to ask to change
> > everything,
> > > just maybe some observation.
> > >
> > > My main observation and concern is still that this puts pressure and
> > > confusion on contributors, which are mostly blocked on committers for
> > > reviews, or are taking tickets as multi-week projects. I think it is
> not
> > a
> > > great experience for contributors, when they are already unsure why
> their
> > > work isn't getting the attention from committers that 

Re: [DISCUSS] Feedback Collection Jira Bot

2021-06-30 Thread Konstantin Knauf
Hi everyone,

Thank you for the additional comments and suggestions.

@Stephan, Kurt: I agree that we shouldn't discourage or dishearten
contributors, and probably 14 days until a ticket becomes "stale-assigned"
are too few. That's why I've already proposed to increase that to 30 days.
Similarly the times for Major/Critical tickets can be increased. From my
perspective, the root causes are the following:

* tickets are opened with the wrong priority (see
https://cwiki.apache.org/confluence/display/FLINK/Flink+Jira+Process#FlinkJiraProcess-TicketsPriorities).
Here it might help to change the default priority.
* committers don't have the time to review tickets or don't bring community
contributions to a resolution. The Jira bot makes this fact more visible.
Without the Jira Bot no external contributor would get more attention, and
no external contribution would be merged faster. Ideally, it'd be the
opposite and committers would actively monitor tickets with labels
"stale-assigned" and "pull-request-available" in order to review those with
priority. That's also why I am not a fan of excluding tickets with
"pull-request-available" from the bot. The bot can help to make these
tickets visible to reviewers.

@Jing Zhang: That's a problem. We should try to change the permissions
accordingly or need to find a different solution.

@Piotr, Kurt: Instead of closing tickets, we could introduce an additional
priority like "Not a Priority" to which we move tickets. No ticket would be
closed automatically.

Overall, the following changes could resolve most of the concerns, I
believe:

* Ignore tickets with a fixVersion for all rules but the stale-unassigned
role.

* We change the time intervals as follows, accepting reality a bit more ;)

* stale-assigned only after 30 days (instead of 14 days)
* stale-critical only after 14 days (instead of 7 days)
* stale-major only after 60 days (instead of 30 days)

* Introduce "Not a Priority" priority and stop closing tickets.

* Change default priority for new tickets of Flink project to "Minor"

The measures, I proposed above, still try to achieve the goals mentioned
and agreed upon in the original discussion thread, which were the following:


   -

   clearer communication and expectation management with the community
   -

  a user or contributor should be able to judge the urgency of a ticket
  by its priority
  -

  if a ticket is assigned to someone the expectation that someone is
  working on it should hold
  -

   generally reduce noise in Jira
   -

   reduce overhead of committers to ask about status updates of
   contributions or bug reports
   -

  “Are you still working on this?”
  -

  “Are you still interested in this?”
  -

  “Does this still happen on Flink 1.x?”
  -

  “Are you still experiencing this issue?”
  -

  “What is the status of the implementation”?
  -

   while still encouraging users to add new tickets and to leave feedback
   about existing tickets


Kurt, Stephan, if you'd like to change the bot to "just close very old
tickets", I suggest you start a separate discussion and subsequent voting
thread.

Best,

Konstantin


On Wed, Jun 30, 2021 at 9:01 AM Kurt Young  wrote:

> +1 to Stephan's opinion, with just one minor difference. For my experience
> and a project
> as big as Flink, picking up an issue created 1-2 years ago seems normal to
> me. To be more
> specific, during the blink planner merge, I created lots of clean up &
> refactor issues, trying
> to make the code be more clean. I haven't had a chance to resolve all these
> but I think they are
> still good improvements. Thus I would propose we don't close any stall
> issues for at least 2 years.
>
> Best,
> Kurt
>
>
> On Wed, Jun 30, 2021 at 7:22 AM Stephan Ewen  wrote:
>
> > Being a bit late to the party, and don't want to ask to change
> everything,
> > just maybe some observation.
> >
> > My main observation and concern is still that this puts pressure and
> > confusion on contributors, which are mostly blocked on committers for
> > reviews, or are taking tickets as multi-week projects. I think it is not
> a
> > great experience for contributors, when they are already unsure why their
> > work isn't getting the attention from committers that they hoped for, to
> > then see issues unassigned or deprioritized automatically. I think we
> > really need to weigh this discouragement of contributors against the
> desire
> > for a tidy ticket system.
> > I also think by now this isn't just a matter of phrasing the bot's
> message
> > correctly. Auto unassignment and deprioritization sends a subtle message
> > that jira resolution is a more important goal than paying attention to
> > contributors (at least I think that is how it will be perceived by many).
> >
> > Back to the original motivation, to not have issues lying around forever,
> > ensuring there is closure eventually.
> > For that, even much longer intervals 

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-30 Thread Till Rohrmann
I think you are right Tison that docs are a special case and they only
require flink-docs to pass. What I am wondering is how much of a problem
this will be (assuming that we have a decent build stability). The more
exceptions we add, the harder it will be to properly follow the guidelines.
Maybe we can observe how many docs PRs get delayed/not merged because of
this and then revisit this discussion if needed.

Cheers,
Till

On Wed, Jun 30, 2021 at 8:30 AM tison  wrote:

> Hi,
>
> There are a number of PRs modifying docs only, but we still require all
> tests passed on that.
>
> It is a good proposal we avoid merge PR with "unrelated" failure, but can
> we improve the case where the contributor only works for docs?
>
> For example, base on the file change set, run doc tests only.
>
> Best,
> tison.
>
>
> godfrey he  于2021年6月30日周三 下午2:17写道:
>
> > +1 for the proposal. Thanks Xintong!
> >
> > Best,
> > Godfrey
> >
> >
> >
> > Rui Li  于2021年6月30日周三 上午11:36写道:
> >
> > > Thanks Xintong. +1 to the proposal.
> > >
> > > On Tue, Jun 29, 2021 at 11:05 AM 刘建刚 
> wrote:
> > >
> > > > +1 for the proposal. Since the test time is long and environment may
> > > vary,
> > > > unstable tests are really annoying for developers. The solution is
> > > welcome.
> > > >
> > > > Best
> > > > liujiangang
> > > >
> > > > Jingsong Li  于2021年6月29日周二 上午10:31写道:
> > > >
> > > > > +1 Thanks Xintong for the update!
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > > >
> > > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann <
> trohrm...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > +1, thanks for updating the guidelines Xintong!
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo 
> > > > wrote:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > Thanks Xintong for drafting this doc.
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG <
> beyond1...@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > Thanks Xintong for giving detailed documentation.
> > > > > > > >
> > > > > > > > The best practice for handling test failure is very detailed,
> > > it's
> > > > a
> > > > > > good
> > > > > > > > guidelines document with clear action steps.
> > > > > > > >
> > > > > > > > +1 to Xintong's proposal.
> > > > > > > >
> > > > > > > > Xintong Song  于2021年6月28日周一 下午4:07写道:
> > > > > > > >
> > > > > > > > > Thanks all for the discussion.
> > > > > > > > >
> > > > > > > > > Based on the opinions so far, I've drafted the new
> guidelines
> > > > [1],
> > > > > > as a
> > > > > > > > > potential replacement of the original wiki page [2].
> > > > > > > > >
> > > > > > > > > Hopefully this draft has covered the most opinions
> discussed
> > > and
> > > > > > > consensus
> > > > > > > > > made in this discussion thread.
> > > > > > > > >
> > > > > > > > > Looking forward to your feedback.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > > > > > > > >
> > > > > > > > > [2]
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> > > > > > pnowoj...@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for the clarification Till. +1 for what you have
> > > > written.
> > > > > > > > > >
> > > > > > > > > > Piotrek
> > > > > > > > > >
> > > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann <
> > trohrm...@apache.org
> > > >
> > > > > > > > > napisał(a):
> > > > > > > > > >
> > > > > > > > > > > One quick note for clarification. I don't have anything
> > > > against
> > > > > > > builds
> > > > > > > > > > > running on your personal Azure account and this is not
> > > what I
> > > > > > > > > understood
> > > > > > > > > > > under "local environment". For me "local environment"
> > means
> > > > > that
> > > > > > > > > someone
> > > > > > > > > > > runs the test locally on his machine and then says that
> > the
> > > > > > > > > > > tests have passed locally.
> > > > > > > > > > >
> > > > > > > > > > > I do agree that there might be a conflict of interests
> > if a
> > > > PR
> > > > > > > author
> > > > > > > > > > > disables tests. Here I would argue that we don't have
> > > > malignant
> > > > > > > > > > committers
> > > > > > > > > > > which means that every committer will probably first
> > check
> > > > the
> > > > > > > > > respective
> > > > > > > > > > > ticket for how often the test failed. Then I guess the
> > next
> > > > > step
> > > > > > > would
> > > > > > > > > be
> 

[jira] [Created] (FLINK-23191) Azure: Upload CI logs to S3 as well

2021-06-30 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-23191:
--

 Summary: Azure: Upload CI logs to S3 as well
 Key: FLINK-23191
 URL: https://issues.apache.org/jira/browse/FLINK-23191
 Project: Flink
  Issue Type: Improvement
  Components: Build System / Azure Pipelines
Reporter: Robert Metzger


We are currently uploading the CI logs to Azure as artifacts. The maximum 
retention (we've also configured) is 60 days. Afterwards, the logs are gone.

For rarely occurring test failures, the logs might be lost at the time we start 
looking into them.
Therefore, we should store the CI logs somewhere permanently, such as S3 
(similarly to how we stored them when we were using travis)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Feedback Collection Jira Bot

2021-06-30 Thread Kurt Young
+1 to Stephan's opinion, with just one minor difference. For my experience
and a project
as big as Flink, picking up an issue created 1-2 years ago seems normal to
me. To be more
specific, during the blink planner merge, I created lots of clean up &
refactor issues, trying
to make the code be more clean. I haven't had a chance to resolve all these
but I think they are
still good improvements. Thus I would propose we don't close any stall
issues for at least 2 years.

Best,
Kurt


On Wed, Jun 30, 2021 at 7:22 AM Stephan Ewen  wrote:

> Being a bit late to the party, and don't want to ask to change everything,
> just maybe some observation.
>
> My main observation and concern is still that this puts pressure and
> confusion on contributors, which are mostly blocked on committers for
> reviews, or are taking tickets as multi-week projects. I think it is not a
> great experience for contributors, when they are already unsure why their
> work isn't getting the attention from committers that they hoped for, to
> then see issues unassigned or deprioritized automatically. I think we
> really need to weigh this discouragement of contributors against the desire
> for a tidy ticket system.
> I also think by now this isn't just a matter of phrasing the bot's message
> correctly. Auto unassignment and deprioritization sends a subtle message
> that jira resolution is a more important goal than paying attention to
> contributors (at least I think that is how it will be perceived by many).
>
> Back to the original motivation, to not have issues lying around forever,
> ensuring there is closure eventually.
> For that, even much longer intervals would be fine. Like pinging every
> three months, closing after three pings - would resolve most tickets in a
> year, which is not too bad in the scope of a project like Flink. Many
> features/wishes easily move to two releases in the future, which is almost
> a year. We would get rid of long dead tickets and interfere little with
> current tickets. Contributors can probably understand ticket closing after
> a year of inactivity.
>
> I am curious if a very simple bot that really just looks at stale issues
> (no comment/change in three months), pings the
> issue/reporter/assignee/watchers and closes it after three pings would do
> the job.
> We would get out of the un-assigning business (which can send very tricky
> signals) and would rely on reporters/assignees/watchers to unassign if they
> see that the contributor abandoned the issue. With a cadence of three
> months for pinging, this isn't much work for the ones that get pinged.
>
> Issues where we rely on faster handling are probably the ones where
> committers have a stake in getting those into an upcoming release, so these
> tend to be watched anyways.
>
> On Wed, Jun 23, 2021 at 2:39 PM JING ZHANG  wrote:
>
> > Hi Konstantin, Chesnay,
> >
> > > I would like it to not unassign people if a PR is open. These are
> > > usually blocked by the reviewer, not the assignee, and having the
> > > assignees now additionally having to update JIRA periodically is a bit
> > > like rubbing salt into the wound.
> >
> > I agree with Chesnay about not un-assign an issue if a PR is open.
> > Besides, Could assignees remove the "stale-assigned" tag  by themself? It
> > seems assignees have no permission to delete the tag if the issue is not
> > created by themselves.
> >
> > Best regards,
> > JING ZHANG
> >
> > Konstantin Knauf  于2021年6月23日周三 下午4:17写道:
> >
> > > > I agree there are such tickets, but I don't see how this is
> addressing
> > my
> > > concerns. There are also tickets that just shouldn't be closed as I
> > > described above. Why do you think that duplicating tickets and losing
> > > discussions/knowledge is a good solution?
> > >
> > > I don't understand why we are necessarily losing discussion/knowledge.
> > The
> > > tickets are still there, just in "Closed" state, which are included in
> > > default Jira search. We could of course just add a label, but closing
> > seems
> > > clearer to me given that likely this ticket will not get comitter
> > attention
> > > in the foreseeable future.
> > >
> > > > I would like to avoid having to constantly fight against the bot.
> It's
> > > already responsible for the majority of my daily emails, with quite
> > little
> > > benefit for me personally. I initially thought that after some period
> of
> > > time it will settle down, but now I'm afraid it won't happen.
> > >
> > > Can you elaborate which rules you are running into mostly? I'd rather
> > like
> > > to understand how we work right now and where this conflicts with the
> > Jira
> > > bot vs slowly disabling the jira bot via labels.
> > >
> > > On Wed, Jun 23, 2021 at 10:00 AM Piotr Nowojski 
> > > wrote:
> > >
> > > > Hi Konstantin,
> > > >
> > > > > In my opinion it is important that we close tickets eventually.
> There
> > > are
> > > > a
> > > > > lot of tickets (bugs, improvements, tech debt) that over time
> became
> > > > > 

Re: [DISCUSS] FLIP-171: Async Sink

2021-06-30 Thread Piotr Nowojski
Thanks for addressing this issue :)

Best, Piotrek

wt., 29 cze 2021 o 17:58 Hausmann, Steffen  napisał(a):

> Hey Poitr,
>
> I've just adapted the FLIP and changed the signature for the
> `submitRequestEntries` method:
>
> protected abstract void submitRequestEntries(List
> requestEntries, ResultFuture requestResult);
>
> In addition, we are likely to use an AtomicLong to track the number of
> outstanding requests, as you have proposed in 2b). I've already indicated
> this in the FLIP, but it's not fully fleshed out. But as you have said,
> that seems to be an implementation detail and the important part is the
> change of the `submitRequestEntries` signature.
>
> Thanks for your feedback!
>
> Cheers, Steffen
>
>
> On 25.06.21, 17:05, "Hausmann, Steffen" 
> wrote:
>
> CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender and
> know the content is safe.
>
>
>
> Hi Piotr,
>
> I’m happy to take your guidance on this. I need to think through your
> proposals and I’ll follow-up on Monday with some more context so that we
> can close the discussion on these details. But for now, I’ll close the vote.
>
> Thanks, Steffen
>
> From: Piotr Nowojski 
> Date: Friday, 25. June 2021 at 14:48
> To: Till Rohrmann 
> Cc: Steffen Hausmann , "dev@flink.apache.org" <
> dev@flink.apache.org>, Arvid Heise 
> Subject: RE: [EXTERNAL] [DISCUSS] FLIP-171: Async Sink
>
>
> CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender and
> know the content is safe.
>
>
> Hey,
>
> I've just synced with Arvid about a couple of more remarks from my
> side and he shared mine concerns.
>
> 1. I would very strongly recommend ditching `CompletableFuture `
> from the  `protected abstract CompletableFuture
> submitRequestEntries(List requestEntries);`  in favor of
> something like
> `org.apache.flink.streaming.api.functions.async.ResultFuture` interface.
> `CompletableFuture` would partially make the threading model of the
> `AsyncSincWriter` part of the public API and it would tie our hands.
> Regardless how `CompletableFuture` is used, it imposes performance
> overhead because it's synchronisation/volatile inside of it. On the other
> hand something like:
>
> protected abstract void submitRequestEntries(List
> requestEntries, ResultFuture requestResult);
>
> Would allow us to implement the threading model as we wish.
> `ResultFuture` could be backed via `CompletableFuture` underneath, but
> it could also be something more efficient.  I will explain what I have in
> mind in a second.
>
> 2. It looks to me that proposed `AsyncSinkWriter` Internals are not
> very efficient and maybe the threading model hasn't been thought through?
> Especially private fields:
>
> private final BlockingDeque bufferedRequestEntries;
> private BlockingDeque> inFlightRequests;
>
> are a bit strange to me. Why do we need two separate thread safe
> collections? Why do we need a `BlockingDeque` of `CompletableFuture`s?
> If we are already using a fully synchronised collection, there should be no
> need for another layer of thread safe `CompletableFuture`.
>
> As I understand, the threading model of the `AsyncSinkWriter` is very
> similar to that of the `AsyncWaitOperator`, with very similar requirements
> for inducing backpressure. How I would see it implemented is for example:
>
> a) Having a single lock, that would encompass the whole
> `AsyncSinkWriter#flush()` method. `flush()` would be called from the task
> thread (mailbox). To induce backpressure, `#flush()` would just call
> `lock.wait()`. `ResultFuture#complete(...)` called from an async thread,
> would also synchronize on the same lock, and mark some of the inflight
> requests as completed and call `lock.notify()`.
>
> b) More efficient solution. On the hot path we would have for example
> only `AtomicLong numberOfInFlightRequests`. Task thread would be bumping
> it, `ResultFuture#complete()` would be decreasing it. If the task thread
> when bumping `numberOfInFlightRequests` exceeds a threshold, he goes to
> sleep/wait on a lock or some `CompletableFuture`. If
> `ResultFuture#complete()` when decreasing the count goes below the
> threshold, it would wake up the task thread.  Compared to the option a),
> on the hot path, option b) would have only AtomicLong.increment overhead
>
> c) We could use mailbox, the same way as AsyncWaitOperator is doing.
> In this case `ResultFuture#complete()` would be enquing mailbox action,
> which is thread safe on it's own.
>
> Either of those options would be more efficient and simpler (from the
> threading model perspective) than having two `BlockingQueues` and
> `CompletableFuture`. Also as you can see, neither of those solutions
> require the overhead of ` CompletableFuture
> submitRequestEntries(List 

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-30 Thread tison
Hi,

There are a number of PRs modifying docs only, but we still require all
tests passed on that.

It is a good proposal we avoid merge PR with "unrelated" failure, but can
we improve the case where the contributor only works for docs?

For example, base on the file change set, run doc tests only.

Best,
tison.


godfrey he  于2021年6月30日周三 下午2:17写道:

> +1 for the proposal. Thanks Xintong!
>
> Best,
> Godfrey
>
>
>
> Rui Li  于2021年6月30日周三 上午11:36写道:
>
> > Thanks Xintong. +1 to the proposal.
> >
> > On Tue, Jun 29, 2021 at 11:05 AM 刘建刚  wrote:
> >
> > > +1 for the proposal. Since the test time is long and environment may
> > vary,
> > > unstable tests are really annoying for developers. The solution is
> > welcome.
> > >
> > > Best
> > > liujiangang
> > >
> > > Jingsong Li  于2021年6月29日周二 上午10:31写道:
> > >
> > > > +1 Thanks Xintong for the update!
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann 
> > > > wrote:
> > > >
> > > > > +1, thanks for updating the guidelines Xintong!
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo 
> > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > Thanks Xintong for drafting this doc.
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG  >
> > > > wrote:
> > > > > > >
> > > > > > > Thanks Xintong for giving detailed documentation.
> > > > > > >
> > > > > > > The best practice for handling test failure is very detailed,
> > it's
> > > a
> > > > > good
> > > > > > > guidelines document with clear action steps.
> > > > > > >
> > > > > > > +1 to Xintong's proposal.
> > > > > > >
> > > > > > > Xintong Song  于2021年6月28日周一 下午4:07写道:
> > > > > > >
> > > > > > > > Thanks all for the discussion.
> > > > > > > >
> > > > > > > > Based on the opinions so far, I've drafted the new guidelines
> > > [1],
> > > > > as a
> > > > > > > > potential replacement of the original wiki page [2].
> > > > > > > >
> > > > > > > > Hopefully this draft has covered the most opinions discussed
> > and
> > > > > > consensus
> > > > > > > > made in this discussion thread.
> > > > > > > >
> > > > > > > > Looking forward to your feedback.
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > > > > > > >
> > > > > > > > [2]
> > > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> > > > > pnowoj...@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for the clarification Till. +1 for what you have
> > > written.
> > > > > > > > >
> > > > > > > > > Piotrek
> > > > > > > > >
> > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann <
> trohrm...@apache.org
> > >
> > > > > > > > napisał(a):
> > > > > > > > >
> > > > > > > > > > One quick note for clarification. I don't have anything
> > > against
> > > > > > builds
> > > > > > > > > > running on your personal Azure account and this is not
> > what I
> > > > > > > > understood
> > > > > > > > > > under "local environment". For me "local environment"
> means
> > > > that
> > > > > > > > someone
> > > > > > > > > > runs the test locally on his machine and then says that
> the
> > > > > > > > > > tests have passed locally.
> > > > > > > > > >
> > > > > > > > > > I do agree that there might be a conflict of interests
> if a
> > > PR
> > > > > > author
> > > > > > > > > > disables tests. Here I would argue that we don't have
> > > malignant
> > > > > > > > > committers
> > > > > > > > > > which means that every committer will probably first
> check
> > > the
> > > > > > > > respective
> > > > > > > > > > ticket for how often the test failed. Then I guess the
> next
> > > > step
> > > > > > would
> > > > > > > > be
> > > > > > > > > > to discuss on the ticket whether to disable it or not.
> And
> > > > > finally,
> > > > > > > > after
> > > > > > > > > > reaching a consensus, it will be disabled. If we see
> > someone
> > > > > > abusing
> > > > > > > > this
> > > > > > > > > > policy, then we can still think about how to guard
> against
> > > it.
> > > > > But,
> > > > > > > > > > honestly, I have very rarely seen such a case. I am also
> ok
> > > to
> > > > > > pull in
> > > > > > > > > the
> > > > > > > > > > release manager to make the final call if this resolves
> > > > concerns.
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Till
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
> > > > > > pnowoj...@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > +1 

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-30 Thread godfrey he
+1 for the proposal. Thanks Xintong!

Best,
Godfrey



Rui Li  于2021年6月30日周三 上午11:36写道:

> Thanks Xintong. +1 to the proposal.
>
> On Tue, Jun 29, 2021 at 11:05 AM 刘建刚  wrote:
>
> > +1 for the proposal. Since the test time is long and environment may
> vary,
> > unstable tests are really annoying for developers. The solution is
> welcome.
> >
> > Best
> > liujiangang
> >
> > Jingsong Li  于2021年6月29日周二 上午10:31写道:
> >
> > > +1 Thanks Xintong for the update!
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann 
> > > wrote:
> > >
> > > > +1, thanks for updating the guidelines Xintong!
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo 
> > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > Thanks Xintong for drafting this doc.
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG 
> > > wrote:
> > > > > >
> > > > > > Thanks Xintong for giving detailed documentation.
> > > > > >
> > > > > > The best practice for handling test failure is very detailed,
> it's
> > a
> > > > good
> > > > > > guidelines document with clear action steps.
> > > > > >
> > > > > > +1 to Xintong's proposal.
> > > > > >
> > > > > > Xintong Song  于2021年6月28日周一 下午4:07写道:
> > > > > >
> > > > > > > Thanks all for the discussion.
> > > > > > >
> > > > > > > Based on the opinions so far, I've drafted the new guidelines
> > [1],
> > > > as a
> > > > > > > potential replacement of the original wiki page [2].
> > > > > > >
> > > > > > > Hopefully this draft has covered the most opinions discussed
> and
> > > > > consensus
> > > > > > > made in this discussion thread.
> > > > > > >
> > > > > > > Looking forward to your feedback.
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > > > > > >
> > > > > > > [2]
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> > > > pnowoj...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for the clarification Till. +1 for what you have
> > written.
> > > > > > > >
> > > > > > > > Piotrek
> > > > > > > >
> > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann  >
> > > > > > > napisał(a):
> > > > > > > >
> > > > > > > > > One quick note for clarification. I don't have anything
> > against
> > > > > builds
> > > > > > > > > running on your personal Azure account and this is not
> what I
> > > > > > > understood
> > > > > > > > > under "local environment". For me "local environment" means
> > > that
> > > > > > > someone
> > > > > > > > > runs the test locally on his machine and then says that the
> > > > > > > > > tests have passed locally.
> > > > > > > > >
> > > > > > > > > I do agree that there might be a conflict of interests if a
> > PR
> > > > > author
> > > > > > > > > disables tests. Here I would argue that we don't have
> > malignant
> > > > > > > > committers
> > > > > > > > > which means that every committer will probably first check
> > the
> > > > > > > respective
> > > > > > > > > ticket for how often the test failed. Then I guess the next
> > > step
> > > > > would
> > > > > > > be
> > > > > > > > > to discuss on the ticket whether to disable it or not. And
> > > > finally,
> > > > > > > after
> > > > > > > > > reaching a consensus, it will be disabled. If we see
> someone
> > > > > abusing
> > > > > > > this
> > > > > > > > > policy, then we can still think about how to guard against
> > it.
> > > > But,
> > > > > > > > > honestly, I have very rarely seen such a case. I am also ok
> > to
> > > > > pull in
> > > > > > > > the
> > > > > > > > > release manager to make the final call if this resolves
> > > concerns.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Till
> > > > > > > > >
> > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
> > > > > pnowoj...@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1 for the general idea, however I have concerns about a
> > > couple
> > > > > of
> > > > > > > > > details.
> > > > > > > > > >
> > > > > > > > > > > I would first try to not introduce the exception for
> > local
> > > > > builds.
> > > > > > > > > > > It makes it quite hard for others to verify the build
> and
> > > to
> > > > > make
> > > > > > > > sure
> > > > > > > > > > that the right things were executed.
> > > > > > > > > >
> > > > > > > > > > I would counter Till's proposal to ignore local green
> > builds.
> > > > If
> > > > > > > > > committer
> > > > > > > > > > is merging and closing a PR, with official azure failure,
> > but
> > > > > there
> > > > > > > > was a
> > > > > > > > > > green build