[jira] [Commented] (FLINK-19469) HBase connector 2.2 failed to download dependencies "org.glassfish:javax.el:jar:3.0.1-b06-SNAPSHOT"

2020-10-05 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17207862#comment-17207862
 ] 

Robert Metzger commented on FLINK-19469:


It looks a lot like it. Maybe we should consider banning SNAPSHOT dependencies?

> HBase connector 2.2 failed to download dependencies 
> "org.glassfish:javax.el:jar:3.0.1-b06-SNAPSHOT" 
> 
>
> Key: FLINK-19469
> URL: https://issues.apache.org/jira/browse/FLINK-19469
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / HBase
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Priority: Major
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=7093&view=logs&j=d44f43ce-542c-597d-bf94-b0718c71e5e8&t=03dca39c-73e8-5aaf-601d-328ae5c35f20
> {code}
> 2020-09-29T20:59:24.8085970Z [ERROR] Failed to execute goal on project 
> flink-connector-hbase-2.2_2.11: Could not resolve dependencies for project 
> org.apache.flink:flink-connector-hbase-2.2_2.11:jar:1.12-SNAPSHOT: Failed to 
> collect dependencies at org.apache.hbase:hbase-server:jar:tests:2.2.3 -> 
> org.glassfish.web:javax.servlet.jsp:jar:2.3.2 -> 
> org.glassfish:javax.el:jar:3.0.1-b06-SNAPSHOT: Failed to read artifact 
> descriptor for org.glassfish:javax.el:jar:3.0.1-b06-SNAPSHOT: Could not 
> transfer artifact org.glassfish:javax.el:pom:3.0.1-b06-SNAPSHOT from/to 
> jvnet-nexus-snapshots 
> (https://maven.java.net/content/repositories/snapshots): Failed to transfer 
> file: 
> https://maven.java.net/content/repositories/snapshots/org/glassfish/javax.el/3.0.1-b06-SNAPSHOT/javax.el-3.0.1-b06-SNAPSHOT.pom.
>  Return code is: 503 , ReasonPhrase:Service Unavailable: Back-end server is 
> at capacity. -> [Help 1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] pnowojski commented on a change in pull request #13539: [FLINK-19027][network] Assign exclusive buffers to LocalRecoveredInputChannel.

2020-10-05 Thread GitBox


pnowojski commented on a change in pull request #13539:
URL: https://github.com/apache/flink/pull/13539#discussion_r499379001



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/SingleInputGate.java
##
@@ -247,7 +248,18 @@ public void setup() throws IOException {
}
 
@Override
-   public CompletableFuture readRecoveredState(ExecutorService 
executor, ChannelStateReader reader) {
+   public CompletableFuture readRecoveredState(ExecutorService 
executor, ChannelStateReader reader) throws IOException {
+   synchronized (requestLock) {
+   if (closeFuture.isDone()) {
+   return FutureUtils.completedVoidFuture();
+   }
+   for (InputChannel inputChannel : 
inputChannels.values()) {
+   if (inputChannel instanceof 
RemoteRecoveredInputChannel) {
+   ((RemoteRecoveredInputChannel) 
inputChannel).assignExclusiveSegments();
+   }
+   }
+   }
+

Review comment:
   Why is this change relevant to the fix? Could you add some explanation 
to the commit message?

##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/io/network/partition/consumer/SingleInputGateTest.java
##
@@ -217,15 +221,11 @@ public void testConcurrentReadStateAndProcessAndClose() 
throws Exception {
}
};
 
-   submitTasksAndWaitForResults(executor, new Callable[] 
{closeTask, readRecoveredStateTask, processStateTask});
-   } finally {
-   executor.shutdown();
+   executor.invokeAll(Arrays.asList(closeTask, 
readRecoveredStateTask, processStateTask));
+
// wait until the internal channel state recover task 
finishes
-   executor.awaitTermination(60, TimeUnit.SECONDS);
assertEquals(totalBuffers, 
environment.getNetworkBufferPool().getNumberOfAvailableMemorySegments());
assertTrue(inputGate.getCloseFuture().isDone());
-
-   environment.close();

Review comment:
   Did you remove `awaitTermination` and `close` calls?

##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/io/network/partition/consumer/SingleInputGateTest.java
##
@@ -59,26 +59,27 @@
 import org.apache.flink.runtime.shuffle.ShuffleDescriptor;

Review comment:
   Change first commit message to:
   > [FLINK-19027][test][network] Ensure SingleInputGateTest does not swallow 
exceptions during cleanup.
   
   ?
   

##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/LocalRecoveredInputChannel.java
##
@@ -42,8 +42,17 @@
TaskEventPublisher taskEventPublisher,
int initialBackOff,
int maxBackoff,
+   int networkBuffersPerChannel,
InputChannelMetrics metrics) {
-   super(inputGate, channelIndex, partitionId, initialBackOff, 
maxBackoff, metrics.getNumBytesInLocalCounter(), 
metrics.getNumBuffersInLocalCounter());
+   super(

Review comment:
   I'm not sure if I understand this bug and the fix. Why is allocating 
exclusive buffers for `LocalRecoveredInputChannel` fixing the problem? Isn't it 
just reducing the window for the live lock to happen? What if downstream tasks 
are scheduled with a significant delay (exclusive buffers assignment happens 
after upstream tasks already acquired lot's of buffers).
   
   In other words, Isn't this a semi fix for this bug 
https://issues.apache.org/jira/browse/FLINK-13203





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-19489) SplitFetcherTest.testNotifiesWhenGoingIdleConcurrent gets stuck

2020-10-05 Thread Kezhu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17207865#comment-17207865
 ] 

Kezhu Wang commented on FLINK-19489:


After dived in, I think this could be a bug of {{CompletableFuture.get}} in 
openjdk 9 and all above. I suspect that {{CompletableFuture.get}} could swallow 
{{InterruptedException}} if the waiting future completes immediately after 
{{Thread.interrupt}} of parking thread.

[{{CompletableFuture.get}}|https://hg.openjdk.java.net/jdk/jdk11/file/1ddf9a99e4ad/src/java.base/share/classes/java/util/concurrent/CompletableFuture.java#l1984]
 and its complementary method 
[{{CompletableFuture.waitingGet}}|https://hg.openjdk.java.net/jdk/jdk11/file/1ddf9a99e4ad/src/java.base/share/classes/java/util/concurrent/CompletableFuture.java#l1805]
 have less than 40 lines of code, I copy them here for analysis.
{code:java}
volatile Object result;   // Either the result or boxed AltResult

public T get() throws InterruptedException, ExecutionException {
Object r;
if ((r = result) == null)
r = waitingGet(true);
return (T) reportGet(r);
}

/**
 * Returns raw result after waiting, or null if interruptible and
 * interrupted.
 */
private Object waitingGet(boolean interruptible) {
Signaller q = null;
boolean queued = false;
Object r;
while ((r = result) == null) {
if (q == null) {
q = new Signaller(interruptible, 0L, 0L);
if (Thread.currentThread() instanceof ForkJoinWorkerThread)
ForkJoinPool.helpAsyncBlocker(defaultExecutor(), q);
}
else if (!queued)
queued = tryPushStack(q);
else {
try {
ForkJoinPool.managedBlock(q);
} catch (InterruptedException ie) { // currently cannot happen
q.interrupted = true;
}
if (q.interrupted && interruptible)  // tag(INTERRUPTED): We 
are interrupted and about to break loop.
break;
}
}
if (q != null && queued) {
q.thread = null;
if (!interruptible && q.interrupted)
Thread.currentThread().interrupt();
if (r == null)
cleanStack();
}
if (r != null || (r = result) != null) // tag(ASSIGNMENT): We could 
reach here because of interruption.
postComplete();
return r;
}
{code}
I annotates {{CompletableFuture.waitingGet}} with tags {{INTERRUPTED}} and 
{{ASSIGNMENT}}. If {{CompletableFuture.complete}} occurs between these two 
tags, {{CompletableFuture.waitingGet}} will return {{result}} due to 
{{ASSIGNMENT}} statement, thus the interruption caught in {{INTERRUPTED}} lost.

{{CompletableFuture.complete}} is rather simple, it is merely compare-and-swap 
with completion posting.
{code:java}
public boolean complete(T value) {
boolean triggered = completeValue(value);
postComplete();
return triggered;
}

final boolean completeValue(T t) {
return RESULT.compareAndSet(this, null, (t == null) ? NIL : t);
}

private static final VarHandle RESULT;

static {
RESULT = l.findVarHandle(CompletableFuture.class, "result", 
Object.class);
}
{code}
In contrast to {{CompletableFuture.get}}, 
[{{ReentrantLock.lock}}|https://hg.openjdk.java.net/jdk/jdk11/file/1ddf9a99e4ad/src/java.base/share/classes/java/util/concurrent/locks/ReentrantLock.java#l252]
 and its complementary method 
[{{AbstractQueuedSynchronizer.acquire}}|https://hg.openjdk.java.net/jdk/jdk11/file/1ddf9a99e4ad/src/java.base/share/classes/java/util/concurrent/locks/AbstractQueuedSynchronizer.java#l1226]
 self-interrupt after blocking wait.

[~dian.fu] [~aljoscha] [~sewen] [~becket_qin] Does above sound sensible ? Glad 
to hear feedbacks.


I have also created a [github 
repository|https://github.com/kezhuw/openjdk-completablefuture-interruptedexception]
 to demonstrate this problem with ready-to-run maven project. I manually tested 
it with adoptopenjdk 8/9/10/11/12/13/14/15, seems that this problem exists in 
openjdk 9 and above, but not openjdk8. 

> SplitFetcherTest.testNotifiesWhenGoingIdleConcurrent gets stuck
> ---
>
> Key: FLINK-19489
> URL: https://issues.apache.org/jira/browse/FLINK-19489
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Priority: Major
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=7158&view=logs&j=298e20ef-7951-5965-0e79-ea664ddc435e&t=b4cd3436-dbe8-556d-3bca-42f92c3cbf2f

[jira] [Updated] (FLINK-13203) [proper fix] Deadlock occurs when requiring exclusive buffer for RemoteInputChannel

2020-10-05 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski updated FLINK-13203:
---
Description: 
The issue is during requesting exclusive buffers with a timeout. Since 
currently the number of maximum buffers and the number of required buffers are 
not the same for local buffer pools, there may be cases that the local buffer 
pools of the upstream tasks occupy all the buffers while the downstream tasks 
fail to acquire exclusive buffers to make progress. As for 1.9 in 
https://issues.apache.org/jira/browse/FLINK-12852 deadlock was avoided by 
adding a timeout to try to failover the current execution when the timeout 
occurs and tips users to increase the number of buffers in the exception 
message.

In the discussion under the https://issues.apache.org/jira/browse/FLINK-12852 
there were numerous proper solutions discussed and as for now there is no 
consensus how to fix it:

1. Only allocate the minimum per producer, which is one buffer per channel. 
This would be needed to keep the requirement similar to what we have at the 
moment, but it is much less than we recommend for the credit-based network data 
exchange (2* channels + floating)

2a. Coordinate the deployment sink-to-source such that receivers always have 
their buffers first. This will be complex to implement and coordinate and break 
with many assumptions about tasks being independent (coordination wise) on the 
TaskManagers. Giving that assumption up will be a pretty big step and cause 
lot's of complexity in the future.
{quote}
It will also increase deployment delays. Low deployment delays should be a 
design goal in my opinion, as it will enable other features more easily, like 
low-disruption upgrades, etc.
{quote}

2b. Assign extra buffers only once all of the tasks are RUNNING. This is a 
simplified version of 2a, without tracking the tasks sink-to-source.

3. Make buffers always revokable, by spilling.
This is tricky to implement very efficiently, especially because there is the 
logic that slices buffers for early sends for the low-latency streaming stuff
the spilling request will come from an asynchronous call. That will probably 
stay like that even with the mailbox, because the main thread will be 
frequently blocked on buffer allocation when this request comes.

4. We allocate the recommended number for good throughput (2*numChannels + 
floating) per consumer and per producer.
No dynamic rebalancing any more. This would increase the number of required 
network buffers in certain high-parallelism scenarios quite a bit with the 
default config. Users can down-configure this by setting the per-channel 
buffers lower. But it would break user setups and require them to adjust the 
config when upgrading.

5. We make the network resource per slot and ask the scheduler to attach 
information about how many producers and how many consumers will be in the 
slot, worst case. We use that to pre-compute how many excess buffers the 
producers may take.
This will also break with some assumptions and lead us to the point that we 
have to pre-compute network buffers in the same way as managed memory. Seeing 
how much pain it is with the managed memory, this seems not so great.

  was:
The issue is during requesting exclusive buffers with a timeout. Since 
currently the number of maximum buffers and the number of required buffers are 
not the same for local buffer pools, there may be cases that the local buffer 
pools of the upstream tasks occupy all the buffers while the downstream tasks 
fail to acquire exclusive buffers to make progress. As for 1.9 in 
https://issues.apache.org/jira/browse/FLINK-12852 deadlock was avoided by 
adding a timeout to try to failover the current execution when the timeout 
occurs and tips users to increase the number of buffers in the exception 
message.

In the discussion under the https://issues.apache.org/jira/browse/FLINK-12852 
there were numerous proper solutions discussed and as for now there is no 
consensus how to fix it:

1. Only allocate the minimum per producer, which is one buffer per channel. 
This would be needed to keep the requirement similar to what we have at the 
moment, but it is much less than we recommend for the credit-based network data 
exchange (2* channels + floating)

2a. Coordinate the deployment sink-to-source such that receivers always have 
their buffers first. This will be complex to implement and coordinate and break 
with many assumptions about tasks being independent (coordination wise) on the 
TaskManagers. Giving that assumption up will be a pretty big step and cause 
lot's of complexity in the future.
It will also increase deployment delays. Low deployment delays should be a 
design goal in my opinion, as it will enable other features more easily, like 
low-disruption upgrades, etc.

2b. Assign extra buffers only once all of the tasks are RUNNING. This is a 

[jira] [Created] (FLINK-19501) Missing state in enum type

2020-10-05 Thread goutham (Jira)
goutham created FLINK-19501:
---

 Summary: Missing state in enum type
 Key: FLINK-19501
 URL: https://issues.apache.org/jira/browse/FLINK-19501
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.11.2
Reporter: goutham
 Fix For: 1.12.0


INITIALIZING state is missing for one of enums api snapshot document 

 

"enum" : [ "INITIALIZING", "CREATED", "RUNNING", "FAILING", "FAILED", 
"CANCELLING", "CANCELED", "FINISHED", "RESTARTING", "SUSPENDED", "RECONCILING" ]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] aljoscha commented on pull request #13531: [FLINK-19318] Deprecate timeWindow() operations in DataStream API

2020-10-05 Thread GitBox


aljoscha commented on pull request #13531:
URL: https://github.com/apache/flink/pull/13531#issuecomment-703447514


   @sjwiesman Should we even roll our own class in the future or use one of the 
"new" builtin classes like `Duration`?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-19501) Missing state in enum type

2020-10-05 Thread goutham (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17207866#comment-17207866
 ] 

goutham commented on FLINK-19501:
-

created this issue to fix this. can someone assign this to me?

> Missing state in enum type
> --
>
> Key: FLINK-19501
> URL: https://issues.apache.org/jira/browse/FLINK-19501
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.11.2
>Reporter: goutham
>Priority: Minor
> Fix For: 1.12.0
>
>
> INITIALIZING state is missing for one of enums api snapshot document 
>  
> "enum" : [ "INITIALIZING", "CREATED", "RUNNING", "FAILING", "FAILED", 
> "CANCELLING", "CANCELED", "FINISHED", "RESTARTING", "SUSPENDED", 
> "RECONCILING" ]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-19501) Missing state in enum type

2020-10-05 Thread goutham (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17207866#comment-17207866
 ] 

goutham edited comment on FLINK-19501 at 10/5/20, 7:15 AM:
---

created this issue to fix api missing enum state. can someone assign this to me?


was (Author: gkrish24):
created this issue to fix this. can someone assign this to me?

> Missing state in enum type
> --
>
> Key: FLINK-19501
> URL: https://issues.apache.org/jira/browse/FLINK-19501
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.11.2
>Reporter: goutham
>Priority: Minor
> Fix For: 1.12.0
>
>
> INITIALIZING state is missing for one of enums api snapshot document 
>  
> "enum" : [ "INITIALIZING", "CREATED", "RUNNING", "FAILING", "FAILED", 
> "CANCELLING", "CANCELED", "FINISHED", "RESTARTING", "SUSPENDED", 
> "RECONCILING" ]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] zentol commented on pull request #13464: [FLINK-19307][coordination] Add ResourceTracker

2020-10-05 Thread GitBox


zentol commented on pull request #13464:
URL: https://github.com/apache/flink/pull/13464#issuecomment-703453298


   @tillrohrmann I've addressed your comments.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rmetzger commented on pull request #13439: [FLINK-19295][yarn][tests] Exclude more meaningless akka shutdown errors

2020-10-05 Thread GitBox


rmetzger commented on pull request #13439:
URL: https://github.com/apache/flink/pull/13439#issuecomment-703454812


   Thanks a lot for your review. Merging.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rmetzger merged pull request #13439: [FLINK-19295][yarn][tests] Exclude more meaningless akka shutdown errors

2020-10-05 Thread GitBox


rmetzger merged pull request #13439:
URL: https://github.com/apache/flink/pull/13439


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-19295) YARNSessionFIFOITCase.checkForProhibitedLogContents found a log with prohibited string

2020-10-05 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17207869#comment-17207869
 ] 

Robert Metzger commented on FLINK-19295:


Resolved on master in 
https://github.com/apache/flink/commit/6c7b195d57c3bad5bc1f2251de75ac744dbbe4a7
Preparing backport to 1.11 ...

> YARNSessionFIFOITCase.checkForProhibitedLogContents found a log with 
> prohibited string
> --
>
> Key: FLINK-19295
> URL: https://issues.apache.org/jira/browse/FLINK-19295
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, Runtime / Coordination, Tests
>Affects Versions: 1.12.0, 1.11.3
>Reporter: Dian Fu
>Assignee: Robert Metzger
>Priority: Major
>  Labels: pull-request-available, test-stability
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=6661&view=logs&j=245e1f2e-ba5b-5570-d689-25ae21e5302f&t=e7f339b2-a7c3-57d9-00af-3712d4b15354]
> {code}
> 2020-09-19T22:08:13.5364974Z [ERROR]   
> YARNSessionFIFOITCase.checkForProhibitedLogContents:83->YarnTestBase.ensureNoProhibitedStringInLogFiles:476
>  Found a file 
> /__w/2/s/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-1_0/application_1600553154281_0001/container_1600553154281_0001_01_02/taskmanager.log
>  with a prohibited string (one of [Exception, Started 
> SelectChannelConnector@0.0.0.0:8081]). Excerpts:
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19237) LeaderChangeClusterComponentsTest.testReelectionOfJobMaster failed with "NoResourceAvailableException: Could not allocate the required slot within slot request timeout

2020-10-05 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17207870#comment-17207870
 ] 

Robert Metzger commented on FLINK-19237:


It seems that this test instability is reproducible on CI 
https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8423&view=logs&j=6e58d712-c5cc-52fb-0895-6ff7bd56c46b&t=f30a8e80-b2cf-535c-9952-7f521a4ae374
 (for me it is quite difficult reproducing it locally. I needed 27238 (~50 
minutes) locally).

> LeaderChangeClusterComponentsTest.testReelectionOfJobMaster failed with 
> "NoResourceAvailableException: Could not allocate the required slot within 
> slot request timeout"
> 
>
> Key: FLINK-19237
> URL: https://issues.apache.org/jira/browse/FLINK-19237
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Assignee: Matthias
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=6499&view=logs&j=6bfdaf55-0c08-5e3f-a2d2-2a0285fd41cf&t=fd9796c3-9ce8-5619-781c-42f873e126a6]
> {code}
> 2020-09-14T21:11:02.8200203Z [ERROR] 
> testReelectionOfJobMaster(org.apache.flink.runtime.leaderelection.LeaderChangeClusterComponentsTest)
>   Time elapsed: 300.14 s  <<< FAILURE!
> 2020-09-14T21:11:02.8201761Z java.lang.AssertionError: Job failed.
> 2020-09-14T21:11:02.8202749Z  at 
> org.apache.flink.runtime.jobmaster.utils.JobResultUtils.throwAssertionErrorOnFailedResult(JobResultUtils.java:54)
> 2020-09-14T21:11:02.8203794Z  at 
> org.apache.flink.runtime.jobmaster.utils.JobResultUtils.assertSuccess(JobResultUtils.java:30)
> 2020-09-14T21:11:02.8205177Z  at 
> org.apache.flink.runtime.leaderelection.LeaderChangeClusterComponentsTest.testReelectionOfJobMaster(LeaderChangeClusterComponentsTest.java:152)
> 2020-09-14T21:11:02.8206191Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-09-14T21:11:02.8206985Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-09-14T21:11:02.8207930Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-09-14T21:11:02.8208927Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-09-14T21:11:02.8209753Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-09-14T21:11:02.8210710Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-09-14T21:11:02.8211608Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-09-14T21:11:02.8214473Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-09-14T21:11:02.8215398Z  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 2020-09-14T21:11:02.8216199Z  at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> 2020-09-14T21:11:02.8216947Z  at 
> org.junit.rules.RunRules.evaluate(RunRules.java:20)
> 2020-09-14T21:11:02.8217695Z  at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> 2020-09-14T21:11:02.8218635Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> 2020-09-14T21:11:02.8219499Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> 2020-09-14T21:11:02.8220313Z  at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 2020-09-14T21:11:02.8221060Z  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 2020-09-14T21:11:02.8222171Z  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 2020-09-14T21:11:02.8222937Z  at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 2020-09-14T21:11:02.8223688Z  at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 2020-09-14T21:11:02.8225191Z  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 2020-09-14T21:11:02.8226086Z  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 2020-09-14T21:11:02.8226761Z  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 2020-09-14T21:11:02.8227453Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> 2020-09-14T21:11:02.8228392Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> 2020-09-14T21:11:02.8229256Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> 2020-09-14T21

[GitHub] [flink] aljoscha commented on pull request #13530: [FLINK-19123] Use shared MiniCluster for executeAsync() in TestStreamEnvironment

2020-10-05 Thread GitBox


aljoscha commented on pull request #13530:
URL: https://github.com/apache/flink/pull/13530#issuecomment-703460832


   Thanks for the review! I changed the boolean to an enum but I'm not sure 
about the naming, so very open to any suggestion.
   
   About formatting: I use the Google Java Style as the `google-java-format` 
tool applies but with tabs instead of spaces. That one consistently uses 2 
levels of indentation for both arguments and parameters and continuation while 
using 1 level of indentation for blocks. I like how it visually makes arguments 
more distinct from code blocks and the internal consistency. Plus, it's 
compatible with our Checkstyle rules. I would prefer if we had a consistent 
code style that is enforceable and can automatically be applied but while we 
don't have that I'll use a style I like in new code that is compatible with our 
rules. Sorry for the longer block of text. 😅 I can still undo the whitespace 
changes here.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-19494) Adjust "StreamExecutionEnvironment.generateSequence()" to new API Sources

2020-10-05 Thread Aljoscha Krettek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17207876#comment-17207876
 ] 

Aljoscha Krettek commented on FLINK-19494:
--

Just confirming here. 👌

> Adjust "StreamExecutionEnvironment.generateSequence()" to new API Sources
> -
>
> Key: FLINK-19494
> URL: https://issues.apache.org/jira/browse/FLINK-19494
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Reporter: Stephan Ewen
>Priority: Major
> Fix For: 1.12.0
>
>
> The utility method {{StreamExecutionEnvironment.generateSequence(()}} should 
> instantiate the new {{NumberSequenceSource}} rather than the existing 
> {{StatefulSequenceSource}}.
> We should also deprecate the {{StatefulSequenceSource}} as part of this 
> change, because
>   - it is based on the legacy source API
>   - it is not scalable, because it materializes the sequence



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19501) Missing state in enum type

2020-10-05 Thread Aljoscha Krettek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17207877#comment-17207877
 ] 

Aljoscha Krettek commented on FLINK-19501:
--

Which enum is this about?

> Missing state in enum type
> --
>
> Key: FLINK-19501
> URL: https://issues.apache.org/jira/browse/FLINK-19501
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.11.2
>Reporter: goutham
>Priority: Minor
> Fix For: 1.12.0
>
>
> INITIALIZING state is missing for one of enums api snapshot document 
>  
> "enum" : [ "INITIALIZING", "CREATED", "RUNNING", "FAILING", "FAILED", 
> "CANCELLING", "CANCELED", "FINISHED", "RESTARTING", "SUSPENDED", 
> "RECONCILING" ]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19493) In CliFrontend, make flow of Configuration more obvious

2020-10-05 Thread Aljoscha Krettek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17207878#comment-17207878
 ] 

Aljoscha Krettek commented on FLINK-19493:
--

Yes, I'm planning to add tests for the existing behaviour and new ones if 
necessary.

> In CliFrontend, make flow of Configuration more obvious
> ---
>
> Key: FLINK-19493
> URL: https://issues.apache.org/jira/browse/FLINK-19493
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
>
> It's very important to ensure that the {{Configuration}} the {{CliFrontend}} 
> loads ends up in the {{*ContexteEnvironment}} and that its settings are 
> reflected there. Currently, there are no tests for this behaviour and it is 
> hard to convince yourself that the code is actually doing the right thing. We 
> should simplify the flow of the {{Configuration}} from loading to the 
> environment and add tests that verify this behaviour.
> Currently, the flow is roughly this:
>  - the {{main()}} method loads the {{Configuration}} (from 
> {{flink-conf.yaml}})
>  - the {{Configuration}} is passed to the {{CustomCommandLines}} in 
> {{loadCustomCommandLines()}}
>  - {{main()}} passes both the {{Configuration}} and the 
> {{CustomCommandLines}} to the constructor of {{CliFrontend}}
>  - when we need a {{Configuration}} for execution 
> {{getEffectiveConfiguration()}} is called. This doesn't take the 
> {{Configuration}} of the {{CliFrontend}} as a basis but instead calls 
> {{CustomCommandLine.applyCommandLineOptionsToConfiguration}}
> It's up to the {{CustomCommandLine.applyCommandLineOptionsToConfiguration()}} 
> implemenation to either forward the {{Configuration}} that was given to it by 
> the {{CliFrontend.main()}} method or return some other {{Configuration}}. 
> Only if the correct {{Configuration}} is returned can we ensure that user 
> settings make it all the way though.
> I'm proposing to change 
> {{CustomCommandLine.applyCommandLineOptionsToConfiguration()}} to instead 
> apply it's settings to a {{Configuration}} that is passed in a parameter, 
> then {{getEffectiveConfiguration()}} can pass in the {{Configuration}} it got 
> from the {{main()}} method as a basis and the flow is easy to verify because.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13464: [FLINK-19307][coordination] Add ResourceTracker

2020-10-05 Thread GitBox


flinkbot edited a comment on pull request #13464:
URL: https://github.com/apache/flink/pull/13464#issuecomment-697462269


   
   ## CI report:
   
   * c39df1c9ae95ded8f61525e6bd39edb9ec44c955 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=6949)
 
   * 7bca073125251f8e92f1a3e3a9a62bad398b2fd0 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13530: [FLINK-19123] Use shared MiniCluster for executeAsync() in TestStreamEnvironment

2020-10-05 Thread GitBox


flinkbot edited a comment on pull request #13530:
URL: https://github.com/apache/flink/pull/13530#issuecomment-702582807


   
   ## CI report:
   
   * c0c3a012813e6bf76ee374073218ab647f1aa1be UNKNOWN
   * 2513a954016debd48694a5097a5c2a48943ca532 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7161)
 
   * 66eea9dec0e82a98bd20f1716caa0bbb3be9146e UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-19501) Missing state in enum type

2020-10-05 Thread Chesnay Schepler (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-19501.

Fix Version/s: (was: 1.12.0)
   Resolution: Duplicate

This was already fixed in 79181b8748827f581e1400309af890a450e312cb.

> Missing state in enum type
> --
>
> Key: FLINK-19501
> URL: https://issues.apache.org/jira/browse/FLINK-19501
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.11.2
>Reporter: goutham
>Priority: Minor
>
> INITIALIZING state is missing for one of enums api snapshot document 
>  
> "enum" : [ "INITIALIZING", "CREATED", "RUNNING", "FAILING", "FAILED", 
> "CANCELLING", "CANCELED", "FINISHED", "RESTARTING", "SUSPENDED", 
> "RECONCILING" ]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19502) Flink docker images: allow specifying arbitrary configuration values through environment

2020-10-05 Thread Julius Michaelis (Jira)
Julius Michaelis created FLINK-19502:


 Summary: Flink docker images: allow specifying arbitrary 
configuration values through environment
 Key: FLINK-19502
 URL: https://issues.apache.org/jira/browse/FLINK-19502
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes
Reporter: Julius Michaelis
 Attachments: set_options_from_env.patch

It would be nice if arbitrary values in flink-conf.yaml could be overwritten 
from the environment, especially since this can be used to override config 
comfortably from kubernetes' kustomizations.

(The attached patch to https://github.com/apache/flink-docker achieves that. 
Should I go ahead and create a PR?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19501) Missing state in enum type

2020-10-05 Thread Chesnay Schepler (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-19501:
-
Component/s: (was: API / Core)
 Runtime / REST
 Documentation

> Missing state in enum type
> --
>
> Key: FLINK-19501
> URL: https://issues.apache.org/jira/browse/FLINK-19501
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Runtime / REST
>Affects Versions: 1.11.2
>Reporter: goutham
>Priority: Minor
>
> INITIALIZING state is missing for one of enums api snapshot document 
>  
> "enum" : [ "INITIALIZING", "CREATED", "RUNNING", "FAILING", "FAILED", 
> "CANCELLING", "CANCELED", "FINISHED", "RESTARTING", "SUSPENDED", 
> "RECONCILING" ]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-19154) Application mode deletes HA data in case of suspended ZooKeeper connection

2020-10-05 Thread Kostas Kloudas (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kostas Kloudas reassigned FLINK-19154:
--

Assignee: Kostas Kloudas

> Application mode deletes HA data in case of suspended ZooKeeper connection
> --
>
> Key: FLINK-19154
> URL: https://issues.apache.org/jira/browse/FLINK-19154
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission
>Affects Versions: 1.12.0, 1.11.1
> Environment: Run a stand-alone cluster that runs a single job (if you 
> are familiar with the way Ververica Platform runs Flink jobs, we use a very 
> similar approach). It runs Flink 1.11.1 straight from the official docker 
> image.
>Reporter: Husky Zeng
>Assignee: Kostas Kloudas
>Priority: Blocker
> Fix For: 1.12.0, 1.11.3
>
>
> A user reported that Flink's application mode deletes HA data in case of a 
> suspended ZooKeeper connection [1]. 
> The problem seems to be that the {{ApplicationDispatcherBootstrap}} class 
> produces an exception (that the request job can no longer be found because of 
> a lost ZooKeeper connection) which will be interpreted as a job failure. Due 
> to this interpretation, the cluster will be shut down with a terminal state 
> of FAILED which will cause the HA data to be cleaned up. The exact problem 
> occurs in the {{JobStatusPollingUtils.getJobResult}} which is called by 
> {{ApplicationDispatcherBootstrap.getJobResult()}}.
> The above described behaviour can be found in this log [2].
> [1] 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoint-metadata-deleted-by-Flink-after-ZK-connection-issues-td37937.html
> [2] https://pastebin.com/raw/uH9KDU2L



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19154) Application mode deletes HA data in case of suspended ZooKeeper connection

2020-10-05 Thread Kostas Kloudas (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17207896#comment-17207896
 ] 

Kostas Kloudas commented on FLINK-19154:


[~rmetzger] I assigned it to me. 

> Application mode deletes HA data in case of suspended ZooKeeper connection
> --
>
> Key: FLINK-19154
> URL: https://issues.apache.org/jira/browse/FLINK-19154
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission
>Affects Versions: 1.12.0, 1.11.1
> Environment: Run a stand-alone cluster that runs a single job (if you 
> are familiar with the way Ververica Platform runs Flink jobs, we use a very 
> similar approach). It runs Flink 1.11.1 straight from the official docker 
> image.
>Reporter: Husky Zeng
>Assignee: Kostas Kloudas
>Priority: Blocker
> Fix For: 1.12.0, 1.11.3
>
>
> A user reported that Flink's application mode deletes HA data in case of a 
> suspended ZooKeeper connection [1]. 
> The problem seems to be that the {{ApplicationDispatcherBootstrap}} class 
> produces an exception (that the request job can no longer be found because of 
> a lost ZooKeeper connection) which will be interpreted as a job failure. Due 
> to this interpretation, the cluster will be shut down with a terminal state 
> of FAILED which will cause the HA data to be cleaned up. The exact problem 
> occurs in the {{JobStatusPollingUtils.getJobResult}} which is called by 
> {{ApplicationDispatcherBootstrap.getJobResult()}}.
> The above described behaviour can be found in this log [2].
> [1] 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoint-metadata-deleted-by-Flink-after-ZK-connection-issues-td37937.html
> [2] https://pastebin.com/raw/uH9KDU2L



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] dawidwys commented on pull request #13531: [FLINK-19318] Deprecate timeWindow() operations in DataStream API

2020-10-05 Thread GitBox


dawidwys commented on pull request #13531:
URL: https://github.com/apache/flink/pull/13531#issuecomment-703471242


   I think there was a discussion on that topic before (maybe even a vote) and 
the consensus was that we want to switch over to the `java.time` classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] kl0u commented on a change in pull request #13531: [FLINK-19318] Deprecate timeWindow() operations in DataStream API

2020-10-05 Thread GitBox


kl0u commented on a change in pull request #13531:
URL: https://github.com/apache/flink/pull/13531#discussion_r499415647



##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/KeyedStream.java
##
@@ -615,7 +615,12 @@ public IntervalJoined(
 * {@link 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment#setStreamTimeCharacteristic(org.apache.flink.streaming.api.TimeCharacteristic)}
 *
 * @param size The size of the window.
+*
+* @deprecated Please use {@link #windowAll(WindowAssigner)} with 
either {@link
+*TumblingEventTimeWindows} or {@link 
TumblingProcessingTimeWindows}. For more information,

Review comment:
   I think that was copied from the `timeWindowAll` :P 
   The deprecation message should mention `window()`, not `windowAll()`.

##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/KeyedStream.java
##
@@ -633,7 +638,12 @@ public IntervalJoined(
 * {@link 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment#setStreamTimeCharacteristic(org.apache.flink.streaming.api.TimeCharacteristic)}
 *
 * @param size The size of the window.
+*
+* @deprecated Please use {@link #windowAll(WindowAssigner)} with 
either {@link
+*SlidingEventTimeWindows} or {@link 
SlidingProcessingTimeWindows}. For more information,
+*  see the deprecation notice on {@link TimeCharacteristic}
 */
+   @Deprecated

Review comment:
   Same as above.

##
File path: 
flink-streaming-scala/src/main/scala/org/apache/flink/streaming/api/scala/KeyedStream.scala
##
@@ -231,11 +231,36 @@ class KeyedStream[T, K](javaStream: KeyedJavaStream[T, 
K]) extends DataStream[T]
* [[StreamExecutionEnvironment.setStreamTimeCharacteristic()]]
*
* @param size The size of the window.
+   *
+   * @deprecated Please use [[windowAll()]] with either 
[[TumblingEventTimeWindows]] or
+   * [[TumblingProcessingTimeWindows]]. For more information, see 
the deprecation
+   * notice on 
[[org.apache.flink.streaming.api.TimeCharacteristic]].
*/
+  @deprecated

Review comment:
   Same as in the java classes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13464: [FLINK-19307][coordination] Add ResourceTracker

2020-10-05 Thread GitBox


flinkbot edited a comment on pull request #13464:
URL: https://github.com/apache/flink/pull/13464#issuecomment-697462269


   
   ## CI report:
   
   * c39df1c9ae95ded8f61525e6bd39edb9ec44c955 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=6949)
 
   * 7bca073125251f8e92f1a3e3a9a62bad398b2fd0 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7198)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13530: [FLINK-19123] Use shared MiniCluster for executeAsync() in TestStreamEnvironment

2020-10-05 Thread GitBox


flinkbot edited a comment on pull request #13530:
URL: https://github.com/apache/flink/pull/13530#issuecomment-702582807


   
   ## CI report:
   
   * c0c3a012813e6bf76ee374073218ab647f1aa1be UNKNOWN
   * 2513a954016debd48694a5097a5c2a48943ca532 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7161)
 
   * 66eea9dec0e82a98bd20f1716caa0bbb3be9146e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7196)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rmetzger opened a new pull request #13540: [FLINK-19344] Fix DispatcherResourceCleanupTest race condition

2020-10-05 Thread GitBox


rmetzger opened a new pull request #13540:
URL: https://github.com/apache/flink/pull/13540


   
   ## What is the purpose of the change
   
   This is removing a test-instability, where sometimes, the 
DispatcherResourceCleanupTest. testJobSubmissionUnderSameJobId() would report a 
`DuplicateJobSubmissionException`. 
   The test is initialized with a dispatcher, recovering the testing job graph. 
Once the TestingJobManagerRunner for this job graph has been created, the 
result future is completed, and a new job gets submitted.
   The problem is that the `DispatcherJob.isDuplicateJob()` method might still 
find the job in the `runningJobs` list, because the cleanup from that list 
happens asynchronously.
   
   The test instability is resolved by waiting until TestingJobManagerRunner 
has been closed.
   
   
   
   ## Verifying this change
   
   I have executed this patch 4000 times on CI, without a failure: 
https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8421&view=results
   
   ## Does this pull request potentially affect one of the following parts:
   
   The change only affects tests.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-19344) DispatcherResourceCleanupTest#testJobSubmissionUnderSameJobId is unstable on Azure Pipeline

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-19344:
---
Labels: pull-request-available test-stability  (was: test-stability)

> DispatcherResourceCleanupTest#testJobSubmissionUnderSameJobId is unstable on 
> Azure Pipeline
> ---
>
> Key: FLINK-19344
> URL: https://issues.apache.org/jira/browse/FLINK-19344
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.12.0
>Reporter: Yingjie Cao
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0
>
>
> Here is the log and stack: 
> https://dev.azure.com/kevin-flink/3f520f11-5170-4153-99d0-2ade0d99b911/_apis/build/builds/88/logs/102



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19502) Flink docker images: allow specifying arbitrary configuration values through environment

2020-10-05 Thread Chesnay Schepler (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17207904#comment-17207904
 ] 

Chesnay Schepler commented on FLINK-19502:
--

This is already possible? 
https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/docker.html#configure-options

> Flink docker images: allow specifying arbitrary configuration values through 
> environment
> 
>
> Key: FLINK-19502
> URL: https://issues.apache.org/jira/browse/FLINK-19502
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Reporter: Julius Michaelis
>Priority: Minor
> Attachments: set_options_from_env.patch
>
>
> It would be nice if arbitrary values in flink-conf.yaml could be overwritten 
> from the environment, especially since this can be used to override config 
> comfortably from kubernetes' kustomizations.
> (The attached patch to https://github.com/apache/flink-docker achieves that. 
> Should I go ahead and create a PR?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #13540: [FLINK-19344] Fix DispatcherResourceCleanupTest race condition

2020-10-05 Thread GitBox


flinkbot commented on pull request #13540:
URL: https://github.com/apache/flink/pull/13540#issuecomment-703478380


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 6650579aa244c53d76237be84b861224c451b4ba (Mon Oct 05 
08:16:53 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-14068) Use Java's Duration instead of Flink's Time

2020-10-05 Thread Aljoscha Krettek (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-14068:
-
Description: 
As discussion in mailing list 
[here|https://lists.apache.org/x/thread.html/90ad2f1d7856cfe5bdc8f7dd678c626be96eeaeeb736e98f31660039@%3Cdev.flink.apache.org%3E]
 the community reaches a consensus that we will use Java's Duration for 
representing "time interval" instead of use Flink's Time for it.

Specifically, Flink has two {{Time}} classes, which are

{{org.apache.flink.api.common.time.Time}}
{{org.apache.flink.streaming.api.windowing.time.Time}}

the latter has been already deprecated and superseded by the former. Now we 
want to also deprecated the former and drop it in 2.0.0(we don't drop it just 
now because it is part of {{@Public}} interfaces).

  was:
As discussion in mailing list 
[here|https://lists.apache.org/x/thread.html/90ad2f1d7856cfe5bdc8f7dd678c626be96eeaeeb736e98f31660039@%3Cdev.flink.apache.org%3E]
 the community reaches a consensus that we will use Java's Duration for 
representing "time interval" instead of use Flink's Time for it.

Specifically, Flink has two {{Time}} classes, which are

{{org.apache.flink.api.common.time.Time}}
{{org.apache.flink.streaming.api.windowing.time.Time}}

the latter has been already deprecated and superseded by the former. Now we 
want to also deprecated the format and drop it in 2.0.0(we don't drop it just 
now because it is part of {{@Public}} interfaces).


> Use Java's Duration instead of Flink's Time
> ---
>
> Key: FLINK-14068
> URL: https://issues.apache.org/jira/browse/FLINK-14068
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream, Runtime / Configuration, Runtime / 
> Coordination
>Reporter: Zili Chen
>Priority: Major
> Fix For: 2.0.0
>
>
> As discussion in mailing list 
> [here|https://lists.apache.org/x/thread.html/90ad2f1d7856cfe5bdc8f7dd678c626be96eeaeeb736e98f31660039@%3Cdev.flink.apache.org%3E]
>  the community reaches a consensus that we will use Java's Duration for 
> representing "time interval" instead of use Flink's Time for it.
> Specifically, Flink has two {{Time}} classes, which are
> {{org.apache.flink.api.common.time.Time}}
> {{org.apache.flink.streaming.api.windowing.time.Time}}
> the latter has been already deprecated and superseded by the former. Now we 
> want to also deprecated the former and drop it in 2.0.0(we don't drop it just 
> now because it is part of {{@Public}} interfaces).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] dawidwys commented on pull request #13531: [FLINK-19318] Deprecate timeWindow() operations in DataStream API

2020-10-05 Thread GitBox


dawidwys commented on pull request #13531:
URL: https://github.com/apache/flink/pull/13531#issuecomment-703478881


   Do we have a plan for removing the deprecated methods from docs? There are 
quite some.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] aljoscha commented on pull request #13531: [FLINK-19318] Deprecate timeWindow() operations in DataStream API

2020-10-05 Thread GitBox


aljoscha commented on pull request #13531:
URL: https://github.com/apache/flink/pull/13531#issuecomment-703480530


   @sjwiesman : https://issues.apache.org/jira/browse/FLINK-14068



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rmetzger commented on pull request #11502: [FLINK-16620] - Add attempt information in taskNameWithSubtask

2020-10-05 Thread GitBox


rmetzger commented on pull request #11502:
URL: https://github.com/apache/flink/pull/11502#issuecomment-703480338


@Jiayi-Liao: #13439 has been merged to master, feel free to rebase.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] AHeise commented on a change in pull request #13539: [FLINK-19027][network] Assign exclusive buffers to LocalRecoveredInputChannel.

2020-10-05 Thread GitBox


AHeise commented on a change in pull request #13539:
URL: https://github.com/apache/flink/pull/13539#discussion_r499424128



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/SingleInputGate.java
##
@@ -247,7 +248,18 @@ public void setup() throws IOException {
}
 
@Override
-   public CompletableFuture readRecoveredState(ExecutorService 
executor, ChannelStateReader reader) {
+   public CompletableFuture readRecoveredState(ExecutorService 
executor, ChannelStateReader reader) throws IOException {
+   synchronized (requestLock) {
+   if (closeFuture.isDone()) {
+   return FutureUtils.completedVoidFuture();
+   }
+   for (InputChannel inputChannel : 
inputChannels.values()) {
+   if (inputChannel instanceof 
RemoteRecoveredInputChannel) {
+   ((RemoteRecoveredInputChannel) 
inputChannel).assignExclusiveSegments();
+   }
+   }
+   }
+

Review comment:
   I'll add. In short, the #number of required buffers is now higher than a 
few tests (and possibly production setups) assume. Without the lazy 
initialization, you cannot simulate backpressure in a few scenarios as easily.

##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/io/network/partition/consumer/SingleInputGateTest.java
##
@@ -217,15 +221,11 @@ public void testConcurrentReadStateAndProcessAndClose() 
throws Exception {
}
};
 
-   submitTasksAndWaitForResults(executor, new Callable[] 
{closeTask, readRecoveredStateTask, processStateTask});
-   } finally {
-   executor.shutdown();
+   executor.invokeAll(Arrays.asList(closeTask, 
readRecoveredStateTask, processStateTask));
+
// wait until the internal channel state recover task 
finishes
-   executor.awaitTermination(60, TimeUnit.SECONDS);
assertEquals(totalBuffers, 
environment.getNetworkBufferPool().getNumberOfAvailableMemorySegments());
assertTrue(inputGate.getCloseFuture().isDone());
-
-   environment.close();

Review comment:
   `close` is called by the `Closer`.
   
   `shutdown` + `awaitTermination` is simply the wrong method. `invokeAll` is 
doing what was intended. Could be an extra commit. However, it should then 
probably be done on all 10 places that use `submitTasksAndWaitForResults`.

##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/LocalRecoveredInputChannel.java
##
@@ -42,8 +42,17 @@
TaskEventPublisher taskEventPublisher,
int initialBackOff,
int maxBackoff,
+   int networkBuffersPerChannel,
InputChannelMetrics metrics) {
-   super(inputGate, channelIndex, partitionId, initialBackOff, 
maxBackoff, metrics.getNumBytesInLocalCounter(), 
metrics.getNumBuffersInLocalCounter());
+   super(

Review comment:
   Hm you are right, it doesn't solve it completely after having read the 
ticket. However, without a solution for FLINK-13203, there will also not be a 
real solution here.
   On the other hand, it's inherently wrong to treat local and remote channels 
differently during recovery (they even share the same implementation). So this 
commit is still fixing the issue in a best effort manner and certainly helps to 
improve build stability, which is an improvement of its own.

##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/io/network/partition/consumer/SingleInputGateTest.java
##
@@ -59,26 +59,27 @@
 import org.apache.flink.runtime.shuffle.ShuffleDescriptor;

Review comment:
   I didn't even know that double-tags are a thing. :p





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #13540: [FLINK-19344] Fix DispatcherResourceCleanupTest race condition

2020-10-05 Thread GitBox


flinkbot commented on pull request #13540:
URL: https://github.com/apache/flink/pull/13540#issuecomment-703484500


   
   ## CI report:
   
   * 6650579aa244c53d76237be84b861224c451b4ba UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] aljoscha commented on pull request #13531: [FLINK-19318] Deprecate timeWindow() operations in DataStream API

2020-10-05 Thread GitBox


aljoscha commented on pull request #13531:
URL: https://github.com/apache/flink/pull/13531#issuecomment-703484840


   Addressed comments, PTAL.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] kl0u commented on pull request #13531: [FLINK-19318] Deprecate timeWindow() operations in DataStream API

2020-10-05 Thread GitBox


kl0u commented on pull request #13531:
URL: https://github.com/apache/flink/pull/13531#issuecomment-703486706


   @aljoscha from my side there is only the comment that @dawidwys made about 
the docs. After this is resolved and Azure gives green, feel free to merge.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-18736) Source chaining with N-Ary Stream Operator in Flink

2020-10-05 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-18736.


> Source chaining with N-Ary Stream Operator in Flink
> ---
>
> Key: FLINK-18736
> URL: https://issues.apache.org/jira/browse/FLINK-18736
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Task
>Affects Versions: 1.11.1
>Reporter: Piotr Nowojski
>Priority: Critical
> Fix For: 1.12.0
>
>
> A follow up to [FLIP-76| 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-92%3A+Add+N-Ary+Stream+Operator+in+Flink].
>  In order to improve performance, multiple input operator introduced in 
> FLINK-15688 should be chain-able with sources 
> ([FLIP-27|https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (FLINK-18736) Source chaining with N-Ary Stream Operator in Flink

2020-10-05 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-18736.
--
Resolution: Fixed

Resolved with the resolution of all subtasks.

> Source chaining with N-Ary Stream Operator in Flink
> ---
>
> Key: FLINK-18736
> URL: https://issues.apache.org/jira/browse/FLINK-18736
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Task
>Affects Versions: 1.11.1
>Reporter: Piotr Nowojski
>Priority: Critical
> Fix For: 1.12.0
>
>
> A follow up to [FLIP-76| 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-92%3A+Add+N-Ary+Stream+Operator+in+Flink].
>  In order to improve performance, multiple input operator introduced in 
> FLINK-15688 should be chain-able with sources 
> ([FLIP-27|https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13540: [FLINK-19344] Fix DispatcherResourceCleanupTest race condition

2020-10-05 Thread GitBox


flinkbot edited a comment on pull request #13540:
URL: https://github.com/apache/flink/pull/13540#issuecomment-703484500


   
   ## CI report:
   
   * 6650579aa244c53d76237be84b861224c451b4ba Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7199)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13531: [FLINK-19318] Deprecate timeWindow() operations in DataStream API

2020-10-05 Thread GitBox


flinkbot edited a comment on pull request #13531:
URL: https://github.com/apache/flink/pull/13531#issuecomment-702591517


   
   ## CI report:
   
   * 2662522fc5199438ca7bd7e8a5d1825d2aa80f14 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7162)
 
   * 12b57911b704cce325f3c6e8c31e5744e284f3db UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13531: [FLINK-19318] Deprecate timeWindow() operations in DataStream API

2020-10-05 Thread GitBox


flinkbot edited a comment on pull request #13531:
URL: https://github.com/apache/flink/pull/13531#issuecomment-702591517


   
   ## CI report:
   
   * 2662522fc5199438ca7bd7e8a5d1825d2aa80f14 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7162)
 
   * 12b57911b704cce325f3c6e8c31e5744e284f3db Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7200)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13530: [FLINK-19123] Use shared MiniCluster for executeAsync() in TestStreamEnvironment

2020-10-05 Thread GitBox


flinkbot edited a comment on pull request #13530:
URL: https://github.com/apache/flink/pull/13530#issuecomment-702582807


   
   ## CI report:
   
   * c0c3a012813e6bf76ee374073218ab647f1aa1be UNKNOWN
   * 2513a954016debd48694a5097a5c2a48943ca532 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7161)
 
   * 66eea9dec0e82a98bd20f1716caa0bbb3be9146e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7196)
 
   * 88ec25aa3dc2db1f5d35a4351d5a26a5e353c7cc UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-19503) Add DSTL interface and its in-memory implementation for testing

2020-10-05 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-19503:
-

 Summary: Add DSTL interface and its in-memory implementation for 
testing
 Key: FLINK-19503
 URL: https://issues.apache.org/jira/browse/FLINK-19503
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Checkpointing, Runtime / Task
Reporter: Roman Khachatryan
Assignee: Roman Khachatryan
 Fix For: 1.12.0


Durable Short-term Log is a proposed component to store state changes and later 
output.

The goal of this issue is to allow:
 # Integration with changes to the state backends
 # Implementation first production DSTL version

The code should include interfaces, test implementations, means to access from 
state backends, tests.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] aljoscha commented on pull request #13531: [FLINK-19318] Deprecate timeWindow() operations in DataStream API

2020-10-05 Thread GitBox


aljoscha commented on pull request #13531:
URL: https://github.com/apache/flink/pull/13531#issuecomment-703523727


   I pushed a commit that removes all mentions of `timeWindow*()` from the 
documentation. In some places I replace it by a processing-time assigner 
because it wouldn't work with event time. All other cases now use an explicit 
event-time assigner.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zentol commented on a change in pull request #13536: [FLINK-19497] [metrics] implement mutator methods for FlinkCounterWrapper

2020-10-05 Thread GitBox


zentol commented on a change in pull request #13536:
URL: https://github.com/apache/flink/pull/13536#discussion_r499474961



##
File path: 
flink-metrics/flink-metrics-dropwizard/src/test/java/org/apache/flink/dropwizard/metrics/FlinkCounterWrapperTest.java
##
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.dropwizard.metrics;
+
+import org.apache.flink.metrics.Counter;
+import org.apache.flink.metrics.util.TestCounter;
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+
+/**
+ * Tests for the FlinkMeterWrapper.

Review comment:
   ```suggestion
* Tests for the {@link FlinkCounterWrapper}.
   ```

##
File path: 
flink-metrics/flink-metrics-dropwizard/src/test/java/org/apache/flink/dropwizard/metrics/FlinkCounterWrapperTest.java
##
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.dropwizard.metrics;
+
+import org.apache.flink.metrics.Counter;
+import org.apache.flink.metrics.util.TestCounter;
+import org.junit.Test;

Review comment:
   the import order is incorrect:
   ```
   [ERROR] 
src/test/java/org/apache/flink/dropwizard/metrics/FlinkCounterWrapperTest.java:[23]
 (imports) ImportOrder: 'org.junit.Test' should be separated from previous 
imports.
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-19497) Implement mutator methods for FlinkCounterWrapper

2020-10-05 Thread Chesnay Schepler (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-19497:
-
Issue Type: Improvement  (was: Bug)

> Implement mutator methods for FlinkCounterWrapper
> -
>
> Key: FLINK-19497
> URL: https://issues.apache.org/jira/browse/FLINK-19497
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Affects Versions: 1.10.2, 1.11.2
>Reporter: Richard Moorhead
>Assignee: Richard Moorhead
>Priority: Minor
>  Labels: pull-request-available
>
> Looking at the dropwizard wrapper classes in flink-metrics-dropwizard, it 
> appears that all of them have mutator methods defined with the exception of 
> FlinkCounterWrapper. We have a use case wherein we mutate counters from a 
> dropwizard context but wish the underlying metrics in the flink registry to 
> be updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-19497) Implement mutator methods for FlinkCounterWrapper

2020-10-05 Thread Chesnay Schepler (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler reassigned FLINK-19497:


Assignee: Richard Moorhead

> Implement mutator methods for FlinkCounterWrapper
> -
>
> Key: FLINK-19497
> URL: https://issues.apache.org/jira/browse/FLINK-19497
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Metrics
>Affects Versions: 1.10.2, 1.11.2
>Reporter: Richard Moorhead
>Assignee: Richard Moorhead
>Priority: Minor
>  Labels: pull-request-available
>
> Looking at the dropwizard wrapper classes in flink-metrics-dropwizard, it 
> appears that all of them have mutator methods defined with the exception of 
> FlinkCounterWrapper. We have a use case wherein we mutate counters from a 
> dropwizard context but wish the underlying metrics in the flink registry to 
> be updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19504) Implement persistence for DSTL via DFS

2020-10-05 Thread Roman Khachatryan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan updated FLINK-19504:
--
Labels:   (was: DST)

> Implement persistence for DSTL via DFS
> --
>
> Key: FLINK-19504
> URL: https://issues.apache.org/jira/browse/FLINK-19504
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Checkpointing, Runtime / Task
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Major
> Fix For: 1.12.0
>
>
> Durable Short-term Log is a proposed component to store state changes and 
> later output.
> The goal of this issue is to add a persistent version of FLINK-19503 (backed 
> by DFS).
> No scheduling and RPC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19504) Implement persistence for DSTL via DFS

2020-10-05 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-19504:
-

 Summary: Implement persistence for DSTL via DFS
 Key: FLINK-19504
 URL: https://issues.apache.org/jira/browse/FLINK-19504
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Checkpointing, Runtime / Task
Reporter: Roman Khachatryan
Assignee: Roman Khachatryan
 Fix For: 1.12.0


Durable Short-term Log is a proposed component to store state changes and later 
output.

The goal of this issue is to add a persistent version of FLINK-19503 (backed by 
DFS).

No scheduling and RPC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19504) Implement persistence for DSTL via DFS

2020-10-05 Thread Roman Khachatryan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan updated FLINK-19504:
--
Labels: DST  (was: )

> Implement persistence for DSTL via DFS
> --
>
> Key: FLINK-19504
> URL: https://issues.apache.org/jira/browse/FLINK-19504
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Checkpointing, Runtime / Task
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Major
>  Labels: DST
> Fix For: 1.12.0
>
>
> Durable Short-term Log is a proposed component to store state changes and 
> later output.
> The goal of this issue is to add a persistent version of FLINK-19503 (backed 
> by DFS).
> No scheduling and RPC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19504) Implement persistence for DSTL via DFS

2020-10-05 Thread Roman Khachatryan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan updated FLINK-19504:
--
Description: 
Durable Short-term Log is a proposed component to store state changes and later 
output.

The goal of this issue is to add a persistent version of FLINK-19503 (backed by 
DFS).

No scheduling and RPC.

 

If absent, add request hedging capability.

 

  was:
Durable Short-term Log is a proposed component to store state changes and later 
output.

The goal of this issue is to add a persistent version of FLINK-19503 (backed by 
DFS).

No scheduling and RPC.


> Implement persistence for DSTL via DFS
> --
>
> Key: FLINK-19504
> URL: https://issues.apache.org/jira/browse/FLINK-19504
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Checkpointing, Runtime / Task
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Major
> Fix For: 1.12.0
>
>
> Durable Short-term Log is a proposed component to store state changes and 
> later output.
> The goal of this issue is to add a persistent version of FLINK-19503 (backed 
> by DFS).
> No scheduling and RPC.
>  
> If absent, add request hedging capability.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19505) Implement distributed DSTL (DFS-based)

2020-10-05 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-19505:
-

 Summary: Implement distributed DSTL (DFS-based)
 Key: FLINK-19505
 URL: https://issues.apache.org/jira/browse/FLINK-19505
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Checkpointing, Runtime / Task
Reporter: Roman Khachatryan
Assignee: Roman Khachatryan
 Fix For: 1.12.0


Sub-tasks:
 # Implement scheduling and deployment (no rescaling)
 # Implement RPC between the tasks and DSTL (including load balancing)
 # Implement re-scaling
 # Implement filtering of records on replay to support up-scaling



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19505) Implement distributed DSTL (DFS-based)

2020-10-05 Thread Roman Khachatryan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan updated FLINK-19505:
--
Labels: DSTL  (was: )

> Implement distributed DSTL (DFS-based)
> --
>
> Key: FLINK-19505
> URL: https://issues.apache.org/jira/browse/FLINK-19505
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Checkpointing, Runtime / Task
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Major
>  Labels: DSTL
> Fix For: 1.12.0
>
>
> Sub-tasks:
>  # Implement scheduling and deployment (no rescaling)
>  # Implement RPC between the tasks and DSTL (including load balancing)
>  # Implement re-scaling
>  # Implement filtering of records on replay to support up-scaling



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19504) Implement persistence for DSTL via DFS

2020-10-05 Thread Roman Khachatryan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan updated FLINK-19504:
--
Labels: DSTL  (was: )

> Implement persistence for DSTL via DFS
> --
>
> Key: FLINK-19504
> URL: https://issues.apache.org/jira/browse/FLINK-19504
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Checkpointing, Runtime / Task
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Major
>  Labels: DSTL
> Fix For: 1.12.0
>
>
> Durable Short-term Log is a proposed component to store state changes and 
> later output.
> The goal of this issue is to add a persistent version of FLINK-19503 (backed 
> by DFS).
> No scheduling and RPC.
>  
> If absent, add request hedging capability.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19503) Add DSTL interface and its in-memory implementation for testing

2020-10-05 Thread Roman Khachatryan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan updated FLINK-19503:
--
Labels: DSTL  (was: )

> Add DSTL interface and its in-memory implementation for testing
> ---
>
> Key: FLINK-19503
> URL: https://issues.apache.org/jira/browse/FLINK-19503
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Checkpointing, Runtime / Task
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Major
>  Labels: DSTL
> Fix For: 1.12.0
>
>
> Durable Short-term Log is a proposed component to store state changes and 
> later output.
> The goal of this issue is to allow:
>  # Integration with changes to the state backends
>  # Implementation first production DSTL version
> The code should include interfaces, test implementations, means to access 
> from state backends, tests.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13531: [FLINK-19318] Deprecate timeWindow() operations in DataStream API

2020-10-05 Thread GitBox


flinkbot edited a comment on pull request #13531:
URL: https://github.com/apache/flink/pull/13531#issuecomment-702591517


   
   ## CI report:
   
   * 2662522fc5199438ca7bd7e8a5d1825d2aa80f14 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7162)
 
   * 12b57911b704cce325f3c6e8c31e5744e284f3db Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7200)
 
   * 09c97e5bdf9e3ea5719d772749b9bb542ec89514 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-19295) YARNSessionFIFOITCase.checkForProhibitedLogContents found a log with prohibited string

2020-10-05 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17207981#comment-17207981
 ] 

Robert Metzger commented on FLINK-19295:


Resolved on 1.11 in 
https://github.com/apache/flink/commit/c9f3cc6b6c25963eec648bd06d70a6e9a0e004e5

> YARNSessionFIFOITCase.checkForProhibitedLogContents found a log with 
> prohibited string
> --
>
> Key: FLINK-19295
> URL: https://issues.apache.org/jira/browse/FLINK-19295
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, Runtime / Coordination, Tests
>Affects Versions: 1.12.0, 1.11.3
>Reporter: Dian Fu
>Assignee: Robert Metzger
>Priority: Major
>  Labels: pull-request-available, test-stability
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=6661&view=logs&j=245e1f2e-ba5b-5570-d689-25ae21e5302f&t=e7f339b2-a7c3-57d9-00af-3712d4b15354]
> {code}
> 2020-09-19T22:08:13.5364974Z [ERROR]   
> YARNSessionFIFOITCase.checkForProhibitedLogContents:83->YarnTestBase.ensureNoProhibitedStringInLogFiles:476
>  Found a file 
> /__w/2/s/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-1_0/application_1600553154281_0001/container_1600553154281_0001_01_02/taskmanager.log
>  with a prohibited string (one of [Exception, Started 
> SelectChannelConnector@0.0.0.0:8081]). Excerpts:
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-19295) YARNSessionFIFOITCase.checkForProhibitedLogContents found a log with prohibited string

2020-10-05 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-19295.
--
Fix Version/s: 1.11.3
   1.12.0
   Resolution: Fixed

> YARNSessionFIFOITCase.checkForProhibitedLogContents found a log with 
> prohibited string
> --
>
> Key: FLINK-19295
> URL: https://issues.apache.org/jira/browse/FLINK-19295
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, Runtime / Coordination, Tests
>Affects Versions: 1.12.0, 1.11.3
>Reporter: Dian Fu
>Assignee: Robert Metzger
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0, 1.11.3
>
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=6661&view=logs&j=245e1f2e-ba5b-5570-d689-25ae21e5302f&t=e7f339b2-a7c3-57d9-00af-3712d4b15354]
> {code}
> 2020-09-19T22:08:13.5364974Z [ERROR]   
> YARNSessionFIFOITCase.checkForProhibitedLogContents:83->YarnTestBase.ensureNoProhibitedStringInLogFiles:476
>  Found a file 
> /__w/2/s/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-1_0/application_1600553154281_0001/container_1600553154281_0001_01_02/taskmanager.log
>  with a prohibited string (one of [Exception, Started 
> SelectChannelConnector@0.0.0.0:8081]). Excerpts:
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13530: [FLINK-19123] Use shared MiniCluster for executeAsync() in TestStreamEnvironment

2020-10-05 Thread GitBox


flinkbot edited a comment on pull request #13530:
URL: https://github.com/apache/flink/pull/13530#issuecomment-702582807


   
   ## CI report:
   
   * c0c3a012813e6bf76ee374073218ab647f1aa1be UNKNOWN
   * 2513a954016debd48694a5097a5c2a48943ca532 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7161)
 
   * 66eea9dec0e82a98bd20f1716caa0bbb3be9146e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7196)
 
   * 88ec25aa3dc2db1f5d35a4351d5a26a5e353c7cc Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7201)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13531: [FLINK-19318] Deprecate timeWindow() operations in DataStream API

2020-10-05 Thread GitBox


flinkbot edited a comment on pull request #13531:
URL: https://github.com/apache/flink/pull/13531#issuecomment-702591517


   
   ## CI report:
   
   * 2662522fc5199438ca7bd7e8a5d1825d2aa80f14 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7162)
 
   * 12b57911b704cce325f3c6e8c31e5744e284f3db Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7200)
 
   * 09c97e5bdf9e3ea5719d772749b9bb542ec89514 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7202)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (FLINK-19458) ZooKeeperLeaderElectionITCase.testJobExecutionOnClusterWithLeaderChange: ZooKeeper unexpectedly modified

2020-10-05 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-19458:
--

Assignee: Robert Metzger

> ZooKeeperLeaderElectionITCase.testJobExecutionOnClusterWithLeaderChange: 
> ZooKeeper unexpectedly modified
> 
>
> Key: FLINK-19458
> URL: https://issues.apache.org/jira/browse/FLINK-19458
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8422&view=logs&j=70ad9b63-500e-5dc9-5a3c-b60356162d7e&t=944c7023-8984-5aa2-b5f8-54922bd90d3a
> {code}
> 2020-09-29T13:34:18.1803081Z [ERROR] 
> testJobExecutionOnClusterWithLeaderChange(org.apache.flink.test.runtime.leaderelection.ZooKeeperLeaderElectionITCase)
>   Time elapsed: 23.524 s  <<< ERROR!
> 2020-09-29T13:34:18.1803707Z java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.client.JobSubmissionException: Failed to submit job.
> 2020-09-29T13:34:18.1804343Z  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> 2020-09-29T13:34:18.1804738Z  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> 2020-09-29T13:34:18.1805274Z  at 
> org.apache.flink.test.runtime.leaderelection.ZooKeeperLeaderElectionITCase.testJobExecutionOnClusterWithLeaderChange(ZooKeeperLeaderElectionITCase.java:117)
> 2020-09-29T13:34:18.1805772Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-09-29T13:34:18.1806136Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-09-29T13:34:18.1806555Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-09-29T13:34:18.1806936Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-09-29T13:34:18.1807313Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-09-29T13:34:18.1807731Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-09-29T13:34:18.1808341Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-09-29T13:34:18.1808973Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-09-29T13:34:18.1809376Z  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> 2020-09-29T13:34:18.1809851Z  at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> 2020-09-29T13:34:18.1810201Z  at 
> org.junit.rules.RunRules.evaluate(RunRules.java:20)
> 2020-09-29T13:34:18.1810632Z  at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> 2020-09-29T13:34:18.1811035Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> 2020-09-29T13:34:18.1811700Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> 2020-09-29T13:34:18.1812082Z  at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 2020-09-29T13:34:18.1812447Z  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 2020-09-29T13:34:18.1812824Z  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 2020-09-29T13:34:18.1813190Z  at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 2020-09-29T13:34:18.1813565Z  at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 2020-09-29T13:34:18.1813964Z  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 2020-09-29T13:34:18.1814364Z  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 2020-09-29T13:34:18.1814752Z  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 2020-09-29T13:34:18.1815298Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> 2020-09-29T13:34:18.1816096Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> 2020-09-29T13:34:18.1816552Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> 2020-09-29T13:34:18.1816984Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> 2020-09-29T13:34:18.1817421Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> 2020-09-29T13:34:18.1817894Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> 2020-09-29T13:34:18.1818318Z  at 
> org.apach

[GitHub] [flink] kl0u commented on a change in pull request #13531: [FLINK-19318] Deprecate timeWindow() operations in DataStream API

2020-10-05 Thread GitBox


kl0u commented on a change in pull request #13531:
URL: https://github.com/apache/flink/pull/13531#discussion_r499506498



##
File path: docs/dev/stream/experimental.md
##
@@ -60,7 +60,7 @@ Code example:
 StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
 DataStreamSource source = ...
 DataStreamUtils.reinterpretAsKeyedStream(source, (in) -> in, 
TypeInformation.of(Integer.class))
-.timeWindow(Time.seconds(1))
+.window(TumblingEventTimeWindows.of(Time.seconds(1)))

Review comment:
   Same as above in both.

##
File path: docs/dev/parallel.md
##
@@ -55,7 +55,7 @@ DataStream text = [...]
 DataStream> wordCounts = text
 .flatMap(new LineSplitter())
 .keyBy(value -> value.f0)
-.timeWindow(Time.seconds(5))
+.window(TumblingEventTimeWindows.of(Time.seconds(5)))
 .sum(1).setParallelism(5);

Review comment:
   For all of these changes, same as above about event vs processing time.

##
File path: docs/dev/stream/operators/windows.md
##
@@ -684,7 +684,7 @@ DataStream> input = ...;
 
 input
   .keyBy(t -> t.f0)
-  .timeWindow(Time.minutes(5))
+  .window(TumblingEventTimeWindows.of(Time.minutes(5)))

Review comment:
   Same as above.

##
File path: README.md
##
@@ -41,7 +41,7 @@ val text = env.socketTextStream(host, port, '\n')
 val windowCounts = text.flatMap { w => w.split("\\s") }
   .map { w => WordWithCount(w, 1) }
   .keyBy("word")
-  .timeWindow(Time.seconds(5))
+  .window(TumblingEventTimeWindow.of(Time.seconds(5)))

Review comment:
   I think here it will not work without a `WatermarkStrategy`, right? If 
this is correct, we should maybe put processing time window.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] XComp commented on a change in pull request #13458: FLINK-18851 [runtime] Add checkpoint type to checkpoint history entries in Web UI

2020-10-05 Thread GitBox


XComp commented on a change in pull request #13458:
URL: https://github.com/apache/flink/pull/13458#discussion_r499512971



##
File path: flink-runtime-web/src/test/resources/rest_api_v1.snapshot
##
@@ -557,7 +557,7 @@
   },
   "status" : {
 "type" : "string",
-"enum" : [ "CREATED", "RUNNING", "FAILING", "FAILED", 
"CANCELLING", "CANCELED", "FINISHED", "RESTARTING", "SUSPENDED", "RECONCILING" ]
+"enum" : [ "INITIALIZING", "CREATED", "RUNNING", "FAILING", 
"FAILED", "CANCELLING", "CANCELED", "FINISHED", "RESTARTING", "SUSPENDED", 
"RECONCILING" ]

Review comment:
   This was already fixed in PR #13403 and merged into `master`. Rebasing 
the branch should fix this diff.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] pnowojski commented on a change in pull request #13351: [FLINK-18990][task] Read channel state sequentially

2020-10-05 Thread GitBox


pnowojski commented on a change in pull request #13351:
URL: https://github.com/apache/flink/pull/13351#discussion_r499521881



##
File path: 
flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/tasks/StreamTaskMultipleInputSelectiveReadingTest.java
##
@@ -197,22 +198,21 @@ public void testInputStarvation() throws Exception {
testHarness.processElement(new StreamRecord<>("3"), 1);
testHarness.processElement(new StreamRecord<>("4"), 1);
 
-   testHarness.processSingleStep();
expectedOutput.add(new StreamRecord<>("[2]: 1"));
-   testHarness.processSingleStep();
expectedOutput.add(new StreamRecord<>("[2]: 2"));
-   assertThat(testHarness.getOutput(), 
contains(expectedOutput.toArray()));
+   testHarness.processAll();
+   assertEquals(expectedOutput, new 
ArrayList<>(testHarness.getOutput()).subList(0, expectedOutput.size()));

Review comment:
   This test is now not doing what it was intended.
   
   Now you are processing all elements from the input gate `1` before 
`testHarness.processElement(new StreamRecord<>("1"), 2);` (L207/206) is being 
enqueued to input gate `2`.
   
   I would guess that 
   ```
   // to avoid starvation, if the input selection is ALL and 
availableInputsMask is not ALL,
   // always try to check and set the availability of another input
   if (inputSelectionHandler.shouldSetAvailableForAnotherInput()) {
fullCheckAndSetAvailable();
   }
   ```
   check from `StreamMultipleInputProcessor#selectNextReadingInputIndex` is 
currently not tested.
   
   The intention behind this test is:
   1. to have a long (just as well could be infinite) backlog of records to 
process on one of the inputs
   2. introduce the availability change on the second input, and make sure it's 
checked/respected (instead of hot looping on the first input)
   3. also throw in a third not selected input just to spice things a little bit
   
   Why did you have to change this test?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13530: [FLINK-19123] Use shared MiniCluster for executeAsync() in TestStreamEnvironment

2020-10-05 Thread GitBox


flinkbot edited a comment on pull request #13530:
URL: https://github.com/apache/flink/pull/13530#issuecomment-702582807


   
   ## CI report:
   
   * c0c3a012813e6bf76ee374073218ab647f1aa1be UNKNOWN
   * 66eea9dec0e82a98bd20f1716caa0bbb3be9146e Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7196)
 
   * 88ec25aa3dc2db1f5d35a4351d5a26a5e353c7cc Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7201)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-19458) ZooKeeperLeaderElectionITCase.testJobExecutionOnClusterWithLeaderChange: ZooKeeper unexpectedly modified

2020-10-05 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208029#comment-17208029
 ] 

Robert Metzger commented on FLINK-19458:


I'll have a look at this failure ...

The test failure seems to occur only on the azure provided VMs.




> ZooKeeperLeaderElectionITCase.testJobExecutionOnClusterWithLeaderChange: 
> ZooKeeper unexpectedly modified
> 
>
> Key: FLINK-19458
> URL: https://issues.apache.org/jira/browse/FLINK-19458
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8422&view=logs&j=70ad9b63-500e-5dc9-5a3c-b60356162d7e&t=944c7023-8984-5aa2-b5f8-54922bd90d3a
> {code}
> 2020-09-29T13:34:18.1803081Z [ERROR] 
> testJobExecutionOnClusterWithLeaderChange(org.apache.flink.test.runtime.leaderelection.ZooKeeperLeaderElectionITCase)
>   Time elapsed: 23.524 s  <<< ERROR!
> 2020-09-29T13:34:18.1803707Z java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.client.JobSubmissionException: Failed to submit job.
> 2020-09-29T13:34:18.1804343Z  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> 2020-09-29T13:34:18.1804738Z  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> 2020-09-29T13:34:18.1805274Z  at 
> org.apache.flink.test.runtime.leaderelection.ZooKeeperLeaderElectionITCase.testJobExecutionOnClusterWithLeaderChange(ZooKeeperLeaderElectionITCase.java:117)
> 2020-09-29T13:34:18.1805772Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-09-29T13:34:18.1806136Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-09-29T13:34:18.1806555Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-09-29T13:34:18.1806936Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-09-29T13:34:18.1807313Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-09-29T13:34:18.1807731Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-09-29T13:34:18.1808341Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-09-29T13:34:18.1808973Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-09-29T13:34:18.1809376Z  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> 2020-09-29T13:34:18.1809851Z  at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> 2020-09-29T13:34:18.1810201Z  at 
> org.junit.rules.RunRules.evaluate(RunRules.java:20)
> 2020-09-29T13:34:18.1810632Z  at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> 2020-09-29T13:34:18.1811035Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> 2020-09-29T13:34:18.1811700Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> 2020-09-29T13:34:18.1812082Z  at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 2020-09-29T13:34:18.1812447Z  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 2020-09-29T13:34:18.1812824Z  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 2020-09-29T13:34:18.1813190Z  at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 2020-09-29T13:34:18.1813565Z  at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 2020-09-29T13:34:18.1813964Z  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 2020-09-29T13:34:18.1814364Z  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 2020-09-29T13:34:18.1814752Z  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 2020-09-29T13:34:18.1815298Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> 2020-09-29T13:34:18.1816096Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> 2020-09-29T13:34:18.1816552Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> 2020-09-29T13:34:18.1816984Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> 2020-09-29T13:34:18.1817421Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> 2020-09-29T13:34:18.1817894Z  at 
> org.apac

[GitHub] [flink] tillrohrmann commented on pull request #13530: [FLINK-19123] Use shared MiniCluster for executeAsync() in TestStreamEnvironment

2020-10-05 Thread GitBox


tillrohrmann commented on pull request #13530:
URL: https://github.com/apache/flink/pull/13530#issuecomment-703583316


   Thanks for the explanation of the formatting changes @aljoscha. I agree that 
an automated way is superior to an inconsistent manual approach. I guess 
adopting this would be easiest if someone shared a consistent set of settings 
for `google-java-format` (potentially also for other IDEs than IntelliJ). But 
this PR should not be blocked on this. I am fine with either formatting changes 
because it is not a big problem.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] tillrohrmann commented on a change in pull request #13464: [FLINK-19307][coordination] Add ResourceTracker

2020-10-05 Thread GitBox


tillrohrmann commented on a change in pull request #13464:
URL: https://github.com/apache/flink/pull/13464#discussion_r499544656



##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/resourcemanager/slotmanager/DefaultResourceTrackerTest.java
##
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.resourcemanager.slotmanager;
+
+import org.apache.flink.api.common.JobID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.slots.ResourceRequirement;
+import org.apache.flink.util.TestLogger;
+
+import org.hamcrest.collection.IsMapContaining;
+import org.junit.Test;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Map;
+
+import static org.hamcrest.CoreMatchers.is;
+import static org.hamcrest.Matchers.contains;
+import static org.hamcrest.Matchers.empty;
+import static org.junit.Assert.assertThat;
+
+/**
+ * Tests for the {@link DefaultResourceTracker}.
+ *
+ * Note: The majority is of the tracking logic is covered by the {@link 
JobScopedResourceTrackerTest}.
+ */
+public class DefaultResourceTrackerTest extends TestLogger {
+
+   @Test
+   public void testInitialBehavior() {
+   DefaultResourceTracker tracker = new DefaultResourceTracker();
+
+   assertThat(tracker.isEmpty(), is(true));
+   tracker.notifyLostResource(JobID.generate(), 
ResourceProfile.ANY);
+   tracker.clear();

Review comment:
   I am a bit confused because we don't assert anything after these lines.
   
   If it tests that no exceptions are thrown, should we add a separate test 
case with a name along the lines of `clearDoesNotThrowException`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] tillrohrmann commented on a change in pull request #13464: [FLINK-19307][coordination] Add ResourceTracker

2020-10-05 Thread GitBox


tillrohrmann commented on a change in pull request #13464:
URL: https://github.com/apache/flink/pull/13464#discussion_r499545877



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DefaultResourceTracker.java
##
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.resourcemanager.slotmanager;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.JobID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.slots.ResourceRequirement;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Map;
+
+/**
+ * Default {@link ResourceTracker} implementation.
+ */
+public class DefaultResourceTracker implements ResourceTracker {
+
+   private static final Logger LOG = 
LoggerFactory.getLogger(DefaultResourceTracker.class);
+
+   private final Map trackers = new 
LinkedHashMap<>();

Review comment:
   Hmm, then we have to state this in the JavaDocs of `ResourceTracker` and 
every implementation has to support this contract. Otherwise we implement 
against an internal implementation detail which might change in the future.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] XComp commented on pull request #13458: FLINK-18851 [runtime] Add checkpoint type to checkpoint history entries in Web UI

2020-10-05 Thread GitBox


XComp commented on pull request #13458:
URL: https://github.com/apache/flink/pull/13458#issuecomment-703586638


   > > Thanks @gm7y8 for your changes. There are only two minor formatting 
things left which need to be addressed. Additionally, please update the commit 
message to comply to [Flink's commit 
format](https://flink.apache.org/contributing/contribute-documentation.html#submit-your-contribution).
 It should look like `[FLINK-18851][runtime-web] ...`.
   > > I'm gonna ask @vthinkxie to review the `web-dashboard` changes in the 
meantime. For me, they look good.
   > 
   > The frontend code looks good to me
   
   Reconsidering the change: @vthinkxie what about exposing this information to 
the Checkpoint history page as well?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (FLINK-19237) LeaderChangeClusterComponentsTest.testReelectionOfJobMaster failed with "NoResourceAvailableException: Could not allocate the required slot within slot request timeout"

2020-10-05 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-19237:
--

Assignee: Robert Metzger  (was: Matthias)

> LeaderChangeClusterComponentsTest.testReelectionOfJobMaster failed with 
> "NoResourceAvailableException: Could not allocate the required slot within 
> slot request timeout"
> 
>
> Key: FLINK-19237
> URL: https://issues.apache.org/jira/browse/FLINK-19237
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=6499&view=logs&j=6bfdaf55-0c08-5e3f-a2d2-2a0285fd41cf&t=fd9796c3-9ce8-5619-781c-42f873e126a6]
> {code}
> 2020-09-14T21:11:02.8200203Z [ERROR] 
> testReelectionOfJobMaster(org.apache.flink.runtime.leaderelection.LeaderChangeClusterComponentsTest)
>   Time elapsed: 300.14 s  <<< FAILURE!
> 2020-09-14T21:11:02.8201761Z java.lang.AssertionError: Job failed.
> 2020-09-14T21:11:02.8202749Z  at 
> org.apache.flink.runtime.jobmaster.utils.JobResultUtils.throwAssertionErrorOnFailedResult(JobResultUtils.java:54)
> 2020-09-14T21:11:02.8203794Z  at 
> org.apache.flink.runtime.jobmaster.utils.JobResultUtils.assertSuccess(JobResultUtils.java:30)
> 2020-09-14T21:11:02.8205177Z  at 
> org.apache.flink.runtime.leaderelection.LeaderChangeClusterComponentsTest.testReelectionOfJobMaster(LeaderChangeClusterComponentsTest.java:152)
> 2020-09-14T21:11:02.8206191Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-09-14T21:11:02.8206985Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-09-14T21:11:02.8207930Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-09-14T21:11:02.8208927Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-09-14T21:11:02.8209753Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-09-14T21:11:02.8210710Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-09-14T21:11:02.8211608Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-09-14T21:11:02.8214473Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-09-14T21:11:02.8215398Z  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 2020-09-14T21:11:02.8216199Z  at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> 2020-09-14T21:11:02.8216947Z  at 
> org.junit.rules.RunRules.evaluate(RunRules.java:20)
> 2020-09-14T21:11:02.8217695Z  at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> 2020-09-14T21:11:02.8218635Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> 2020-09-14T21:11:02.8219499Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> 2020-09-14T21:11:02.8220313Z  at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 2020-09-14T21:11:02.8221060Z  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 2020-09-14T21:11:02.8222171Z  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 2020-09-14T21:11:02.8222937Z  at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 2020-09-14T21:11:02.8223688Z  at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 2020-09-14T21:11:02.8225191Z  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 2020-09-14T21:11:02.8226086Z  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 2020-09-14T21:11:02.8226761Z  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 2020-09-14T21:11:02.8227453Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> 2020-09-14T21:11:02.8228392Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> 2020-09-14T21:11:02.8229256Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> 2020-09-14T21:11:02.8235798Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> 2020-09-14T21:11:02.8237650Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> 2020-09-14T21:11:02.8239039Z  at 
> org.apache.maven.surefire.

[jira] [Comment Edited] (FLINK-18851) Add checkpoint type to checkpoint history entries in Web UI

2020-10-05 Thread Matthias (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200915#comment-17200915
 ] 

Matthias edited comment on FLINK-18851 at 10/5/20, 12:13 PM:
-

{quote}
Arvid Heise or Till Rohrmann I don't seem to have write access to the repo, can 
you add me to the contributor list?
{quote}

Hi [~gkrish24], thanks for contributing to Apache Flink. I reviewed your PR. 
The PR is going to be merged into {{master}} by one of the committers after all 
concerns are addressed.

Best, Matthias



was (Author: mapohl):
{quote}
Arvid Heise or Till Rohrmann I don't seem to have write access to the repo, can 
you add me to the contributor list?
{quote}

Hi [~gkrish24], thanks for contributing to Apache Flink. I reviewed your PR. 
The PR is going to be merged into `master` by one of the committers after all 
concerns are addressed.

Best, Matthias


> Add checkpoint type to checkpoint history entries in Web UI
> ---
>
> Key: FLINK-18851
> URL: https://issues.apache.org/jira/browse/FLINK-18851
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / REST, Runtime / Web Frontend
>Affects Versions: 1.12.0
>Reporter: Arvid Heise
>Assignee: goutham
>Priority: Minor
>  Labels: pull-request-available, starter
> Attachments: Checkpoint details.png
>
>
> It would be helpful to users to better understand checkpointing times, if the 
> type of the checkpoint is displayed in the checkpoint history.
> Possible types are savepoint, aligned checkpoint, unaligned checkpoint.
> A possible place can be seen in the screenshot
> !Checkpoint details.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18851) Add checkpoint type to checkpoint history entries in Web UI

2020-10-05 Thread Matthias (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208048#comment-17208048
 ] 

Matthias commented on FLINK-18851:
--

Hi [~gkrish24], sorry for the late reply. I was out-of-office last week. AFAIK, 
you're getting extended rights to maintain Jira tickets if you're granted a 
Flink committer. Until then, just comment on a ticket you'd like to take over 
and someone from will assign the task to you. This way, the Flink community is 
able to manage open issues in a better way.

Feel free to check out the 
[overview|https://flink.apache.org/contributing/how-to-contribute.html] on how 
to contribute to Apache Flink in case you haven't done this, yet.

> Add checkpoint type to checkpoint history entries in Web UI
> ---
>
> Key: FLINK-18851
> URL: https://issues.apache.org/jira/browse/FLINK-18851
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / REST, Runtime / Web Frontend
>Affects Versions: 1.12.0
>Reporter: Arvid Heise
>Assignee: goutham
>Priority: Minor
>  Labels: pull-request-available, starter
> Attachments: Checkpoint details.png
>
>
> It would be helpful to users to better understand checkpointing times, if the 
> type of the checkpoint is displayed in the checkpoint history.
> Possible types are savepoint, aligned checkpoint, unaligned checkpoint.
> A possible place can be seen in the screenshot
> !Checkpoint details.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13464: [FLINK-19307][coordination] Add ResourceTracker

2020-10-05 Thread GitBox


flinkbot edited a comment on pull request #13464:
URL: https://github.com/apache/flink/pull/13464#issuecomment-697462269


   
   ## CI report:
   
   * 7bca073125251f8e92f1a3e3a9a62bad398b2fd0 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7198)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] aljoscha commented on pull request #13531: [FLINK-19318] Deprecate timeWindow() operations in DataStream API

2020-10-05 Thread GitBox


aljoscha commented on pull request #13531:
URL: https://github.com/apache/flink/pull/13531#issuecomment-703598381


   Yes, you're right, I missed the `README.md`. Other than that, I tried to use 
`TumblingProcessingTimeWindows` when it's clear in the example that no 
event-time source is used. If the example is vague on the source, like the 
examples in `windows.md`, `experimental.md`, and `parallel.md` then I used 
`TumblingEventTimeWindows`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] dawidwys opened a new pull request #13541: [FLINK-19422] Upgrade Kafka and schema registry versions in the avro registry e2e test

2020-10-05 Thread GitBox


dawidwys opened a new pull request #13541:
URL: https://github.com/apache/flink/pull/13541


   ## What is the purpose of the change
   
   Upgrade Kafka & Schema registry versions to potentially decrease the 
probability of hitting problems in said systems.
   
   
   ## Brief change log
   
 - Upgrade Kafka to 2.6.0 (it required using the 2.12 binary as Kafka 
dropped support for 2.11 in 2.4)
 - Updated kafka-common to work with newer versions of Zookeeper
 - Upgrade Schema Registry version to 5.0.0
   
   
   ## Verifying this change
   
   The e2e should still work.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**) - 
upgrades components versions used in e2e.
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / **no** / 
don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-19422) Avro Confluent Schema Registry nightly end-to-end test failed with "Register operation timed out; error code: 50002"

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-19422:
---
Labels: pull-request-available test-stability  (was: test-stability)

> Avro Confluent Schema Registry nightly end-to-end test failed with "Register 
> operation timed out; error code: 50002"
> 
>
> Key: FLINK-19422
> URL: https://issues.apache.org/jira/browse/FLINK-19422
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Tests
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Priority: Major
>  Labels: pull-request-available, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=6955&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529
> {code}
> 2020-09-25T14:09:18.2560779Z Caused by: java.io.IOException: Could not 
> register schema in registry
> 2020-09-25T14:09:18.2561395Z  at 
> org.apache.flink.formats.avro.registry.confluent.ConfluentSchemaRegistryCoder.writeSchema(ConfluentSchemaRegistryCoder.java:91)
>  ~[?:?]
> 2020-09-25T14:09:18.2562127Z  at 
> org.apache.flink.formats.avro.RegistryAvroSerializationSchema.serialize(RegistryAvroSerializationSchema.java:82)
>  ~[?:?]
> 2020-09-25T14:09:18.2562883Z  at 
> org.apache.flink.streaming.connectors.kafka.internals.KafkaSerializationSchemaWrapper.serialize(KafkaSerializationSchemaWrapper.java:71)
>  ~[?:?]
> 2020-09-25T14:09:18.2563622Z  at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:866)
>  ~[?:?]
> 2020-09-25T14:09:18.2564255Z  at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:99)
>  ~[?:?]
> 2020-09-25T14:09:18.2565375Z  at 
> org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.invoke(TwoPhaseCommitSinkFunction.java:235)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2566540Z  at 
> org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2567692Z  at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:71)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2568852Z  at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2570022Z  at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2571315Z  at 
> org.apache.flink.streaming.runtime.tasks.BroadcastingOutputCollector.collect(BroadcastingOutputCollector.java:76)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2572586Z  at 
> org.apache.flink.streaming.runtime.tasks.BroadcastingOutputCollector.collect(BroadcastingOutputCollector.java:32)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2573736Z  at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2574824Z  at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2576019Z  at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2577309Z  at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2578256Z  at 
> org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordsWithTimestamps(AbstractFetcher.java:352)
>  ~[?:?]
> 2020-09-25T14:09:18.2579003Z  at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.partitionConsumerRecordsHandler(KafkaFetcher.java:185)
>  ~[?:?]
> 2020-09-25T14:09:18.2579687Z  at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:141)
>  ~[?:?]
> 2020-09-25T14:09:18.2580733Z  at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:755)
>  ~[?:?]
> 2020-09-25T14:09:18.2582441Z  at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.

[jira] [Commented] (FLINK-19422) Avro Confluent Schema Registry nightly end-to-end test failed with "Register operation timed out; error code: 50002"

2020-10-05 Thread Dawid Wysakowicz (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208060#comment-17208060
 ] 

Dawid Wysakowicz commented on FLINK-19422:
--

I think this is a problem  in the confluent schema registry. I find some log 
entries in the registry initialization process, which if searched for lead to 
similar reports. There are not many helpful answers there though. I observed 
though we are using very old versions of kafka (and transiently zookeeper) and 
registry. I'd suggest upgrading the versions and see if the issue still occurs. 
I opened a PR with upgrade versions. 

> Avro Confluent Schema Registry nightly end-to-end test failed with "Register 
> operation timed out; error code: 50002"
> 
>
> Key: FLINK-19422
> URL: https://issues.apache.org/jira/browse/FLINK-19422
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Tests
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Priority: Major
>  Labels: pull-request-available, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=6955&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529
> {code}
> 2020-09-25T14:09:18.2560779Z Caused by: java.io.IOException: Could not 
> register schema in registry
> 2020-09-25T14:09:18.2561395Z  at 
> org.apache.flink.formats.avro.registry.confluent.ConfluentSchemaRegistryCoder.writeSchema(ConfluentSchemaRegistryCoder.java:91)
>  ~[?:?]
> 2020-09-25T14:09:18.2562127Z  at 
> org.apache.flink.formats.avro.RegistryAvroSerializationSchema.serialize(RegistryAvroSerializationSchema.java:82)
>  ~[?:?]
> 2020-09-25T14:09:18.2562883Z  at 
> org.apache.flink.streaming.connectors.kafka.internals.KafkaSerializationSchemaWrapper.serialize(KafkaSerializationSchemaWrapper.java:71)
>  ~[?:?]
> 2020-09-25T14:09:18.2563622Z  at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:866)
>  ~[?:?]
> 2020-09-25T14:09:18.2564255Z  at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:99)
>  ~[?:?]
> 2020-09-25T14:09:18.2565375Z  at 
> org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.invoke(TwoPhaseCommitSinkFunction.java:235)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2566540Z  at 
> org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2567692Z  at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:71)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2568852Z  at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2570022Z  at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2571315Z  at 
> org.apache.flink.streaming.runtime.tasks.BroadcastingOutputCollector.collect(BroadcastingOutputCollector.java:76)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2572586Z  at 
> org.apache.flink.streaming.runtime.tasks.BroadcastingOutputCollector.collect(BroadcastingOutputCollector.java:32)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2573736Z  at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2574824Z  at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2576019Z  at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2577309Z  at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2578256Z  at 
> org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordsWithTimestamps(AbstractFetcher.java:352)
>  ~[?:?]
> 2020-09-25T14:09:18.2579003Z  at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.partitionConsumerRecordsHandler(KafkaFetcher.java:185)
>  ~[?:?]
> 2020-09-25T14:09:18.2579687Z  at 
> o

[jira] [Assigned] (FLINK-19422) Avro Confluent Schema Registry nightly end-to-end test failed with "Register operation timed out; error code: 50002"

2020-10-05 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz reassigned FLINK-19422:


Assignee: Dawid Wysakowicz

> Avro Confluent Schema Registry nightly end-to-end test failed with "Register 
> operation timed out; error code: 50002"
> 
>
> Key: FLINK-19422
> URL: https://issues.apache.org/jira/browse/FLINK-19422
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Tests
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Assignee: Dawid Wysakowicz
>Priority: Major
>  Labels: pull-request-available, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=6955&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529
> {code}
> 2020-09-25T14:09:18.2560779Z Caused by: java.io.IOException: Could not 
> register schema in registry
> 2020-09-25T14:09:18.2561395Z  at 
> org.apache.flink.formats.avro.registry.confluent.ConfluentSchemaRegistryCoder.writeSchema(ConfluentSchemaRegistryCoder.java:91)
>  ~[?:?]
> 2020-09-25T14:09:18.2562127Z  at 
> org.apache.flink.formats.avro.RegistryAvroSerializationSchema.serialize(RegistryAvroSerializationSchema.java:82)
>  ~[?:?]
> 2020-09-25T14:09:18.2562883Z  at 
> org.apache.flink.streaming.connectors.kafka.internals.KafkaSerializationSchemaWrapper.serialize(KafkaSerializationSchemaWrapper.java:71)
>  ~[?:?]
> 2020-09-25T14:09:18.2563622Z  at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:866)
>  ~[?:?]
> 2020-09-25T14:09:18.2564255Z  at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:99)
>  ~[?:?]
> 2020-09-25T14:09:18.2565375Z  at 
> org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.invoke(TwoPhaseCommitSinkFunction.java:235)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2566540Z  at 
> org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2567692Z  at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:71)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2568852Z  at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2570022Z  at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2571315Z  at 
> org.apache.flink.streaming.runtime.tasks.BroadcastingOutputCollector.collect(BroadcastingOutputCollector.java:76)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2572586Z  at 
> org.apache.flink.streaming.runtime.tasks.BroadcastingOutputCollector.collect(BroadcastingOutputCollector.java:32)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2573736Z  at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2574824Z  at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2576019Z  at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2577309Z  at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-09-25T14:09:18.2578256Z  at 
> org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordsWithTimestamps(AbstractFetcher.java:352)
>  ~[?:?]
> 2020-09-25T14:09:18.2579003Z  at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.partitionConsumerRecordsHandler(KafkaFetcher.java:185)
>  ~[?:?]
> 2020-09-25T14:09:18.2579687Z  at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:141)
>  ~[?:?]
> 2020-09-25T14:09:18.2580733Z  at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:755)
>  ~[?:?]
> 2020-09-25T14:09:18.2582441Z  at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
>  ~[flink-dist_2.11-1.12-SNAPSH

[GitHub] [flink] flinkbot commented on pull request #13541: [FLINK-19422] Upgrade Kafka and schema registry versions in the avro registry e2e test

2020-10-05 Thread GitBox


flinkbot commented on pull request #13541:
URL: https://github.com/apache/flink/pull/13541#issuecomment-703602975


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 098133ce65783b3a119851d3011c7b88dce76318 (Mon Oct 05 
12:36:01 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rmetzger commented on pull request #13457: [FLINK-8357] Use rolling logs as default

2020-10-05 Thread GitBox


rmetzger commented on pull request #13457:
URL: https://github.com/apache/flink/pull/13457#issuecomment-703604630


   That sounds like a good plan. The more complexity we can remove from the 
bash scripts, the better.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rmetzger commented on pull request #12823: FLINK-18013: Refactor Hadoop utils to a single module

2020-10-05 Thread GitBox


rmetzger commented on pull request #12823:
URL: https://github.com/apache/flink/pull/12823#issuecomment-703605400


   Closing PR due to inactivity. Please reopen if you wish to continue.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rmetzger closed pull request #12823: FLINK-18013: Refactor Hadoop utils to a single module

2020-10-05 Thread GitBox


rmetzger closed pull request #12823:
URL: https://github.com/apache/flink/pull/12823


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13540: [FLINK-19344] Fix DispatcherResourceCleanupTest race condition

2020-10-05 Thread GitBox


flinkbot edited a comment on pull request #13540:
URL: https://github.com/apache/flink/pull/13540#issuecomment-703484500


   
   ## CI report:
   
   * 6650579aa244c53d76237be84b861224c451b4ba Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7199)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-19506) UnalignedCheckpointITCase.shouldPerformUnalignedCheckpointOnNonParallelLocalChannel: "Exceeded checkpoint tolerable failure threshold"

2020-10-05 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-19506:
--

 Summary: 
UnalignedCheckpointITCase.shouldPerformUnalignedCheckpointOnNonParallelLocalChannel:
 "Exceeded checkpoint tolerable failure threshold"
 Key: FLINK-19506
 URL: https://issues.apache.org/jira/browse/FLINK-19506
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Affects Versions: 1.12.0
Reporter: Robert Metzger


https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8433&view=logs&j=584fa981-f71a-5840-1c49-f800c954fe4b&t=532bf1f8-8c75-59c3-eaad-8c773769bc3a

{code}
2020-10-05T12:30:11.4979736Z [ERROR] Tests run: 6, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 97.533 s <<< FAILURE! - in 
org.apache.flink.test.checkpointing.UnalignedCheckpointITCase
2020-10-05T12:30:11.4980464Z [ERROR] 
shouldPerformUnalignedCheckpointOnNonParallelLocalChannel(org.apache.flink.test.checkpointing.UnalignedCheckpointITCase)
  Time elapsed: 32.406 s  <<< ERROR!
2020-10-05T12:30:11.4980971Z 
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
2020-10-05T12:30:11.4988360Zat 
org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
2020-10-05T12:30:11.4989659Zat 
org.apache.flink.client.program.PerJobMiniClusterFactory$PerJobMiniClusterJobClient.lambda$getJobExecutionResult$2(PerJobMiniClusterFactory.java:196)
2020-10-05T12:30:11.4990584Zat 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
2020-10-05T12:30:11.4991620Zat 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
2020-10-05T12:30:11.499Zat 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
2020-10-05T12:30:11.4992654Zat 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
2020-10-05T12:30:11.4993153Zat 
org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:229)
2020-10-05T12:30:11.4993661Zat 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
2020-10-05T12:30:11.4994133Zat 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
2020-10-05T12:30:11.4994590Zat 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
2020-10-05T12:30:11.4995201Zat 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
2020-10-05T12:30:11.4995781Zat 
org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:926)
2020-10-05T12:30:11.4996228Zat 
akka.dispatch.OnComplete.internal(Future.scala:264)
2020-10-05T12:30:11.4996985Zat 
akka.dispatch.OnComplete.internal(Future.scala:261)
2020-10-05T12:30:11.4997419Zat 
akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
2020-10-05T12:30:11.4997855Zat 
akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
2020-10-05T12:30:11.4998312Zat 
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
2020-10-05T12:30:11.4998951Zat 
org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
2020-10-05T12:30:11.4999477Zat 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
2020-10-05T12:30:11.548Zat 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
2020-10-05T12:30:11.5000504Zat 
akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572)
2020-10-05T12:30:11.5000984Zat 
akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22)
2020-10-05T12:30:11.5001567Zat 
akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21)
2020-10-05T12:30:11.5002091Zat 
scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
2020-10-05T12:30:11.5002534Zat 
scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
2020-10-05T12:30:11.5002976Zat 
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
2020-10-05T12:30:11.5003460Zat 
akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
2020-10-05T12:30:11.5004015Zat 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
2020-10-05T12:30:11.5004603Zat 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
2020-10-05T12:30:11.5005173Zat 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
2020-10-05T12:30:11.5005677Zat 
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
2020-10-05T12:30:11.5006170Zat 
akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
2020-10-05T12:30:11.5006644Zat 
akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
2020-10-05T12:30:

[jira] [Updated] (FLINK-19506) UnalignedCheckpointITCase.shouldPerformUnalignedCheckpointOnNonParallelLocalChannel: "Exceeded checkpoint tolerable failure threshold"

2020-10-05 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-19506:
---
Description: 
https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8433&view=logs&j=584fa981-f71a-5840-1c49-f800c954fe4b&t=532bf1f8-8c75-59c3-eaad-8c773769bc3a

{code}
2020-10-05T12:30:11.4979736Z [ERROR] Tests run: 6, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 97.533 s <<< FAILURE! - in 
org.apache.flink.test.checkpointing.UnalignedCheckpointITCase
2020-10-05T12:30:11.4980464Z [ERROR] 
shouldPerformUnalignedCheckpointOnNonParallelLocalChannel(org.apache.flink.test.checkpointing.UnalignedCheckpointITCase)
  Time elapsed: 32.406 s  <<< ERROR!
2020-10-05T12:30:11.4980971Z 
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
2020-10-05T12:30:11.4988360Zat 
org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
2020-10-05T12:30:11.4989659Zat 
org.apache.flink.client.program.PerJobMiniClusterFactory$PerJobMiniClusterJobClient.lambda$getJobExecutionResult$2(PerJobMiniClusterFactory.java:196)
2020-10-05T12:30:11.4990584Zat 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
2020-10-05T12:30:11.4991620Zat 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
2020-10-05T12:30:11.499Zat 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
2020-10-05T12:30:11.4992654Zat 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
2020-10-05T12:30:11.4993153Zat 
org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:229)
2020-10-05T12:30:11.4993661Zat 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
2020-10-05T12:30:11.4994133Zat 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
2020-10-05T12:30:11.4994590Zat 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
2020-10-05T12:30:11.4995201Zat 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
2020-10-05T12:30:11.4995781Zat 
org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:926)
2020-10-05T12:30:11.4996228Zat 
akka.dispatch.OnComplete.internal(Future.scala:264)
2020-10-05T12:30:11.4996985Zat 
akka.dispatch.OnComplete.internal(Future.scala:261)
2020-10-05T12:30:11.4997419Zat 
akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
2020-10-05T12:30:11.4997855Zat 
akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
2020-10-05T12:30:11.4998312Zat 
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
2020-10-05T12:30:11.4998951Zat 
org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
2020-10-05T12:30:11.4999477Zat 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
2020-10-05T12:30:11.548Zat 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
2020-10-05T12:30:11.5000504Zat 
akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572)
2020-10-05T12:30:11.5000984Zat 
akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22)
2020-10-05T12:30:11.5001567Zat 
akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21)
2020-10-05T12:30:11.5002091Zat 
scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
2020-10-05T12:30:11.5002534Zat 
scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
2020-10-05T12:30:11.5002976Zat 
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
2020-10-05T12:30:11.5003460Zat 
akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
2020-10-05T12:30:11.5004015Zat 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
2020-10-05T12:30:11.5004603Zat 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
2020-10-05T12:30:11.5005173Zat 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
2020-10-05T12:30:11.5005677Zat 
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
2020-10-05T12:30:11.5006170Zat 
akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
2020-10-05T12:30:11.5006644Zat 
akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
2020-10-05T12:30:11.5007153Zat 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
2020-10-05T12:30:11.5007690Zat 
akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
2020-10-05T12:30:11.5008170Zat 
akka.dispatch.forkjoin.ForkJ

[jira] [Updated] (FLINK-19506) UnalignedCheckpointITCase.shouldPerformUnalignedCheckpointOnNonParallelLocalChannel: "Exceeded checkpoint tolerable failure threshold"

2020-10-05 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-19506:
---
Description: 
https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8433&view=logs&j=584fa981-f71a-5840-1c49-f800c954fe4b&t=532bf1f8-8c75-59c3-eaad-8c773769bc3a

{code}
2020-10-05T12:30:11.4979736Z [ERROR] Tests run: 6, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 97.533 s <<< FAILURE! - in 
org.apache.flink.test.checkpointing.UnalignedCheckpointITCase
2020-10-05T12:30:11.4980464Z [ERROR] 
shouldPerformUnalignedCheckpointOnNonParallelLocalChannel(org.apache.flink.test.checkpointing.UnalignedCheckpointITCase)
  Time elapsed: 32.406 s  <<< ERROR!
2020-10-05T12:30:11.4980971Z 
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
2020-10-05T12:30:11.4988360Zat 
org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
2020-10-05T12:30:11.4989659Zat 
org.apache.flink.client.program.PerJobMiniClusterFactory$PerJobMiniClusterJobClient.lambda$getJobExecutionResult$2(PerJobMiniClusterFactory.java:196)
2020-10-05T12:30:11.4990584Zat 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
2020-10-05T12:30:11.4991620Zat 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
2020-10-05T12:30:11.499Zat 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
2020-10-05T12:30:11.4992654Zat 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
2020-10-05T12:30:11.4993153Zat 
org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:229)
2020-10-05T12:30:11.4993661Zat 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
2020-10-05T12:30:11.4994133Zat 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
2020-10-05T12:30:11.4994590Zat 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
2020-10-05T12:30:11.4995201Zat 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
2020-10-05T12:30:11.4995781Zat 
org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:926)
2020-10-05T12:30:11.4996228Zat 
akka.dispatch.OnComplete.internal(Future.scala:264)
2020-10-05T12:30:11.4996985Zat 
akka.dispatch.OnComplete.internal(Future.scala:261)
2020-10-05T12:30:11.4997419Zat 
akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
2020-10-05T12:30:11.4997855Zat 
akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
2020-10-05T12:30:11.4998312Zat 
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
2020-10-05T12:30:11.4998951Zat 
org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
2020-10-05T12:30:11.4999477Zat 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
2020-10-05T12:30:11.548Zat 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
2020-10-05T12:30:11.5000504Zat 
akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572)
2020-10-05T12:30:11.5000984Zat 
akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22)
2020-10-05T12:30:11.5001567Zat 
akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21)
2020-10-05T12:30:11.5002091Zat 
scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
2020-10-05T12:30:11.5002534Zat 
scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
2020-10-05T12:30:11.5002976Zat 
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
2020-10-05T12:30:11.5003460Zat 
akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
2020-10-05T12:30:11.5004015Zat 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
2020-10-05T12:30:11.5004603Zat 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
2020-10-05T12:30:11.5005173Zat 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
2020-10-05T12:30:11.5005677Zat 
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
2020-10-05T12:30:11.5006170Zat 
akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
2020-10-05T12:30:11.5006644Zat 
akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
2020-10-05T12:30:11.5007153Zat 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
2020-10-05T12:30:11.5007690Zat 
akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
2020-10-05T12:30:11.5008170Zat 
akka.dispatch.forkjoin.ForkJ

[GitHub] [flink] rmetzger commented on pull request #13412: [FLINK-19069] execute finalizeOnMaster with io executor to avoid hea…

2020-10-05 Thread GitBox


rmetzger commented on pull request #13412:
URL: https://github.com/apache/flink/pull/13412#issuecomment-703609314


   I'm closing this PR for now. Please re-open or comment if you have a 
different opinion.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rmetzger closed pull request #13412: [FLINK-19069] execute finalizeOnMaster with io executor to avoid hea…

2020-10-05 Thread GitBox


rmetzger closed pull request #13412:
URL: https://github.com/apache/flink/pull/13412


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-19069) finalizeOnMaster takes too much time and client timeouts

2020-10-05 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-19069.
--
Resolution: Won't Fix

> finalizeOnMaster takes too much time and client timeouts
> 
>
> Key: FLINK-19069
> URL: https://issues.apache.org/jira/browse/FLINK-19069
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0, 1.10.0, 1.11.0, 1.12.0
>Reporter: Jiayi Liao
>Assignee: Jiayi Liao
>Priority: Critical
>  Labels: pull-request-available
>
> Currently we execute {{finalizeOnMaster}} in JM's main thread, which may 
> stuck the JM for a very long time and client timeouts eventually. 
> For example, we'd like to write data to HDFS  and commit files on JM, which 
> takes more than ten minutes to commit tens of thousands files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19069) finalizeOnMaster takes too much time and client timeouts

2020-10-05 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-19069:
---
Fix Version/s: (was: 1.11.3)
   (was: 1.10.3)
   (was: 1.12.0)

> finalizeOnMaster takes too much time and client timeouts
> 
>
> Key: FLINK-19069
> URL: https://issues.apache.org/jira/browse/FLINK-19069
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0, 1.10.0, 1.11.0, 1.12.0
>Reporter: Jiayi Liao
>Assignee: Jiayi Liao
>Priority: Critical
>  Labels: pull-request-available
>
> Currently we execute {{finalizeOnMaster}} in JM's main thread, which may 
> stuck the JM for a very long time and client timeouts eventually. 
> For example, we'd like to write data to HDFS  and commit files on JM, which 
> takes more than ten minutes to commit tens of thousands files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19069) finalizeOnMaster takes too much time and client timeouts

2020-10-05 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208065#comment-17208065
 ] 

Robert Metzger commented on FLINK-19069:


Based on the discussion in the PR, we decided not to address this issue in 
Flink 1.12.
We'll soon introduce a new Sink API, that will not rely on running user code on 
the master. Closing ticket.

> finalizeOnMaster takes too much time and client timeouts
> 
>
> Key: FLINK-19069
> URL: https://issues.apache.org/jira/browse/FLINK-19069
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0, 1.10.0, 1.11.0, 1.12.0
>Reporter: Jiayi Liao
>Assignee: Jiayi Liao
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.12.0, 1.10.3, 1.11.3
>
>
> Currently we execute {{finalizeOnMaster}} in JM's main thread, which may 
> stuck the JM for a very long time and client timeouts eventually. 
> For example, we'd like to write data to HDFS  and commit files on JM, which 
> takes more than ten minutes to commit tens of thousands files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] kl0u commented on pull request #13531: [FLINK-19318] Deprecate timeWindow() operations in DataStream API

2020-10-05 Thread GitBox


kl0u commented on pull request #13531:
URL: https://github.com/apache/flink/pull/13531#issuecomment-703613996


   Ok but this way the examples do not work "out-of-the-box", right?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13531: [FLINK-19318] Deprecate timeWindow() operations in DataStream API

2020-10-05 Thread GitBox


flinkbot edited a comment on pull request #13531:
URL: https://github.com/apache/flink/pull/13531#issuecomment-702591517


   
   ## CI report:
   
   * 2662522fc5199438ca7bd7e8a5d1825d2aa80f14 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7162)
 
   * 12b57911b704cce325f3c6e8c31e5744e284f3db Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7200)
 
   * 09c97e5bdf9e3ea5719d772749b9bb542ec89514 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7202)
 
   * 4609bf890fe953a697b15f26280729abc36ea913 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #13541: [FLINK-19422] Upgrade Kafka and schema registry versions in the avro registry e2e test

2020-10-05 Thread GitBox


flinkbot commented on pull request #13541:
URL: https://github.com/apache/flink/pull/13541#issuecomment-703617870


   
   ## CI report:
   
   * 098133ce65783b3a119851d3011c7b88dce76318 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] tillrohrmann commented on a change in pull request #13508: [FLINK-19308][coordination] Add SlotTracker

2020-10-05 Thread GitBox


tillrohrmann commented on a change in pull request #13508:
URL: https://github.com/apache/flink/pull/13508#discussion_r499586350



##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/dispatcher/DispatcherResourceCleanupTest.java
##
@@ -343,6 +344,7 @@ public void testRunningJobsRegistryCleanup() throws 
Exception {
 * before a new job with the same {@link JobID} is started.
 */
@Test
+   @Ignore

Review comment:
   Do we know why this test is untable?

##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DefaultSlotTracker.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.resourcemanager.slotmanager;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.JobID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotID;
+import 
org.apache.flink.runtime.resourcemanager.registration.TaskExecutorConnection;
+import org.apache.flink.runtime.taskexecutor.SlotStatus;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.LinkedHashMap;
+import java.util.Map;
+import java.util.function.BiConsumer;
+import java.util.function.Consumer;
+
+/**
+ * Default SlotTracker implementation.
+ */
+class DefaultSlotTracker implements SlotTracker {
+   private static final Logger LOG = 
LoggerFactory.getLogger(DefaultSlotTracker.class);
+
+   /**
+* Map for all registered slots.
+*/
+   private final Map slots = new 
HashMap<>();
+
+   /**
+* Index of all currently free slots.
+*/
+   private final Map freeSlots = new 
LinkedHashMap<>();
+
+   private final SlotStatusUpdateListener slotStatusUpdateListener;
+
+   private final SlotStatusStateReconciler slotStatusStateReconciler = new 
SlotStatusStateReconciler(this::transitionSlotToFree, 
this::transitionSlotToPending, this::transitionSlotToAllocated);
+
+   public DefaultSlotTracker(SlotStatusUpdateListener 
slotStatusUpdateListener) {

Review comment:
   Is it because now we need to introduce a factory to introduce a custom 
tracker into the `SlotManager`?

##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DefaultSlotTracker.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.resourcemanager.slotmanager;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.JobID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotID;
+import 
org.apache.flink.runtime.resourcemanager.registration.TaskExecutorConnection;
+import org.apache.flink.runtime.taskexecutor.SlotStatus;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.LinkedHashMap;
+import java.util.Map;
+import java.util.function.BiCon

  1   2   3   >