[GitHub] [flink-benchmarks] pnowojski merged pull request #12: [hotfix] Fix Readme instructions that are easy to break

2021-03-28 Thread GitBox


pnowojski merged pull request #12:
URL: https://github.com/apache/flink-benchmarks/pull/12


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-22007) PartitionReleaseInBatchJobBenchmarkExecutor seems to be failing

2021-03-28 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-22007:
--

 Summary: PartitionReleaseInBatchJobBenchmarkExecutor seems to be 
failing
 Key: FLINK-22007
 URL: https://issues.apache.org/jira/browse/FLINK-22007
 Project: Flink
  Issue Type: Bug
  Components: Benchmarks, Runtime / Coordination
Affects Versions: 1.13.0
Reporter: Piotr Nowojski
 Fix For: 1.13.0


Travis CI is failing:
https://travis-ci.com/github/apache/flink-benchmarks/builds/221290042

While there is also some problem with the Jenkins builds for the same benchmark.
http://codespeed.dak8s.net:8080/job/flink-scheduler-benchmarks/232

It would be also interesting for the future to understand why the Jenkins build 
is green and try to fix it (ideally, if some benchmarks fail, partial results 
should be still uploaded but the Jenkins build should be marked as failed). 
Otherwise issues like that can remain unnoticed for quite a bit of time.

CC [~Thesharing] [~zhuzh]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink-benchmarks] pnowojski commented on pull request #12: [hotfix] Fix Readme instructions that are easy to break

2021-03-28 Thread GitBox


pnowojski commented on pull request #12:
URL: https://github.com/apache/flink-benchmarks/pull/12#issuecomment-809122082


   Build failure looks unrelated: 
https://issues.apache.org/jira/browse/FLINK-22007


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] twalthr commented on pull request #15406: [FLINK-21998][hive] Copy more code from hive and move them to a dedicated package

2021-03-28 Thread GitBox


twalthr commented on pull request #15406:
URL: https://github.com/apache/flink/pull/15406#issuecomment-809120847


   @lirui-apache could you give the community more context about this change in 
the JIRA issue? I think most people in the Flink community would immediately 
reject copying more code from Hive to Flink unless there is a good reason for 
it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] gaoyunhaii commented on a change in pull request #15259: [FLINK-20757][network] Optimize data broadcast for sort-merge blocking shuffle

2021-03-28 Thread GitBox


gaoyunhaii commented on a change in pull request #15259:
URL: https://github.com/apache/flink/pull/15259#discussion_r602998872



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/SortMergeResultPartition.java
##
@@ -94,11 +94,17 @@
  */
 private final SortMergeResultPartitionReadScheduler readScheduler;
 
-/** Number of guaranteed network buffers can be used by {@link 
#currentSortBuffer}. */
+/**
+ * Number of guaranteed network buffers can be used by {@link 
#unicastSortBuffer} and {@link
+ * #broadcastSortBuffer}.
+ */
 private int numBuffersForSort;
 
-/** Current {@link SortBuffer} to append records to. */
-private SortBuffer currentSortBuffer;
+/** {@link SortBuffer} for records sent by {@link 
#broadcastRecord(ByteBuffer)}. */
+private SortBuffer broadcastSortBuffer;
+
+/** {@link SortBuffer} for records sent by {@link #emitRecord(ByteBuffer, 
int)}. */
+private SortBuffer unicastSortBuffer;

Review comment:
   It seems in the following logic two sort-buffers has no difference ? 
Thus we may not need to use  two separate buffers, we could still use only one 
`currentBuffer`. 

##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/SortMergeResultPartition.java
##
@@ -389,8 +418,9 @@ private void releaseWriteBuffers() {
 public void close() {
 releaseWriteBuffers();
 // the close method will be always called by the task thread, so there 
is need to make
-// the currentSortBuffer filed volatile and visible to the cancel 
thread intermediately
-releaseCurrentSortBuffer();
+// the sort buffer filed volatile and visible to the cancel thread 
intermediately

Review comment:
   filed -> field

##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PartitionedFileWriter.java
##
@@ -86,6 +86,13 @@
 /** Current subpartition to write buffers to. */
 private int currentSubpartition = -1;
 
+/**
+ * Broadcast region is an optimization for the broadcast partition which 
writes the same data to
+ * all subpartitions. For a broadcast region, data is only written once 
and the indexes of all
+ * subpartitions point to the same offset in the data file.
+ */
+private boolean isBroadCastRegion;

Review comment:
   Might unified as `isBroadcastRegion`. 

##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PartitionedFileWriter.java
##
@@ -205,32 +215,70 @@ public void writeBuffers(List 
bufferWithChannels) throws IOEx
 return;
 }
 
-long expectedBytes = 0;
+long expectedBytes;
 ByteBuffer[] bufferWithHeaders = new ByteBuffer[2 * 
bufferWithChannels.size()];
 
+if (isBroadCastRegion) {
+expectedBytes = writeBroadcastBuffers(bufferWithChannels, 
bufferWithHeaders);
+} else {
+expectedBytes = writeUnicastBuffers(bufferWithChannels, 
bufferWithHeaders);
+}
+
+BufferReaderWriterUtil.writeBuffers(dataFileChannel, expectedBytes, 
bufferWithHeaders);
+}
+
+private long writeUnicastBuffers(
+List bufferWithChannels, ByteBuffer[] 
bufferWithHeaders) {
+long expectedBytes = 0;
 for (int i = 0; i < bufferWithChannels.size(); i++) {
-BufferWithChannel bufferWithChannel = bufferWithChannels.get(i);
-Buffer buffer = bufferWithChannel.getBuffer();
-int subpartitionIndex = bufferWithChannel.getChannelIndex();
-if (subpartitionIndex != currentSubpartition) {
+int subpartition = bufferWithChannels.get(i).getChannelIndex();
+if (subpartition != currentSubpartition) {
 checkState(
-subpartitionBuffers[subpartitionIndex] == 0,
+subpartitionBuffers[subpartition] == 0,
 "Must write data of the same channel together.");
-subpartitionOffsets[subpartitionIndex] = totalBytesWritten;
-currentSubpartition = subpartitionIndex;
+subpartitionOffsets[subpartition] = totalBytesWritten;
+currentSubpartition = subpartition;
 }
 
-ByteBuffer header = BufferReaderWriterUtil.allocatedHeaderBuffer();
-BufferReaderWriterUtil.setByteChannelBufferHeader(buffer, header);
-bufferWithHeaders[2 * i] = header;
-bufferWithHeaders[2 * i + 1] = buffer.getNioBufferReadable();
+Buffer buffer = bufferWithChannels.get(i).getBuffer();
+int numBytes = setBufferWithHeader(buffer, bufferWithHeaders, 2 * 
i);
+expectedBytes += numBytes;
+totalBytesWritten += numBytes;

Review comment:
   Might we move `totalBytesWritten` update outside the two methods ?

##
Fil

[GitHub] [flink] XComp commented on pull request #15020: [FLINK-21445][kubernetes] Application mode does not set the configuration when building PackagedProgram

2021-03-28 Thread GitBox


XComp commented on pull request #15020:
URL: https://github.com/apache/flink/pull/15020#issuecomment-809117295


   Ok, thanks @SteNicholas . I'll give it another pass when you're done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #15406: [FLINK-21998][hive] Copy more code from hive and move them to a dedicated package

2021-03-28 Thread GitBox


flinkbot commented on pull request #15406:
URL: https://github.com/apache/flink/pull/15406#issuecomment-809116403


   
   ## CI report:
   
   * 8f3bef8d676748698cf0f7e6a9e9f595604ae412 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15405: [Flink-21976] Move Flink ML pipeline API and library code to a separate repository named flink-ml

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #15405:
URL: https://github.com/apache/flink/pull/15405#issuecomment-809101943


   
   ## CI report:
   
   * 90f1936af457ab1419790cd750ee1ab45ec04ecb Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15652)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-16641) Announce sender's backlog to solve the deadlock issue without exclusive buffers

2021-03-28 Thread Yingjie Cao (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-16641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingjie Cao updated FLINK-16641:

Fix Version/s: (was: 1.13.0)
   1.14.0

> Announce sender's backlog to solve the deadlock issue without exclusive 
> buffers
> ---
>
> Key: FLINK-16641
> URL: https://issues.apache.org/jira/browse/FLINK-16641
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Zhijiang
>Assignee: Yingjie Cao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> This is the second ingredient besides FLINK-16404 to solve the deadlock 
> problem without exclusive buffers.
> The scenario is as follows:
>  * The data in subpartition with positive backlog can be sent without doubt 
> because the exclusive credits would be feedback finally.
>  * Without exclusive buffers, the receiver would not request floating buffers 
> for 0 backlog. But when the new backlog is added into such subpartition, it 
> has no way to notify the receiver side without positive credits ATM.
>  * So it would result in waiting for each other between receiver and sender 
> sides to cause deadlock. The sender waits for credit to notify backlog and 
> the receiver waits for backlog to request floating credits.
> To solve the above problem, the sender needs a separate message to announce 
> backlog sometimes besides existing `BufferResponse`. Then the receiver can 
> get this info to request floating buffers to feedback.
> The side effect brought is to increase network transport delay and throughput 
> regression. We can measure how much it effects in existing micro-benchmark. 
> It might probably bear this effect to get a benefit of fast checkpoint 
> without exclusive buffers. We can give the proper explanations in respective 
> configuration options to let users make the final decision in practice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #15366: [FLINK-12828][sql-client] Support -f option with a sql script as input

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #15366:
URL: https://github.com/apache/flink/pull/15366#issuecomment-806388017


   
   ## CI report:
   
   * 347d38bc59e3ffa1d3854da26163c2170c2f6986 UNKNOWN
   * de78d0128017c7ad13d93cd5ba30f94ec516c5f1 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15618)
 
   * 1f8003071d9fee631437752a36a82f74e02caa04 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-16012) Reduce the default number of exclusive buffers from 2 to 1 on receiver side

2021-03-28 Thread Yingjie Cao (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-16012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingjie Cao updated FLINK-16012:

Fix Version/s: (was: 1.13.0)
   1.14.0

> Reduce the default number of exclusive buffers from 2 to 1 on receiver side
> ---
>
> Key: FLINK-16012
> URL: https://issues.apache.org/jira/browse/FLINK-16012
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Zhijiang
>Assignee: Yingjie Cao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In order to reduce the inflight buffers for checkpoint in the case of back 
> pressure, we can reduce the number of exclusive buffers for remote input 
> channel from default 2 to 1 as the first step. Besides that, the total 
> required buffers are also reduced as a result. We can further verify the 
> performance effect via various of benchmarks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18762) Make network buffers per incoming/outgoing channel can be configured separately

2021-03-28 Thread Yingjie Cao (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingjie Cao updated FLINK-18762:

Fix Version/s: (was: 1.13.0)
   1.14.0

> Make network buffers per incoming/outgoing channel can be configured 
> separately
> ---
>
> Key: FLINK-18762
> URL: https://issues.apache.org/jira/browse/FLINK-18762
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Yingjie Cao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> In FLINK-16012, we want to decrease the default number of exclusive buffers 
> at receiver side from 2 to 1 to accelerate checkpoint in cases of 
> backpressure. However, number of buffers per outgoing and incoming channels 
> are configured by a single configuration key. It is better to make network 
> buffers per incoming/outgoing channel can be configured separately which is 
> more flexible. At the same time, we can keep the default behavior compatible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-16428) Fine-grained network buffer management for backpressure

2021-03-28 Thread Yingjie Cao (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingjie Cao updated FLINK-16428:

Fix Version/s: (was: 1.13.0)
   1.14.0

> Fine-grained network buffer management for backpressure
> ---
>
> Key: FLINK-16428
> URL: https://issues.apache.org/jira/browse/FLINK-16428
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Reporter: Zhijiang
>Priority: Critical
> Fix For: 1.14.0
>
>
> It is an umbrella ticket for tracing the progress of this improvement.
> This is the second ingredient to solve the “checkpoints under backpressure” 
> problem (together with unaligned checkpoints). It consists of two steps:
>  * See if we can use less network memory in general for streaming jobs (with 
> potentially different distribution of floating buffers in the input side)
>  * Under backpressure, reduce network memory to have less in-flight data 
> (needs specification of algorithm and experiments)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16428) Fine-grained network buffer management for backpressure

2021-03-28 Thread Yingjie Cao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310448#comment-17310448
 ] 

Yingjie Cao commented on FLINK-16428:
-

Though the PR is ready, I guess we do not have enough time to merge it into 
1.13, moving the fix version to 1.14.

> Fine-grained network buffer management for backpressure
> ---
>
> Key: FLINK-16428
> URL: https://issues.apache.org/jira/browse/FLINK-16428
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Reporter: Zhijiang
>Priority: Critical
> Fix For: 1.13.0
>
>
> It is an umbrella ticket for tracing the progress of this improvement.
> This is the second ingredient to solve the “checkpoints under backpressure” 
> problem (together with unaligned checkpoints). It consists of two steps:
>  * See if we can use less network memory in general for streaming jobs (with 
> potentially different distribution of floating buffers in the input side)
>  * Under backpressure, reduce network memory to have less in-flight data 
> (needs specification of algorithm and experiments)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #15016: [FLINK-21413][state] Clean TtlMapState and TtlListState after all elements are expired

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #15016:
URL: https://github.com/apache/flink/pull/15016#issuecomment-785640058


   
   ## CI report:
   
   * 5d95fe96d132b396f3d59f2c102d63a68ccd0d73 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15646)
 
   * b4862f8a8e8dab12ec9c0d71d8a31152cb212ebc UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13025: [FLINK-18762][network] Make network buffers per incoming/outgoing channel can be configured separately

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #13025:
URL: https://github.com/apache/flink/pull/13025#issuecomment-666200437


   
   ## CI report:
   
   * 17b353709923bd16fbdc07d0936c1302f8ddd537 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15624)
 
   * ae949a0492d2e2df7af183be3fa32a77eaada34c Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15651)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-22006) Could not run more than 20 jobs in a native K8s session with K8s HA enabled

2021-03-28 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-22006:
--
Labels: k8s-ha  (was: )

> Could not run more than 20 jobs in a native K8s session with K8s HA enabled
> ---
>
> Key: FLINK-22006
> URL: https://issues.apache.org/jira/browse/FLINK-22006
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.12.2, 1.13.0
>Reporter: Yang Wang
>Priority: Major
>  Labels: k8s-ha
> Attachments: image-2021-03-24-18-08-42-116.png
>
>
> Currently, if we start a native K8s session cluster with K8s HA enabled, we 
> could not run more than 20 streaming jobs. 
>  
> The latest job is always initializing, and the previous one is created and 
> waiting to be assigned. It seems that some internal resources have been 
> exhausted, e.g. okhttp thread pool , tcp connections or something else.
> !image-2021-03-24-18-08-42-116.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22006) Could not run more than 20 jobs in a native K8s session when K8s HA enabled

2021-03-28 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-22006:
--
Description: 
Currently, if we start a native K8s session cluster when K8s HA enabled, we 
could not run more than 20 streaming jobs. 

 

The latest job is always initializing, and the previous one is created and 
waiting to be assigned. It seems that some internal resources have been 
exhausted, e.g. okhttp thread pool , tcp connections or something else.

!image-2021-03-24-18-08-42-116.png!

  was:
Currently, if we start a native K8s session cluster with K8s HA enabled, we 
could not run more than 20 streaming jobs. 

 

The latest job is always initializing, and the previous one is created and 
waiting to be assigned. It seems that some internal resources have been 
exhausted, e.g. okhttp thread pool , tcp connections or something else.

!image-2021-03-24-18-08-42-116.png!


> Could not run more than 20 jobs in a native K8s session when K8s HA enabled
> ---
>
> Key: FLINK-22006
> URL: https://issues.apache.org/jira/browse/FLINK-22006
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.12.2, 1.13.0
>Reporter: Yang Wang
>Priority: Major
>  Labels: k8s-ha
> Attachments: image-2021-03-24-18-08-42-116.png
>
>
> Currently, if we start a native K8s session cluster when K8s HA enabled, we 
> could not run more than 20 streaming jobs. 
>  
> The latest job is always initializing, and the previous one is created and 
> waiting to be assigned. It seems that some internal resources have been 
> exhausted, e.g. okhttp thread pool , tcp connections or something else.
> !image-2021-03-24-18-08-42-116.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22006) Could not run more than 20 jobs in a native K8s session when K8s HA enabled

2021-03-28 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-22006:
--
Summary: Could not run more than 20 jobs in a native K8s session when K8s 
HA enabled  (was: Could not run more than 20 jobs in a native K8s session with 
K8s HA enabled)

> Could not run more than 20 jobs in a native K8s session when K8s HA enabled
> ---
>
> Key: FLINK-22006
> URL: https://issues.apache.org/jira/browse/FLINK-22006
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.12.2, 1.13.0
>Reporter: Yang Wang
>Priority: Major
>  Labels: k8s-ha
> Attachments: image-2021-03-24-18-08-42-116.png
>
>
> Currently, if we start a native K8s session cluster with K8s HA enabled, we 
> could not run more than 20 streaming jobs. 
>  
> The latest job is always initializing, and the previous one is created and 
> waiting to be assigned. It seems that some internal resources have been 
> exhausted, e.g. okhttp thread pool , tcp connections or something else.
> !image-2021-03-24-18-08-42-116.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] Jiayi-Liao commented on a change in pull request #15016: [FLINK-21413][state] Clean TtlMapState and TtlListState after all elements are expired

2021-03-28 Thread GitBox


Jiayi-Liao commented on a change in pull request #15016:
URL: https://github.com/apache/flink/pull/15016#discussion_r603040907



##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/state/ttl/TtlStateTestBase.java
##
@@ -471,6 +473,26 @@ public void 
testRestoreTtlAndRegisterNonTtlStateCompatFailure() throws Exception
 sbetc.createState(ctx().createStateDescriptor(), "");
 }
 
+@Test
+public void testIncrementalCleanupWholeState() throws Exception {
+assumeTrue(incrementalCleanupSupported());
+initTest(getConfBuilder(TTL).cleanupIncrementally(5, true).build());
+timeProvider.time = 0;
+// create enough keys to trigger incremental rehash
+updateKeys(0, INC_CLEANUP_ALL_KEYS, ctx().updateEmpty);
+// expire all state
+timeProvider.time = 120;
+// trigger state clean up
+for (int i = 0; i < INC_CLEANUP_ALL_KEYS; i++) {
+sbetc.setCurrentKey(Integer.toString(i));
+}
+// check all state cleaned up
+for (int i = 0; i < INC_CLEANUP_ALL_KEYS; i++) {
+sbetc.setCurrentKey(Integer.toString(i));
+assertTrue("Original state should be cleared", 
ctx().isOriginalNull());

Review comment:
   @Myasuka  Oh.. I see what you mean here. Very good point. :) 
   
   I've improved the codes based on your comments. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-22006) Could not run more than 20 jobs in a native K8s session with K8s HA enabled

2021-03-28 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-22006:
--
Component/s: Runtime / Coordination

> Could not run more than 20 jobs in a native K8s session with K8s HA enabled
> ---
>
> Key: FLINK-22006
> URL: https://issues.apache.org/jira/browse/FLINK-22006
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.12.2, 1.13.0
>Reporter: Yang Wang
>Priority: Major
>  Labels: k8s-ha
> Attachments: image-2021-03-24-18-08-42-116.png
>
>
> Currently, if we start a native K8s session cluster with K8s HA enabled, we 
> could not run more than 20 streaming jobs. 
>  
> The latest job is always initializing, and the previous one is created and 
> waiting to be assigned. It seems that some internal resources have been 
> exhausted, e.g. okhttp thread pool , tcp connections or something else.
> !image-2021-03-24-18-08-42-116.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22006) Could not run more than 20 jobs in a native K8s session with K8s HA enabled

2021-03-28 Thread Yang Wang (Jira)
Yang Wang created FLINK-22006:
-

 Summary: Could not run more than 20 jobs in a native K8s session 
with K8s HA enabled
 Key: FLINK-22006
 URL: https://issues.apache.org/jira/browse/FLINK-22006
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.12.2, 1.13.0
Reporter: Yang Wang
 Attachments: image-2021-03-24-18-08-42-116.png

Currently, if we start a native K8s session cluster with K8s HA enabled, we 
could not run more than 20 streaming jobs. 

 

The latest job is always initializing, and the previous one is created and 
waiting to be assigned. It seems that some internal resources have been 
exhausted, e.g. okhttp thread pool , tcp connections or something else.

!image-2021-03-24-18-08-42-116.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22006) Could not run more than 20 jobs in a native K8s session with K8s HA enabled

2021-03-28 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-22006:
--
Attachment: image-2021-03-24-18-08-42-116.png

> Could not run more than 20 jobs in a native K8s session with K8s HA enabled
> ---
>
> Key: FLINK-22006
> URL: https://issues.apache.org/jira/browse/FLINK-22006
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.12.2, 1.13.0
>Reporter: Yang Wang
>Priority: Major
> Attachments: image-2021-03-24-18-08-42-116.png
>
>
> Currently, if we start a native K8s session cluster with K8s HA enabled, we 
> could not run more than 20 streaming jobs. 
>  
> The latest job is always initializing, and the previous one is created and 
> waiting to be assigned. It seems that some internal resources have been 
> exhausted, e.g. okhttp thread pool , tcp connections or something else.
> !image-2021-03-24-18-08-42-116.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #15406: [FLINK-21998][hive] Copy more code from hive and move them to a dedicated package

2021-03-28 Thread GitBox


flinkbot commented on pull request #15406:
URL: https://github.com/apache/flink/pull/15406#issuecomment-809105749


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 8f3bef8d676748698cf0f7e6a9e9f595604ae412 (Mon Mar 29 
06:26:18 UTC 2021)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
* **This pull request references an unassigned [Jira 
ticket](https://issues.apache.org/jira/browse/FLINK-21998).** According to the 
[code contribution 
guide](https://flink.apache.org/contributing/contribute-code.html), tickets 
need to be assigned before starting with the implementation work.
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (FLINK-21942) KubernetesLeaderRetrievalDriver not closed after terminated which lead to connection leak

2021-03-28 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang reassigned FLINK-21942:
-

Assignee: Yang Wang

> KubernetesLeaderRetrievalDriver not closed after terminated which lead to 
> connection leak
> -
>
> Key: FLINK-21942
> URL: https://issues.apache.org/jira/browse/FLINK-21942
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.12.2, 1.13.0
>Reporter: Yi Tang
>Assignee: Yang Wang
>Priority: Major
>  Labels: k8s-ha
> Fix For: 1.13.0
>
> Attachments: image-2021-03-24-18-08-30-196.png, 
> image-2021-03-24-18-08-42-116.png, jstack.l
>
>
> Looks like KubernetesLeaderRetrievalDriver is not closed even if the 
> KubernetesLeaderElectionDriver is closed and job reach globally terminated.
> This will lead to many configmap watching be still active with connections to 
> K8s.
> When the connections exceeds max concurrent requests, those new configmap 
> watching can not be started. Finally leads to all new jobs submitted timeout.
> [~fly_in_gis] [~trohrmann] This may be related to FLINK-20695, could you 
> confirm this issue?
> But when many jobs are running in same session cluster, the config map 
> watching is required to be active. Maybe we should merge all config maps 
> watching?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-21998) Copy more code from hive and move them to a dedicated package

2021-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-21998:
---
Labels: pull-request-available  (was: )

> Copy more code from hive and move them to a dedicated package
> -
>
> Key: FLINK-21998
> URL: https://issues.apache.org/jira/browse/FLINK-21998
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: Rui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] lirui-apache opened a new pull request #15406: [FLINK-21998][hive] Copy more code from hive and move them to a dedicated package

2021-03-28 Thread GitBox


lirui-apache opened a new pull request #15406:
URL: https://github.com/apache/flink/pull/15406


   
   
   ## What is the purpose of the change
   
   Copy more code from hive and move them to a dedicated package
   
   
   ## Brief change log
   
 - Move the copied hive classes to package 
`org.apache.flink.table.planner.delegation.hive.copy`
 - Copy more hive code
   
   
   ## Verifying this change
   
   Existing tests
   
   ## Does this pull request potentially affect one of the following parts:
   
   NA
   
   ## Documentation
   
   NA
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-21942) KubernetesLeaderRetrievalDriver not closed after terminated which lead to connection leak

2021-03-28 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310438#comment-17310438
 ] 

Yang Wang commented on FLINK-21942:
---

I will attach a PR for this issue. It will solve the Kubernetes ConfigMap watch 
leak, as well as the ZooKeeper connection.

Actually, we have the same issue for ZooKeeper HA service for long time.

 

Unfortunately, I still have not enough time to find out the root cause why we 
only could run about 20 jobs in a Kubernetes session cluster with HA enabled. 
Since it is a separate issue, I will create another ticket.

> KubernetesLeaderRetrievalDriver not closed after terminated which lead to 
> connection leak
> -
>
> Key: FLINK-21942
> URL: https://issues.apache.org/jira/browse/FLINK-21942
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.12.2, 1.13.0
>Reporter: Yi Tang
>Priority: Major
>  Labels: k8s-ha
> Fix For: 1.13.0
>
> Attachments: image-2021-03-24-18-08-30-196.png, 
> image-2021-03-24-18-08-42-116.png, jstack.l
>
>
> Looks like KubernetesLeaderRetrievalDriver is not closed even if the 
> KubernetesLeaderElectionDriver is closed and job reach globally terminated.
> This will lead to many configmap watching be still active with connections to 
> K8s.
> When the connections exceeds max concurrent requests, those new configmap 
> watching can not be started. Finally leads to all new jobs submitted timeout.
> [~fly_in_gis] [~trohrmann] This may be related to FLINK-20695, could you 
> confirm this issue?
> But when many jobs are running in same session cluster, the config map 
> watching is required to be active. Maybe we should merge all config maps 
> watching?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #15405: [Flink-21976] Move Flink ML pipeline API and library code to a separate repository named flink-ml

2021-03-28 Thread GitBox


flinkbot commented on pull request #15405:
URL: https://github.com/apache/flink/pull/15405#issuecomment-809101943


   
   ## CI report:
   
   * 90f1936af457ab1419790cd750ee1ab45ec04ecb UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15294: [FLINK-21945][streaming] Omit pointwise connections from checkpointing in unaligned checkpoints.

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #15294:
URL: https://github.com/apache/flink/pull/15294#issuecomment-803058203


   
   ## CI report:
   
   * b22d2cdd4e457842b585e64089858a0ae8eb9a2b UNKNOWN
   * 19964baf25121c6f0a4f85d75ffcc568d98dcdaa Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15605)
 
   * 6bff4c2e3485c8f0dd09b872652c6a7958836e7f UNKNOWN
   * ea1d9cb1f72dd92d6910773bc63f6e8343416715 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15253: [FLINK-21808][hive] Support DQL/DML in HiveParser

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #15253:
URL: https://github.com/apache/flink/pull/15253#issuecomment-801069817


   
   ## CI report:
   
   * c550f67518cae7d8fe19752af581c582bd7115b9 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15625)
 
   * f57250440164af2180629998bc4cfda236d64772 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15650)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15161: [FLINK-20114][connector/kafka,common] Fix a few KafkaSource-related bugs

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #15161:
URL: https://github.com/apache/flink/pull/15161#issuecomment-797177953


   
   ## CI report:
   
   * d3f59e302b9860d49ce79252aeb8bd1decaeeabf UNKNOWN
   * 6fc6cc9a2e6c26e9e8fa62b481237231394078eb Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15507)
 
   * 6c17ec7241323224d3053134e82353c63f28a24f Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15649)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13025: [FLINK-18762][network] Make network buffers per incoming/outgoing channel can be configured separately

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #13025:
URL: https://github.com/apache/flink/pull/13025#issuecomment-666200437


   
   ## CI report:
   
   * 17b353709923bd16fbdc07d0936c1302f8ddd537 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15624)
 
   * ae949a0492d2e2df7af183be3fa32a77eaada34c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] Myasuka commented on a change in pull request #15016: [FLINK-21413][state] Clean TtlMapState and TtlListState after all elements are expired

2021-03-28 Thread GitBox


Myasuka commented on a change in pull request #15016:
URL: https://github.com/apache/flink/pull/15016#discussion_r603034801



##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/state/ttl/TtlStateTestBase.java
##
@@ -471,6 +473,26 @@ public void 
testRestoreTtlAndRegisterNonTtlStateCompatFailure() throws Exception
 sbetc.createState(ctx().createStateDescriptor(), "");
 }
 
+@Test
+public void testIncrementalCleanupWholeState() throws Exception {
+assumeTrue(incrementalCleanupSupported());
+initTest(getConfBuilder(TTL).cleanupIncrementally(5, true).build());
+timeProvider.time = 0;
+// create enough keys to trigger incremental rehash
+updateKeys(0, INC_CLEANUP_ALL_KEYS, ctx().updateEmpty);
+// expire all state
+timeProvider.time = 120;
+// trigger state clean up
+for (int i = 0; i < INC_CLEANUP_ALL_KEYS; i++) {
+sbetc.setCurrentKey(Integer.toString(i));
+}
+// check all state cleaned up
+for (int i = 0; i < INC_CLEANUP_ALL_KEYS; i++) {
+sbetc.setCurrentKey(Integer.toString(i));
+assertTrue("Original state should be cleared", 
ctx().isOriginalNull());

Review comment:
   My original point is that we could avoid to introduce those 
`isValueOfCurrentKeyNull` related methods by just introducing one single 
`isCurrentStateTableNull` method within `TtlStateTestBase.java` (no matter what 
name it is). 
   
   FLINK-21413 is about to fix a performance problem to avoid many unnecessary 
empty hash map and state entry, instead of incorrect query result. This new 
added test should only targets for heap-based keyed state backend and we can 
safely cast to `AbstractHeapState` here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] Myasuka commented on a change in pull request #15016: [FLINK-21413][state] Clean TtlMapState and TtlListState after all elements are expired

2021-03-28 Thread GitBox


Myasuka commented on a change in pull request #15016:
URL: https://github.com/apache/flink/pull/15016#discussion_r603034801



##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/state/ttl/TtlStateTestBase.java
##
@@ -471,6 +473,26 @@ public void 
testRestoreTtlAndRegisterNonTtlStateCompatFailure() throws Exception
 sbetc.createState(ctx().createStateDescriptor(), "");
 }
 
+@Test
+public void testIncrementalCleanupWholeState() throws Exception {
+assumeTrue(incrementalCleanupSupported());
+initTest(getConfBuilder(TTL).cleanupIncrementally(5, true).build());
+timeProvider.time = 0;
+// create enough keys to trigger incremental rehash
+updateKeys(0, INC_CLEANUP_ALL_KEYS, ctx().updateEmpty);
+// expire all state
+timeProvider.time = 120;
+// trigger state clean up
+for (int i = 0; i < INC_CLEANUP_ALL_KEYS; i++) {
+sbetc.setCurrentKey(Integer.toString(i));
+}
+// check all state cleaned up
+for (int i = 0; i < INC_CLEANUP_ALL_KEYS; i++) {
+sbetc.setCurrentKey(Integer.toString(i));
+assertTrue("Original state should be cleared", 
ctx().isOriginalNull());

Review comment:
   My original point is that we could avoid to introduce those 
`isValueOfCurrentKeyNull` related methods by just introduce one single 
`isCurrentStateTableNull` method within `TtlStateTestBase.java` (no matter what 
name it is). 
   
   FLINK-21413 is about to fix a performance problem to avoid many unnecessary 
empty hash map and state entry, instead of incorrect query result. This new 
added test should only targets for heap-based keyed state backend and we can 
safely cast to `AbstractHeapState` here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #15405: [Flink-21976] Move Flink ML pipeline API and library code to a separate repository named flink-ml

2021-03-28 Thread GitBox


flinkbot commented on pull request #15405:
URL: https://github.com/apache/flink/pull/15405#issuecomment-809095000


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 90f1936af457ab1419790cd750ee1ab45ec04ecb (Mon Mar 29 
06:04:31 UTC 2021)
   
   **Warnings:**
* **6 pom.xml files were touched**: Check for build and licensing issues.
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
* **This pull request references an unassigned [Jira 
ticket](https://issues.apache.org/jira/browse/FLINK-21976).** According to the 
[code contribution 
guide](https://flink.apache.org/contributing/contribute-code.html), tickets 
need to be assigned before starting with the implementation work.
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] lindong28 opened a new pull request #15405: [Flink-21976 ] Move Flink ML pipeline API and library code to a separate repository named flink-ml

2021-03-28 Thread GitBox


lindong28 opened a new pull request #15405:
URL: https://github.com/apache/flink/pull/15405


   ## What is the purpose of the change
   
   Remove ML pipeline API and library code from Flink so that we can later add 
them to https://github.com/apache/flink-ml.
   
   ## Brief change log
   
   - Remove all files under flink-ml-parent
   - Remove fink-ml-* related configs from pom.xml
   
   ## Verifying this change
   
   Pass unit tests.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15294: [FLINK-21945][streaming] Omit pointwise connections from checkpointing in unaligned checkpoints.

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #15294:
URL: https://github.com/apache/flink/pull/15294#issuecomment-803058203


   
   ## CI report:
   
   * b22d2cdd4e457842b585e64089858a0ae8eb9a2b UNKNOWN
   * 19964baf25121c6f0a4f85d75ffcc568d98dcdaa Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15605)
 
   * 6bff4c2e3485c8f0dd09b872652c6a7958836e7f UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15253: [FLINK-21808][hive] Support DQL/DML in HiveParser

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #15253:
URL: https://github.com/apache/flink/pull/15253#issuecomment-801069817


   
   ## CI report:
   
   * c550f67518cae7d8fe19752af581c582bd7115b9 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15625)
 
   * f57250440164af2180629998bc4cfda236d64772 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15161: [FLINK-20114][connector/kafka,common] Fix a few KafkaSource-related bugs

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #15161:
URL: https://github.com/apache/flink/pull/15161#issuecomment-797177953


   
   ## CI report:
   
   * d3f59e302b9860d49ce79252aeb8bd1decaeeabf UNKNOWN
   * 6fc6cc9a2e6c26e9e8fa62b481237231394078eb Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15507)
 
   * 6c17ec7241323224d3053134e82353c63f28a24f UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15016: [FLINK-21413][state] Clean TtlMapState and TtlListState after all elements are expired

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #15016:
URL: https://github.com/apache/flink/pull/15016#issuecomment-785640058


   
   ## CI report:
   
   * 5d95fe96d132b396f3d59f2c102d63a68ccd0d73 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15646)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-21906) Support computed column syntax for Hive DDL dialect

2021-03-28 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310429#comment-17310429
 ] 

Jark Wu commented on FLINK-21906:
-

Thanks [~hackergin], I assigned this issue to you.

> Support computed column syntax for Hive DDL dialect
> ---
>
> Key: FLINK-21906
> URL: https://issues.apache.org/jira/browse/FLINK-21906
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: Jark Wu
>Assignee: jinfeng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-21906) Support computed column syntax for Hive DDL dialect

2021-03-28 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu reassigned FLINK-21906:
---

Assignee: jinfeng

> Support computed column syntax for Hive DDL dialect
> ---
>
> Key: FLINK-21906
> URL: https://issues.apache.org/jira/browse/FLINK-21906
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: Jark Wu
>Assignee: jinfeng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] SteNicholas commented on pull request #15020: [FLINK-21445][kubernetes] Application mode does not set the configuration when building PackagedProgram

2021-03-28 Thread GitBox


SteNicholas commented on pull request #15020:
URL: https://github.com/apache/flink/pull/15020#issuecomment-809072744


   @tillrohrmann , I will update this pull request through following the 
comments above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-21989) Add a SupportsSourceWatermark ability interface

2021-03-28 Thread Timo Walther (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Walther closed FLINK-21989.

Fix Version/s: 1.13.0
   Resolution: Fixed

Fixed in 1.13.0: 71122fda952b7bd3b0308182e10da77af1cc67ff

> Add a SupportsSourceWatermark ability interface
> ---
>
> Key: FLINK-21989
> URL: https://issues.apache.org/jira/browse/FLINK-21989
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API, Table SQL / Planner
>Reporter: Timo Walther
>Assignee: Timo Walther
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> FLINK-21899 added a dedicated function that can be used in watermark 
> definitions. Currently, the generated watermark strategy is invalid because 
> of the exception that we throw in the function’s implementation. We should 
> integrate this concept deeper into the interfaces instead of the need to 
> implement some expression analyzing utility for every source.
> We propose the following interface:
> {code}
> SupportsSourceWatermark {
>   void applySourceWatermark()
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] twalthr closed pull request #15388: [FLINK-21989][table] Add a SupportsSourceWatermark ability interface

2021-03-28 Thread GitBox


twalthr closed pull request #15388:
URL: https://github.com/apache/flink/pull/15388


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14839: [FLINK-21353][state] Add DFS-based StateChangelog

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #14839:
URL: https://github.com/apache/flink/pull/14839#issuecomment-772060196


   
   ## CI report:
   
   * 426533428e0971d34f6e80acc89fe5a5a72ea2a4 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15638)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zicat commented on a change in pull request #15247: [FLINK-21833][Table SQL / Runtime] TemporalRowTimeJoinOperator.java will lead to the state expansion by short-life-cycle & huge Row

2021-03-28 Thread GitBox


zicat commented on a change in pull request #15247:
URL: https://github.com/apache/flink/pull/15247#discussion_r603014108



##
File path: 
flink-table/flink-table-runtime-blink/src/main/java/org/apache/flink/table/runtime/operators/join/temporal/TemporalRowTimeJoinOperator.java
##
@@ -301,6 +302,8 @@ private void cleanupExpiredVersionInState(long 
currentWatermark, List r
 public void cleanupState(long time) {
 leftState.clear();
 rightState.clear();
+nextLeftIndex.clear();
+registeredTimer.clear();

Review comment:
Please let me know if there is anything I should change, thx 
@leonardBang 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-21906) Support computed column syntax for Hive DDL dialect

2021-03-28 Thread jinfeng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310411#comment-17310411
 ] 

jinfeng commented on FLINK-21906:
-

I learned the code in FLIP-152, what we need to do is to use antlr in 
flink-hive-connector to implement the syntax of watermark, and convert ast to 
CreateTableOperation,
Maybe I can take a try. This feature is not target to 1.13 release, which means 
that I have more time to complete it , that's fine

> Support computed column syntax for Hive DDL dialect
> ---
>
> Key: FLINK-21906
> URL: https://issues.apache.org/jira/browse/FLINK-21906
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: Jark Wu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21981) Increase the priority of the parameter in flink-conf

2021-03-28 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310410#comment-17310410
 ] 

Xintong Song commented on FLINK-21981:
--

Hi [~Bo Cui],

I noticed you have filed several tickets, all related to using Flink Yarn 
deployment. The discussion seems not going smoothly.

If you wish, you can reach to my gmail (tonysong820). We can try to setup an 
offline discussion in Chinese. Hope that helps.

> Increase the priority of the parameter in flink-conf
> 
>
> Key: FLINK-21981
> URL: https://issues.apache.org/jira/browse/FLINK-21981
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission
>Reporter: Bo Cui
>Priority: Major
>
> in my cluster, env has HADOOP_CONF_DIR and YARN_CONF_DIR, but they are hadoop 
> and yarn conf, not flink conf. and hadoop conf of flink(env.hadoop.conf.dir) 
> is different from them. so we should use `env.hadoop.conf.dir` first
> https://github.com/apache/flink/blob/57e93c90a14a9f5316da821863dd2b335a8f86e0/flink-dist/src/main/flink-bin/bin/config.sh#L256



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #15404: [hotfix][docs] Remove redundant import

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #15404:
URL: https://github.com/apache/flink/pull/15404#issuecomment-809050239


   
   ## CI report:
   
   * b1c186c4ac9432c1bda5474b5a1d830062195e71 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15647)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15200: [FLINK-21355] Send changes to the state changelog

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #15200:
URL: https://github.com/apache/flink/pull/15200#issuecomment-798902665


   
   ## CI report:
   
   * 5e1342d9916f5c4356c622a40bc27bcbdacde9d7 UNKNOWN
   * 683a1724ca1074a01a0cbaf986afa4a85537b478 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15639)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15054: [FLINK-13550][runtime][ui] Operator's Flame Graph

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #15054:
URL: https://github.com/apache/flink/pull/15054#issuecomment-788337524


   
   ## CI report:
   
   * 26a28f2d83f56cb386e1365fd4df4fb8a2f2bf86 UNKNOWN
   * 0b5aaf42f2861a38e26a80e25cf2324e7cf06bb7 UNKNOWN
   * e7f59e2e2811c0718a453572e90a4ad2a900ff03 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15642)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15016: [FLINK-21413][state] Clean TtlMapState and TtlListState after all elements are expired

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #15016:
URL: https://github.com/apache/flink/pull/15016#issuecomment-785640058


   
   ## CI report:
   
   * cf3c223658f37cc838409b690e2e1bf93c04f207 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15641)
 
   * 5d95fe96d132b396f3d59f2c102d63a68ccd0d73 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15646)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-22005) SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1)

2021-03-28 Thread Guowei Ma (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guowei Ma closed FLINK-22005.
-
Fix Version/s: 1.13.0
 Release Note: fix in the master c375f4cfd394c10b54110eac446873055b716b89
   Resolution: Fixed

> SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1) 
> 
>
> Key: FLINK-22005
> URL: https://issues.apache.org/jira/browse/FLINK-22005
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Client
>Affects Versions: 1.13.0
>Reporter: Guowei Ma
>Priority: Major
>  Labels: test-stability
> Fix For: 1.13.0
>
>
> The test fail because of Waiting for Elasticsearch records indefinitely.
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15583&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=19826



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22005) SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1)

2021-03-28 Thread Guowei Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310399#comment-17310399
 ] 

Guowei Ma commented on FLINK-22005:
---

thanks [~Leonard Xu]. I find that this does not appear in the following 
test(28/29). So I close this tickets.

> SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1) 
> 
>
> Key: FLINK-22005
> URL: https://issues.apache.org/jira/browse/FLINK-22005
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Client
>Affects Versions: 1.13.0
>Reporter: Guowei Ma
>Priority: Major
>  Labels: test-stability
>
> The test fail because of Waiting for Elasticsearch records indefinitely.
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15583&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=19826



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wuchong commented on a change in pull request #15400: [FLINK-20557][sql-client] Support STATEMENT SET in SQL CLI

2021-03-28 Thread GitBox


wuchong commented on a change in pull request #15400:
URL: https://github.com/apache/flink/pull/15400#discussion_r603002125



##
File path: 
flink-table/flink-sql-client/src/main/java/org/apache/flink/table/client/cli/CliClient.java
##
@@ -299,15 +307,23 @@ private void callOperation(Operation operation) {
 } else if (operation instanceof HelpOperation) {
 // HELP
 callHelp();
+} else if (operation instanceof BeginStatementSetOperation) {
+// BEGIN STATEMENT SET
+callBeginStatementSet();
+} else if (operation instanceof EndStatementSetOperation) {
+// END
+callEndStatementSet();
+} else if (operation instanceof CatalogSinkModifyOperation) {
+// INSERT INTO/OVERWRITE
+callInsert((CatalogSinkModifyOperation) operation);
+} else if (isStatementSetMode) {

Review comment:
   This looks really hack and hard to maintain what statement are not 
allowed in statement set. I suggest to check statements at the beginning of 
this method. 
   
   ```java
 if (isStatementSetMode) {
   // check the current operation is allowed in STATEMENT SET
   if (!(operation instanceof CatalogSinkModifyOperation
   || operation instanceof EndStatementSetOperation)) {
   printError(MESSAGE_STATEMENT_SET_SQL_EXECUTION_ERROR);
   return;
   }
   }
   ```

##
File path: 
flink-table/flink-sql-client/src/main/java/org/apache/flink/table/client/cli/CliClient.java
##
@@ -412,27 +428,65 @@ private void callSelect(QueryOperation operation) {
 }
 
 private boolean callInsert(CatalogSinkModifyOperation operation) {
-printInfo(CliStrings.MESSAGE_SUBMITTING_STATEMENT);
-
-try {
-TableResult result = executor.executeOperation(sessionId, 
operation);
-checkState(result.getJobClient().isPresent());
-terminal.writer()
-.println(
-
CliStrings.messageInfo(CliStrings.MESSAGE_STATEMENT_SUBMITTED)
-.toAnsi());
-// keep compatibility with before
-terminal.writer()
-.println(
-String.format(
-"Job ID: %s\n",
-
result.getJobClient().get().getJobID().toString()));
-terminal.flush();
+if (isStatementSetMode) {
+statementSetOperations.add(operation);
+printInfo(CliStrings.MESSAGE_ADD_STATEMENT_TO_STATEMENT_SET);
 return true;
-} catch (SqlExecutionException e) {
-printExecutionException(e);
+} else {
+printInfo(CliStrings.MESSAGE_SUBMITTING_STATEMENT);
+
+try {
+TableResult result = executor.executeOperation(sessionId, 
operation);
+checkState(result.getJobClient().isPresent());
+terminal.writer()
+.println(
+
CliStrings.messageInfo(CliStrings.MESSAGE_STATEMENT_SUBMITTED)
+.toAnsi());
+// keep compatibility with before
+terminal.writer()
+.println(
+String.format(
+"Job ID: %s\n",
+
result.getJobClient().get().getJobID().toString()));
+terminal.flush();
+return true;
+} catch (SqlExecutionException e) {
+printExecutionException(e);
+}
+return false;
+}
+}
+
+private void callBeginStatementSet() {
+if (isStatementSetMode) {
+printStatementSetExecutionException();
+} else {
+isStatementSetMode = true;
+statementSetOperations = new ArrayList<>();
+printInfo(CliStrings.MESSAGE_BEGIN_STATEMENT_SET);
+}
+}
+
+private void callEndStatementSet() {
+if (isStatementSetMode) {
+isStatementSetMode = false;
+printInfo(CliStrings.MESSAGE_SUBMITTING_STATEMENT_SET);
+
+try {
+TableResult result = executor.executeOperation(sessionId, 
statementSetOperations);
+checkState(result.getJobClient().isPresent());
+terminal.writer()
+.println(
+
CliStrings.messageInfo(CliStrings.MESSAGE_STATEMENT_SET_SUBMITTED)
+.toAnsi());
+terminal.flush();

Review comment:
   This logic should be reused with `callInsert`, and we should also 
consider dml-sync option (could you rebase the branch) ?

##
File path: 
flink-table/flink-sql-client/s

[jira] [Commented] (FLINK-22005) SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1)

2021-03-28 Thread Leonard Xu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310391#comment-17310391
 ] 

Leonard Xu commented on FLINK-22005:


[~maguowei] Could you rebase to latest master, this issue has  been fixed in 
[https://github.com/apache/flink/pull/15394#event-4516849115]

> SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1) 
> 
>
> Key: FLINK-22005
> URL: https://issues.apache.org/jira/browse/FLINK-22005
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Client
>Affects Versions: 1.13.0
>Reporter: Guowei Ma
>Priority: Major
>  Labels: test-stability
>
> The test fail because of Waiting for Elasticsearch records indefinitely.
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15583&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=19826



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21981) Increase the priority of the parameter in flink-conf

2021-03-28 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310388#comment-17310388
 ] 

Bo Cui commented on FLINK-21981:


[~Paul Lin] yes, the current plan is like `export ...`, but since the 
`env.hadoop.conf.dir` exists and we can use it to reduce `export...`, why not 
use it first ?

> Increase the priority of the parameter in flink-conf
> 
>
> Key: FLINK-21981
> URL: https://issues.apache.org/jira/browse/FLINK-21981
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission
>Reporter: Bo Cui
>Priority: Major
>
> in my cluster, env has HADOOP_CONF_DIR and YARN_CONF_DIR, but they are hadoop 
> and yarn conf, not flink conf. and hadoop conf of flink(env.hadoop.conf.dir) 
> is different from them. so we should use `env.hadoop.conf.dir` first
> https://github.com/apache/flink/blob/57e93c90a14a9f5316da821863dd2b335a8f86e0/flink-dist/src/main/flink-bin/bin/config.sh#L256



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21990) SourceStreamTask will always hang if the CheckpointedFunction#snapshotState throws an exception.

2021-03-28 Thread Kezhu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310387#comment-17310387
 ] 

Kezhu Wang commented on FLINK-21990:


A {{disableChaining}} in between failed source and downstream sink passes the 
test case. But still hang for a while before run into blocking operations. 
Seems that simple {{Thread.interrupt}} is not that enough, a cooperative 
{{SourceFunction.cancel}} should help.

> SourceStreamTask will always hang if the CheckpointedFunction#snapshotState 
> throws an exception.
> 
>
> Key: FLINK-21990
> URL: https://issues.apache.org/jira/browse/FLINK-21990
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.11.0, 1.12.0
>Reporter: ming li
>Priority: Critical
>
> If the source in {{SourceStreamTask}} implements {{CheckpointedFunction}} and 
> an exception is thrown in the snapshotState method, then the 
> {{SourceStreamTask}} will always hang.
> The main reason is that the checkpoint is executed in the mailbox. When the 
> {{CheckpointedFunction#snapshotState}}  of the source throws an exception, 
> the StreamTask#cleanUpInvoke will be called, where it will wait for the end 
> of the {{LegacySourceFunctionThread}} of the source. However, the source 
> thread does not end by itself (this requires the user to control it), the 
> {{Task}} will hang at this time, and the JobMaster has no perception of this 
> behavior.
> {code:java}
> protected void cleanUpInvoke() throws Exception {
> getCompletionFuture().exceptionally(unused -> null).join(); //wait for 
> the end of the source
> // clean up everything we initialized
> isRunning = false;
> ...
> }{code}
> I think we should call the cancel method of the source first, and then wait 
> for the end.
> The following is my test code, the test branch is Flink's master branch.
> {code:java}
> @Test
> public void testSourceFailure() throws Exception {
> final StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.enableCheckpointing(2000L);
> env.setRestartStrategy(RestartStrategies.noRestart());
> env.addSource(new FailedSource()).addSink(new DiscardingSink<>());
> JobGraph jobGraph = 
> StreamingJobGraphGenerator.createJobGraph(env.getStreamGraph());
> try {
> // assert that the job only execute checkpoint once and only failed 
> once.
> TestUtils.submitJobAndWaitForResult(
> cluster.getClusterClient(), jobGraph, 
> getClass().getClassLoader());
> } catch (JobExecutionException jobException) {
> Optional throwable =
> ExceptionUtils.findThrowable(jobException, 
> FlinkRuntimeException.class);
> Assert.assertTrue(throwable.isPresent());
> Assert.assertEquals(
> 
> CheckpointFailureManager.EXCEEDED_CHECKPOINT_TOLERABLE_FAILURE_MESSAGE,
> throwable.get().getMessage());
> }
> // assert that the job only failed once.
> Assert.assertEquals(1, 
> StringGeneratingSourceFunction.INITIALIZE_TIMES.get());
> }
> private static class FailedSource extends RichParallelSourceFunction
> implements CheckpointedFunction {
> private transient boolean running;
> @Override
> public void open(Configuration parameters) throws Exception {
> running = true;
> }
> @Override
> public void run(SourceContext ctx) throws Exception {
> while (running) {
> ctx.collect("test");
> }
> }
> @Override
> public void cancel() {
> running = false;
> }
> @Override
> public void snapshotState(FunctionSnapshotContext context) throws 
> Exception {
> throw new RuntimeException("source failed");
> }
> @Override
> public void initializeState(FunctionInitializationContext context) throws 
> Exception {}
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18356) Exit code 137 returned from process when testing pyflink

2021-03-28 Thread Guowei Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310383#comment-17310383
 ] 

Guowei Ma commented on FLINK-18356:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15554&view=logs&j=34f41360-6c0d-54d3-11a1-0292a2def1d9&t=2d56e022-1ace-542f-bf1a-b37dd63243f2&l=9452

> Exit code 137 returned from process when testing pyflink
> 
>
> Key: FLINK-18356
> URL: https://issues.apache.org/jira/browse/FLINK-18356
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Build System / Azure Pipelines
>Affects Versions: 1.12.0
>Reporter: Piotr Nowojski
>Priority: Major
>  Labels: test-stability
> Fix For: 1.13.0
>
>
> {noformat}
> = test session starts 
> ==
> platform linux -- Python 3.7.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
> cachedir: .tox/py37-cython/.pytest_cache
> rootdir: /__w/3/s/flink-python
> collected 568 items
> pyflink/common/tests/test_configuration.py ..[  
> 1%]
> pyflink/common/tests/test_execution_config.py ...[  
> 5%]
> pyflink/dataset/tests/test_execution_environment.py .
> ##[error]Exit code 137 returned from process: file name '/bin/docker', 
> arguments 'exec -i -u 1002 
> 97fc4e22522d2ced1f4d23096b8929045d083dd0a99a4233a8b20d0489e9bddb 
> /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'.
> Finishing: Test - python
> {noformat}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3729&view=logs&j=9cada3cb-c1d3-5621-16da-0f718fb86602&t=8d78fe4f-d658-5c70-12f8-4921589024c3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #15403: [hotfix] fix typo, assign processedData to currentProcessedData rather than currentPersistedData.

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #15403:
URL: https://github.com/apache/flink/pull/15403#issuecomment-809045211


   
   ## CI report:
   
   * d87a60142ebdb3eef911d51391679b90f4acaa6a Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15645)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #15404: [hotfix][docs] Remove redundant import

2021-03-28 Thread GitBox


flinkbot commented on pull request #15404:
URL: https://github.com/apache/flink/pull/15404#issuecomment-809050239


   
   ## CI report:
   
   * b1c186c4ac9432c1bda5474b5a1d830062195e71 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15402: [FLINK-21985][table-api] Support Explain Query/Modifcation syntax in Calcite Parser

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #15402:
URL: https://github.com/apache/flink/pull/15402#issuecomment-809045142


   
   ## CI report:
   
   * 97ba64b2959826a53be3273d9b565b3fe8f96fdc Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15644)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18356) Exit code 137 returned from process when testing pyflink

2021-03-28 Thread Guowei Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310380#comment-17310380
 ] 

Guowei Ma commented on FLINK-18356:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=62110053-334f-5295-a0ab-80dd7e2babbf&l=24506

> Exit code 137 returned from process when testing pyflink
> 
>
> Key: FLINK-18356
> URL: https://issues.apache.org/jira/browse/FLINK-18356
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Build System / Azure Pipelines
>Affects Versions: 1.12.0
>Reporter: Piotr Nowojski
>Priority: Major
>  Labels: test-stability
> Fix For: 1.13.0
>
>
> {noformat}
> = test session starts 
> ==
> platform linux -- Python 3.7.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
> cachedir: .tox/py37-cython/.pytest_cache
> rootdir: /__w/3/s/flink-python
> collected 568 items
> pyflink/common/tests/test_configuration.py ..[  
> 1%]
> pyflink/common/tests/test_execution_config.py ...[  
> 5%]
> pyflink/dataset/tests/test_execution_environment.py .
> ##[error]Exit code 137 returned from process: file name '/bin/docker', 
> arguments 'exec -i -u 1002 
> 97fc4e22522d2ced1f4d23096b8929045d083dd0a99a4233a8b20d0489e9bddb 
> /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'.
> Finishing: Test - python
> {noformat}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3729&view=logs&j=9cada3cb-c1d3-5621-16da-0f718fb86602&t=8d78fe4f-d658-5c70-12f8-4921589024c3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21103) E2e tests time out on azure

2021-03-28 Thread Guowei Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310377#comment-17310377
 ] 

Guowei Ma commented on FLINK-21103:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15575&view=results

> E2e tests time out on azure
> ---
>
> Key: FLINK-21103
> URL: https://issues.apache.org/jira/browse/FLINK-21103
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines, Tests
>Affects Versions: 1.11.3, 1.12.1, 1.13.0
>Reporter: Dawid Wysakowicz
>Assignee: Dawid Wysakowicz
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.13.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12377&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529
> {code}
> Creating worker2 ... done
> Jan 22 13:16:17 Waiting for hadoop cluster to come up. We have been trying 
> for 0 seconds, retrying ...
> Jan 22 13:16:22 Waiting for hadoop cluster to come up. We have been trying 
> for 5 seconds, retrying ...
> Jan 22 13:16:27 Waiting for hadoop cluster to come up. We have been trying 
> for 10 seconds, retrying ...
> Jan 22 13:16:32 Waiting for hadoop cluster to come up. We have been trying 
> for 15 seconds, retrying ...
> Jan 22 13:16:37 Waiting for hadoop cluster to come up. We have been trying 
> for 20 seconds, retrying ...
> Jan 22 13:16:43 Waiting for hadoop cluster to come up. We have been trying 
> for 26 seconds, retrying ...
> Jan 22 13:16:48 Waiting for hadoop cluster to come up. We have been trying 
> for 31 seconds, retrying ...
> Jan 22 13:16:53 Waiting for hadoop cluster to come up. We have been trying 
> for 36 seconds, retrying ...
> Jan 22 13:16:58 Waiting for hadoop cluster to come up. We have been trying 
> for 41 seconds, retrying ...
> Jan 22 13:17:03 Waiting for hadoop cluster to come up. We have been trying 
> for 46 seconds, retrying ...
> Jan 22 13:17:08 We only have 0 NodeManagers up. We have been trying for 0 
> seconds, retrying ...
> 21/01/22 13:17:10 INFO client.RMProxy: Connecting to ResourceManager at 
> master.docker-hadoop-cluster-network/172.19.0.3:8032
> 21/01/22 13:17:11 INFO client.AHSProxy: Connecting to Application History 
> server at master.docker-hadoop-cluster-network/172.19.0.3:10200
> Jan 22 13:17:11 We now have 2 NodeManagers up.
> ==
> === WARNING: This E2E Run took already 80% of the allocated time budget of 
> 250 minutes ===
> ==
> ==
> === WARNING: This E2E Run will time out in the next few minutes. Starting to 
> upload the log output ===
> ==
> ##[error]The task has timed out.
> Async Command Start: Upload Artifact
> Uploading 1 files
> File upload succeed.
> Upload '/tmp/_e2e_watchdog.output.0' to file container: 
> '#/11824779/e2e-timeout-logs'
> Associated artifact 140921 with build 12377
> Async Command End: Upload Artifact
> Async Command Start: Upload Artifact
> Uploading 1 files
> File upload succeed.
> Upload '/tmp/_e2e_watchdog.output.1' to file container: 
> '#/11824779/e2e-timeout-logs'
> Associated artifact 140921 with build 12377
> Async Command End: Upload Artifact
> Async Command Start: Upload Artifact
> Uploading 1 files
> File upload succeed.
> Upload '/tmp/_e2e_watchdog.output.2' to file container: 
> '#/11824779/e2e-timeout-logs'
> Associated artifact 140921 with build 12377
> Async Command End: Upload Artifact
> Finishing: Run e2e tests
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] YikSanChan commented on pull request #15360: [FLINK-21938][docs] Add how to unit test python udfs

2021-03-28 Thread GitBox


YikSanChan commented on pull request #15360:
URL: https://github.com/apache/flink/pull/15360#issuecomment-809048667


   @rmetzger Hi Robert, is there anything I need to do here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-21954) Test failures occur due to the test not waiting for the ExecutionGraph to be created

2021-03-28 Thread Guowei Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310374#comment-17310374
 ] 

Guowei Ma commented on FLINK-21954:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15573&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7030a106-e977-5851-a05e-535de648c9c9&l=8502

> Test failures occur due to the test not waiting for the ExecutionGraph to be 
> created
> 
>
> Key: FLINK-21954
> URL: https://issues.apache.org/jira/browse/FLINK-21954
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Matthias
>Priority: Major
>  Labels: test-stability
>
> Various tests are failing due the test not waiting for the ExecutionGraph to 
> be created:
> * 
> [JobMasterTest.testRestoringFromSavepoint|https://dev.azure.com/mapohl/flink/_build/results?buildId=356&view=logs&j=243b38e1-22e7-598a-c8ae-385dce2c28b5&t=fea482b6-4f61-51f4-2584-f73df532b395&l=8266]
> * {{JobMasterTest.testRequestNextInputSplitWithGlobalFailover}}
> * {{JobMasterTest.testRequestNextInputSplitWithLocalFailover}} (also fails 
> due to FLINK-21450)
> * {{JobMasterQueryableStateTest.testRequestKvStateOfWrongJob}}
> * {{JobMasterQueryableStateTest.testRequestKvStateWithIrrelevantRegistration}}
> * {{JobMasterQueryableStateTest.testDuplicatedKvStateRegistrationsFailTask}}
> * {{JobMasterQueryableStateTest.testRegisterKvState}}
> We might have to double-check whether other tests are affected as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21659) Running HA per-job cluster (rocks, incremental) end-to-end test fails

2021-03-28 Thread Guowei Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310373#comment-17310373
 ] 

Guowei Ma commented on FLINK-21659:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15573&view=logs&j=4dd4dbdd-1802-5eb7-a518-6acd9d24d0fc&t=8d6b4dd3-4ca1-5611-1743-57a7d76b395a&l=1733


> Running HA per-job cluster (rocks, incremental) end-to-end test fails
> -
>
> Key: FLINK-21659
> URL: https://issues.apache.org/jira/browse/FLINK-21659
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.13.0
>Reporter: Guowei Ma
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=14232&view=logs&j=4dd4dbdd-1802-5eb7-a518-6acd9d24d0fc&t=8d6b4dd3-4ca1-5611-1743-57a7d76b395a
> It seems that the task deploy tasks to the TaskManager0 failed and it causes 
> the checkpoint fails.
> {code:java}
> java.util.concurrent.CompletionException: 
> java.util.concurrent.TimeoutException: Invocation of public abstract 
> java.util.concurrent.CompletableFuture 
> org.apache.flink.runtime.taskexecutor.TaskExecutorGateway.submitTask(org.apache.flink.runtime.deployment.TaskDeploymentDescriptor,org.apache.flink.runtime.jobmaster.JobMasterId,org.apache.flink.api.common.time.Time)
>  timed out.
> at 
> java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
>  ~[?:1.8.0_282]
> at 
> java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
>  ~[?:1.8.0_282]
> at 
> java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:925) 
> ~[?:1.8.0_282]
> at 
> java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:913)
>  ~[?:1.8.0_282]
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>  ~[?:1.8.0_282]
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
>  ~[?:1.8.0_282]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:234)
>  ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>  ~[?:1.8.0_282]
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>  ~[?:1.8.0_282]
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>  ~[?:1.8.0_282]
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
>  ~[?:1.8.0_282]
> at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:1064)
>  ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at akka.dispatch.OnComplete.internal(Future.scala:263) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at akka.dispatch.OnComplete.internal(Future.scala:261) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:191) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:188) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:73)
>  ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:644) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>  ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)
>

[GitHub] [flink] flinkbot commented on pull request #15404: [hotfix][docs] Remove redundant import

2021-03-28 Thread GitBox


flinkbot commented on pull request #15404:
URL: https://github.com/apache/flink/pull/15404#issuecomment-809047367


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit b1c186c4ac9432c1bda5474b5a1d830062195e71 (Mon Mar 29 
03:54:18 UTC 2021)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-21990) SourceStreamTask will always hang if the CheckpointedFunction#snapshotState throws an exception.

2021-03-28 Thread Kezhu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310372#comment-17310372
 ] 

Kezhu Wang commented on FLINK-21990:


Ideally, this should be fixed in 
[1.12.2|https://github.com/apache/flink/blob/release-1.12.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/SourceStreamTask.java#L175]
 and 
[master|https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/SourceStreamTask.java#L181].
 But I do run to hang in test case(modified version). Either that run does not 
touch blocking operations or interruption state was eaten up 
somewhere(FLINK-21186 ?) I think that fix worth a test to gain confidence on 
this.

cc [~AHeise]

> SourceStreamTask will always hang if the CheckpointedFunction#snapshotState 
> throws an exception.
> 
>
> Key: FLINK-21990
> URL: https://issues.apache.org/jira/browse/FLINK-21990
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.11.0, 1.12.0
>Reporter: ming li
>Priority: Critical
>
> If the source in {{SourceStreamTask}} implements {{CheckpointedFunction}} and 
> an exception is thrown in the snapshotState method, then the 
> {{SourceStreamTask}} will always hang.
> The main reason is that the checkpoint is executed in the mailbox. When the 
> {{CheckpointedFunction#snapshotState}}  of the source throws an exception, 
> the StreamTask#cleanUpInvoke will be called, where it will wait for the end 
> of the {{LegacySourceFunctionThread}} of the source. However, the source 
> thread does not end by itself (this requires the user to control it), the 
> {{Task}} will hang at this time, and the JobMaster has no perception of this 
> behavior.
> {code:java}
> protected void cleanUpInvoke() throws Exception {
> getCompletionFuture().exceptionally(unused -> null).join(); //wait for 
> the end of the source
> // clean up everything we initialized
> isRunning = false;
> ...
> }{code}
> I think we should call the cancel method of the source first, and then wait 
> for the end.
> The following is my test code, the test branch is Flink's master branch.
> {code:java}
> @Test
> public void testSourceFailure() throws Exception {
> final StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.enableCheckpointing(2000L);
> env.setRestartStrategy(RestartStrategies.noRestart());
> env.addSource(new FailedSource()).addSink(new DiscardingSink<>());
> JobGraph jobGraph = 
> StreamingJobGraphGenerator.createJobGraph(env.getStreamGraph());
> try {
> // assert that the job only execute checkpoint once and only failed 
> once.
> TestUtils.submitJobAndWaitForResult(
> cluster.getClusterClient(), jobGraph, 
> getClass().getClassLoader());
> } catch (JobExecutionException jobException) {
> Optional throwable =
> ExceptionUtils.findThrowable(jobException, 
> FlinkRuntimeException.class);
> Assert.assertTrue(throwable.isPresent());
> Assert.assertEquals(
> 
> CheckpointFailureManager.EXCEEDED_CHECKPOINT_TOLERABLE_FAILURE_MESSAGE,
> throwable.get().getMessage());
> }
> // assert that the job only failed once.
> Assert.assertEquals(1, 
> StringGeneratingSourceFunction.INITIALIZE_TIMES.get());
> }
> private static class FailedSource extends RichParallelSourceFunction
> implements CheckpointedFunction {
> private transient boolean running;
> @Override
> public void open(Configuration parameters) throws Exception {
> running = true;
> }
> @Override
> public void run(SourceContext ctx) throws Exception {
> while (running) {
> ctx.collect("test");
> }
> }
> @Override
> public void cancel() {
> running = false;
> }
> @Override
> public void snapshotState(FunctionSnapshotContext context) throws 
> Exception {
> throw new RuntimeException("source failed");
> }
> @Override
> public void initializeState(FunctionInitializationContext context) throws 
> Exception {}
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] YikSanChan opened a new pull request #15404: [hotfix][docs] Remove redundant import

2021-03-28 Thread GitBox


YikSanChan opened a new pull request #15404:
URL: https://github.com/apache/flink/pull/15404


   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluser with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / no / don't 
know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-21981) Increase the priority of the parameter in flink-conf

2021-03-28 Thread Paul Lin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310371#comment-17310371
 ] 

Paul Lin commented on FLINK-21981:
--

[~Bo Cui] Just to confirm, did you try exporting `HADOOP_CONF_DIR` before run 
any flink commands? It should work for multi-tenant scenarios. Env variables 
are more flexible and environment-specific, thus should take higher priority 
than static configurations.

> Increase the priority of the parameter in flink-conf
> 
>
> Key: FLINK-21981
> URL: https://issues.apache.org/jira/browse/FLINK-21981
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission
>Reporter: Bo Cui
>Priority: Major
>
> in my cluster, env has HADOOP_CONF_DIR and YARN_CONF_DIR, but they are hadoop 
> and yarn conf, not flink conf. and hadoop conf of flink(env.hadoop.conf.dir) 
> is different from them. so we should use `env.hadoop.conf.dir` first
> https://github.com/apache/flink/blob/57e93c90a14a9f5316da821863dd2b335a8f86e0/flink-dist/src/main/flink-bin/bin/config.sh#L256



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21103) E2e tests time out on azure

2021-03-28 Thread Guowei Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310370#comment-17310370
 ] 

Guowei Ma commented on FLINK-21103:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15573&view=results

> E2e tests time out on azure
> ---
>
> Key: FLINK-21103
> URL: https://issues.apache.org/jira/browse/FLINK-21103
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines, Tests
>Affects Versions: 1.11.3, 1.12.1, 1.13.0
>Reporter: Dawid Wysakowicz
>Assignee: Dawid Wysakowicz
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.13.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12377&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529
> {code}
> Creating worker2 ... done
> Jan 22 13:16:17 Waiting for hadoop cluster to come up. We have been trying 
> for 0 seconds, retrying ...
> Jan 22 13:16:22 Waiting for hadoop cluster to come up. We have been trying 
> for 5 seconds, retrying ...
> Jan 22 13:16:27 Waiting for hadoop cluster to come up. We have been trying 
> for 10 seconds, retrying ...
> Jan 22 13:16:32 Waiting for hadoop cluster to come up. We have been trying 
> for 15 seconds, retrying ...
> Jan 22 13:16:37 Waiting for hadoop cluster to come up. We have been trying 
> for 20 seconds, retrying ...
> Jan 22 13:16:43 Waiting for hadoop cluster to come up. We have been trying 
> for 26 seconds, retrying ...
> Jan 22 13:16:48 Waiting for hadoop cluster to come up. We have been trying 
> for 31 seconds, retrying ...
> Jan 22 13:16:53 Waiting for hadoop cluster to come up. We have been trying 
> for 36 seconds, retrying ...
> Jan 22 13:16:58 Waiting for hadoop cluster to come up. We have been trying 
> for 41 seconds, retrying ...
> Jan 22 13:17:03 Waiting for hadoop cluster to come up. We have been trying 
> for 46 seconds, retrying ...
> Jan 22 13:17:08 We only have 0 NodeManagers up. We have been trying for 0 
> seconds, retrying ...
> 21/01/22 13:17:10 INFO client.RMProxy: Connecting to ResourceManager at 
> master.docker-hadoop-cluster-network/172.19.0.3:8032
> 21/01/22 13:17:11 INFO client.AHSProxy: Connecting to Application History 
> server at master.docker-hadoop-cluster-network/172.19.0.3:10200
> Jan 22 13:17:11 We now have 2 NodeManagers up.
> ==
> === WARNING: This E2E Run took already 80% of the allocated time budget of 
> 250 minutes ===
> ==
> ==
> === WARNING: This E2E Run will time out in the next few minutes. Starting to 
> upload the log output ===
> ==
> ##[error]The task has timed out.
> Async Command Start: Upload Artifact
> Uploading 1 files
> File upload succeed.
> Upload '/tmp/_e2e_watchdog.output.0' to file container: 
> '#/11824779/e2e-timeout-logs'
> Associated artifact 140921 with build 12377
> Async Command End: Upload Artifact
> Async Command Start: Upload Artifact
> Uploading 1 files
> File upload succeed.
> Upload '/tmp/_e2e_watchdog.output.1' to file container: 
> '#/11824779/e2e-timeout-logs'
> Associated artifact 140921 with build 12377
> Async Command End: Upload Artifact
> Async Command Start: Upload Artifact
> Uploading 1 files
> File upload succeed.
> Upload '/tmp/_e2e_watchdog.output.2' to file container: 
> '#/11824779/e2e-timeout-logs'
> Associated artifact 140921 with build 12377
> Async Command End: Upload Artifact
> Finishing: Run e2e tests
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22005) SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1)

2021-03-28 Thread Guowei Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310368#comment-17310368
 ] 

Guowei Ma commented on FLINK-22005:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15580&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=48499

> SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1) 
> 
>
> Key: FLINK-22005
> URL: https://issues.apache.org/jira/browse/FLINK-22005
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Client
>Affects Versions: 1.13.0
>Reporter: Guowei Ma
>Priority: Major
>  Labels: test-stability
>
> The test fail because of Waiting for Elasticsearch records indefinitely.
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15583&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=19826



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22005) SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1)

2021-03-28 Thread Guowei Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310366#comment-17310366
 ] 

Guowei Ma commented on FLINK-22005:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15589&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=58087

> SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1) 
> 
>
> Key: FLINK-22005
> URL: https://issues.apache.org/jira/browse/FLINK-22005
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Client
>Affects Versions: 1.13.0
>Reporter: Guowei Ma
>Priority: Major
>  Labels: test-stability
>
> The test fail because of Waiting for Elasticsearch records indefinitely.
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15583&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=19826



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-21103) E2e tests time out on azure

2021-03-28 Thread Guowei Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307542#comment-17307542
 ] 

Guowei Ma edited comment on FLINK-21103 at 3/29/21, 3:47 AM:
-

on branch 1.13
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15272&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15622&view=results



was (Author: maguowei):
on branch 1.13
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15272&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15622&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15589&view=results


> E2e tests time out on azure
> ---
>
> Key: FLINK-21103
> URL: https://issues.apache.org/jira/browse/FLINK-21103
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines, Tests
>Affects Versions: 1.11.3, 1.12.1, 1.13.0
>Reporter: Dawid Wysakowicz
>Assignee: Dawid Wysakowicz
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.13.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12377&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529
> {code}
> Creating worker2 ... done
> Jan 22 13:16:17 Waiting for hadoop cluster to come up. We have been trying 
> for 0 seconds, retrying ...
> Jan 22 13:16:22 Waiting for hadoop cluster to come up. We have been trying 
> for 5 seconds, retrying ...
> Jan 22 13:16:27 Waiting for hadoop cluster to come up. We have been trying 
> for 10 seconds, retrying ...
> Jan 22 13:16:32 Waiting for hadoop cluster to come up. We have been trying 
> for 15 seconds, retrying ...
> Jan 22 13:16:37 Waiting for hadoop cluster to come up. We have been trying 
> for 20 seconds, retrying ...
> Jan 22 13:16:43 Waiting for hadoop cluster to come up. We have been trying 
> for 26 seconds, retrying ...
> Jan 22 13:16:48 Waiting for hadoop cluster to come up. We have been trying 
> for 31 seconds, retrying ...
> Jan 22 13:16:53 Waiting for hadoop cluster to come up. We have been trying 
> for 36 seconds, retrying ...
> Jan 22 13:16:58 Waiting for hadoop cluster to come up. We have been trying 
> for 41 seconds, retrying ...
> Jan 22 13:17:03 Waiting for hadoop cluster to come up. We have been trying 
> for 46 seconds, retrying ...
> Jan 22 13:17:08 We only have 0 NodeManagers up. We have been trying for 0 
> seconds, retrying ...
> 21/01/22 13:17:10 INFO client.RMProxy: Connecting to ResourceManager at 
> master.docker-hadoop-cluster-network/172.19.0.3:8032
> 21/01/22 13:17:11 INFO client.AHSProxy: Connecting to Application History 
> server at master.docker-hadoop-cluster-network/172.19.0.3:10200
> Jan 22 13:17:11 We now have 2 NodeManagers up.
> ==
> === WARNING: This E2E Run took already 80% of the allocated time budget of 
> 250 minutes ===
> ==
> ==
> === WARNING: This E2E Run will time out in the next few minutes. Starting to 
> upload the log output ===
> ==
> ##[error]The task has timed out.
> Async Command Start: Upload Artifact
> Uploading 1 files
> File upload succeed.
> Upload '/tmp/_e2e_watchdog.output.0' to file container: 
> '#/11824779/e2e-timeout-logs'
> Associated artifact 140921 with build 12377
> Async Command End: Upload Artifact
> Async Command Start: Upload Artifact
> Uploading 1 files
> File upload succeed.
> Upload '/tmp/_e2e_watchdog.output.1' to file container: 
> '#/11824779/e2e-timeout-logs'
> Associated artifact 140921 with build 12377
> Async Command End: Upload Artifact
> Async Command Start: Upload Artifact
> Uploading 1 files
> File upload succeed.
> Upload '/tmp/_e2e_watchdog.output.2' to file container: 
> '#/11824779/e2e-timeout-logs'
> Associated artifact 140921 with build 12377
> Async Command End: Upload Artifact
> Finishing: Run e2e tests
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] fsk119 commented on a change in pull request #15402: [FLINK-21985][table-api] Support Explain Query/Modifcation syntax in Calcite Parser

2021-03-28 Thread GitBox


fsk119 commented on a change in pull request #15402:
URL: https://github.com/apache/flink/pull/15402#discussion_r602994006



##
File path: 
flink-table/flink-sql-parser-hive/src/main/codegen/includes/parserImpls.ftl
##
@@ -1605,3 +1605,17 @@ SqlShowModules SqlShowModules() :
 return new SqlShowModules(startPos.plus(getPos()), requireFull);
 }
 }
+
+/**
+* Parses a explain module statement.
+*/
+SqlNode SqlRichExplain() :
+{
+SqlNode stmt;
+}
+{
+ ( )*
+stmt = SqlQueryOrDml() {
+return new SqlRichExplain(getPos(),stmt);
+}
+}

Review comment:
   Add a new line

##
File path: 
flink-table/flink-sql-parser-hive/src/test/java/org/apache/flink/sql/parser/hive/FlinkHiveSqlParserImplTest.java
##
@@ -482,4 +482,54 @@ public void testShowModules() {
 
 sql("show full modules").ok("SHOW FULL MODULES");
 }
+
+@Test
+public void testExplain() {
+String sql = "explain plan for select * from emps";
+String expected = "EXPLAIN SELECT *\n" + "FROM `EMPS`";
+this.sql(sql).ok(expected);
+}
+
+@Test
+public void testExplainJsonFormat() {
+// unsupport testExplainJsonFormat now
+}
+
+@Test
+public void testExplainWithImpl() {
+// unsupport testExplainWithImpl now
+}
+
+@Test
+public void testExplainWithoutImpl() {
+// unsupport testExplainWithoutImpl now
+}
+
+@Test
+public void testExplainWithType() {
+// unsupport testExplainWithType now
+}
+
+@Test
+public void testExplainAsXml() {
+// unsupport testExplainWithType now
+}
+
+@Test
+public void testExplainAsJson() {
+// unsupport testExplainWithType now
+}

Review comment:
   Add comment: 
   // TODO: FLINK-20562

##
File path: 
flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/calcite/FlinkPlannerImpl.scala
##
@@ -146,6 +146,10 @@ class FlinkPlannerImpl(
   val validated = validator.validate(explain.getExplicandum)
   explain.setOperand(0, validated)
   explain
+case richExplain: SqlRichExplain =>
+  val validated = validator.validate(richExplain.getStatement)
+  richExplain.setOperand(0,validated)
+  richExplain

Review comment:
   Move forward before `SqlExplain`?  Do we really need SqlExplain? 

##
File path: 
flink-table/flink-sql-parser/src/test/java/org/apache/flink/sql/parser/FlinkSqlParserImplTest.java
##
@@ -1215,6 +1214,56 @@ public void testEnd() {
 sql("end").ok("END");
 }
 
+@Test
+public void testExplain() {
+String sql = "explain plan for select * from emps";
+String expected = "EXPLAIN SELECT *\n" + "FROM `EMPS`";
+this.sql(sql).ok(expected);
+}
+
+@Test
+public void testExplainJsonFormat() {
+// unsupport testExplainJsonFormat now
+}
+
+@Test
+public void testExplainWithImpl() {
+// unsupport testExplainWithImpl now
+}
+
+@Test
+public void testExplainWithoutImpl() {
+// unsupport testExplainWithoutImpl now
+}
+
+@Test
+public void testExplainWithType() {
+// unsupport testExplainWithType now
+}
+
+@Test
+public void testExplainAsXml() {
+// unsupport testExplainWithType now
+}
+
+@Test
+public void testExplainAsJson() {
+// unsupport testExplainWithType now

Review comment:
   ditto

##
File path: 
flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/calcite/FlinkPlannerImpl.scala
##
@@ -18,28 +18,26 @@
 
 package org.apache.flink.table.calcite
 
-import org.apache.flink.sql.parser.ExtendedSqlNode
-import org.apache.flink.sql.parser.dql.{SqlRichDescribeTable, SqlShowCatalogs, 
SqlShowCurrentCatalog, SqlShowCurrentDatabase, SqlShowDatabases, 
SqlShowFunctions, SqlShowTables, SqlShowViews}
-import org.apache.flink.table.api.{TableException, ValidationException}
-import org.apache.flink.table.catalog.CatalogReader
-import org.apache.flink.table.parse.CalciteParser
+import _root_.java.lang.{Boolean => JBoolean}
+import _root_.java.util
+import _root_.java.util.function.{Function => JFunction}
 
 import org.apache.calcite.plan.RelOptTable.ViewExpander
 import org.apache.calcite.plan._
 import org.apache.calcite.rel.RelRoot
 import org.apache.calcite.rel.`type`.RelDataType
 import org.apache.calcite.rex.RexBuilder
-import org.apache.calcite.sql.advise.{SqlAdvisor, SqlAdvisorValidator}
+import org.apache.calcite.sql.advise.SqlAdvisorValidator
 import org.apache.calcite.sql.validate.SqlValidator
 import org.apache.calcite.sql.{SqlExplain, SqlKind, SqlNode, SqlOperatorTable}
 import org.apache.calcite.sql2rel.{SqlRexConvertletTable, SqlToRelConverter}
 import org.apache.calcite.tools.{FrameworkConfig, RelConversionException}
+import org.apache.flink.sql.parser.ExtendedSqlNode
+import org.apache.flink.

[jira] [Comment Edited] (FLINK-21103) E2e tests time out on azure

2021-03-28 Thread Guowei Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307542#comment-17307542
 ] 

Guowei Ma edited comment on FLINK-21103 at 3/29/21, 3:45 AM:
-

on branch 1.13
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15272&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15622&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15589&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15584&view=results



was (Author: maguowei):
on branch 1.13
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15272&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15622&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15589&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15584&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15583&view=results

> E2e tests time out on azure
> ---
>
> Key: FLINK-21103
> URL: https://issues.apache.org/jira/browse/FLINK-21103
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines, Tests
>Affects Versions: 1.11.3, 1.12.1, 1.13.0
>Reporter: Dawid Wysakowicz
>Assignee: Dawid Wysakowicz
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.13.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12377&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529
> {code}
> Creating worker2 ... done
> Jan 22 13:16:17 Waiting for hadoop cluster to come up. We have been trying 
> for 0 seconds, retrying ...
> Jan 22 13:16:22 Waiting for hadoop cluster to come up. We have been trying 
> for 5 seconds, retrying ...
> Jan 22 13:16:27 Waiting for hadoop cluster to come up. We have been trying 
> for 10 seconds, retrying ...
> Jan 22 13:16:32 Waiting for hadoop cluster to come up. We have been trying 
> for 15 seconds, retrying ...
> Jan 22 13:16:37 Waiting for hadoop cluster to come up. We have been trying 
> for 20 seconds, retrying ...
> Jan 22 13:16:43 Waiting for hadoop cluster to come up. We have been trying 
> for 26 seconds, retrying ...
> Jan 22 13:16:48 Waiting for hadoop cluster to come up. We have been trying 
> for 31 seconds, retrying ...
> Jan 22 13:16:53 Waiting for hadoop cluster to come up. We have been trying 
> for 36 seconds, retrying ...
> Jan 22 13:16:58 Waiting for hadoop cluster to come up. We have been trying 
> for 41 seconds, retrying ...
> Jan 22 13:17:03 Waiting for hadoop cluster to come up. We have been trying 
> for 46 seconds, retrying ...
> Jan 22 13:17:08 We only have 0 NodeManagers up. We have been trying for 0 
> seconds, retrying ...
> 21/01/22 13:17:10 INFO client.RMProxy: Connecting to ResourceManager at 
> master.docker-hadoop-cluster-network/172.19.0.3:8032
> 21/01/22 13:17:11 INFO client.AHSProxy: Connecting to Application History 
> server at master.docker-hadoop-cluster-network/172.19.0.3:10200
> Jan 22 13:17:11 We now have 2 NodeManagers up.
> ==
> === WARNING: This E2E Run took already 80% of the allocated time budget of 
> 250 minutes ===
> ==
> ==
> === WARNING: This E2E Run will time out in the next few minutes. Starting to 
> upload the log output ===
> ==
> ##[error]The task has timed out.
> Async Command Start: Upload Artifact
> Uploading 1 files
> File upload succeed.
> Upload '/tmp/_e2e_watchdog.output.0' to file container: 
> '#/11824779/e2e-timeout-logs'
> Associated artifact 140921 with build 12377
> Async Command End: Upload Artifact
> Async Command Start: Upload Artifact
> Uploading 1 files
> File upload succeed.
> Upload '/tmp/_e2e_watchdog.output.1' to file container: 
> '#/11824779/e2e-timeout-logs'
> Associated artifact 140921 with build 12377
> Async Command End: Upload Artifact
> Async Command Start: Upload Artifact
> Uploading 1 files
> File upload succeed.
> Upload '/tmp/_e2e_watchdog.output.2' to file container: 
> '#/11824779/e2e-timeout-logs'
> Associated artifact 140921 with build 12377
> Async Command End: Upload Artifact
> Finishing: Run e2e tests
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #15403: [hotfix] fix typo, assign processedData to currentProcessedData rather than currentPersistedData.

2021-03-28 Thread GitBox


flinkbot commented on pull request #15403:
URL: https://github.com/apache/flink/pull/15403#issuecomment-809045211


   
   ## CI report:
   
   * d87a60142ebdb3eef911d51391679b90f4acaa6a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (FLINK-21103) E2e tests time out on azure

2021-03-28 Thread Guowei Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307542#comment-17307542
 ] 

Guowei Ma edited comment on FLINK-21103 at 3/29/21, 3:45 AM:
-

on branch 1.13
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15272&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15622&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15589&view=results



was (Author: maguowei):
on branch 1.13
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15272&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15622&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15589&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15584&view=results


> E2e tests time out on azure
> ---
>
> Key: FLINK-21103
> URL: https://issues.apache.org/jira/browse/FLINK-21103
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines, Tests
>Affects Versions: 1.11.3, 1.12.1, 1.13.0
>Reporter: Dawid Wysakowicz
>Assignee: Dawid Wysakowicz
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.13.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12377&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529
> {code}
> Creating worker2 ... done
> Jan 22 13:16:17 Waiting for hadoop cluster to come up. We have been trying 
> for 0 seconds, retrying ...
> Jan 22 13:16:22 Waiting for hadoop cluster to come up. We have been trying 
> for 5 seconds, retrying ...
> Jan 22 13:16:27 Waiting for hadoop cluster to come up. We have been trying 
> for 10 seconds, retrying ...
> Jan 22 13:16:32 Waiting for hadoop cluster to come up. We have been trying 
> for 15 seconds, retrying ...
> Jan 22 13:16:37 Waiting for hadoop cluster to come up. We have been trying 
> for 20 seconds, retrying ...
> Jan 22 13:16:43 Waiting for hadoop cluster to come up. We have been trying 
> for 26 seconds, retrying ...
> Jan 22 13:16:48 Waiting for hadoop cluster to come up. We have been trying 
> for 31 seconds, retrying ...
> Jan 22 13:16:53 Waiting for hadoop cluster to come up. We have been trying 
> for 36 seconds, retrying ...
> Jan 22 13:16:58 Waiting for hadoop cluster to come up. We have been trying 
> for 41 seconds, retrying ...
> Jan 22 13:17:03 Waiting for hadoop cluster to come up. We have been trying 
> for 46 seconds, retrying ...
> Jan 22 13:17:08 We only have 0 NodeManagers up. We have been trying for 0 
> seconds, retrying ...
> 21/01/22 13:17:10 INFO client.RMProxy: Connecting to ResourceManager at 
> master.docker-hadoop-cluster-network/172.19.0.3:8032
> 21/01/22 13:17:11 INFO client.AHSProxy: Connecting to Application History 
> server at master.docker-hadoop-cluster-network/172.19.0.3:10200
> Jan 22 13:17:11 We now have 2 NodeManagers up.
> ==
> === WARNING: This E2E Run took already 80% of the allocated time budget of 
> 250 minutes ===
> ==
> ==
> === WARNING: This E2E Run will time out in the next few minutes. Starting to 
> upload the log output ===
> ==
> ##[error]The task has timed out.
> Async Command Start: Upload Artifact
> Uploading 1 files
> File upload succeed.
> Upload '/tmp/_e2e_watchdog.output.0' to file container: 
> '#/11824779/e2e-timeout-logs'
> Associated artifact 140921 with build 12377
> Async Command End: Upload Artifact
> Async Command Start: Upload Artifact
> Uploading 1 files
> File upload succeed.
> Upload '/tmp/_e2e_watchdog.output.1' to file container: 
> '#/11824779/e2e-timeout-logs'
> Associated artifact 140921 with build 12377
> Async Command End: Upload Artifact
> Async Command Start: Upload Artifact
> Uploading 1 files
> File upload succeed.
> Upload '/tmp/_e2e_watchdog.output.2' to file container: 
> '#/11824779/e2e-timeout-logs'
> Associated artifact 140921 with build 12377
> Async Command End: Upload Artifact
> Finishing: Run e2e tests
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #15402: [FLINK-21985][table-api] Support Explain Query/Modifcation syntax in Calcite Parser

2021-03-28 Thread GitBox


flinkbot commented on pull request #15402:
URL: https://github.com/apache/flink/pull/15402#issuecomment-809045142


   
   ## CI report:
   
   * 97ba64b2959826a53be3273d9b565b3fe8f96fdc UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15401: [FLINK-21969][python] Invoke finish bundle method before emitting the max timestamp watermark in PythonTimestampsAndWatermarksOperato

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #15401:
URL: https://github.com/apache/flink/pull/15401#issuecomment-809036806


   
   ## CI report:
   
   * 7d4a73b2d067dad06ae36fbf2750a1018df4338c Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15643)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-22005) SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1)

2021-03-28 Thread Guowei Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310365#comment-17310365
 ] 

Guowei Ma commented on FLINK-22005:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15583&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=19760

> SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1) 
> 
>
> Key: FLINK-22005
> URL: https://issues.apache.org/jira/browse/FLINK-22005
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Client
>Affects Versions: 1.13.0
>Reporter: Guowei Ma
>Priority: Major
>  Labels: test-stability
>
> The test fail because of Waiting for Elasticsearch records indefinitely.
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15583&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=19826



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #15016: [FLINK-21413][state] Clean TtlMapState and TtlListState after all elements are expired

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #15016:
URL: https://github.com/apache/flink/pull/15016#issuecomment-785640058


   
   ## CI report:
   
   * cf3c223658f37cc838409b690e2e1bf93c04f207 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15641)
 
   * 5d95fe96d132b396f3d59f2c102d63a68ccd0d73 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15054: [FLINK-13550][runtime][ui] Operator's Flame Graph

2021-03-28 Thread GitBox


flinkbot edited a comment on pull request #15054:
URL: https://github.com/apache/flink/pull/15054#issuecomment-788337524


   
   ## CI report:
   
   * 26a28f2d83f56cb386e1365fd4df4fb8a2f2bf86 UNKNOWN
   * 0b5aaf42f2861a38e26a80e25cf2324e7cf06bb7 UNKNOWN
   * 7717d7719f6b448084ff27365c2b718d2c059376 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15640)
 
   * e7f59e2e2811c0718a453572e90a4ad2a900ff03 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15642)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-21981) Increase the priority of the parameter in flink-conf

2021-03-28 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310364#comment-17310364
 ] 

Xintong Song commented on FLINK-21981:
--

{quote}and HADOOP_CONF_DIR=default hdfs-site, and HDADOOP_CONF_DIR cannot be 
modified in multi-tenant scenarios.
{quote}
Why is that? You should be able to expose the env only for the specific 
command, instead of exposing globally.

> Increase the priority of the parameter in flink-conf
> 
>
> Key: FLINK-21981
> URL: https://issues.apache.org/jira/browse/FLINK-21981
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission
>Reporter: Bo Cui
>Priority: Major
>
> in my cluster, env has HADOOP_CONF_DIR and YARN_CONF_DIR, but they are hadoop 
> and yarn conf, not flink conf. and hadoop conf of flink(env.hadoop.conf.dir) 
> is different from them. so we should use `env.hadoop.conf.dir` first
> https://github.com/apache/flink/blob/57e93c90a14a9f5316da821863dd2b335a8f86e0/flink-dist/src/main/flink-bin/bin/config.sh#L256



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22005) SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1)

2021-03-28 Thread Guowei Ma (Jira)
Guowei Ma created FLINK-22005:
-

 Summary: SQL Client end-to-end test (Old planner) Elasticsearch 
(v7.5.1) 
 Key: FLINK-22005
 URL: https://issues.apache.org/jira/browse/FLINK-22005
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Client
Affects Versions: 1.13.0
Reporter: Guowei Ma


The test fail because of Waiting for Elasticsearch records indefinitely.
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15583&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=19826




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21982) jobs should use client org.apache.hadoop.config on yarn

2021-03-28 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310358#comment-17310358
 ] 

Bo Cui commented on FLINK-21982:


when new configuration, configuration will load core-site by 
classLoader.getResource(String) 

https://github.com/apache/hadoop/blob/ea6595d3b68ac462aec0d493718d7a10fbda0b6d/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L788

> jobs should use client org.apache.hadoop.config on yarn
> ---
>
> Key: FLINK-21982
> URL: https://issues.apache.org/jira/browse/FLINK-21982
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.13.0
>Reporter: Bo Cui
>Priority: Major
>
> currently, jobs use the Yarn server configuration during execution. 
> i think we should submit the configuration to the HDFS, like MR job.xml...etc
> and container fetches the job.xml and creates a soft link(hdfs-site core-site 
> yarn-site) during container initializes
> and then the configuration can obtain these resources through 
> classLoader.getResource(String) during configuration initializes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-21981) Increase the priority of the parameter in flink-conf

2021-03-28 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310355#comment-17310355
 ] 

Bo Cui edited comment on FLINK-21981 at 3/29/21, 3:33 AM:
--

the client side has 2 hdfs-silt (default hdfs-site and flink hdfs-site, they 
are not the same )

and HADOOP_CONF_DIR=default hdfs-site, and HDADOOP_CONF_DIR cannot be modified 
in multi-tenant scenarios.

so i think Preferential use of `env.hadoop.conf.dir` may be better.


was (Author: bo cui):
the client side has 2 hdfs-silt (default hdfs-site and flink hdfs-site, they 
are not the same )

and HADOOP_CONF_DIR=default hdfs-site, and HDADOOP_CONF_DIR cannot be modified 
in multi-tenant scenarios.

so i think Preferential use of `env.hadoop.conf.dir` may be better.

 

 

> Increase the priority of the parameter in flink-conf
> 
>
> Key: FLINK-21981
> URL: https://issues.apache.org/jira/browse/FLINK-21981
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission
>Reporter: Bo Cui
>Priority: Major
>
> in my cluster, env has HADOOP_CONF_DIR and YARN_CONF_DIR, but they are hadoop 
> and yarn conf, not flink conf. and hadoop conf of flink(env.hadoop.conf.dir) 
> is different from them. so we should use `env.hadoop.conf.dir` first
> https://github.com/apache/flink/blob/57e93c90a14a9f5316da821863dd2b335a8f86e0/flink-dist/src/main/flink-bin/bin/config.sh#L256



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21981) Increase the priority of the parameter in flink-conf

2021-03-28 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310355#comment-17310355
 ] 

Bo Cui commented on FLINK-21981:


the client side has 2 hdfs-silt (default hdfs-site and flink hdfs-site, they 
are not the same )

and HADOOP_CONF_DIR=default hdfs-site, and HDADOOP_CONF_DIR cannot be modified 
in multi-tenant scenarios.

so i think Preferential use of `env.hadoop.conf.dir` may be better.

 

 

> Increase the priority of the parameter in flink-conf
> 
>
> Key: FLINK-21981
> URL: https://issues.apache.org/jira/browse/FLINK-21981
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission
>Reporter: Bo Cui
>Priority: Major
>
> in my cluster, env has HADOOP_CONF_DIR and YARN_CONF_DIR, but they are hadoop 
> and yarn conf, not flink conf. and hadoop conf of flink(env.hadoop.conf.dir) 
> is different from them. so we should use `env.hadoop.conf.dir` first
> https://github.com/apache/flink/blob/57e93c90a14a9f5316da821863dd2b335a8f86e0/flink-dist/src/main/flink-bin/bin/config.sh#L256



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21982) jobs should use client org.apache.hadoop.config on yarn

2021-03-28 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310354#comment-17310354
 ] 

Bo Cui commented on FLINK-21982:


thx [~xintongsong]
 FLINK-16005 only solves part of my problem(YarnApplicationMasterRunner and 
YarnResourceManager...).
 If the job uses `new Configuration` API, job and configuration still use the 
yarn/hdfs server configuration.
{quote}i think we should submit the configuration to the HDFS, like MR 
job.xml...etc
 and container fetches the job.xml and creates a soft link(hdfs-site core-site 
yarn-site) during container initializes
 and then the configuration can obtain these resources through 
classLoader.getResource(String) during configuration initializes
{quote}

> jobs should use client org.apache.hadoop.config on yarn
> ---
>
> Key: FLINK-21982
> URL: https://issues.apache.org/jira/browse/FLINK-21982
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.13.0
>Reporter: Bo Cui
>Priority: Major
>
> currently, jobs use the Yarn server configuration during execution. 
> i think we should submit the configuration to the HDFS, like MR job.xml...etc
> and container fetches the job.xml and creates a soft link(hdfs-site core-site 
> yarn-site) during container initializes
> and then the configuration can obtain these resources through 
> classLoader.getResource(String) during configuration initializes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #15403: [hotfix] fix typo, assign processedData to currentProcessedData rather than currentPersistedData.

2021-03-28 Thread GitBox


flinkbot commented on pull request #15403:
URL: https://github.com/apache/flink/pull/15403#issuecomment-809039318


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit d87a60142ebdb3eef911d51391679b90f4acaa6a (Mon Mar 29 
03:25:41 UTC 2021)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #15402: [FLINK-21985][table-api] Support Explain Query/Modifcation syntax in Calcite Parser

2021-03-28 Thread GitBox


flinkbot commented on pull request #15402:
URL: https://github.com/apache/flink/pull/15402#issuecomment-809039335


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 97ba64b2959826a53be3273d9b565b3fe8f96fdc (Mon Mar 29 
03:25:43 UTC 2021)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-21640) Job fails to be submitted in tenant scenario

2021-03-28 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310353#comment-17310353
 ] 

Xintong Song commented on FLINK-21640:
--

[~Bo Cui], [~fly_in_gis],

Obviously, different companies maintain ZooKeeper differently. I think 
users/companies can choose whichever approach they want, and Flink should not 
make assumptions on how ZK is maintained. In that sense, creating the root 
ZNode with global access only to fit with some specific ways of maintaining ZK 
does not sounds fair to me.

Leaving aside how ZK is maintained, I think it is a common convention for 
access control, that by default new contents should be created with as strict 
access as possible. E.g., by default new files/directories can only be modified 
by the creating user on most of the Linux/Unix systems.

> Job fails to be submitted in tenant scenario
> 
>
> Key: FLINK-21640
> URL: https://issues.apache.org/jira/browse/FLINK-21640
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Client / Job Submission
>Affects Versions: 1.12.2, 1.13.0
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-03-06-09-30-52-410.png, 
> image-2021-03-06-09-34-05-518.png
>
>
> Job fails to be submitted in tenant scenario
>  !image-2021-03-06-09-30-52-410.png! 
> because current user does not have the Znode permission.
>  !image-2021-03-06-09-34-05-518.png! 
> i think the parent znode acl is anyone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] est08zw opened a new pull request #15403: [hotfix] fix typo, assign processedData to currentProcessedData rather than currentPersistedData.

2021-03-28 Thread GitBox


est08zw opened a new pull request #15403:
URL: https://github.com/apache/flink/pull/15403


   
   
   ## What is the purpose of the change
   
   fix typo, assign processedData to currentProcessedData rather than 
currentPersistedData.
   
   ## Brief change log
   
   fix typo, assign processedData to currentProcessedData rather than 
currentPersistedData.
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] chaozwn closed pull request #15390: [FLINK-21985][table-api] Support Explain Query/Modifcation syntax in Calcite Parser

2021-03-28 Thread GitBox


chaozwn closed pull request #15390:
URL: https://github.com/apache/flink/pull/15390


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-22004) Translate Flink Roadmap to Chinese.

2021-03-28 Thread Yuan Mei (Jira)
Yuan Mei created FLINK-22004:


 Summary: Translate Flink Roadmap to Chinese.
 Key: FLINK-22004
 URL: https://issues.apache.org/jira/browse/FLINK-22004
 Project: Flink
  Issue Type: Task
  Components: Documentation
Reporter: Yuan Mei



https://flink.apache.org/roadmap.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] chaozwn opened a new pull request #15402: [FLINK-21985][table-api] Support Explain Query/Modifcation syntax in Calcite Parser

2021-03-28 Thread GitBox


chaozwn opened a new pull request #15402:
URL: https://github.com/apache/flink/pull/15402


   …Calcite Parser
   
   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluser with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / no / don't 
know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (FLINK-21103) E2e tests time out on azure

2021-03-28 Thread Guowei Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307542#comment-17307542
 ] 

Guowei Ma edited comment on FLINK-21103 at 3/29/21, 3:18 AM:
-

on branch 1.13
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15272&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15622&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15589&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15584&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15583&view=results


was (Author: maguowei):
on branch 1.13
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15272&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15622&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15589&view=results
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15584&view=results

> E2e tests time out on azure
> ---
>
> Key: FLINK-21103
> URL: https://issues.apache.org/jira/browse/FLINK-21103
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines, Tests
>Affects Versions: 1.11.3, 1.12.1, 1.13.0
>Reporter: Dawid Wysakowicz
>Assignee: Dawid Wysakowicz
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.13.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12377&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529
> {code}
> Creating worker2 ... done
> Jan 22 13:16:17 Waiting for hadoop cluster to come up. We have been trying 
> for 0 seconds, retrying ...
> Jan 22 13:16:22 Waiting for hadoop cluster to come up. We have been trying 
> for 5 seconds, retrying ...
> Jan 22 13:16:27 Waiting for hadoop cluster to come up. We have been trying 
> for 10 seconds, retrying ...
> Jan 22 13:16:32 Waiting for hadoop cluster to come up. We have been trying 
> for 15 seconds, retrying ...
> Jan 22 13:16:37 Waiting for hadoop cluster to come up. We have been trying 
> for 20 seconds, retrying ...
> Jan 22 13:16:43 Waiting for hadoop cluster to come up. We have been trying 
> for 26 seconds, retrying ...
> Jan 22 13:16:48 Waiting for hadoop cluster to come up. We have been trying 
> for 31 seconds, retrying ...
> Jan 22 13:16:53 Waiting for hadoop cluster to come up. We have been trying 
> for 36 seconds, retrying ...
> Jan 22 13:16:58 Waiting for hadoop cluster to come up. We have been trying 
> for 41 seconds, retrying ...
> Jan 22 13:17:03 Waiting for hadoop cluster to come up. We have been trying 
> for 46 seconds, retrying ...
> Jan 22 13:17:08 We only have 0 NodeManagers up. We have been trying for 0 
> seconds, retrying ...
> 21/01/22 13:17:10 INFO client.RMProxy: Connecting to ResourceManager at 
> master.docker-hadoop-cluster-network/172.19.0.3:8032
> 21/01/22 13:17:11 INFO client.AHSProxy: Connecting to Application History 
> server at master.docker-hadoop-cluster-network/172.19.0.3:10200
> Jan 22 13:17:11 We now have 2 NodeManagers up.
> ==
> === WARNING: This E2E Run took already 80% of the allocated time budget of 
> 250 minutes ===
> ==
> ==
> === WARNING: This E2E Run will time out in the next few minutes. Starting to 
> upload the log output ===
> ==
> ##[error]The task has timed out.
> Async Command Start: Upload Artifact
> Uploading 1 files
> File upload succeed.
> Upload '/tmp/_e2e_watchdog.output.0' to file container: 
> '#/11824779/e2e-timeout-logs'
> Associated artifact 140921 with build 12377
> Async Command End: Upload Artifact
> Async Command Start: Upload Artifact
> Uploading 1 files
> File upload succeed.
> Upload '/tmp/_e2e_watchdog.output.1' to file container: 
> '#/11824779/e2e-timeout-logs'
> Associated artifact 140921 with build 12377
> Async Command End: Upload Artifact
> Async Command Start: Upload Artifact
> Uploading 1 files
> File upload succeed.
> Upload '/tmp/_e2e_watchdog.output.2' to file container: 
> '#/11824779/e2e-timeout-logs'
> Associated artifact 140921 with build 12377
> Async Command End: Upload Artifact
> Finishing: Run e2e tests
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   >