[GitHub] [flink-benchmarks] pnowojski merged pull request #12: [hotfix] Fix Readme instructions that are easy to break
pnowojski merged pull request #12: URL: https://github.com/apache/flink-benchmarks/pull/12 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-22007) PartitionReleaseInBatchJobBenchmarkExecutor seems to be failing
Piotr Nowojski created FLINK-22007: -- Summary: PartitionReleaseInBatchJobBenchmarkExecutor seems to be failing Key: FLINK-22007 URL: https://issues.apache.org/jira/browse/FLINK-22007 Project: Flink Issue Type: Bug Components: Benchmarks, Runtime / Coordination Affects Versions: 1.13.0 Reporter: Piotr Nowojski Fix For: 1.13.0 Travis CI is failing: https://travis-ci.com/github/apache/flink-benchmarks/builds/221290042 While there is also some problem with the Jenkins builds for the same benchmark. http://codespeed.dak8s.net:8080/job/flink-scheduler-benchmarks/232 It would be also interesting for the future to understand why the Jenkins build is green and try to fix it (ideally, if some benchmarks fail, partial results should be still uploaded but the Jenkins build should be marked as failed). Otherwise issues like that can remain unnoticed for quite a bit of time. CC [~Thesharing] [~zhuzh] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink-benchmarks] pnowojski commented on pull request #12: [hotfix] Fix Readme instructions that are easy to break
pnowojski commented on pull request #12: URL: https://github.com/apache/flink-benchmarks/pull/12#issuecomment-809122082 Build failure looks unrelated: https://issues.apache.org/jira/browse/FLINK-22007 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] twalthr commented on pull request #15406: [FLINK-21998][hive] Copy more code from hive and move them to a dedicated package
twalthr commented on pull request #15406: URL: https://github.com/apache/flink/pull/15406#issuecomment-809120847 @lirui-apache could you give the community more context about this change in the JIRA issue? I think most people in the Flink community would immediately reject copying more code from Hive to Flink unless there is a good reason for it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] gaoyunhaii commented on a change in pull request #15259: [FLINK-20757][network] Optimize data broadcast for sort-merge blocking shuffle
gaoyunhaii commented on a change in pull request #15259: URL: https://github.com/apache/flink/pull/15259#discussion_r602998872 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/SortMergeResultPartition.java ## @@ -94,11 +94,17 @@ */ private final SortMergeResultPartitionReadScheduler readScheduler; -/** Number of guaranteed network buffers can be used by {@link #currentSortBuffer}. */ +/** + * Number of guaranteed network buffers can be used by {@link #unicastSortBuffer} and {@link + * #broadcastSortBuffer}. + */ private int numBuffersForSort; -/** Current {@link SortBuffer} to append records to. */ -private SortBuffer currentSortBuffer; +/** {@link SortBuffer} for records sent by {@link #broadcastRecord(ByteBuffer)}. */ +private SortBuffer broadcastSortBuffer; + +/** {@link SortBuffer} for records sent by {@link #emitRecord(ByteBuffer, int)}. */ +private SortBuffer unicastSortBuffer; Review comment: It seems in the following logic two sort-buffers has no difference ? Thus we may not need to use two separate buffers, we could still use only one `currentBuffer`. ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/SortMergeResultPartition.java ## @@ -389,8 +418,9 @@ private void releaseWriteBuffers() { public void close() { releaseWriteBuffers(); // the close method will be always called by the task thread, so there is need to make -// the currentSortBuffer filed volatile and visible to the cancel thread intermediately -releaseCurrentSortBuffer(); +// the sort buffer filed volatile and visible to the cancel thread intermediately Review comment: filed -> field ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PartitionedFileWriter.java ## @@ -86,6 +86,13 @@ /** Current subpartition to write buffers to. */ private int currentSubpartition = -1; +/** + * Broadcast region is an optimization for the broadcast partition which writes the same data to + * all subpartitions. For a broadcast region, data is only written once and the indexes of all + * subpartitions point to the same offset in the data file. + */ +private boolean isBroadCastRegion; Review comment: Might unified as `isBroadcastRegion`. ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PartitionedFileWriter.java ## @@ -205,32 +215,70 @@ public void writeBuffers(List bufferWithChannels) throws IOEx return; } -long expectedBytes = 0; +long expectedBytes; ByteBuffer[] bufferWithHeaders = new ByteBuffer[2 * bufferWithChannels.size()]; +if (isBroadCastRegion) { +expectedBytes = writeBroadcastBuffers(bufferWithChannels, bufferWithHeaders); +} else { +expectedBytes = writeUnicastBuffers(bufferWithChannels, bufferWithHeaders); +} + +BufferReaderWriterUtil.writeBuffers(dataFileChannel, expectedBytes, bufferWithHeaders); +} + +private long writeUnicastBuffers( +List bufferWithChannels, ByteBuffer[] bufferWithHeaders) { +long expectedBytes = 0; for (int i = 0; i < bufferWithChannels.size(); i++) { -BufferWithChannel bufferWithChannel = bufferWithChannels.get(i); -Buffer buffer = bufferWithChannel.getBuffer(); -int subpartitionIndex = bufferWithChannel.getChannelIndex(); -if (subpartitionIndex != currentSubpartition) { +int subpartition = bufferWithChannels.get(i).getChannelIndex(); +if (subpartition != currentSubpartition) { checkState( -subpartitionBuffers[subpartitionIndex] == 0, +subpartitionBuffers[subpartition] == 0, "Must write data of the same channel together."); -subpartitionOffsets[subpartitionIndex] = totalBytesWritten; -currentSubpartition = subpartitionIndex; +subpartitionOffsets[subpartition] = totalBytesWritten; +currentSubpartition = subpartition; } -ByteBuffer header = BufferReaderWriterUtil.allocatedHeaderBuffer(); -BufferReaderWriterUtil.setByteChannelBufferHeader(buffer, header); -bufferWithHeaders[2 * i] = header; -bufferWithHeaders[2 * i + 1] = buffer.getNioBufferReadable(); +Buffer buffer = bufferWithChannels.get(i).getBuffer(); +int numBytes = setBufferWithHeader(buffer, bufferWithHeaders, 2 * i); +expectedBytes += numBytes; +totalBytesWritten += numBytes; Review comment: Might we move `totalBytesWritten` update outside the two methods ? ## Fil
[GitHub] [flink] XComp commented on pull request #15020: [FLINK-21445][kubernetes] Application mode does not set the configuration when building PackagedProgram
XComp commented on pull request #15020: URL: https://github.com/apache/flink/pull/15020#issuecomment-809117295 Ok, thanks @SteNicholas . I'll give it another pass when you're done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #15406: [FLINK-21998][hive] Copy more code from hive and move them to a dedicated package
flinkbot commented on pull request #15406: URL: https://github.com/apache/flink/pull/15406#issuecomment-809116403 ## CI report: * 8f3bef8d676748698cf0f7e6a9e9f595604ae412 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15405: [Flink-21976] Move Flink ML pipeline API and library code to a separate repository named flink-ml
flinkbot edited a comment on pull request #15405: URL: https://github.com/apache/flink/pull/15405#issuecomment-809101943 ## CI report: * 90f1936af457ab1419790cd750ee1ab45ec04ecb Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15652) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-16641) Announce sender's backlog to solve the deadlock issue without exclusive buffers
[ https://issues.apache.org/jira/browse/FLINK-16641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-16641: Fix Version/s: (was: 1.13.0) 1.14.0 > Announce sender's backlog to solve the deadlock issue without exclusive > buffers > --- > > Key: FLINK-16641 > URL: https://issues.apache.org/jira/browse/FLINK-16641 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Network >Reporter: Zhijiang >Assignee: Yingjie Cao >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > This is the second ingredient besides FLINK-16404 to solve the deadlock > problem without exclusive buffers. > The scenario is as follows: > * The data in subpartition with positive backlog can be sent without doubt > because the exclusive credits would be feedback finally. > * Without exclusive buffers, the receiver would not request floating buffers > for 0 backlog. But when the new backlog is added into such subpartition, it > has no way to notify the receiver side without positive credits ATM. > * So it would result in waiting for each other between receiver and sender > sides to cause deadlock. The sender waits for credit to notify backlog and > the receiver waits for backlog to request floating credits. > To solve the above problem, the sender needs a separate message to announce > backlog sometimes besides existing `BufferResponse`. Then the receiver can > get this info to request floating buffers to feedback. > The side effect brought is to increase network transport delay and throughput > regression. We can measure how much it effects in existing micro-benchmark. > It might probably bear this effect to get a benefit of fast checkpoint > without exclusive buffers. We can give the proper explanations in respective > configuration options to let users make the final decision in practice. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #15366: [FLINK-12828][sql-client] Support -f option with a sql script as input
flinkbot edited a comment on pull request #15366: URL: https://github.com/apache/flink/pull/15366#issuecomment-806388017 ## CI report: * 347d38bc59e3ffa1d3854da26163c2170c2f6986 UNKNOWN * de78d0128017c7ad13d93cd5ba30f94ec516c5f1 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15618) * 1f8003071d9fee631437752a36a82f74e02caa04 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-16012) Reduce the default number of exclusive buffers from 2 to 1 on receiver side
[ https://issues.apache.org/jira/browse/FLINK-16012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-16012: Fix Version/s: (was: 1.13.0) 1.14.0 > Reduce the default number of exclusive buffers from 2 to 1 on receiver side > --- > > Key: FLINK-16012 > URL: https://issues.apache.org/jira/browse/FLINK-16012 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Network >Reporter: Zhijiang >Assignee: Yingjie Cao >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > Time Spent: 10m > Remaining Estimate: 0h > > In order to reduce the inflight buffers for checkpoint in the case of back > pressure, we can reduce the number of exclusive buffers for remote input > channel from default 2 to 1 as the first step. Besides that, the total > required buffers are also reduced as a result. We can further verify the > performance effect via various of benchmarks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18762) Make network buffers per incoming/outgoing channel can be configured separately
[ https://issues.apache.org/jira/browse/FLINK-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-18762: Fix Version/s: (was: 1.13.0) 1.14.0 > Make network buffers per incoming/outgoing channel can be configured > separately > --- > > Key: FLINK-18762 > URL: https://issues.apache.org/jira/browse/FLINK-18762 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Network >Reporter: Yingjie Cao >Priority: Minor > Labels: pull-request-available > Fix For: 1.14.0 > > > In FLINK-16012, we want to decrease the default number of exclusive buffers > at receiver side from 2 to 1 to accelerate checkpoint in cases of > backpressure. However, number of buffers per outgoing and incoming channels > are configured by a single configuration key. It is better to make network > buffers per incoming/outgoing channel can be configured separately which is > more flexible. At the same time, we can keep the default behavior compatible. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-16428) Fine-grained network buffer management for backpressure
[ https://issues.apache.org/jira/browse/FLINK-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-16428: Fix Version/s: (was: 1.13.0) 1.14.0 > Fine-grained network buffer management for backpressure > --- > > Key: FLINK-16428 > URL: https://issues.apache.org/jira/browse/FLINK-16428 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network >Reporter: Zhijiang >Priority: Critical > Fix For: 1.14.0 > > > It is an umbrella ticket for tracing the progress of this improvement. > This is the second ingredient to solve the “checkpoints under backpressure” > problem (together with unaligned checkpoints). It consists of two steps: > * See if we can use less network memory in general for streaming jobs (with > potentially different distribution of floating buffers in the input side) > * Under backpressure, reduce network memory to have less in-flight data > (needs specification of algorithm and experiments) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16428) Fine-grained network buffer management for backpressure
[ https://issues.apache.org/jira/browse/FLINK-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310448#comment-17310448 ] Yingjie Cao commented on FLINK-16428: - Though the PR is ready, I guess we do not have enough time to merge it into 1.13, moving the fix version to 1.14. > Fine-grained network buffer management for backpressure > --- > > Key: FLINK-16428 > URL: https://issues.apache.org/jira/browse/FLINK-16428 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network >Reporter: Zhijiang >Priority: Critical > Fix For: 1.13.0 > > > It is an umbrella ticket for tracing the progress of this improvement. > This is the second ingredient to solve the “checkpoints under backpressure” > problem (together with unaligned checkpoints). It consists of two steps: > * See if we can use less network memory in general for streaming jobs (with > potentially different distribution of floating buffers in the input side) > * Under backpressure, reduce network memory to have less in-flight data > (needs specification of algorithm and experiments) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #15016: [FLINK-21413][state] Clean TtlMapState and TtlListState after all elements are expired
flinkbot edited a comment on pull request #15016: URL: https://github.com/apache/flink/pull/15016#issuecomment-785640058 ## CI report: * 5d95fe96d132b396f3d59f2c102d63a68ccd0d73 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15646) * b4862f8a8e8dab12ec9c0d71d8a31152cb212ebc UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #13025: [FLINK-18762][network] Make network buffers per incoming/outgoing channel can be configured separately
flinkbot edited a comment on pull request #13025: URL: https://github.com/apache/flink/pull/13025#issuecomment-666200437 ## CI report: * 17b353709923bd16fbdc07d0936c1302f8ddd537 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15624) * ae949a0492d2e2df7af183be3fa32a77eaada34c Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15651) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-22006) Could not run more than 20 jobs in a native K8s session with K8s HA enabled
[ https://issues.apache.org/jira/browse/FLINK-22006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Wang updated FLINK-22006: -- Labels: k8s-ha (was: ) > Could not run more than 20 jobs in a native K8s session with K8s HA enabled > --- > > Key: FLINK-22006 > URL: https://issues.apache.org/jira/browse/FLINK-22006 > Project: Flink > Issue Type: Bug >Affects Versions: 1.12.2, 1.13.0 >Reporter: Yang Wang >Priority: Major > Labels: k8s-ha > Attachments: image-2021-03-24-18-08-42-116.png > > > Currently, if we start a native K8s session cluster with K8s HA enabled, we > could not run more than 20 streaming jobs. > > The latest job is always initializing, and the previous one is created and > waiting to be assigned. It seems that some internal resources have been > exhausted, e.g. okhttp thread pool , tcp connections or something else. > !image-2021-03-24-18-08-42-116.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-22006) Could not run more than 20 jobs in a native K8s session when K8s HA enabled
[ https://issues.apache.org/jira/browse/FLINK-22006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Wang updated FLINK-22006: -- Description: Currently, if we start a native K8s session cluster when K8s HA enabled, we could not run more than 20 streaming jobs. The latest job is always initializing, and the previous one is created and waiting to be assigned. It seems that some internal resources have been exhausted, e.g. okhttp thread pool , tcp connections or something else. !image-2021-03-24-18-08-42-116.png! was: Currently, if we start a native K8s session cluster with K8s HA enabled, we could not run more than 20 streaming jobs. The latest job is always initializing, and the previous one is created and waiting to be assigned. It seems that some internal resources have been exhausted, e.g. okhttp thread pool , tcp connections or something else. !image-2021-03-24-18-08-42-116.png! > Could not run more than 20 jobs in a native K8s session when K8s HA enabled > --- > > Key: FLINK-22006 > URL: https://issues.apache.org/jira/browse/FLINK-22006 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.12.2, 1.13.0 >Reporter: Yang Wang >Priority: Major > Labels: k8s-ha > Attachments: image-2021-03-24-18-08-42-116.png > > > Currently, if we start a native K8s session cluster when K8s HA enabled, we > could not run more than 20 streaming jobs. > > The latest job is always initializing, and the previous one is created and > waiting to be assigned. It seems that some internal resources have been > exhausted, e.g. okhttp thread pool , tcp connections or something else. > !image-2021-03-24-18-08-42-116.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-22006) Could not run more than 20 jobs in a native K8s session when K8s HA enabled
[ https://issues.apache.org/jira/browse/FLINK-22006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Wang updated FLINK-22006: -- Summary: Could not run more than 20 jobs in a native K8s session when K8s HA enabled (was: Could not run more than 20 jobs in a native K8s session with K8s HA enabled) > Could not run more than 20 jobs in a native K8s session when K8s HA enabled > --- > > Key: FLINK-22006 > URL: https://issues.apache.org/jira/browse/FLINK-22006 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.12.2, 1.13.0 >Reporter: Yang Wang >Priority: Major > Labels: k8s-ha > Attachments: image-2021-03-24-18-08-42-116.png > > > Currently, if we start a native K8s session cluster with K8s HA enabled, we > could not run more than 20 streaming jobs. > > The latest job is always initializing, and the previous one is created and > waiting to be assigned. It seems that some internal resources have been > exhausted, e.g. okhttp thread pool , tcp connections or something else. > !image-2021-03-24-18-08-42-116.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] Jiayi-Liao commented on a change in pull request #15016: [FLINK-21413][state] Clean TtlMapState and TtlListState after all elements are expired
Jiayi-Liao commented on a change in pull request #15016: URL: https://github.com/apache/flink/pull/15016#discussion_r603040907 ## File path: flink-runtime/src/test/java/org/apache/flink/runtime/state/ttl/TtlStateTestBase.java ## @@ -471,6 +473,26 @@ public void testRestoreTtlAndRegisterNonTtlStateCompatFailure() throws Exception sbetc.createState(ctx().createStateDescriptor(), ""); } +@Test +public void testIncrementalCleanupWholeState() throws Exception { +assumeTrue(incrementalCleanupSupported()); +initTest(getConfBuilder(TTL).cleanupIncrementally(5, true).build()); +timeProvider.time = 0; +// create enough keys to trigger incremental rehash +updateKeys(0, INC_CLEANUP_ALL_KEYS, ctx().updateEmpty); +// expire all state +timeProvider.time = 120; +// trigger state clean up +for (int i = 0; i < INC_CLEANUP_ALL_KEYS; i++) { +sbetc.setCurrentKey(Integer.toString(i)); +} +// check all state cleaned up +for (int i = 0; i < INC_CLEANUP_ALL_KEYS; i++) { +sbetc.setCurrentKey(Integer.toString(i)); +assertTrue("Original state should be cleared", ctx().isOriginalNull()); Review comment: @Myasuka Oh.. I see what you mean here. Very good point. :) I've improved the codes based on your comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-22006) Could not run more than 20 jobs in a native K8s session with K8s HA enabled
[ https://issues.apache.org/jira/browse/FLINK-22006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Wang updated FLINK-22006: -- Component/s: Runtime / Coordination > Could not run more than 20 jobs in a native K8s session with K8s HA enabled > --- > > Key: FLINK-22006 > URL: https://issues.apache.org/jira/browse/FLINK-22006 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.12.2, 1.13.0 >Reporter: Yang Wang >Priority: Major > Labels: k8s-ha > Attachments: image-2021-03-24-18-08-42-116.png > > > Currently, if we start a native K8s session cluster with K8s HA enabled, we > could not run more than 20 streaming jobs. > > The latest job is always initializing, and the previous one is created and > waiting to be assigned. It seems that some internal resources have been > exhausted, e.g. okhttp thread pool , tcp connections or something else. > !image-2021-03-24-18-08-42-116.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22006) Could not run more than 20 jobs in a native K8s session with K8s HA enabled
Yang Wang created FLINK-22006: - Summary: Could not run more than 20 jobs in a native K8s session with K8s HA enabled Key: FLINK-22006 URL: https://issues.apache.org/jira/browse/FLINK-22006 Project: Flink Issue Type: Bug Affects Versions: 1.12.2, 1.13.0 Reporter: Yang Wang Attachments: image-2021-03-24-18-08-42-116.png Currently, if we start a native K8s session cluster with K8s HA enabled, we could not run more than 20 streaming jobs. The latest job is always initializing, and the previous one is created and waiting to be assigned. It seems that some internal resources have been exhausted, e.g. okhttp thread pool , tcp connections or something else. !image-2021-03-24-18-08-42-116.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-22006) Could not run more than 20 jobs in a native K8s session with K8s HA enabled
[ https://issues.apache.org/jira/browse/FLINK-22006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Wang updated FLINK-22006: -- Attachment: image-2021-03-24-18-08-42-116.png > Could not run more than 20 jobs in a native K8s session with K8s HA enabled > --- > > Key: FLINK-22006 > URL: https://issues.apache.org/jira/browse/FLINK-22006 > Project: Flink > Issue Type: Bug >Affects Versions: 1.12.2, 1.13.0 >Reporter: Yang Wang >Priority: Major > Attachments: image-2021-03-24-18-08-42-116.png > > > Currently, if we start a native K8s session cluster with K8s HA enabled, we > could not run more than 20 streaming jobs. > > The latest job is always initializing, and the previous one is created and > waiting to be assigned. It seems that some internal resources have been > exhausted, e.g. okhttp thread pool , tcp connections or something else. > !image-2021-03-24-18-08-42-116.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot commented on pull request #15406: [FLINK-21998][hive] Copy more code from hive and move them to a dedicated package
flinkbot commented on pull request #15406: URL: https://github.com/apache/flink/pull/15406#issuecomment-809105749 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Automated Checks Last check on commit 8f3bef8d676748698cf0f7e6a9e9f595604ae412 (Mon Mar 29 06:26:18 UTC 2021) **Warnings:** * No documentation files were touched! Remember to keep the Flink docs up to date! * **This pull request references an unassigned [Jira ticket](https://issues.apache.org/jira/browse/FLINK-21998).** According to the [code contribution guide](https://flink.apache.org/contributing/contribute-code.html), tickets need to be assigned before starting with the implementation work. Mention the bot in a comment to re-run the automated checks. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (FLINK-21942) KubernetesLeaderRetrievalDriver not closed after terminated which lead to connection leak
[ https://issues.apache.org/jira/browse/FLINK-21942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Wang reassigned FLINK-21942: - Assignee: Yang Wang > KubernetesLeaderRetrievalDriver not closed after terminated which lead to > connection leak > - > > Key: FLINK-21942 > URL: https://issues.apache.org/jira/browse/FLINK-21942 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.12.2, 1.13.0 >Reporter: Yi Tang >Assignee: Yang Wang >Priority: Major > Labels: k8s-ha > Fix For: 1.13.0 > > Attachments: image-2021-03-24-18-08-30-196.png, > image-2021-03-24-18-08-42-116.png, jstack.l > > > Looks like KubernetesLeaderRetrievalDriver is not closed even if the > KubernetesLeaderElectionDriver is closed and job reach globally terminated. > This will lead to many configmap watching be still active with connections to > K8s. > When the connections exceeds max concurrent requests, those new configmap > watching can not be started. Finally leads to all new jobs submitted timeout. > [~fly_in_gis] [~trohrmann] This may be related to FLINK-20695, could you > confirm this issue? > But when many jobs are running in same session cluster, the config map > watching is required to be active. Maybe we should merge all config maps > watching? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-21998) Copy more code from hive and move them to a dedicated package
[ https://issues.apache.org/jira/browse/FLINK-21998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-21998: --- Labels: pull-request-available (was: ) > Copy more code from hive and move them to a dedicated package > - > > Key: FLINK-21998 > URL: https://issues.apache.org/jira/browse/FLINK-21998 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive >Reporter: Rui Li >Priority: Major > Labels: pull-request-available > Fix For: 1.13.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] lirui-apache opened a new pull request #15406: [FLINK-21998][hive] Copy more code from hive and move them to a dedicated package
lirui-apache opened a new pull request #15406: URL: https://github.com/apache/flink/pull/15406 ## What is the purpose of the change Copy more code from hive and move them to a dedicated package ## Brief change log - Move the copied hive classes to package `org.apache.flink.table.planner.delegation.hive.copy` - Copy more hive code ## Verifying this change Existing tests ## Does this pull request potentially affect one of the following parts: NA ## Documentation NA -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-21942) KubernetesLeaderRetrievalDriver not closed after terminated which lead to connection leak
[ https://issues.apache.org/jira/browse/FLINK-21942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310438#comment-17310438 ] Yang Wang commented on FLINK-21942: --- I will attach a PR for this issue. It will solve the Kubernetes ConfigMap watch leak, as well as the ZooKeeper connection. Actually, we have the same issue for ZooKeeper HA service for long time. Unfortunately, I still have not enough time to find out the root cause why we only could run about 20 jobs in a Kubernetes session cluster with HA enabled. Since it is a separate issue, I will create another ticket. > KubernetesLeaderRetrievalDriver not closed after terminated which lead to > connection leak > - > > Key: FLINK-21942 > URL: https://issues.apache.org/jira/browse/FLINK-21942 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.12.2, 1.13.0 >Reporter: Yi Tang >Priority: Major > Labels: k8s-ha > Fix For: 1.13.0 > > Attachments: image-2021-03-24-18-08-30-196.png, > image-2021-03-24-18-08-42-116.png, jstack.l > > > Looks like KubernetesLeaderRetrievalDriver is not closed even if the > KubernetesLeaderElectionDriver is closed and job reach globally terminated. > This will lead to many configmap watching be still active with connections to > K8s. > When the connections exceeds max concurrent requests, those new configmap > watching can not be started. Finally leads to all new jobs submitted timeout. > [~fly_in_gis] [~trohrmann] This may be related to FLINK-20695, could you > confirm this issue? > But when many jobs are running in same session cluster, the config map > watching is required to be active. Maybe we should merge all config maps > watching? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot commented on pull request #15405: [Flink-21976] Move Flink ML pipeline API and library code to a separate repository named flink-ml
flinkbot commented on pull request #15405: URL: https://github.com/apache/flink/pull/15405#issuecomment-809101943 ## CI report: * 90f1936af457ab1419790cd750ee1ab45ec04ecb UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15294: [FLINK-21945][streaming] Omit pointwise connections from checkpointing in unaligned checkpoints.
flinkbot edited a comment on pull request #15294: URL: https://github.com/apache/flink/pull/15294#issuecomment-803058203 ## CI report: * b22d2cdd4e457842b585e64089858a0ae8eb9a2b UNKNOWN * 19964baf25121c6f0a4f85d75ffcc568d98dcdaa Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15605) * 6bff4c2e3485c8f0dd09b872652c6a7958836e7f UNKNOWN * ea1d9cb1f72dd92d6910773bc63f6e8343416715 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15253: [FLINK-21808][hive] Support DQL/DML in HiveParser
flinkbot edited a comment on pull request #15253: URL: https://github.com/apache/flink/pull/15253#issuecomment-801069817 ## CI report: * c550f67518cae7d8fe19752af581c582bd7115b9 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15625) * f57250440164af2180629998bc4cfda236d64772 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15650) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15161: [FLINK-20114][connector/kafka,common] Fix a few KafkaSource-related bugs
flinkbot edited a comment on pull request #15161: URL: https://github.com/apache/flink/pull/15161#issuecomment-797177953 ## CI report: * d3f59e302b9860d49ce79252aeb8bd1decaeeabf UNKNOWN * 6fc6cc9a2e6c26e9e8fa62b481237231394078eb Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15507) * 6c17ec7241323224d3053134e82353c63f28a24f Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15649) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #13025: [FLINK-18762][network] Make network buffers per incoming/outgoing channel can be configured separately
flinkbot edited a comment on pull request #13025: URL: https://github.com/apache/flink/pull/13025#issuecomment-666200437 ## CI report: * 17b353709923bd16fbdc07d0936c1302f8ddd537 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15624) * ae949a0492d2e2df7af183be3fa32a77eaada34c UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] Myasuka commented on a change in pull request #15016: [FLINK-21413][state] Clean TtlMapState and TtlListState after all elements are expired
Myasuka commented on a change in pull request #15016: URL: https://github.com/apache/flink/pull/15016#discussion_r603034801 ## File path: flink-runtime/src/test/java/org/apache/flink/runtime/state/ttl/TtlStateTestBase.java ## @@ -471,6 +473,26 @@ public void testRestoreTtlAndRegisterNonTtlStateCompatFailure() throws Exception sbetc.createState(ctx().createStateDescriptor(), ""); } +@Test +public void testIncrementalCleanupWholeState() throws Exception { +assumeTrue(incrementalCleanupSupported()); +initTest(getConfBuilder(TTL).cleanupIncrementally(5, true).build()); +timeProvider.time = 0; +// create enough keys to trigger incremental rehash +updateKeys(0, INC_CLEANUP_ALL_KEYS, ctx().updateEmpty); +// expire all state +timeProvider.time = 120; +// trigger state clean up +for (int i = 0; i < INC_CLEANUP_ALL_KEYS; i++) { +sbetc.setCurrentKey(Integer.toString(i)); +} +// check all state cleaned up +for (int i = 0; i < INC_CLEANUP_ALL_KEYS; i++) { +sbetc.setCurrentKey(Integer.toString(i)); +assertTrue("Original state should be cleared", ctx().isOriginalNull()); Review comment: My original point is that we could avoid to introduce those `isValueOfCurrentKeyNull` related methods by just introducing one single `isCurrentStateTableNull` method within `TtlStateTestBase.java` (no matter what name it is). FLINK-21413 is about to fix a performance problem to avoid many unnecessary empty hash map and state entry, instead of incorrect query result. This new added test should only targets for heap-based keyed state backend and we can safely cast to `AbstractHeapState` here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] Myasuka commented on a change in pull request #15016: [FLINK-21413][state] Clean TtlMapState and TtlListState after all elements are expired
Myasuka commented on a change in pull request #15016: URL: https://github.com/apache/flink/pull/15016#discussion_r603034801 ## File path: flink-runtime/src/test/java/org/apache/flink/runtime/state/ttl/TtlStateTestBase.java ## @@ -471,6 +473,26 @@ public void testRestoreTtlAndRegisterNonTtlStateCompatFailure() throws Exception sbetc.createState(ctx().createStateDescriptor(), ""); } +@Test +public void testIncrementalCleanupWholeState() throws Exception { +assumeTrue(incrementalCleanupSupported()); +initTest(getConfBuilder(TTL).cleanupIncrementally(5, true).build()); +timeProvider.time = 0; +// create enough keys to trigger incremental rehash +updateKeys(0, INC_CLEANUP_ALL_KEYS, ctx().updateEmpty); +// expire all state +timeProvider.time = 120; +// trigger state clean up +for (int i = 0; i < INC_CLEANUP_ALL_KEYS; i++) { +sbetc.setCurrentKey(Integer.toString(i)); +} +// check all state cleaned up +for (int i = 0; i < INC_CLEANUP_ALL_KEYS; i++) { +sbetc.setCurrentKey(Integer.toString(i)); +assertTrue("Original state should be cleared", ctx().isOriginalNull()); Review comment: My original point is that we could avoid to introduce those `isValueOfCurrentKeyNull` related methods by just introduce one single `isCurrentStateTableNull` method within `TtlStateTestBase.java` (no matter what name it is). FLINK-21413 is about to fix a performance problem to avoid many unnecessary empty hash map and state entry, instead of incorrect query result. This new added test should only targets for heap-based keyed state backend and we can safely cast to `AbstractHeapState` here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #15405: [Flink-21976] Move Flink ML pipeline API and library code to a separate repository named flink-ml
flinkbot commented on pull request #15405: URL: https://github.com/apache/flink/pull/15405#issuecomment-809095000 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Automated Checks Last check on commit 90f1936af457ab1419790cd750ee1ab45ec04ecb (Mon Mar 29 06:04:31 UTC 2021) **Warnings:** * **6 pom.xml files were touched**: Check for build and licensing issues. * No documentation files were touched! Remember to keep the Flink docs up to date! * **This pull request references an unassigned [Jira ticket](https://issues.apache.org/jira/browse/FLINK-21976).** According to the [code contribution guide](https://flink.apache.org/contributing/contribute-code.html), tickets need to be assigned before starting with the implementation work. Mention the bot in a comment to re-run the automated checks. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] lindong28 opened a new pull request #15405: [Flink-21976 ] Move Flink ML pipeline API and library code to a separate repository named flink-ml
lindong28 opened a new pull request #15405: URL: https://github.com/apache/flink/pull/15405 ## What is the purpose of the change Remove ML pipeline API and library code from Flink so that we can later add them to https://github.com/apache/flink-ml. ## Brief change log - Remove all files under flink-ml-parent - Remove fink-ml-* related configs from pom.xml ## Verifying this change Pass unit tests. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15294: [FLINK-21945][streaming] Omit pointwise connections from checkpointing in unaligned checkpoints.
flinkbot edited a comment on pull request #15294: URL: https://github.com/apache/flink/pull/15294#issuecomment-803058203 ## CI report: * b22d2cdd4e457842b585e64089858a0ae8eb9a2b UNKNOWN * 19964baf25121c6f0a4f85d75ffcc568d98dcdaa Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15605) * 6bff4c2e3485c8f0dd09b872652c6a7958836e7f UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15253: [FLINK-21808][hive] Support DQL/DML in HiveParser
flinkbot edited a comment on pull request #15253: URL: https://github.com/apache/flink/pull/15253#issuecomment-801069817 ## CI report: * c550f67518cae7d8fe19752af581c582bd7115b9 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15625) * f57250440164af2180629998bc4cfda236d64772 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15161: [FLINK-20114][connector/kafka,common] Fix a few KafkaSource-related bugs
flinkbot edited a comment on pull request #15161: URL: https://github.com/apache/flink/pull/15161#issuecomment-797177953 ## CI report: * d3f59e302b9860d49ce79252aeb8bd1decaeeabf UNKNOWN * 6fc6cc9a2e6c26e9e8fa62b481237231394078eb Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15507) * 6c17ec7241323224d3053134e82353c63f28a24f UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15016: [FLINK-21413][state] Clean TtlMapState and TtlListState after all elements are expired
flinkbot edited a comment on pull request #15016: URL: https://github.com/apache/flink/pull/15016#issuecomment-785640058 ## CI report: * 5d95fe96d132b396f3d59f2c102d63a68ccd0d73 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15646) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-21906) Support computed column syntax for Hive DDL dialect
[ https://issues.apache.org/jira/browse/FLINK-21906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310429#comment-17310429 ] Jark Wu commented on FLINK-21906: - Thanks [~hackergin], I assigned this issue to you. > Support computed column syntax for Hive DDL dialect > --- > > Key: FLINK-21906 > URL: https://issues.apache.org/jira/browse/FLINK-21906 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive >Reporter: Jark Wu >Assignee: jinfeng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (FLINK-21906) Support computed column syntax for Hive DDL dialect
[ https://issues.apache.org/jira/browse/FLINK-21906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jark Wu reassigned FLINK-21906: --- Assignee: jinfeng > Support computed column syntax for Hive DDL dialect > --- > > Key: FLINK-21906 > URL: https://issues.apache.org/jira/browse/FLINK-21906 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive >Reporter: Jark Wu >Assignee: jinfeng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] SteNicholas commented on pull request #15020: [FLINK-21445][kubernetes] Application mode does not set the configuration when building PackagedProgram
SteNicholas commented on pull request #15020: URL: https://github.com/apache/flink/pull/15020#issuecomment-809072744 @tillrohrmann , I will update this pull request through following the comments above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-21989) Add a SupportsSourceWatermark ability interface
[ https://issues.apache.org/jira/browse/FLINK-21989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther closed FLINK-21989. Fix Version/s: 1.13.0 Resolution: Fixed Fixed in 1.13.0: 71122fda952b7bd3b0308182e10da77af1cc67ff > Add a SupportsSourceWatermark ability interface > --- > > Key: FLINK-21989 > URL: https://issues.apache.org/jira/browse/FLINK-21989 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API, Table SQL / Planner >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Major > Labels: pull-request-available > Fix For: 1.13.0 > > > FLINK-21899 added a dedicated function that can be used in watermark > definitions. Currently, the generated watermark strategy is invalid because > of the exception that we throw in the function’s implementation. We should > integrate this concept deeper into the interfaces instead of the need to > implement some expression analyzing utility for every source. > We propose the following interface: > {code} > SupportsSourceWatermark { > void applySourceWatermark() > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] twalthr closed pull request #15388: [FLINK-21989][table] Add a SupportsSourceWatermark ability interface
twalthr closed pull request #15388: URL: https://github.com/apache/flink/pull/15388 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #14839: [FLINK-21353][state] Add DFS-based StateChangelog
flinkbot edited a comment on pull request #14839: URL: https://github.com/apache/flink/pull/14839#issuecomment-772060196 ## CI report: * 426533428e0971d34f6e80acc89fe5a5a72ea2a4 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15638) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] zicat commented on a change in pull request #15247: [FLINK-21833][Table SQL / Runtime] TemporalRowTimeJoinOperator.java will lead to the state expansion by short-life-cycle & huge Row
zicat commented on a change in pull request #15247: URL: https://github.com/apache/flink/pull/15247#discussion_r603014108 ## File path: flink-table/flink-table-runtime-blink/src/main/java/org/apache/flink/table/runtime/operators/join/temporal/TemporalRowTimeJoinOperator.java ## @@ -301,6 +302,8 @@ private void cleanupExpiredVersionInState(long currentWatermark, List r public void cleanupState(long time) { leftState.clear(); rightState.clear(); +nextLeftIndex.clear(); +registeredTimer.clear(); Review comment: Please let me know if there is anything I should change, thx @leonardBang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-21906) Support computed column syntax for Hive DDL dialect
[ https://issues.apache.org/jira/browse/FLINK-21906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310411#comment-17310411 ] jinfeng commented on FLINK-21906: - I learned the code in FLIP-152, what we need to do is to use antlr in flink-hive-connector to implement the syntax of watermark, and convert ast to CreateTableOperation, Maybe I can take a try. This feature is not target to 1.13 release, which means that I have more time to complete it , that's fine > Support computed column syntax for Hive DDL dialect > --- > > Key: FLINK-21906 > URL: https://issues.apache.org/jira/browse/FLINK-21906 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive >Reporter: Jark Wu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21981) Increase the priority of the parameter in flink-conf
[ https://issues.apache.org/jira/browse/FLINK-21981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310410#comment-17310410 ] Xintong Song commented on FLINK-21981: -- Hi [~Bo Cui], I noticed you have filed several tickets, all related to using Flink Yarn deployment. The discussion seems not going smoothly. If you wish, you can reach to my gmail (tonysong820). We can try to setup an offline discussion in Chinese. Hope that helps. > Increase the priority of the parameter in flink-conf > > > Key: FLINK-21981 > URL: https://issues.apache.org/jira/browse/FLINK-21981 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission >Reporter: Bo Cui >Priority: Major > > in my cluster, env has HADOOP_CONF_DIR and YARN_CONF_DIR, but they are hadoop > and yarn conf, not flink conf. and hadoop conf of flink(env.hadoop.conf.dir) > is different from them. so we should use `env.hadoop.conf.dir` first > https://github.com/apache/flink/blob/57e93c90a14a9f5316da821863dd2b335a8f86e0/flink-dist/src/main/flink-bin/bin/config.sh#L256 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #15404: [hotfix][docs] Remove redundant import
flinkbot edited a comment on pull request #15404: URL: https://github.com/apache/flink/pull/15404#issuecomment-809050239 ## CI report: * b1c186c4ac9432c1bda5474b5a1d830062195e71 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15647) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15200: [FLINK-21355] Send changes to the state changelog
flinkbot edited a comment on pull request #15200: URL: https://github.com/apache/flink/pull/15200#issuecomment-798902665 ## CI report: * 5e1342d9916f5c4356c622a40bc27bcbdacde9d7 UNKNOWN * 683a1724ca1074a01a0cbaf986afa4a85537b478 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15639) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15054: [FLINK-13550][runtime][ui] Operator's Flame Graph
flinkbot edited a comment on pull request #15054: URL: https://github.com/apache/flink/pull/15054#issuecomment-788337524 ## CI report: * 26a28f2d83f56cb386e1365fd4df4fb8a2f2bf86 UNKNOWN * 0b5aaf42f2861a38e26a80e25cf2324e7cf06bb7 UNKNOWN * e7f59e2e2811c0718a453572e90a4ad2a900ff03 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15642) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15016: [FLINK-21413][state] Clean TtlMapState and TtlListState after all elements are expired
flinkbot edited a comment on pull request #15016: URL: https://github.com/apache/flink/pull/15016#issuecomment-785640058 ## CI report: * cf3c223658f37cc838409b690e2e1bf93c04f207 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15641) * 5d95fe96d132b396f3d59f2c102d63a68ccd0d73 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15646) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-22005) SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1)
[ https://issues.apache.org/jira/browse/FLINK-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guowei Ma closed FLINK-22005. - Fix Version/s: 1.13.0 Release Note: fix in the master c375f4cfd394c10b54110eac446873055b716b89 Resolution: Fixed > SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1) > > > Key: FLINK-22005 > URL: https://issues.apache.org/jira/browse/FLINK-22005 > Project: Flink > Issue Type: Bug > Components: Table SQL / Client >Affects Versions: 1.13.0 >Reporter: Guowei Ma >Priority: Major > Labels: test-stability > Fix For: 1.13.0 > > > The test fail because of Waiting for Elasticsearch records indefinitely. > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15583&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=19826 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22005) SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1)
[ https://issues.apache.org/jira/browse/FLINK-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310399#comment-17310399 ] Guowei Ma commented on FLINK-22005: --- thanks [~Leonard Xu]. I find that this does not appear in the following test(28/29). So I close this tickets. > SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1) > > > Key: FLINK-22005 > URL: https://issues.apache.org/jira/browse/FLINK-22005 > Project: Flink > Issue Type: Bug > Components: Table SQL / Client >Affects Versions: 1.13.0 >Reporter: Guowei Ma >Priority: Major > Labels: test-stability > > The test fail because of Waiting for Elasticsearch records indefinitely. > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15583&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=19826 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] wuchong commented on a change in pull request #15400: [FLINK-20557][sql-client] Support STATEMENT SET in SQL CLI
wuchong commented on a change in pull request #15400: URL: https://github.com/apache/flink/pull/15400#discussion_r603002125 ## File path: flink-table/flink-sql-client/src/main/java/org/apache/flink/table/client/cli/CliClient.java ## @@ -299,15 +307,23 @@ private void callOperation(Operation operation) { } else if (operation instanceof HelpOperation) { // HELP callHelp(); +} else if (operation instanceof BeginStatementSetOperation) { +// BEGIN STATEMENT SET +callBeginStatementSet(); +} else if (operation instanceof EndStatementSetOperation) { +// END +callEndStatementSet(); +} else if (operation instanceof CatalogSinkModifyOperation) { +// INSERT INTO/OVERWRITE +callInsert((CatalogSinkModifyOperation) operation); +} else if (isStatementSetMode) { Review comment: This looks really hack and hard to maintain what statement are not allowed in statement set. I suggest to check statements at the beginning of this method. ```java if (isStatementSetMode) { // check the current operation is allowed in STATEMENT SET if (!(operation instanceof CatalogSinkModifyOperation || operation instanceof EndStatementSetOperation)) { printError(MESSAGE_STATEMENT_SET_SQL_EXECUTION_ERROR); return; } } ``` ## File path: flink-table/flink-sql-client/src/main/java/org/apache/flink/table/client/cli/CliClient.java ## @@ -412,27 +428,65 @@ private void callSelect(QueryOperation operation) { } private boolean callInsert(CatalogSinkModifyOperation operation) { -printInfo(CliStrings.MESSAGE_SUBMITTING_STATEMENT); - -try { -TableResult result = executor.executeOperation(sessionId, operation); -checkState(result.getJobClient().isPresent()); -terminal.writer() -.println( - CliStrings.messageInfo(CliStrings.MESSAGE_STATEMENT_SUBMITTED) -.toAnsi()); -// keep compatibility with before -terminal.writer() -.println( -String.format( -"Job ID: %s\n", - result.getJobClient().get().getJobID().toString())); -terminal.flush(); +if (isStatementSetMode) { +statementSetOperations.add(operation); +printInfo(CliStrings.MESSAGE_ADD_STATEMENT_TO_STATEMENT_SET); return true; -} catch (SqlExecutionException e) { -printExecutionException(e); +} else { +printInfo(CliStrings.MESSAGE_SUBMITTING_STATEMENT); + +try { +TableResult result = executor.executeOperation(sessionId, operation); +checkState(result.getJobClient().isPresent()); +terminal.writer() +.println( + CliStrings.messageInfo(CliStrings.MESSAGE_STATEMENT_SUBMITTED) +.toAnsi()); +// keep compatibility with before +terminal.writer() +.println( +String.format( +"Job ID: %s\n", + result.getJobClient().get().getJobID().toString())); +terminal.flush(); +return true; +} catch (SqlExecutionException e) { +printExecutionException(e); +} +return false; +} +} + +private void callBeginStatementSet() { +if (isStatementSetMode) { +printStatementSetExecutionException(); +} else { +isStatementSetMode = true; +statementSetOperations = new ArrayList<>(); +printInfo(CliStrings.MESSAGE_BEGIN_STATEMENT_SET); +} +} + +private void callEndStatementSet() { +if (isStatementSetMode) { +isStatementSetMode = false; +printInfo(CliStrings.MESSAGE_SUBMITTING_STATEMENT_SET); + +try { +TableResult result = executor.executeOperation(sessionId, statementSetOperations); +checkState(result.getJobClient().isPresent()); +terminal.writer() +.println( + CliStrings.messageInfo(CliStrings.MESSAGE_STATEMENT_SET_SUBMITTED) +.toAnsi()); +terminal.flush(); Review comment: This logic should be reused with `callInsert`, and we should also consider dml-sync option (could you rebase the branch) ? ## File path: flink-table/flink-sql-client/s
[jira] [Commented] (FLINK-22005) SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1)
[ https://issues.apache.org/jira/browse/FLINK-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310391#comment-17310391 ] Leonard Xu commented on FLINK-22005: [~maguowei] Could you rebase to latest master, this issue has been fixed in [https://github.com/apache/flink/pull/15394#event-4516849115] > SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1) > > > Key: FLINK-22005 > URL: https://issues.apache.org/jira/browse/FLINK-22005 > Project: Flink > Issue Type: Bug > Components: Table SQL / Client >Affects Versions: 1.13.0 >Reporter: Guowei Ma >Priority: Major > Labels: test-stability > > The test fail because of Waiting for Elasticsearch records indefinitely. > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15583&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=19826 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21981) Increase the priority of the parameter in flink-conf
[ https://issues.apache.org/jira/browse/FLINK-21981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310388#comment-17310388 ] Bo Cui commented on FLINK-21981: [~Paul Lin] yes, the current plan is like `export ...`, but since the `env.hadoop.conf.dir` exists and we can use it to reduce `export...`, why not use it first ? > Increase the priority of the parameter in flink-conf > > > Key: FLINK-21981 > URL: https://issues.apache.org/jira/browse/FLINK-21981 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission >Reporter: Bo Cui >Priority: Major > > in my cluster, env has HADOOP_CONF_DIR and YARN_CONF_DIR, but they are hadoop > and yarn conf, not flink conf. and hadoop conf of flink(env.hadoop.conf.dir) > is different from them. so we should use `env.hadoop.conf.dir` first > https://github.com/apache/flink/blob/57e93c90a14a9f5316da821863dd2b335a8f86e0/flink-dist/src/main/flink-bin/bin/config.sh#L256 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21990) SourceStreamTask will always hang if the CheckpointedFunction#snapshotState throws an exception.
[ https://issues.apache.org/jira/browse/FLINK-21990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310387#comment-17310387 ] Kezhu Wang commented on FLINK-21990: A {{disableChaining}} in between failed source and downstream sink passes the test case. But still hang for a while before run into blocking operations. Seems that simple {{Thread.interrupt}} is not that enough, a cooperative {{SourceFunction.cancel}} should help. > SourceStreamTask will always hang if the CheckpointedFunction#snapshotState > throws an exception. > > > Key: FLINK-21990 > URL: https://issues.apache.org/jira/browse/FLINK-21990 > Project: Flink > Issue Type: Bug >Affects Versions: 1.11.0, 1.12.0 >Reporter: ming li >Priority: Critical > > If the source in {{SourceStreamTask}} implements {{CheckpointedFunction}} and > an exception is thrown in the snapshotState method, then the > {{SourceStreamTask}} will always hang. > The main reason is that the checkpoint is executed in the mailbox. When the > {{CheckpointedFunction#snapshotState}} of the source throws an exception, > the StreamTask#cleanUpInvoke will be called, where it will wait for the end > of the {{LegacySourceFunctionThread}} of the source. However, the source > thread does not end by itself (this requires the user to control it), the > {{Task}} will hang at this time, and the JobMaster has no perception of this > behavior. > {code:java} > protected void cleanUpInvoke() throws Exception { > getCompletionFuture().exceptionally(unused -> null).join(); //wait for > the end of the source > // clean up everything we initialized > isRunning = false; > ... > }{code} > I think we should call the cancel method of the source first, and then wait > for the end. > The following is my test code, the test branch is Flink's master branch. > {code:java} > @Test > public void testSourceFailure() throws Exception { > final StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.enableCheckpointing(2000L); > env.setRestartStrategy(RestartStrategies.noRestart()); > env.addSource(new FailedSource()).addSink(new DiscardingSink<>()); > JobGraph jobGraph = > StreamingJobGraphGenerator.createJobGraph(env.getStreamGraph()); > try { > // assert that the job only execute checkpoint once and only failed > once. > TestUtils.submitJobAndWaitForResult( > cluster.getClusterClient(), jobGraph, > getClass().getClassLoader()); > } catch (JobExecutionException jobException) { > Optional throwable = > ExceptionUtils.findThrowable(jobException, > FlinkRuntimeException.class); > Assert.assertTrue(throwable.isPresent()); > Assert.assertEquals( > > CheckpointFailureManager.EXCEEDED_CHECKPOINT_TOLERABLE_FAILURE_MESSAGE, > throwable.get().getMessage()); > } > // assert that the job only failed once. > Assert.assertEquals(1, > StringGeneratingSourceFunction.INITIALIZE_TIMES.get()); > } > private static class FailedSource extends RichParallelSourceFunction > implements CheckpointedFunction { > private transient boolean running; > @Override > public void open(Configuration parameters) throws Exception { > running = true; > } > @Override > public void run(SourceContext ctx) throws Exception { > while (running) { > ctx.collect("test"); > } > } > @Override > public void cancel() { > running = false; > } > @Override > public void snapshotState(FunctionSnapshotContext context) throws > Exception { > throw new RuntimeException("source failed"); > } > @Override > public void initializeState(FunctionInitializationContext context) throws > Exception {} > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18356) Exit code 137 returned from process when testing pyflink
[ https://issues.apache.org/jira/browse/FLINK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310383#comment-17310383 ] Guowei Ma commented on FLINK-18356: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15554&view=logs&j=34f41360-6c0d-54d3-11a1-0292a2def1d9&t=2d56e022-1ace-542f-bf1a-b37dd63243f2&l=9452 > Exit code 137 returned from process when testing pyflink > > > Key: FLINK-18356 > URL: https://issues.apache.org/jira/browse/FLINK-18356 > Project: Flink > Issue Type: Bug > Components: API / Python, Build System / Azure Pipelines >Affects Versions: 1.12.0 >Reporter: Piotr Nowojski >Priority: Major > Labels: test-stability > Fix For: 1.13.0 > > > {noformat} > = test session starts > == > platform linux -- Python 3.7.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1 > cachedir: .tox/py37-cython/.pytest_cache > rootdir: /__w/3/s/flink-python > collected 568 items > pyflink/common/tests/test_configuration.py ..[ > 1%] > pyflink/common/tests/test_execution_config.py ...[ > 5%] > pyflink/dataset/tests/test_execution_environment.py . > ##[error]Exit code 137 returned from process: file name '/bin/docker', > arguments 'exec -i -u 1002 > 97fc4e22522d2ced1f4d23096b8929045d083dd0a99a4233a8b20d0489e9bddb > /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'. > Finishing: Test - python > {noformat} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3729&view=logs&j=9cada3cb-c1d3-5621-16da-0f718fb86602&t=8d78fe4f-d658-5c70-12f8-4921589024c3 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #15403: [hotfix] fix typo, assign processedData to currentProcessedData rather than currentPersistedData.
flinkbot edited a comment on pull request #15403: URL: https://github.com/apache/flink/pull/15403#issuecomment-809045211 ## CI report: * d87a60142ebdb3eef911d51391679b90f4acaa6a Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15645) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #15404: [hotfix][docs] Remove redundant import
flinkbot commented on pull request #15404: URL: https://github.com/apache/flink/pull/15404#issuecomment-809050239 ## CI report: * b1c186c4ac9432c1bda5474b5a1d830062195e71 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15402: [FLINK-21985][table-api] Support Explain Query/Modifcation syntax in Calcite Parser
flinkbot edited a comment on pull request #15402: URL: https://github.com/apache/flink/pull/15402#issuecomment-809045142 ## CI report: * 97ba64b2959826a53be3273d9b565b3fe8f96fdc Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15644) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-18356) Exit code 137 returned from process when testing pyflink
[ https://issues.apache.org/jira/browse/FLINK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310380#comment-17310380 ] Guowei Ma commented on FLINK-18356: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=62110053-334f-5295-a0ab-80dd7e2babbf&l=24506 > Exit code 137 returned from process when testing pyflink > > > Key: FLINK-18356 > URL: https://issues.apache.org/jira/browse/FLINK-18356 > Project: Flink > Issue Type: Bug > Components: API / Python, Build System / Azure Pipelines >Affects Versions: 1.12.0 >Reporter: Piotr Nowojski >Priority: Major > Labels: test-stability > Fix For: 1.13.0 > > > {noformat} > = test session starts > == > platform linux -- Python 3.7.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1 > cachedir: .tox/py37-cython/.pytest_cache > rootdir: /__w/3/s/flink-python > collected 568 items > pyflink/common/tests/test_configuration.py ..[ > 1%] > pyflink/common/tests/test_execution_config.py ...[ > 5%] > pyflink/dataset/tests/test_execution_environment.py . > ##[error]Exit code 137 returned from process: file name '/bin/docker', > arguments 'exec -i -u 1002 > 97fc4e22522d2ced1f4d23096b8929045d083dd0a99a4233a8b20d0489e9bddb > /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'. > Finishing: Test - python > {noformat} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3729&view=logs&j=9cada3cb-c1d3-5621-16da-0f718fb86602&t=8d78fe4f-d658-5c70-12f8-4921589024c3 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21103) E2e tests time out on azure
[ https://issues.apache.org/jira/browse/FLINK-21103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310377#comment-17310377 ] Guowei Ma commented on FLINK-21103: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15575&view=results > E2e tests time out on azure > --- > > Key: FLINK-21103 > URL: https://issues.apache.org/jira/browse/FLINK-21103 > Project: Flink > Issue Type: Bug > Components: Build System / Azure Pipelines, Tests >Affects Versions: 1.11.3, 1.12.1, 1.13.0 >Reporter: Dawid Wysakowicz >Assignee: Dawid Wysakowicz >Priority: Blocker > Labels: test-stability > Fix For: 1.13.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12377&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529 > {code} > Creating worker2 ... done > Jan 22 13:16:17 Waiting for hadoop cluster to come up. We have been trying > for 0 seconds, retrying ... > Jan 22 13:16:22 Waiting for hadoop cluster to come up. We have been trying > for 5 seconds, retrying ... > Jan 22 13:16:27 Waiting for hadoop cluster to come up. We have been trying > for 10 seconds, retrying ... > Jan 22 13:16:32 Waiting for hadoop cluster to come up. We have been trying > for 15 seconds, retrying ... > Jan 22 13:16:37 Waiting for hadoop cluster to come up. We have been trying > for 20 seconds, retrying ... > Jan 22 13:16:43 Waiting for hadoop cluster to come up. We have been trying > for 26 seconds, retrying ... > Jan 22 13:16:48 Waiting for hadoop cluster to come up. We have been trying > for 31 seconds, retrying ... > Jan 22 13:16:53 Waiting for hadoop cluster to come up. We have been trying > for 36 seconds, retrying ... > Jan 22 13:16:58 Waiting for hadoop cluster to come up. We have been trying > for 41 seconds, retrying ... > Jan 22 13:17:03 Waiting for hadoop cluster to come up. We have been trying > for 46 seconds, retrying ... > Jan 22 13:17:08 We only have 0 NodeManagers up. We have been trying for 0 > seconds, retrying ... > 21/01/22 13:17:10 INFO client.RMProxy: Connecting to ResourceManager at > master.docker-hadoop-cluster-network/172.19.0.3:8032 > 21/01/22 13:17:11 INFO client.AHSProxy: Connecting to Application History > server at master.docker-hadoop-cluster-network/172.19.0.3:10200 > Jan 22 13:17:11 We now have 2 NodeManagers up. > == > === WARNING: This E2E Run took already 80% of the allocated time budget of > 250 minutes === > == > == > === WARNING: This E2E Run will time out in the next few minutes. Starting to > upload the log output === > == > ##[error]The task has timed out. > Async Command Start: Upload Artifact > Uploading 1 files > File upload succeed. > Upload '/tmp/_e2e_watchdog.output.0' to file container: > '#/11824779/e2e-timeout-logs' > Associated artifact 140921 with build 12377 > Async Command End: Upload Artifact > Async Command Start: Upload Artifact > Uploading 1 files > File upload succeed. > Upload '/tmp/_e2e_watchdog.output.1' to file container: > '#/11824779/e2e-timeout-logs' > Associated artifact 140921 with build 12377 > Async Command End: Upload Artifact > Async Command Start: Upload Artifact > Uploading 1 files > File upload succeed. > Upload '/tmp/_e2e_watchdog.output.2' to file container: > '#/11824779/e2e-timeout-logs' > Associated artifact 140921 with build 12377 > Async Command End: Upload Artifact > Finishing: Run e2e tests > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] YikSanChan commented on pull request #15360: [FLINK-21938][docs] Add how to unit test python udfs
YikSanChan commented on pull request #15360: URL: https://github.com/apache/flink/pull/15360#issuecomment-809048667 @rmetzger Hi Robert, is there anything I need to do here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-21954) Test failures occur due to the test not waiting for the ExecutionGraph to be created
[ https://issues.apache.org/jira/browse/FLINK-21954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310374#comment-17310374 ] Guowei Ma commented on FLINK-21954: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15573&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7030a106-e977-5851-a05e-535de648c9c9&l=8502 > Test failures occur due to the test not waiting for the ExecutionGraph to be > created > > > Key: FLINK-21954 > URL: https://issues.apache.org/jira/browse/FLINK-21954 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: Matthias >Priority: Major > Labels: test-stability > > Various tests are failing due the test not waiting for the ExecutionGraph to > be created: > * > [JobMasterTest.testRestoringFromSavepoint|https://dev.azure.com/mapohl/flink/_build/results?buildId=356&view=logs&j=243b38e1-22e7-598a-c8ae-385dce2c28b5&t=fea482b6-4f61-51f4-2584-f73df532b395&l=8266] > * {{JobMasterTest.testRequestNextInputSplitWithGlobalFailover}} > * {{JobMasterTest.testRequestNextInputSplitWithLocalFailover}} (also fails > due to FLINK-21450) > * {{JobMasterQueryableStateTest.testRequestKvStateOfWrongJob}} > * {{JobMasterQueryableStateTest.testRequestKvStateWithIrrelevantRegistration}} > * {{JobMasterQueryableStateTest.testDuplicatedKvStateRegistrationsFailTask}} > * {{JobMasterQueryableStateTest.testRegisterKvState}} > We might have to double-check whether other tests are affected as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21659) Running HA per-job cluster (rocks, incremental) end-to-end test fails
[ https://issues.apache.org/jira/browse/FLINK-21659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310373#comment-17310373 ] Guowei Ma commented on FLINK-21659: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15573&view=logs&j=4dd4dbdd-1802-5eb7-a518-6acd9d24d0fc&t=8d6b4dd3-4ca1-5611-1743-57a7d76b395a&l=1733 > Running HA per-job cluster (rocks, incremental) end-to-end test fails > - > > Key: FLINK-21659 > URL: https://issues.apache.org/jira/browse/FLINK-21659 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.13.0 >Reporter: Guowei Ma >Priority: Critical > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=14232&view=logs&j=4dd4dbdd-1802-5eb7-a518-6acd9d24d0fc&t=8d6b4dd3-4ca1-5611-1743-57a7d76b395a > It seems that the task deploy tasks to the TaskManager0 failed and it causes > the checkpoint fails. > {code:java} > java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException: Invocation of public abstract > java.util.concurrent.CompletableFuture > org.apache.flink.runtime.taskexecutor.TaskExecutorGateway.submitTask(org.apache.flink.runtime.deployment.TaskDeploymentDescriptor,org.apache.flink.runtime.jobmaster.JobMasterId,org.apache.flink.api.common.time.Time) > timed out. > at > java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326) > ~[?:1.8.0_282] > at > java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338) > ~[?:1.8.0_282] > at > java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:925) > ~[?:1.8.0_282] > at > java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:913) > ~[?:1.8.0_282] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_282] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_282] > at > org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:234) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > ~[?:1.8.0_282] > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > ~[?:1.8.0_282] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_282] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_282] > at > org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:1064) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at akka.dispatch.OnComplete.internal(Future.scala:263) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at akka.dispatch.OnComplete.internal(Future.scala:261) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at akka.dispatch.japi$CallbackBridge.apply(Future.scala:191) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at akka.dispatch.japi$CallbackBridge.apply(Future.scala:188) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at > org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:73) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at > akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:644) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at > scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at > scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at > scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at > akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328) >
[GitHub] [flink] flinkbot commented on pull request #15404: [hotfix][docs] Remove redundant import
flinkbot commented on pull request #15404: URL: https://github.com/apache/flink/pull/15404#issuecomment-809047367 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Automated Checks Last check on commit b1c186c4ac9432c1bda5474b5a1d830062195e71 (Mon Mar 29 03:54:18 UTC 2021) **Warnings:** * No documentation files were touched! Remember to keep the Flink docs up to date! Mention the bot in a comment to re-run the automated checks. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-21990) SourceStreamTask will always hang if the CheckpointedFunction#snapshotState throws an exception.
[ https://issues.apache.org/jira/browse/FLINK-21990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310372#comment-17310372 ] Kezhu Wang commented on FLINK-21990: Ideally, this should be fixed in [1.12.2|https://github.com/apache/flink/blob/release-1.12.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/SourceStreamTask.java#L175] and [master|https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/SourceStreamTask.java#L181]. But I do run to hang in test case(modified version). Either that run does not touch blocking operations or interruption state was eaten up somewhere(FLINK-21186 ?) I think that fix worth a test to gain confidence on this. cc [~AHeise] > SourceStreamTask will always hang if the CheckpointedFunction#snapshotState > throws an exception. > > > Key: FLINK-21990 > URL: https://issues.apache.org/jira/browse/FLINK-21990 > Project: Flink > Issue Type: Bug >Affects Versions: 1.11.0, 1.12.0 >Reporter: ming li >Priority: Critical > > If the source in {{SourceStreamTask}} implements {{CheckpointedFunction}} and > an exception is thrown in the snapshotState method, then the > {{SourceStreamTask}} will always hang. > The main reason is that the checkpoint is executed in the mailbox. When the > {{CheckpointedFunction#snapshotState}} of the source throws an exception, > the StreamTask#cleanUpInvoke will be called, where it will wait for the end > of the {{LegacySourceFunctionThread}} of the source. However, the source > thread does not end by itself (this requires the user to control it), the > {{Task}} will hang at this time, and the JobMaster has no perception of this > behavior. > {code:java} > protected void cleanUpInvoke() throws Exception { > getCompletionFuture().exceptionally(unused -> null).join(); //wait for > the end of the source > // clean up everything we initialized > isRunning = false; > ... > }{code} > I think we should call the cancel method of the source first, and then wait > for the end. > The following is my test code, the test branch is Flink's master branch. > {code:java} > @Test > public void testSourceFailure() throws Exception { > final StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.enableCheckpointing(2000L); > env.setRestartStrategy(RestartStrategies.noRestart()); > env.addSource(new FailedSource()).addSink(new DiscardingSink<>()); > JobGraph jobGraph = > StreamingJobGraphGenerator.createJobGraph(env.getStreamGraph()); > try { > // assert that the job only execute checkpoint once and only failed > once. > TestUtils.submitJobAndWaitForResult( > cluster.getClusterClient(), jobGraph, > getClass().getClassLoader()); > } catch (JobExecutionException jobException) { > Optional throwable = > ExceptionUtils.findThrowable(jobException, > FlinkRuntimeException.class); > Assert.assertTrue(throwable.isPresent()); > Assert.assertEquals( > > CheckpointFailureManager.EXCEEDED_CHECKPOINT_TOLERABLE_FAILURE_MESSAGE, > throwable.get().getMessage()); > } > // assert that the job only failed once. > Assert.assertEquals(1, > StringGeneratingSourceFunction.INITIALIZE_TIMES.get()); > } > private static class FailedSource extends RichParallelSourceFunction > implements CheckpointedFunction { > private transient boolean running; > @Override > public void open(Configuration parameters) throws Exception { > running = true; > } > @Override > public void run(SourceContext ctx) throws Exception { > while (running) { > ctx.collect("test"); > } > } > @Override > public void cancel() { > running = false; > } > @Override > public void snapshotState(FunctionSnapshotContext context) throws > Exception { > throw new RuntimeException("source failed"); > } > @Override > public void initializeState(FunctionInitializationContext context) throws > Exception {} > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] YikSanChan opened a new pull request #15404: [hotfix][docs] Remove redundant import
YikSanChan opened a new pull request #15404: URL: https://github.com/apache/flink/pull/15404 ## What is the purpose of the change *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)* ## Brief change log *(for example:)* - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact* - *Deployments RPC transmits only the blob storage reference* - *TaskManagers retrieve the TaskInfo from the blob cache* ## Verifying this change *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. *(or)* This change is already covered by existing tests, such as *(please describe tests)*. *(or)* This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end deployment with large payloads (100MB)* - *Extended integration test for recovery after master (JobManager) failure* - *Added test that validates that TaskInfo is transferred only once across recoveries* - *Manually verified the change by running a 4 node cluser with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no) - The serializers: (yes / no / don't know) - The runtime per-record code paths (performance sensitive): (yes / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / no / don't know) - The S3 file system connector: (yes / no / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-21981) Increase the priority of the parameter in flink-conf
[ https://issues.apache.org/jira/browse/FLINK-21981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310371#comment-17310371 ] Paul Lin commented on FLINK-21981: -- [~Bo Cui] Just to confirm, did you try exporting `HADOOP_CONF_DIR` before run any flink commands? It should work for multi-tenant scenarios. Env variables are more flexible and environment-specific, thus should take higher priority than static configurations. > Increase the priority of the parameter in flink-conf > > > Key: FLINK-21981 > URL: https://issues.apache.org/jira/browse/FLINK-21981 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission >Reporter: Bo Cui >Priority: Major > > in my cluster, env has HADOOP_CONF_DIR and YARN_CONF_DIR, but they are hadoop > and yarn conf, not flink conf. and hadoop conf of flink(env.hadoop.conf.dir) > is different from them. so we should use `env.hadoop.conf.dir` first > https://github.com/apache/flink/blob/57e93c90a14a9f5316da821863dd2b335a8f86e0/flink-dist/src/main/flink-bin/bin/config.sh#L256 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21103) E2e tests time out on azure
[ https://issues.apache.org/jira/browse/FLINK-21103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310370#comment-17310370 ] Guowei Ma commented on FLINK-21103: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15573&view=results > E2e tests time out on azure > --- > > Key: FLINK-21103 > URL: https://issues.apache.org/jira/browse/FLINK-21103 > Project: Flink > Issue Type: Bug > Components: Build System / Azure Pipelines, Tests >Affects Versions: 1.11.3, 1.12.1, 1.13.0 >Reporter: Dawid Wysakowicz >Assignee: Dawid Wysakowicz >Priority: Blocker > Labels: test-stability > Fix For: 1.13.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12377&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529 > {code} > Creating worker2 ... done > Jan 22 13:16:17 Waiting for hadoop cluster to come up. We have been trying > for 0 seconds, retrying ... > Jan 22 13:16:22 Waiting for hadoop cluster to come up. We have been trying > for 5 seconds, retrying ... > Jan 22 13:16:27 Waiting for hadoop cluster to come up. We have been trying > for 10 seconds, retrying ... > Jan 22 13:16:32 Waiting for hadoop cluster to come up. We have been trying > for 15 seconds, retrying ... > Jan 22 13:16:37 Waiting for hadoop cluster to come up. We have been trying > for 20 seconds, retrying ... > Jan 22 13:16:43 Waiting for hadoop cluster to come up. We have been trying > for 26 seconds, retrying ... > Jan 22 13:16:48 Waiting for hadoop cluster to come up. We have been trying > for 31 seconds, retrying ... > Jan 22 13:16:53 Waiting for hadoop cluster to come up. We have been trying > for 36 seconds, retrying ... > Jan 22 13:16:58 Waiting for hadoop cluster to come up. We have been trying > for 41 seconds, retrying ... > Jan 22 13:17:03 Waiting for hadoop cluster to come up. We have been trying > for 46 seconds, retrying ... > Jan 22 13:17:08 We only have 0 NodeManagers up. We have been trying for 0 > seconds, retrying ... > 21/01/22 13:17:10 INFO client.RMProxy: Connecting to ResourceManager at > master.docker-hadoop-cluster-network/172.19.0.3:8032 > 21/01/22 13:17:11 INFO client.AHSProxy: Connecting to Application History > server at master.docker-hadoop-cluster-network/172.19.0.3:10200 > Jan 22 13:17:11 We now have 2 NodeManagers up. > == > === WARNING: This E2E Run took already 80% of the allocated time budget of > 250 minutes === > == > == > === WARNING: This E2E Run will time out in the next few minutes. Starting to > upload the log output === > == > ##[error]The task has timed out. > Async Command Start: Upload Artifact > Uploading 1 files > File upload succeed. > Upload '/tmp/_e2e_watchdog.output.0' to file container: > '#/11824779/e2e-timeout-logs' > Associated artifact 140921 with build 12377 > Async Command End: Upload Artifact > Async Command Start: Upload Artifact > Uploading 1 files > File upload succeed. > Upload '/tmp/_e2e_watchdog.output.1' to file container: > '#/11824779/e2e-timeout-logs' > Associated artifact 140921 with build 12377 > Async Command End: Upload Artifact > Async Command Start: Upload Artifact > Uploading 1 files > File upload succeed. > Upload '/tmp/_e2e_watchdog.output.2' to file container: > '#/11824779/e2e-timeout-logs' > Associated artifact 140921 with build 12377 > Async Command End: Upload Artifact > Finishing: Run e2e tests > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22005) SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1)
[ https://issues.apache.org/jira/browse/FLINK-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310368#comment-17310368 ] Guowei Ma commented on FLINK-22005: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15580&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=48499 > SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1) > > > Key: FLINK-22005 > URL: https://issues.apache.org/jira/browse/FLINK-22005 > Project: Flink > Issue Type: Bug > Components: Table SQL / Client >Affects Versions: 1.13.0 >Reporter: Guowei Ma >Priority: Major > Labels: test-stability > > The test fail because of Waiting for Elasticsearch records indefinitely. > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15583&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=19826 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-22005) SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1)
[ https://issues.apache.org/jira/browse/FLINK-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310366#comment-17310366 ] Guowei Ma commented on FLINK-22005: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15589&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=58087 > SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1) > > > Key: FLINK-22005 > URL: https://issues.apache.org/jira/browse/FLINK-22005 > Project: Flink > Issue Type: Bug > Components: Table SQL / Client >Affects Versions: 1.13.0 >Reporter: Guowei Ma >Priority: Major > Labels: test-stability > > The test fail because of Waiting for Elasticsearch records indefinitely. > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15583&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=19826 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-21103) E2e tests time out on azure
[ https://issues.apache.org/jira/browse/FLINK-21103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307542#comment-17307542 ] Guowei Ma edited comment on FLINK-21103 at 3/29/21, 3:47 AM: - on branch 1.13 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15272&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15622&view=results was (Author: maguowei): on branch 1.13 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15272&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15622&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15589&view=results > E2e tests time out on azure > --- > > Key: FLINK-21103 > URL: https://issues.apache.org/jira/browse/FLINK-21103 > Project: Flink > Issue Type: Bug > Components: Build System / Azure Pipelines, Tests >Affects Versions: 1.11.3, 1.12.1, 1.13.0 >Reporter: Dawid Wysakowicz >Assignee: Dawid Wysakowicz >Priority: Blocker > Labels: test-stability > Fix For: 1.13.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12377&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529 > {code} > Creating worker2 ... done > Jan 22 13:16:17 Waiting for hadoop cluster to come up. We have been trying > for 0 seconds, retrying ... > Jan 22 13:16:22 Waiting for hadoop cluster to come up. We have been trying > for 5 seconds, retrying ... > Jan 22 13:16:27 Waiting for hadoop cluster to come up. We have been trying > for 10 seconds, retrying ... > Jan 22 13:16:32 Waiting for hadoop cluster to come up. We have been trying > for 15 seconds, retrying ... > Jan 22 13:16:37 Waiting for hadoop cluster to come up. We have been trying > for 20 seconds, retrying ... > Jan 22 13:16:43 Waiting for hadoop cluster to come up. We have been trying > for 26 seconds, retrying ... > Jan 22 13:16:48 Waiting for hadoop cluster to come up. We have been trying > for 31 seconds, retrying ... > Jan 22 13:16:53 Waiting for hadoop cluster to come up. We have been trying > for 36 seconds, retrying ... > Jan 22 13:16:58 Waiting for hadoop cluster to come up. We have been trying > for 41 seconds, retrying ... > Jan 22 13:17:03 Waiting for hadoop cluster to come up. We have been trying > for 46 seconds, retrying ... > Jan 22 13:17:08 We only have 0 NodeManagers up. We have been trying for 0 > seconds, retrying ... > 21/01/22 13:17:10 INFO client.RMProxy: Connecting to ResourceManager at > master.docker-hadoop-cluster-network/172.19.0.3:8032 > 21/01/22 13:17:11 INFO client.AHSProxy: Connecting to Application History > server at master.docker-hadoop-cluster-network/172.19.0.3:10200 > Jan 22 13:17:11 We now have 2 NodeManagers up. > == > === WARNING: This E2E Run took already 80% of the allocated time budget of > 250 minutes === > == > == > === WARNING: This E2E Run will time out in the next few minutes. Starting to > upload the log output === > == > ##[error]The task has timed out. > Async Command Start: Upload Artifact > Uploading 1 files > File upload succeed. > Upload '/tmp/_e2e_watchdog.output.0' to file container: > '#/11824779/e2e-timeout-logs' > Associated artifact 140921 with build 12377 > Async Command End: Upload Artifact > Async Command Start: Upload Artifact > Uploading 1 files > File upload succeed. > Upload '/tmp/_e2e_watchdog.output.1' to file container: > '#/11824779/e2e-timeout-logs' > Associated artifact 140921 with build 12377 > Async Command End: Upload Artifact > Async Command Start: Upload Artifact > Uploading 1 files > File upload succeed. > Upload '/tmp/_e2e_watchdog.output.2' to file container: > '#/11824779/e2e-timeout-logs' > Associated artifact 140921 with build 12377 > Async Command End: Upload Artifact > Finishing: Run e2e tests > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] fsk119 commented on a change in pull request #15402: [FLINK-21985][table-api] Support Explain Query/Modifcation syntax in Calcite Parser
fsk119 commented on a change in pull request #15402: URL: https://github.com/apache/flink/pull/15402#discussion_r602994006 ## File path: flink-table/flink-sql-parser-hive/src/main/codegen/includes/parserImpls.ftl ## @@ -1605,3 +1605,17 @@ SqlShowModules SqlShowModules() : return new SqlShowModules(startPos.plus(getPos()), requireFull); } } + +/** +* Parses a explain module statement. +*/ +SqlNode SqlRichExplain() : +{ +SqlNode stmt; +} +{ + ( )* +stmt = SqlQueryOrDml() { +return new SqlRichExplain(getPos(),stmt); +} +} Review comment: Add a new line ## File path: flink-table/flink-sql-parser-hive/src/test/java/org/apache/flink/sql/parser/hive/FlinkHiveSqlParserImplTest.java ## @@ -482,4 +482,54 @@ public void testShowModules() { sql("show full modules").ok("SHOW FULL MODULES"); } + +@Test +public void testExplain() { +String sql = "explain plan for select * from emps"; +String expected = "EXPLAIN SELECT *\n" + "FROM `EMPS`"; +this.sql(sql).ok(expected); +} + +@Test +public void testExplainJsonFormat() { +// unsupport testExplainJsonFormat now +} + +@Test +public void testExplainWithImpl() { +// unsupport testExplainWithImpl now +} + +@Test +public void testExplainWithoutImpl() { +// unsupport testExplainWithoutImpl now +} + +@Test +public void testExplainWithType() { +// unsupport testExplainWithType now +} + +@Test +public void testExplainAsXml() { +// unsupport testExplainWithType now +} + +@Test +public void testExplainAsJson() { +// unsupport testExplainWithType now +} Review comment: Add comment: // TODO: FLINK-20562 ## File path: flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/calcite/FlinkPlannerImpl.scala ## @@ -146,6 +146,10 @@ class FlinkPlannerImpl( val validated = validator.validate(explain.getExplicandum) explain.setOperand(0, validated) explain +case richExplain: SqlRichExplain => + val validated = validator.validate(richExplain.getStatement) + richExplain.setOperand(0,validated) + richExplain Review comment: Move forward before `SqlExplain`? Do we really need SqlExplain? ## File path: flink-table/flink-sql-parser/src/test/java/org/apache/flink/sql/parser/FlinkSqlParserImplTest.java ## @@ -1215,6 +1214,56 @@ public void testEnd() { sql("end").ok("END"); } +@Test +public void testExplain() { +String sql = "explain plan for select * from emps"; +String expected = "EXPLAIN SELECT *\n" + "FROM `EMPS`"; +this.sql(sql).ok(expected); +} + +@Test +public void testExplainJsonFormat() { +// unsupport testExplainJsonFormat now +} + +@Test +public void testExplainWithImpl() { +// unsupport testExplainWithImpl now +} + +@Test +public void testExplainWithoutImpl() { +// unsupport testExplainWithoutImpl now +} + +@Test +public void testExplainWithType() { +// unsupport testExplainWithType now +} + +@Test +public void testExplainAsXml() { +// unsupport testExplainWithType now +} + +@Test +public void testExplainAsJson() { +// unsupport testExplainWithType now Review comment: ditto ## File path: flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/calcite/FlinkPlannerImpl.scala ## @@ -18,28 +18,26 @@ package org.apache.flink.table.calcite -import org.apache.flink.sql.parser.ExtendedSqlNode -import org.apache.flink.sql.parser.dql.{SqlRichDescribeTable, SqlShowCatalogs, SqlShowCurrentCatalog, SqlShowCurrentDatabase, SqlShowDatabases, SqlShowFunctions, SqlShowTables, SqlShowViews} -import org.apache.flink.table.api.{TableException, ValidationException} -import org.apache.flink.table.catalog.CatalogReader -import org.apache.flink.table.parse.CalciteParser +import _root_.java.lang.{Boolean => JBoolean} +import _root_.java.util +import _root_.java.util.function.{Function => JFunction} import org.apache.calcite.plan.RelOptTable.ViewExpander import org.apache.calcite.plan._ import org.apache.calcite.rel.RelRoot import org.apache.calcite.rel.`type`.RelDataType import org.apache.calcite.rex.RexBuilder -import org.apache.calcite.sql.advise.{SqlAdvisor, SqlAdvisorValidator} +import org.apache.calcite.sql.advise.SqlAdvisorValidator import org.apache.calcite.sql.validate.SqlValidator import org.apache.calcite.sql.{SqlExplain, SqlKind, SqlNode, SqlOperatorTable} import org.apache.calcite.sql2rel.{SqlRexConvertletTable, SqlToRelConverter} import org.apache.calcite.tools.{FrameworkConfig, RelConversionException} +import org.apache.flink.sql.parser.ExtendedSqlNode +import org.apache.flink.
[jira] [Comment Edited] (FLINK-21103) E2e tests time out on azure
[ https://issues.apache.org/jira/browse/FLINK-21103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307542#comment-17307542 ] Guowei Ma edited comment on FLINK-21103 at 3/29/21, 3:45 AM: - on branch 1.13 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15272&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15622&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15589&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15584&view=results was (Author: maguowei): on branch 1.13 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15272&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15622&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15589&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15584&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15583&view=results > E2e tests time out on azure > --- > > Key: FLINK-21103 > URL: https://issues.apache.org/jira/browse/FLINK-21103 > Project: Flink > Issue Type: Bug > Components: Build System / Azure Pipelines, Tests >Affects Versions: 1.11.3, 1.12.1, 1.13.0 >Reporter: Dawid Wysakowicz >Assignee: Dawid Wysakowicz >Priority: Blocker > Labels: test-stability > Fix For: 1.13.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12377&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529 > {code} > Creating worker2 ... done > Jan 22 13:16:17 Waiting for hadoop cluster to come up. We have been trying > for 0 seconds, retrying ... > Jan 22 13:16:22 Waiting for hadoop cluster to come up. We have been trying > for 5 seconds, retrying ... > Jan 22 13:16:27 Waiting for hadoop cluster to come up. We have been trying > for 10 seconds, retrying ... > Jan 22 13:16:32 Waiting for hadoop cluster to come up. We have been trying > for 15 seconds, retrying ... > Jan 22 13:16:37 Waiting for hadoop cluster to come up. We have been trying > for 20 seconds, retrying ... > Jan 22 13:16:43 Waiting for hadoop cluster to come up. We have been trying > for 26 seconds, retrying ... > Jan 22 13:16:48 Waiting for hadoop cluster to come up. We have been trying > for 31 seconds, retrying ... > Jan 22 13:16:53 Waiting for hadoop cluster to come up. We have been trying > for 36 seconds, retrying ... > Jan 22 13:16:58 Waiting for hadoop cluster to come up. We have been trying > for 41 seconds, retrying ... > Jan 22 13:17:03 Waiting for hadoop cluster to come up. We have been trying > for 46 seconds, retrying ... > Jan 22 13:17:08 We only have 0 NodeManagers up. We have been trying for 0 > seconds, retrying ... > 21/01/22 13:17:10 INFO client.RMProxy: Connecting to ResourceManager at > master.docker-hadoop-cluster-network/172.19.0.3:8032 > 21/01/22 13:17:11 INFO client.AHSProxy: Connecting to Application History > server at master.docker-hadoop-cluster-network/172.19.0.3:10200 > Jan 22 13:17:11 We now have 2 NodeManagers up. > == > === WARNING: This E2E Run took already 80% of the allocated time budget of > 250 minutes === > == > == > === WARNING: This E2E Run will time out in the next few minutes. Starting to > upload the log output === > == > ##[error]The task has timed out. > Async Command Start: Upload Artifact > Uploading 1 files > File upload succeed. > Upload '/tmp/_e2e_watchdog.output.0' to file container: > '#/11824779/e2e-timeout-logs' > Associated artifact 140921 with build 12377 > Async Command End: Upload Artifact > Async Command Start: Upload Artifact > Uploading 1 files > File upload succeed. > Upload '/tmp/_e2e_watchdog.output.1' to file container: > '#/11824779/e2e-timeout-logs' > Associated artifact 140921 with build 12377 > Async Command End: Upload Artifact > Async Command Start: Upload Artifact > Uploading 1 files > File upload succeed. > Upload '/tmp/_e2e_watchdog.output.2' to file container: > '#/11824779/e2e-timeout-logs' > Associated artifact 140921 with build 12377 > Async Command End: Upload Artifact > Finishing: Run e2e tests > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot commented on pull request #15403: [hotfix] fix typo, assign processedData to currentProcessedData rather than currentPersistedData.
flinkbot commented on pull request #15403: URL: https://github.com/apache/flink/pull/15403#issuecomment-809045211 ## CI report: * d87a60142ebdb3eef911d51391679b90f4acaa6a UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-21103) E2e tests time out on azure
[ https://issues.apache.org/jira/browse/FLINK-21103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307542#comment-17307542 ] Guowei Ma edited comment on FLINK-21103 at 3/29/21, 3:45 AM: - on branch 1.13 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15272&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15622&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15589&view=results was (Author: maguowei): on branch 1.13 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15272&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15622&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15589&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15584&view=results > E2e tests time out on azure > --- > > Key: FLINK-21103 > URL: https://issues.apache.org/jira/browse/FLINK-21103 > Project: Flink > Issue Type: Bug > Components: Build System / Azure Pipelines, Tests >Affects Versions: 1.11.3, 1.12.1, 1.13.0 >Reporter: Dawid Wysakowicz >Assignee: Dawid Wysakowicz >Priority: Blocker > Labels: test-stability > Fix For: 1.13.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12377&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529 > {code} > Creating worker2 ... done > Jan 22 13:16:17 Waiting for hadoop cluster to come up. We have been trying > for 0 seconds, retrying ... > Jan 22 13:16:22 Waiting for hadoop cluster to come up. We have been trying > for 5 seconds, retrying ... > Jan 22 13:16:27 Waiting for hadoop cluster to come up. We have been trying > for 10 seconds, retrying ... > Jan 22 13:16:32 Waiting for hadoop cluster to come up. We have been trying > for 15 seconds, retrying ... > Jan 22 13:16:37 Waiting for hadoop cluster to come up. We have been trying > for 20 seconds, retrying ... > Jan 22 13:16:43 Waiting for hadoop cluster to come up. We have been trying > for 26 seconds, retrying ... > Jan 22 13:16:48 Waiting for hadoop cluster to come up. We have been trying > for 31 seconds, retrying ... > Jan 22 13:16:53 Waiting for hadoop cluster to come up. We have been trying > for 36 seconds, retrying ... > Jan 22 13:16:58 Waiting for hadoop cluster to come up. We have been trying > for 41 seconds, retrying ... > Jan 22 13:17:03 Waiting for hadoop cluster to come up. We have been trying > for 46 seconds, retrying ... > Jan 22 13:17:08 We only have 0 NodeManagers up. We have been trying for 0 > seconds, retrying ... > 21/01/22 13:17:10 INFO client.RMProxy: Connecting to ResourceManager at > master.docker-hadoop-cluster-network/172.19.0.3:8032 > 21/01/22 13:17:11 INFO client.AHSProxy: Connecting to Application History > server at master.docker-hadoop-cluster-network/172.19.0.3:10200 > Jan 22 13:17:11 We now have 2 NodeManagers up. > == > === WARNING: This E2E Run took already 80% of the allocated time budget of > 250 minutes === > == > == > === WARNING: This E2E Run will time out in the next few minutes. Starting to > upload the log output === > == > ##[error]The task has timed out. > Async Command Start: Upload Artifact > Uploading 1 files > File upload succeed. > Upload '/tmp/_e2e_watchdog.output.0' to file container: > '#/11824779/e2e-timeout-logs' > Associated artifact 140921 with build 12377 > Async Command End: Upload Artifact > Async Command Start: Upload Artifact > Uploading 1 files > File upload succeed. > Upload '/tmp/_e2e_watchdog.output.1' to file container: > '#/11824779/e2e-timeout-logs' > Associated artifact 140921 with build 12377 > Async Command End: Upload Artifact > Async Command Start: Upload Artifact > Uploading 1 files > File upload succeed. > Upload '/tmp/_e2e_watchdog.output.2' to file container: > '#/11824779/e2e-timeout-logs' > Associated artifact 140921 with build 12377 > Async Command End: Upload Artifact > Finishing: Run e2e tests > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot commented on pull request #15402: [FLINK-21985][table-api] Support Explain Query/Modifcation syntax in Calcite Parser
flinkbot commented on pull request #15402: URL: https://github.com/apache/flink/pull/15402#issuecomment-809045142 ## CI report: * 97ba64b2959826a53be3273d9b565b3fe8f96fdc UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15401: [FLINK-21969][python] Invoke finish bundle method before emitting the max timestamp watermark in PythonTimestampsAndWatermarksOperato
flinkbot edited a comment on pull request #15401: URL: https://github.com/apache/flink/pull/15401#issuecomment-809036806 ## CI report: * 7d4a73b2d067dad06ae36fbf2750a1018df4338c Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15643) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-22005) SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1)
[ https://issues.apache.org/jira/browse/FLINK-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310365#comment-17310365 ] Guowei Ma commented on FLINK-22005: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15583&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=19760 > SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1) > > > Key: FLINK-22005 > URL: https://issues.apache.org/jira/browse/FLINK-22005 > Project: Flink > Issue Type: Bug > Components: Table SQL / Client >Affects Versions: 1.13.0 >Reporter: Guowei Ma >Priority: Major > Labels: test-stability > > The test fail because of Waiting for Elasticsearch records indefinitely. > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15583&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=19826 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #15016: [FLINK-21413][state] Clean TtlMapState and TtlListState after all elements are expired
flinkbot edited a comment on pull request #15016: URL: https://github.com/apache/flink/pull/15016#issuecomment-785640058 ## CI report: * cf3c223658f37cc838409b690e2e1bf93c04f207 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15641) * 5d95fe96d132b396f3d59f2c102d63a68ccd0d73 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15054: [FLINK-13550][runtime][ui] Operator's Flame Graph
flinkbot edited a comment on pull request #15054: URL: https://github.com/apache/flink/pull/15054#issuecomment-788337524 ## CI report: * 26a28f2d83f56cb386e1365fd4df4fb8a2f2bf86 UNKNOWN * 0b5aaf42f2861a38e26a80e25cf2324e7cf06bb7 UNKNOWN * 7717d7719f6b448084ff27365c2b718d2c059376 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15640) * e7f59e2e2811c0718a453572e90a4ad2a900ff03 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15642) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-21981) Increase the priority of the parameter in flink-conf
[ https://issues.apache.org/jira/browse/FLINK-21981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310364#comment-17310364 ] Xintong Song commented on FLINK-21981: -- {quote}and HADOOP_CONF_DIR=default hdfs-site, and HDADOOP_CONF_DIR cannot be modified in multi-tenant scenarios. {quote} Why is that? You should be able to expose the env only for the specific command, instead of exposing globally. > Increase the priority of the parameter in flink-conf > > > Key: FLINK-21981 > URL: https://issues.apache.org/jira/browse/FLINK-21981 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission >Reporter: Bo Cui >Priority: Major > > in my cluster, env has HADOOP_CONF_DIR and YARN_CONF_DIR, but they are hadoop > and yarn conf, not flink conf. and hadoop conf of flink(env.hadoop.conf.dir) > is different from them. so we should use `env.hadoop.conf.dir` first > https://github.com/apache/flink/blob/57e93c90a14a9f5316da821863dd2b335a8f86e0/flink-dist/src/main/flink-bin/bin/config.sh#L256 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22005) SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1)
Guowei Ma created FLINK-22005: - Summary: SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1) Key: FLINK-22005 URL: https://issues.apache.org/jira/browse/FLINK-22005 Project: Flink Issue Type: Bug Components: Table SQL / Client Affects Versions: 1.13.0 Reporter: Guowei Ma The test fail because of Waiting for Elasticsearch records indefinitely. https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15583&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=19826 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21982) jobs should use client org.apache.hadoop.config on yarn
[ https://issues.apache.org/jira/browse/FLINK-21982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310358#comment-17310358 ] Bo Cui commented on FLINK-21982: when new configuration, configuration will load core-site by classLoader.getResource(String) https://github.com/apache/hadoop/blob/ea6595d3b68ac462aec0d493718d7a10fbda0b6d/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L788 > jobs should use client org.apache.hadoop.config on yarn > --- > > Key: FLINK-21982 > URL: https://issues.apache.org/jira/browse/FLINK-21982 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN >Affects Versions: 1.13.0 >Reporter: Bo Cui >Priority: Major > > currently, jobs use the Yarn server configuration during execution. > i think we should submit the configuration to the HDFS, like MR job.xml...etc > and container fetches the job.xml and creates a soft link(hdfs-site core-site > yarn-site) during container initializes > and then the configuration can obtain these resources through > classLoader.getResource(String) during configuration initializes -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-21981) Increase the priority of the parameter in flink-conf
[ https://issues.apache.org/jira/browse/FLINK-21981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310355#comment-17310355 ] Bo Cui edited comment on FLINK-21981 at 3/29/21, 3:33 AM: -- the client side has 2 hdfs-silt (default hdfs-site and flink hdfs-site, they are not the same ) and HADOOP_CONF_DIR=default hdfs-site, and HDADOOP_CONF_DIR cannot be modified in multi-tenant scenarios. so i think Preferential use of `env.hadoop.conf.dir` may be better. was (Author: bo cui): the client side has 2 hdfs-silt (default hdfs-site and flink hdfs-site, they are not the same ) and HADOOP_CONF_DIR=default hdfs-site, and HDADOOP_CONF_DIR cannot be modified in multi-tenant scenarios. so i think Preferential use of `env.hadoop.conf.dir` may be better. > Increase the priority of the parameter in flink-conf > > > Key: FLINK-21981 > URL: https://issues.apache.org/jira/browse/FLINK-21981 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission >Reporter: Bo Cui >Priority: Major > > in my cluster, env has HADOOP_CONF_DIR and YARN_CONF_DIR, but they are hadoop > and yarn conf, not flink conf. and hadoop conf of flink(env.hadoop.conf.dir) > is different from them. so we should use `env.hadoop.conf.dir` first > https://github.com/apache/flink/blob/57e93c90a14a9f5316da821863dd2b335a8f86e0/flink-dist/src/main/flink-bin/bin/config.sh#L256 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21981) Increase the priority of the parameter in flink-conf
[ https://issues.apache.org/jira/browse/FLINK-21981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310355#comment-17310355 ] Bo Cui commented on FLINK-21981: the client side has 2 hdfs-silt (default hdfs-site and flink hdfs-site, they are not the same ) and HADOOP_CONF_DIR=default hdfs-site, and HDADOOP_CONF_DIR cannot be modified in multi-tenant scenarios. so i think Preferential use of `env.hadoop.conf.dir` may be better. > Increase the priority of the parameter in flink-conf > > > Key: FLINK-21981 > URL: https://issues.apache.org/jira/browse/FLINK-21981 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission >Reporter: Bo Cui >Priority: Major > > in my cluster, env has HADOOP_CONF_DIR and YARN_CONF_DIR, but they are hadoop > and yarn conf, not flink conf. and hadoop conf of flink(env.hadoop.conf.dir) > is different from them. so we should use `env.hadoop.conf.dir` first > https://github.com/apache/flink/blob/57e93c90a14a9f5316da821863dd2b335a8f86e0/flink-dist/src/main/flink-bin/bin/config.sh#L256 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21982) jobs should use client org.apache.hadoop.config on yarn
[ https://issues.apache.org/jira/browse/FLINK-21982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310354#comment-17310354 ] Bo Cui commented on FLINK-21982: thx [~xintongsong] FLINK-16005 only solves part of my problem(YarnApplicationMasterRunner and YarnResourceManager...). If the job uses `new Configuration` API, job and configuration still use the yarn/hdfs server configuration. {quote}i think we should submit the configuration to the HDFS, like MR job.xml...etc and container fetches the job.xml and creates a soft link(hdfs-site core-site yarn-site) during container initializes and then the configuration can obtain these resources through classLoader.getResource(String) during configuration initializes {quote} > jobs should use client org.apache.hadoop.config on yarn > --- > > Key: FLINK-21982 > URL: https://issues.apache.org/jira/browse/FLINK-21982 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN >Affects Versions: 1.13.0 >Reporter: Bo Cui >Priority: Major > > currently, jobs use the Yarn server configuration during execution. > i think we should submit the configuration to the HDFS, like MR job.xml...etc > and container fetches the job.xml and creates a soft link(hdfs-site core-site > yarn-site) during container initializes > and then the configuration can obtain these resources through > classLoader.getResource(String) during configuration initializes -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot commented on pull request #15403: [hotfix] fix typo, assign processedData to currentProcessedData rather than currentPersistedData.
flinkbot commented on pull request #15403: URL: https://github.com/apache/flink/pull/15403#issuecomment-809039318 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Automated Checks Last check on commit d87a60142ebdb3eef911d51391679b90f4acaa6a (Mon Mar 29 03:25:41 UTC 2021) **Warnings:** * No documentation files were touched! Remember to keep the Flink docs up to date! Mention the bot in a comment to re-run the automated checks. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #15402: [FLINK-21985][table-api] Support Explain Query/Modifcation syntax in Calcite Parser
flinkbot commented on pull request #15402: URL: https://github.com/apache/flink/pull/15402#issuecomment-809039335 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Automated Checks Last check on commit 97ba64b2959826a53be3273d9b565b3fe8f96fdc (Mon Mar 29 03:25:43 UTC 2021) **Warnings:** * No documentation files were touched! Remember to keep the Flink docs up to date! Mention the bot in a comment to re-run the automated checks. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-21640) Job fails to be submitted in tenant scenario
[ https://issues.apache.org/jira/browse/FLINK-21640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310353#comment-17310353 ] Xintong Song commented on FLINK-21640: -- [~Bo Cui], [~fly_in_gis], Obviously, different companies maintain ZooKeeper differently. I think users/companies can choose whichever approach they want, and Flink should not make assumptions on how ZK is maintained. In that sense, creating the root ZNode with global access only to fit with some specific ways of maintaining ZK does not sounds fair to me. Leaving aside how ZK is maintained, I think it is a common convention for access control, that by default new contents should be created with as strict access as possible. E.g., by default new files/directories can only be modified by the creating user on most of the Linux/Unix systems. > Job fails to be submitted in tenant scenario > > > Key: FLINK-21640 > URL: https://issues.apache.org/jira/browse/FLINK-21640 > Project: Flink > Issue Type: Bug > Components: API / Core, Client / Job Submission >Affects Versions: 1.12.2, 1.13.0 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Labels: pull-request-available > Attachments: image-2021-03-06-09-30-52-410.png, > image-2021-03-06-09-34-05-518.png > > > Job fails to be submitted in tenant scenario > !image-2021-03-06-09-30-52-410.png! > because current user does not have the Znode permission. > !image-2021-03-06-09-34-05-518.png! > i think the parent znode acl is anyone. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] est08zw opened a new pull request #15403: [hotfix] fix typo, assign processedData to currentProcessedData rather than currentPersistedData.
est08zw opened a new pull request #15403: URL: https://github.com/apache/flink/pull/15403 ## What is the purpose of the change fix typo, assign processedData to currentProcessedData rather than currentPersistedData. ## Brief change log fix typo, assign processedData to currentProcessedData rather than currentPersistedData. ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] chaozwn closed pull request #15390: [FLINK-21985][table-api] Support Explain Query/Modifcation syntax in Calcite Parser
chaozwn closed pull request #15390: URL: https://github.com/apache/flink/pull/15390 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-22004) Translate Flink Roadmap to Chinese.
Yuan Mei created FLINK-22004: Summary: Translate Flink Roadmap to Chinese. Key: FLINK-22004 URL: https://issues.apache.org/jira/browse/FLINK-22004 Project: Flink Issue Type: Task Components: Documentation Reporter: Yuan Mei https://flink.apache.org/roadmap.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] chaozwn opened a new pull request #15402: [FLINK-21985][table-api] Support Explain Query/Modifcation syntax in Calcite Parser
chaozwn opened a new pull request #15402: URL: https://github.com/apache/flink/pull/15402 …Calcite Parser ## What is the purpose of the change *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)* ## Brief change log *(for example:)* - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact* - *Deployments RPC transmits only the blob storage reference* - *TaskManagers retrieve the TaskInfo from the blob cache* ## Verifying this change *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. *(or)* This change is already covered by existing tests, such as *(please describe tests)*. *(or)* This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end deployment with large payloads (100MB)* - *Extended integration test for recovery after master (JobManager) failure* - *Added test that validates that TaskInfo is transferred only once across recoveries* - *Manually verified the change by running a 4 node cluser with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no) - The serializers: (yes / no / don't know) - The runtime per-record code paths (performance sensitive): (yes / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / no / don't know) - The S3 file system connector: (yes / no / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-21103) E2e tests time out on azure
[ https://issues.apache.org/jira/browse/FLINK-21103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307542#comment-17307542 ] Guowei Ma edited comment on FLINK-21103 at 3/29/21, 3:18 AM: - on branch 1.13 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15272&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15622&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15589&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15584&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15583&view=results was (Author: maguowei): on branch 1.13 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15272&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15622&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15589&view=results https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15584&view=results > E2e tests time out on azure > --- > > Key: FLINK-21103 > URL: https://issues.apache.org/jira/browse/FLINK-21103 > Project: Flink > Issue Type: Bug > Components: Build System / Azure Pipelines, Tests >Affects Versions: 1.11.3, 1.12.1, 1.13.0 >Reporter: Dawid Wysakowicz >Assignee: Dawid Wysakowicz >Priority: Blocker > Labels: test-stability > Fix For: 1.13.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12377&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529 > {code} > Creating worker2 ... done > Jan 22 13:16:17 Waiting for hadoop cluster to come up. We have been trying > for 0 seconds, retrying ... > Jan 22 13:16:22 Waiting for hadoop cluster to come up. We have been trying > for 5 seconds, retrying ... > Jan 22 13:16:27 Waiting for hadoop cluster to come up. We have been trying > for 10 seconds, retrying ... > Jan 22 13:16:32 Waiting for hadoop cluster to come up. We have been trying > for 15 seconds, retrying ... > Jan 22 13:16:37 Waiting for hadoop cluster to come up. We have been trying > for 20 seconds, retrying ... > Jan 22 13:16:43 Waiting for hadoop cluster to come up. We have been trying > for 26 seconds, retrying ... > Jan 22 13:16:48 Waiting for hadoop cluster to come up. We have been trying > for 31 seconds, retrying ... > Jan 22 13:16:53 Waiting for hadoop cluster to come up. We have been trying > for 36 seconds, retrying ... > Jan 22 13:16:58 Waiting for hadoop cluster to come up. We have been trying > for 41 seconds, retrying ... > Jan 22 13:17:03 Waiting for hadoop cluster to come up. We have been trying > for 46 seconds, retrying ... > Jan 22 13:17:08 We only have 0 NodeManagers up. We have been trying for 0 > seconds, retrying ... > 21/01/22 13:17:10 INFO client.RMProxy: Connecting to ResourceManager at > master.docker-hadoop-cluster-network/172.19.0.3:8032 > 21/01/22 13:17:11 INFO client.AHSProxy: Connecting to Application History > server at master.docker-hadoop-cluster-network/172.19.0.3:10200 > Jan 22 13:17:11 We now have 2 NodeManagers up. > == > === WARNING: This E2E Run took already 80% of the allocated time budget of > 250 minutes === > == > == > === WARNING: This E2E Run will time out in the next few minutes. Starting to > upload the log output === > == > ##[error]The task has timed out. > Async Command Start: Upload Artifact > Uploading 1 files > File upload succeed. > Upload '/tmp/_e2e_watchdog.output.0' to file container: > '#/11824779/e2e-timeout-logs' > Associated artifact 140921 with build 12377 > Async Command End: Upload Artifact > Async Command Start: Upload Artifact > Uploading 1 files > File upload succeed. > Upload '/tmp/_e2e_watchdog.output.1' to file container: > '#/11824779/e2e-timeout-logs' > Associated artifact 140921 with build 12377 > Async Command End: Upload Artifact > Async Command Start: Upload Artifact > Uploading 1 files > File upload succeed. > Upload '/tmp/_e2e_watchdog.output.2' to file container: > '#/11824779/e2e-timeout-logs' > Associated artifact 140921 with build 12377 > Async Command End: Upload Artifact > Finishing: Run e2e tests > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)