Re: [PR] [FLINK-33914][ci] Adds basic Flink CI workflow [flink]

2024-01-09 Thread via GitHub


XComp commented on code in PR #23970:
URL: https://github.com/apache/flink/pull/23970#discussion_r1446999796


##
tools/azure-pipelines/create_build_artifact.sh:
##
@@ -28,15 +28,15 @@ echo "Minimizing artifact files"
 # by removing files not required for subsequent stages
 
 # jars are re-built in subsequent stages, so no need to cache them (cannot be 
avoided)
-find "$FLINK_ARTIFACT_DIR" -maxdepth 8 -type f -name '*.jar' | xargs rm -rf
+find "$FLINK_ARTIFACT_DIR" -maxdepth 8 -type f -name '*.jar' -exec rm -rf {} \;

Review Comment:
   I reorganized this change and moved it into the hotfix commit :+1: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [BP-1.18][FLINK-33906][ci] Makes debug files available for both, AzureCI and GitHub Actions [flink]

2024-01-09 Thread via GitHub


flinkbot commented on PR #24060:
URL: https://github.com/apache/flink/pull/24060#issuecomment-1884344397

   
   ## CI report:
   
   * 8789084771ec6db94be1531233fbd38683f72692 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33414) MiniClusterITCase.testHandleStreamingJobsWhenNotEnoughSlot fails due to unexpected TimeoutException

2024-01-09 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17805001#comment-17805001
 ] 

Sergey Nuyanzin commented on FLINK-33414:
-

[~Jiang Xin] there is another reproduction from today's nightly
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56166=logs=0e7be18f-84f2-53f0-a32d-4a5e4a174679=7c1d86e3-35bd-5fd5-3b7c-30c126a78702=10782

> MiniClusterITCase.testHandleStreamingJobsWhenNotEnoughSlot fails due to 
> unexpected TimeoutException
> ---
>
> Key: FLINK-33414
> URL: https://issues.apache.org/jira/browse/FLINK-33414
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: Jiang Xin
>Priority: Critical
>  Labels: github-actions, pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> We see this test instability in [this 
> build|https://github.com/XComp/flink/actions/runs/6695266358/job/18192039035#step:12:9253].
> {code:java}
> Error: 17:04:52 17:04:52.042 [ERROR] Failures: 
> 9252Error: 17:04:52 17:04:52.042 [ERROR]   
> MiniClusterITCase.testHandleStreamingJobsWhenNotEnoughSlot:120 
> 9253Oct 30 17:04:52 Expecting a throwable with root cause being an instance 
> of:
> 9254Oct 30 17:04:52   
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException
> 9255Oct 30 17:04:52 but was an instance of:
> 9256Oct 30 17:04:52   java.util.concurrent.TimeoutException: Timeout has 
> occurred: 100 ms
> 9257Oct 30 17:04:52   at 
> org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:86)
> 9258Oct 30 17:04:52   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> 9259Oct 30 17:04:52   at 
> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> 9260Oct 30 17:04:52   ...(27 remaining lines not displayed - this can be 
> changed with Assertions.setMaxStackTraceElementsDisplayed) {code}
> The same error occurred in the [finegrained_resourcemanager stage of this 
> build|https://github.com/XComp/flink/actions/runs/6468655160/job/17563927249#step:11:26516]
>  (as reported in FLINK-33245).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [BP-1.18][FLINK-33907][ci] Makes copying test jars being done in the package phase [flink]

2024-01-09 Thread via GitHub


flinkbot commented on PR #24059:
URL: https://github.com/apache/flink/pull/24059#issuecomment-1884336534

   
   ## CI report:
   
   * 467bea59266c26654d3c7409b21ff71dc7c7c3c6 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [FLINK-33906][ci] Makes debug files available for both, AzureCI and GitHub Actions [flink]

2024-01-09 Thread via GitHub


XComp opened a new pull request, #24060:
URL: https://github.com/apache/flink/pull/24060

   1.18 backport PR for parent PR #23964 
   
   No conflicts appeared during the backport.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33946) RocksDb sets setAvoidFlushDuringShutdown to true to speed up Task Cancel

2024-01-09 Thread Yue Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17805000#comment-17805000
 ] 

Yue Ma commented on FLINK-33946:


[~masteryhx]   could you please take a look at this ticket ? 

> RocksDb sets setAvoidFlushDuringShutdown to true to speed up Task Cancel
> 
>
> Key: FLINK-33946
> URL: https://issues.apache.org/jira/browse/FLINK-33946
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.19.0
>Reporter: Yue Ma
>Priority: Major
> Fix For: 1.19.0
>
>
> When a Job fails, the task needs to be canceled and re-deployed. 
> RocksDBStatebackend will call RocksDB.close when disposing.
> {code:java}
> if (!shutting_down_.load(std::memory_order_acquire) &&
> has_unpersisted_data_.load(std::memory_order_relaxed) &&
> !mutable_db_options_.avoid_flush_during_shutdown) {
>   if (immutable_db_options_.atomic_flush) {
> autovector cfds;
> SelectColumnFamiliesForAtomicFlush();
> mutex_.Unlock();
> Status s =
> AtomicFlushMemTables(cfds, FlushOptions(), FlushReason::kShutDown);
> s.PermitUncheckedError();  //**TODO: What to do on error?
> mutex_.Lock();
>   } else {
> for (auto cfd : *versions_->GetColumnFamilySet()) {
>   if (!cfd->IsDropped() && cfd->initialized() && !cfd->mem()->IsEmpty()) {
> cfd->Ref();
> mutex_.Unlock();
> Status s = FlushMemTable(cfd, FlushOptions(), FlushReason::kShutDown);
> s.PermitUncheckedError();  //**TODO: What to do on error?
> mutex_.Lock();
> cfd->UnrefAndTryDelete();
>   }
> }
>   } {code}
> By default (avoid_flush_during_shutdown=false) RocksDb requires FlushMemtable 
> when Close. When the disk pressure is high or the Memtable is large, this 
> process will be more time-consuming, which will cause the Task to get stuck 
> in the Canceling stage and affect the speed of job Failover.
> In fact, it is completely unnecessary to Flush memtable when Flink Task is 
> Close, because the data can be replayed from Checkpoint. So we can set 
> avoid_flush_during_shutdown to true to speed up Task Failover



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33819) Support setting CompressType in RocksDBStateBackend

2024-01-09 Thread Yue Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804999#comment-17804999
 ] 

Yue Ma commented on FLINK-33819:


[~masteryhx]  [~pnowojski] [~srichter]  could you please take a look at this 
ticket ? 

> Support setting CompressType in RocksDBStateBackend
> ---
>
> Key: FLINK-33819
> URL: https://issues.apache.org/jira/browse/FLINK-33819
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.18.0
>Reporter: Yue Ma
>Priority: Major
> Fix For: 1.19.0
>
> Attachments: image-2023-12-14-11-32-32-968.png, 
> image-2023-12-14-11-35-22-306.png
>
>
> Currently, RocksDBStateBackend does not support setting the compression 
> level, and Snappy is used for compression by default. But we have some 
> scenarios where compression will use a lot of CPU resources. Turning off 
> compression can significantly reduce CPU overhead. So we may need to support 
> a parameter for users to set the CompressType of Rocksdb.
>   !image-2023-12-14-11-35-22-306.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-33907][ci] Makes copying test jars being done in the package phase [flink]

2024-01-09 Thread via GitHub


XComp opened a new pull request, #24059:
URL: https://github.com/apache/flink/pull/24059

   1.18 backport PR for parent PR #23965 
   
   No conflicts appeared during backport.
   
   This fixes the following error when compiling the test artifacts of 
flink-clients: Error: 2.054 [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-dependency-plugin:3.2.0:copy-dependencies 
(copy-dependencies) on project flink-clients: Artifact has not been packaged 
yet. When used on reactor artifact, copy should be executed after packaging: 
see MDEP-187. -> [Help 1]
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33907][ci] Makes copying test jars being done later [flink]

2024-01-09 Thread via GitHub


XComp merged PR #23965:
URL: https://github.com/apache/flink/pull/23965


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33907) Makes copying test jars being done later

2024-01-09 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-33907:
--
Description: 
We experienced an issue in GHA which is due to the fact how test resources are 
pre-computed in GHA:
{code:java}
This fixes the following error when compiling flink-clients:
Error: 2.054 [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-dependency-plugin:3.2.0:copy-dependencies 
(copy-dependencies) on project flink-clients: Artifact has not been packaged 
yet. When used on reactor artifact, copy should be executed after packaging: 
see MDEP-187. -> [Help 1] {code}
We need to move this goal to a later phase.

The reason why this popped up is (as far as I remember) that we do only do 
test-compile in GitHub Actions.

  was:
We experienced an issue in GHA which is due to the fact how test resources are 
pre-computed in GHA:
{code:java}
This fixes the following error when compiling flink-clients:
Error: 2.054 [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-dependency-plugin:3.2.0:copy-dependencies 
(copy-dependencies) on project flink-clients: Artifact has not been packaged 
yet. When used on reactor artifact, copy should be executed after packaging: 
see MDEP-187. -> [Help 1] {code}
We need to move this goal to a earlier phase.

The reason why this popped up is (as far as I remember) that we do only do 
test-compile in GitHub Actions.


> Makes copying test jars being done later
> 
>
> Key: FLINK-33907
> URL: https://issues.apache.org/jira/browse/FLINK-33907
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.18.0, 1.19.0
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: github-actions, pull-request-available
>
> We experienced an issue in GHA which is due to the fact how test resources 
> are pre-computed in GHA:
> {code:java}
> This fixes the following error when compiling flink-clients:
> Error: 2.054 [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-dependency-plugin:3.2.0:copy-dependencies 
> (copy-dependencies) on project flink-clients: Artifact has not been packaged 
> yet. When used on reactor artifact, copy should be executed after packaging: 
> see MDEP-187. -> [Help 1] {code}
> We need to move this goal to a later phase.
> The reason why this popped up is (as far as I remember) that we do only do 
> test-compile in GitHub Actions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33907) Makes copying test jars being done later

2024-01-09 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-33907:
--
Summary: Makes copying test jars being done later  (was: Makes copying test 
jars being done earlier)

> Makes copying test jars being done later
> 
>
> Key: FLINK-33907
> URL: https://issues.apache.org/jira/browse/FLINK-33907
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.18.0, 1.19.0
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: github-actions, pull-request-available
>
> We experienced an issue in GHA which is due to the fact how test resources 
> are pre-computed in GHA:
> {code:java}
> This fixes the following error when compiling flink-clients:
> Error: 2.054 [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-dependency-plugin:3.2.0:copy-dependencies 
> (copy-dependencies) on project flink-clients: Artifact has not been packaged 
> yet. When used on reactor artifact, copy should be executed after packaging: 
> see MDEP-187. -> [Help 1] {code}
> We need to move this goal to a earlier phase.
> The reason why this popped up is (as far as I remember) that we do only do 
> test-compile in GitHub Actions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-27756) Fix intermittently failing test in AsyncSinkWriterTest.checkLoggedSendTimesAreWithinBounds

2024-01-09 Thread Ahmed Hamdy (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hamdy updated FLINK-27756:

Affects Version/s: (was: 1.15.0)

> Fix intermittently failing test in 
> AsyncSinkWriterTest.checkLoggedSendTimesAreWithinBounds
> --
>
> Key: FLINK-27756
> URL: https://issues.apache.org/jira/browse/FLINK-27756
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Affects Versions: 1.17.0, 1.19.0
>Reporter: Ahmed Hamdy
>Assignee: Ahmed Hamdy
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> h2. Motivation
>  - One of the integration tests ({{checkLoggedSendTimesAreWithinBounds}}) of 
> {{AsyncSinkWriterTest}} has been reported to fail intermittently on build 
> pipeline causing blocking of new changes.
>  - Reporting build is [linked 
> |https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=36009=logs=aa18c3f6-13b8-5f58-86bb-c1cffb239496=502fb6c0-30a2-5e49-c5c2-a00fa3acb203]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33914][ci] Adds basic Flink CI workflow [flink]

2024-01-09 Thread via GitHub


XComp commented on code in PR #23970:
URL: https://github.com/apache/flink/pull/23970#discussion_r1446291478


##
.github/workflows/template.pre-compile-checks.yml:
##
@@ -0,0 +1,109 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# This workflow collects all checks that do not require compilation and are, 
therefore,
+# JDK independent.
+
+name: "Pre-compile Checks"
+
+on:
+  workflow_dispatch:
+inputs:
+  jdk_version:
+description: "The JDK version that shall be used as a default within 
the Flink CI Docker container."
+default: "8"
+type: choice
+options: ["8", "11", "17"]
+  branch:
+description: "The branch the source code analysis should run on."
+default: "master"
+type: string
+
+  workflow_call:
+inputs:
+  jdk_version:
+description: "The JDK version that shall be used as a default within 
the Flink CI Docker container."
+default: 8
+type: number
+  branch:
+description: "The branch the source code analysis should run on."
+default: "master"
+type: string
+
+permissions: read-all
+
+# This workflow should only contain steps that do not require the compilation 
of Flink (and therefore, are
+# independent of the used JDK)
+jobs:
+  qa:
+name: "Basic QA"
+runs-on: ubuntu-latest
+container:
+  image: chesnay/flink-ci:java_8_11_17_21_maven_386
+  options: --init
+
+steps:
+  - name: "Flink Checkout"
+uses: actions/checkout@v3
+with:
+  ref: ${{ inputs.branch }}
+  persist-credentials: false
+
+  - name: "Set JDK version to Java ${{ inputs.jdk_version }}"
+uses: "./.github/actions/set_java_in_container"
+with:
+  jdk_version: ${{ inputs.jdk_version }}
+
+  - name: "Checkstyle"
+uses: "./.github/actions/run_mvn"
+with:
+  maven-parameters: "checkstyle:check -T1C"
+
+  - name: "Spotless"
+if: (success() || failure())
+uses: "./.github/actions/run_mvn"
+with:
+  maven-parameters: "spotless:check -T1C"
+
+  - name: "License Headers"
+if: (success() || failure())
+uses: "./.github/actions/run_mvn"
+with:
+  maven-parameters: "org.apache.rat:apache-rat-plugin:check -N"
+
+  docs-404-check:
+name: "Docs 404 Check"
+runs-on: ubuntu-latest
+container:
+  image: chesnay/flink-ci:java_8_11_17_21_maven_386
+steps:
+  - name: "Checks out Flink"
+uses: actions/checkout@v3
+with:
+  ref: ${{ inputs.branch }}
+  persist-credentials: false
+
+  - name: "Mark GHA checkout as a safe directory (workaround for 
https://github.com/actions/checkout/issues/1169)"
+run: git config --system --add safe.directory $GITHUB_WORKSPACE
+shell: bash
+
+  - name: "Check if PR contains docs change"
+run: |
+  source ./tools/azure-pipelines/build_properties.sh

Review Comment:
   Created FLINK-34045 as a follow-up. I'm gonna remove the doc creation in 
this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33879] Avoids the potential hang of Hybrid Shuffle during redistribution [flink]

2024-01-09 Thread via GitHub


reswqa commented on code in PR #23957:
URL: https://github.com/apache/flink/pull/23957#discussion_r1446971558


##
flink-runtime/src/test/java/org/apache/flink/runtime/io/network/partition/hybrid/tiered/storage/TieredStorageMemoryManagerImplTest.java:
##
@@ -242,19 +173,49 @@ void testCanNotTransferOwnershipForEvent() throws 
IOException {
 .isInstanceOf(IllegalStateException.class);
 }
 
+@Test
+void testEnsureCapacity() throws IOException {
+final int numBuffers = 5;
+final int guaranteedReclaimableBuffers = 3;
+
+BufferPool bufferPool = globalPool.createBufferPool(numBuffers, 
numBuffers);
+TieredStorageMemoryManagerImpl storageMemoryManager =
+createStorageMemoryManager(
+bufferPool,
+Arrays.asList(
+new TieredStorageMemorySpec(
+new Object(), 
guaranteedReclaimableBuffers, true),
+new TieredStorageMemorySpec(this, 0, false)));
+assertThat(storageMemoryManager.ensureCapacity(0)).isTrue();
+assertThat(bufferPool.bestEffortGetNumOfUsedBuffers())
+.isEqualTo(guaranteedReclaimableBuffers);
+
+assertThat(storageMemoryManager.ensureCapacity(numBuffers - 
guaranteedReclaimableBuffers))
+.isTrue();
+
assertThat(bufferPool.bestEffortGetNumOfUsedBuffers()).isEqualTo(numBuffers);
+
+assertThat(
+storageMemoryManager.ensureCapacity(
+numBuffers - guaranteedReclaimableBuffers + 1))

Review Comment:
   Ah, make sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33743][runtime] Support consuming multiple subpartitions on a single channel [flink]

2024-01-09 Thread via GitHub


reswqa commented on code in PR #23927:
URL: https://github.com/apache/flink/pull/23927#discussion_r1446953814


##
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/CreditBasedSequenceNumberingViewReader.java:
##
@@ -58,6 +59,12 @@ class CreditBasedSequenceNumberingViewReader
 
 private final int initialCredit;
 
+/**
+ * Cache of the index of the only subpartition if the underlining {@link 
ResultSubpartitionView}
+ * only consumes one subpartition.
+ */
+private int subpartitionId;

Review Comment:
   We do need some explanation about the default value `-1`.



##
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/ResultPartition.java:
##
@@ -301,6 +302,24 @@ public void setMetricGroup(TaskIOMetricGroup metrics) {
 partitionId.getPartitionId(), resultPartitionBytes);
 }
 
+@Override
+public ResultSubpartitionView createSubpartitionView(
+ResultSubpartitionIndexSet indexSet, BufferAvailabilityListener 
availabilityListener)
+throws IOException {
+// The ability to support multiple indexes is to be provided in 
subsequent commits of
+// the corresponding pull request. As the function is about to be 
supported uniformly with
+// one set of code, they will be placed in a common method shared by 
all shuffle
+// implementations, and that will be this method.
+Iterator iterator = indexSet.values().iterator();
+int index = iterator.next();
+Preconditions.checkState(!iterator.hasNext());
+return createSubpartitionView(index, availabilityListener);
+}
+
+/** Returns a reader for the subpartition with the given index. */

Review Comment:
   We need a full java doc to explain the differences and connections between 
this method and the previous ones.



##
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/CreditBasedSequenceNumberingViewReader.java:
##
@@ -229,6 +245,14 @@ ResultSubpartitionView.AvailabilityWithBacklog 
hasBuffersAvailable() {
 return subpartitionView.getAvailabilityAndBacklog(Integer.MAX_VALUE);
 }
 
+@Override
+public int peekNextBufferSubpartitionId() throws IOException {

Review Comment:
   Do we have some tests to cover the method like this one introduced in this 
commit? 



##
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/hybrid/tiered/storage/TieredStorageConsumerClient.java:
##
@@ -100,6 +104,13 @@ public Optional getNextBuffer(
 }
 Buffer bufferData = buffer.get();
 if (bufferData.getDataType() == Buffer.DataType.END_OF_SEGMENT) {
+EndOfSegmentEvent event =
+(EndOfSegmentEvent)
+EventSerializer.fromSerializedEvent(
+bufferData.getNioBufferReadable(), 
getClass().getClassLoader());
+Preconditions.checkState(
+subpartitionId.equals(
+new 
TieredStorageSubpartitionId(event.getSubpartitionId(;

Review Comment:
   Is this deserialization only for sanity check?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33879] Avoids the potential hang of Hybrid Shuffle during redistribution [flink]

2024-01-09 Thread via GitHub


jiangxin369 commented on code in PR #23957:
URL: https://github.com/apache/flink/pull/23957#discussion_r1446955961


##
flink-runtime/src/test/java/org/apache/flink/runtime/io/network/partition/hybrid/tiered/storage/TieredStorageMemoryManagerImplTest.java:
##
@@ -242,19 +173,49 @@ void testCanNotTransferOwnershipForEvent() throws 
IOException {
 .isInstanceOf(IllegalStateException.class);
 }
 
+@Test
+void testEnsureCapacity() throws IOException {
+final int numBuffers = 5;
+final int guaranteedReclaimableBuffers = 3;
+
+BufferPool bufferPool = globalPool.createBufferPool(numBuffers, 
numBuffers);
+TieredStorageMemoryManagerImpl storageMemoryManager =
+createStorageMemoryManager(
+bufferPool,
+Arrays.asList(
+new TieredStorageMemorySpec(
+new Object(), 
guaranteedReclaimableBuffers, true),
+new TieredStorageMemorySpec(this, 0, false)));
+assertThat(storageMemoryManager.ensureCapacity(0)).isTrue();
+assertThat(bufferPool.bestEffortGetNumOfUsedBuffers())
+.isEqualTo(guaranteedReclaimableBuffers);
+
+assertThat(storageMemoryManager.ensureCapacity(numBuffers - 
guaranteedReclaimableBuffers))
+.isTrue();
+
assertThat(bufferPool.bestEffortGetNumOfUsedBuffers()).isEqualTo(numBuffers);
+
+assertThat(
+storageMemoryManager.ensureCapacity(
+numBuffers - guaranteedReclaimableBuffers + 1))

Review Comment:
   `EnsureCapacity` only requests buffers from the local buffer pool when the 
requested buffers are less than `numGuaranteedReclaimableBuffers + 
numAdditionalBuffers`. If we replace the value with `1`, the capacity we need 
to ensure is `guaranteedReclaimableBuffers(3) + numAdditionalBuffers(1) = 4`, 
so the requested buffers in the queue are already satisfied and the 
`ensureCapacity` will return `true`. However, what I want to test is the 
situation that fails to `ensureCapacity`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-34015) Setting `execution.savepoint.ignore-unclaimed-state` does not take effect when passing this parameter by dynamic properties

2024-01-09 Thread Zhanghao Chen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804987#comment-17804987
 ] 

Zhanghao Chen commented on FLINK-34015:
---

Hi [~Zakelly], I'm reaching out to you as you've mentioned about refactoring 
CLI in Flink 2.0. This is another example of confusing behvior in the current 
CLI design: the short command and -D dynamic properties may interact with each 
other in a confusing way.

> Setting `execution.savepoint.ignore-unclaimed-state` does not take effect 
> when passing this parameter by dynamic properties
> ---
>
> Key: FLINK-34015
> URL: https://issues.apache.org/jira/browse/FLINK-34015
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.17.0
>Reporter: Renxiang Zhou
>Priority: Critical
>  Labels: ignore-unclaimed-state-invalid, pull-request-available
> Attachments: image-2024-01-08-14-22-09-758.png, 
> image-2024-01-08-14-24-30-665.png
>
>
> We set `execution.savepoint.ignore-unclaimed-state` to true and use -D option 
> to submit the job, but unfortunately we found the value is still false in 
> jobmanager log.
> Pic 1: we  set `execution.savepoint.ignore-unclaimed-state` to true in 
> submiting job.
> !image-2024-01-08-14-22-09-758.png|width=1012,height=222!
> Pic 2: The value is still false in jmlog.
> !image-2024-01-08-14-24-30-665.png|width=651,height=51!
>  
> Besides, the parameter `execution.savepoint-restore-mode` has the same 
> problem since when we pass it by -D option.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34015) Setting `execution.savepoint.ignore-unclaimed-state` does not take effect when passing this parameter by dynamic properties

2024-01-09 Thread Renxiang Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804983#comment-17804983
 ] 

Renxiang Zhou commented on FLINK-34015:
---

[~masteryhx] Hello, hangxiang, please have a look at this issue if you have 
time, many thanks~

And I have implemented a fix for this little bug.
[https://github.com/apache/flink/pull/24058|https://github.com/apache/flink/pull/24058,]

> Setting `execution.savepoint.ignore-unclaimed-state` does not take effect 
> when passing this parameter by dynamic properties
> ---
>
> Key: FLINK-34015
> URL: https://issues.apache.org/jira/browse/FLINK-34015
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.17.0
>Reporter: Renxiang Zhou
>Priority: Critical
>  Labels: ignore-unclaimed-state-invalid, pull-request-available
> Attachments: image-2024-01-08-14-22-09-758.png, 
> image-2024-01-08-14-24-30-665.png
>
>
> We set `execution.savepoint.ignore-unclaimed-state` to true and use -D option 
> to submit the job, but unfortunately we found the value is still false in 
> jobmanager log.
> Pic 1: we  set `execution.savepoint.ignore-unclaimed-state` to true in 
> submiting job.
> !image-2024-01-08-14-22-09-758.png|width=1012,height=222!
> Pic 2: The value is still false in jmlog.
> !image-2024-01-08-14-24-30-665.png|width=651,height=51!
>  
> Besides, the parameter `execution.savepoint-restore-mode` has the same 
> problem since when we pass it by -D option.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-34015][checkpoint] fix that passing ignore-unclaimed-state through dynamic props does not take effect [flink]

2024-01-09 Thread via GitHub


flinkbot commented on PR #24058:
URL: https://github.com/apache/flink/pull/24058#issuecomment-1884262593

   
   ## CI report:
   
   * 364c3e2b9a652e85d36a76934f59f2c93520d560 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-34015) Setting `execution.savepoint.ignore-unclaimed-state` does not take effect when passing this parameter by dynamic properties

2024-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-34015:
---
Labels: ignore-unclaimed-state-invalid pull-request-available  (was: 
ignore-unclaimed-state-invalid)

> Setting `execution.savepoint.ignore-unclaimed-state` does not take effect 
> when passing this parameter by dynamic properties
> ---
>
> Key: FLINK-34015
> URL: https://issues.apache.org/jira/browse/FLINK-34015
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.17.0
>Reporter: Renxiang Zhou
>Priority: Critical
>  Labels: ignore-unclaimed-state-invalid, pull-request-available
> Attachments: image-2024-01-08-14-22-09-758.png, 
> image-2024-01-08-14-24-30-665.png
>
>
> We set `execution.savepoint.ignore-unclaimed-state` to true and use -D option 
> to submit the job, but unfortunately we found the value is still false in 
> jobmanager log.
> Pic 1: we  set `execution.savepoint.ignore-unclaimed-state` to true in 
> submiting job.
> !image-2024-01-08-14-22-09-758.png|width=1012,height=222!
> Pic 2: The value is still false in jmlog.
> !image-2024-01-08-14-24-30-665.png|width=651,height=51!
>  
> Besides, the parameter `execution.savepoint-restore-mode` has the same 
> problem since when we pass it by -D option.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-34015][checkpoint] fix that passing ignore-unclaimed-state through dynamic props does not take effect [flink]

2024-01-09 Thread via GitHub


xiangforever2014 opened a new pull request, #24058:
URL: https://github.com/apache/flink/pull/24058

   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   - fix that passing ignore-unclaimed-state through dynamic props does not 
take effect
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluster with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes, this parameter 
will affect the recovery behavior.)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33583][table-planner] Support state ttl hint for regular join [flink]

2024-01-09 Thread via GitHub


LadyForest commented on code in PR #23752:
URL: https://github.com/apache/flink/pull/23752#discussion_r1446923256


##
flink-table/flink-table-planner/src/test/resources/org/apache/flink/table/planner/plan/hints/stream/StateTtlHintTest.xml:
##
@@ -16,6 +16,35 @@ See the License for the specific language governing 
permissions and
 limitations under the License.
 -->
 
+  
+
+  
+
+
+  
+
+
+  

Re: [PR] [FLINK-33583][table-planner] Support state ttl hint for regular join [flink]

2024-01-09 Thread via GitHub


xuyangzhong commented on code in PR #23752:
URL: https://github.com/apache/flink/pull/23752#discussion_r1446914262


##
flink-table/flink-table-planner/src/test/resources/org/apache/flink/table/planner/plan/hints/stream/StateTtlHintTest.xml:
##
@@ -0,0 +1,282 @@
+
+
+
+  
+
+  
+
+
+  
+
+
+  
+
+  
+  
+
+  
+
+
+  
+
+
+  
+
+  
+  
+
+  
+
+
+  
+
+
+  
+
+  
+  
+
+  
+
+
+  
+
+
+  
+
+  
+  
+
+  
+
+
+  
+
+
+  
+
+  
+  
+
+  
+
+
+  
+
+
+  
+
+  
+  
+
+  
+
+
+  
+
+
+  
+
+  
+  
+
+  
+
+
+  
+
+
+  
+
+  
+  
+
+  
+
+
+  
+
+
+  
+
+  
+  
+
+  
+
+
+  
+
+
+  

[jira] [Commented] (FLINK-33881) [TtlListState]Avoid copy and update value in TtlListState#getUnexpiredOrNull

2024-01-09 Thread Jinzhong Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804975#comment-17804975
 ] 

Jinzhong Li commented on FLINK-33881:
-

[~Zakelly] [~masteryhx] 

I have published a pr, could you help to review it?

 

>>>  "And actually I think there may be some value if we could make sure it is 
>>>safe to do shallow copy."

I think it is not safe to do shallow copy.
The CopyOnWrite mechanism in HeapStateBackend only deep copy StateMapEntry 
wrapper, not user Object.  It is necessary to deepcopy listState elements to 
avoid taskThread and snapshotThread access one object concurrently. 

> [TtlListState]Avoid copy and update value in TtlListState#getUnexpiredOrNull
> 
>
> Key: FLINK-33881
> URL: https://issues.apache.org/jira/browse/FLINK-33881
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: Jinzhong Li
>Assignee: Jinzhong Li
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2023-12-19-21-25-21-446.png, 
> image-2023-12-19-21-26-43-518.png
>
>
> In some scenarios, 'TtlListState#getUnexpiredOrNull -> 
> elementSerializer.copy(ttlValue)'  consumes a lot of cpu resources.
> !image-2023-12-19-21-25-21-446.png|width=529,height=119!
> I found that for TtlListState#getUnexpiredOrNull, if none of the elements 
> have expired, it still needs to copy all the elements and update the whole 
> list/map in TtlIncrementalCleanup#runCleanup();
> !image-2023-12-19-21-26-43-518.png|width=505,height=266!
> I think we could optimize TtlListState#getUnexpiredOrNull by:
> 1)find the first expired element index in the list;
> 2)If not found, return to the original list;
> 3)If found, then constrct the unexpire list (puts the previous elements into 
> the list), and go through the subsequent elements, adding expired elements 
> into the list.
> {code:java}
> public List> getUnexpiredOrNull(@Nonnull List> 
> ttlValues) {
> //...
> int firstExpireIndex = -1;
> for (int i = 0; i < ttlValues.size(); i++) {
> if (TtlUtils.expired(ttlValues.get(i), ttl, currentTimestamp)) {
> firstExpireIndex = i;
> break;
> }
> }
> if (firstExpireIndex == -1) {
> return ttlValues;  //return the original ttlValues
> }
> List> unexpired = new ArrayList<>(ttlValues.size());
> for (int i = 0; i < ttlValues.size(); i++) {
> if (i < firstExpireIndex) {
> // unexpired.add(ttlValues.get(i));
> unexpired.add(elementSerializer.copy(ttlValues.get(i)));
> }
> if (i > firstExpireIndex) {
> if (!TtlUtils.expired(ttlValues.get(i), ttl, currentTimestamp)) {
> // unexpired.add(ttlValues.get(i));
> unexpired.add(elementSerializer.copy(ttlValues.get(i)));
> }
> }
> }
> //  .
> } {code}
> *In this way, the extra iteration overhead is actually very very small, but 
> the benefit when there are no expired elements is significant.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33932) Support Retry Mechanism in RocksDBStateDataTransfer

2024-01-09 Thread Guojun Li (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guojun Li updated FLINK-33932:
--
Description: 
Currently, there is no retry mechanism for downloading and uploading RocksDB 
state files. Any jittering of remote filesystem might lead to a checkpoint 
failure. By supporting retry mechanism in RocksDBStateDataTransfer, we can 
significantly reduce the failure rate of checkpoint during asynchronous phase.
The exception is as below:
{noformat}
 
2023-12-19 08:46:00,197 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Decline 
checkpoint 2 by task 
5b008c2c048fa8534d648455e69f9497_fbcce2f96483f8f15e87dc6c9afd372f_183_0 of job 
a025f19e at 
application-6f1c6e3d-1702480803995-5093022-taskmanager-1-1 @ 
fdbd:dc61:1a:101:0:0:0:36 (dataPort=38789).
org.apache.flink.util.SerializedThrowable: 
org.apache.flink.runtime.checkpoint.CheckpointException: Asynchronous task 
checkpoint failed.
    at 
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:320)
 ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
    at 
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:155)
 ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]
    at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: org.apache.flink.util.SerializedThrowable: java.lang.Exception: 
Could not materialize checkpoint 2 for operator GlobalGroupAggregate[132] -> 
Calc[133] (184/500)#0.
    at 
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:298)
 ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
    ... 4 more
Caused by: org.apache.flink.util.SerializedThrowable: 
java.util.concurrent.ExecutionException: java.io.IOException: Could not flush 
to file and close the file system output stream to 
hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the 
stream state handle
    at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?]
    at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?]
    at 
org.apache.flink.util.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:544)
 ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
    at 
org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:54)
 ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
    at 
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191)
 ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
    at 
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124)
 ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
    ... 3 more
Caused by: org.apache.flink.util.SerializedThrowable: java.io.IOException: 
Could not flush to file and close the file system output stream to 
hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the 
stream state handle
    at 
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:516)
 ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
    at 
org.apache.flink.contrib.streaming.state.RocksDBStateUploader.uploadLocalFileToCheckpointFs(RocksDBStateUploader.java:157)
 ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
    at 
org.apache.flink.contrib.streaming.state.RocksDBStateUploader.lambda$createUploadFutures$0(RocksDBStateUploader.java:113)
 ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
    at 
org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:32)
 ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
    at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
 ~[?:?]
    ... 3 more
Caused by: org.apache.flink.util.SerializedThrowable: 
java.net.ConnectException: Connection timed out
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777) 
~[?:?]
    at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) 
~[hadoop-common-2.6.0-cdh5.4.4.jar:?]
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532) 
~[hadoop-common-2.6.0-cdh5.4.4.jar:?]
    at 
org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1835)
 ~[hadoop-hdfs-2.6.0-cdh5.4.4.jar:?]
    at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:1268)
 ~[hadoop-hdfs-2.6.0-cdh5.4.4.jar:?]
 

[jira] [Updated] (FLINK-33932) Support Retry Mechanism in RocksDBStateDataTransfer

2024-01-09 Thread Guojun Li (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guojun Li updated FLINK-33932:
--
Summary: Support Retry Mechanism in RocksDBStateDataTransfer  (was: Support 
retry mechanism for rocksdb uploader)

> Support Retry Mechanism in RocksDBStateDataTransfer
> ---
>
> Key: FLINK-33932
> URL: https://issues.apache.org/jira/browse/FLINK-33932
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Reporter: Guojun Li
>Priority: Major
>  Labels: pull-request-available
>
> Rocksdb uploader will throw exception and decline the current checkpoint if 
> there are errors when uploading to remote file system like hdfs.
> The exception is as below:
> {noformat}
>  
> 2023-12-19 08:46:00,197 INFO  
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Decline 
> checkpoint 2 by task 
> 5b008c2c048fa8534d648455e69f9497_fbcce2f96483f8f15e87dc6c9afd372f_183_0 of 
> job a025f19e at 
> application-6f1c6e3d-1702480803995-5093022-taskmanager-1-1 @ 
> fdbd:dc61:1a:101:0:0:0:36 (dataPort=38789).
> org.apache.flink.util.SerializedThrowable: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Asynchronous task 
> checkpoint failed.
>     at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:320)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:155)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]
>     at java.lang.Thread.run(Thread.java:829) [?:?]
> Caused by: org.apache.flink.util.SerializedThrowable: java.lang.Exception: 
> Could not materialize checkpoint 2 for operator GlobalGroupAggregate[132] -> 
> Calc[133] (184/500)#0.
>     at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:298)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     ... 4 more
> Caused by: org.apache.flink.util.SerializedThrowable: 
> java.util.concurrent.ExecutionException: java.io.IOException: Could not flush 
> to file and close the file system output stream to 
> hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the 
> stream state handle
>     at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?]
>     at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?]
>     at 
> org.apache.flink.util.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:544)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:54)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     ... 3 more
> Caused by: org.apache.flink.util.SerializedThrowable: java.io.IOException: 
> Could not flush to file and close the file system output stream to 
> hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the 
> stream state handle
>     at 
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:516)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> org.apache.flink.contrib.streaming.state.RocksDBStateUploader.uploadLocalFileToCheckpointFs(RocksDBStateUploader.java:157)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> org.apache.flink.contrib.streaming.state.RocksDBStateUploader.lambda$createUploadFutures$0(RocksDBStateUploader.java:113)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:32)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>  ~[?:?]
>     ... 3 more
> Caused by: org.apache.flink.util.SerializedThrowable: 
> java.net.ConnectException: Connection timed out
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
>     at 

Re: [PR] [FLINK-33634] Add Conditions to Flink CRD's Status field [flink-kubernetes-operator]

2024-01-09 Thread via GitHub


lajith2006 commented on code in PR #749:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/749#discussion_r1446866102


##
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/FlinkDeploymentController.java:
##
@@ -227,4 +235,28 @@ private boolean 
validateDeployment(FlinkResourceContext ctx) {
 }
 return true;
 }
+
+private void setCRStatus(FlinkDeployment flinkApp) {
+final List conditions = new ArrayList<>();
+FlinkDeploymentStatus deploymentStatus = flinkApp.getStatus();
+switch (deploymentStatus.getJobManagerDeploymentStatus()) {

Review Comment:
   Modified logic to update the conditions. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33634] Add Conditions to Flink CRD's Status field [flink-kubernetes-operator]

2024-01-09 Thread via GitHub


lajith2006 commented on code in PR #749:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/749#discussion_r1446864912


##
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/FlinkDeploymentController.java:
##
@@ -227,4 +235,28 @@ private boolean 
validateDeployment(FlinkResourceContext ctx) {
 }
 return true;
 }
+
+private void setCRStatus(FlinkDeployment flinkApp) {

Review Comment:
   Made changes to populate Conditions in CommonStatus using resource lifecycle 
state



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33634] Add Conditions to Flink CRD's Status field [flink-kubernetes-operator]

2024-01-09 Thread via GitHub


lajith2006 commented on code in PR #749:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/749#discussion_r1446864256


##
flink-kubernetes-operator-api/src/main/java/org/apache/flink/kubernetes/operator/api/status/CommonCRStatus.java:
##
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.kubernetes.operator.api.status;
+
+import io.fabric8.kubernetes.api.model.Condition;
+import io.fabric8.kubernetes.api.model.ConditionBuilder;
+
+import java.text.SimpleDateFormat;
+import java.util.Date;
+
+/** Status of CR. */
+public class CommonCRStatus {

Review Comment:
   Using ConditionUtils



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33634] Add Conditions to Flink CRD's Status field [flink-kubernetes-operator]

2024-01-09 Thread via GitHub


lajith2006 commented on code in PR #749:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/749#discussion_r1446864103


##
flink-kubernetes-operator-api/src/main/java/org/apache/flink/kubernetes/operator/api/status/CommonCRStatus.java:
##
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.kubernetes.operator.api.status;
+
+import io.fabric8.kubernetes.api.model.Condition;
+import io.fabric8.kubernetes.api.model.ConditionBuilder;
+
+import java.text.SimpleDateFormat;
+import java.util.Date;
+
+/** Status of CR. */
+public class CommonCRStatus {
+
+public static Condition crReadyTrueCondition(final String message) {
+return crCondition("Ready", "True", message, "Ready");
+}
+
+public static Condition crReadyFalseCondition(final String message) {
+return crCondition("Ready", "False", message, "Progressing");
+}
+
+public static Condition crErrorCondition(final String message) {
+return crCondition("Error", "True", message, "UnhandledException");
+}
+
+public static Condition crCondition(

Review Comment:
   Method names has been changed 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33634] Add Conditions to Flink CRD's Status field [flink-kubernetes-operator]

2024-01-09 Thread via GitHub


lajith2006 commented on PR #749:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/749#issuecomment-1884164328

   Pushed the changes for re review. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-30613] Improve resolving schema compatibility -- Milestone one [flink]

2024-01-09 Thread via GitHub


masteryhx commented on PR #21635:
URL: https://github.com/apache/flink/pull/21635#issuecomment-1884161683

   Thanks all for the detailed review.
   Please let me know if any other comments.
   I will merge it if no other comments beyond two days from now on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33980][core] Reorganize job configuration [flink]

2024-01-09 Thread via GitHub


JunRuiLee commented on PR #24025:
URL: https://github.com/apache/flink/pull/24025#issuecomment-1884160739

   Thanks @masteryhx for the heads up, I've updated the PR following the merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33932) Support retry mechanism for rocksdb uploader

2024-01-09 Thread Hangxiang Yu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804951#comment-17804951
 ] 

Hangxiang Yu commented on FLINK-33932:
--

Thanks for pinging me here.

Retry machanism is common when we want to get or put data by network.

So I think it makes sense for checkpoint failure due to temproal network 
problem, which may increase a bit overhead for some other reasons.

Since Flink has checkpoint machanism to retry failed checkpoint coarsely, I 
think it looks good to me if this fine-grained retry could be configurable and 
don't change current default machanism.

> Support retry mechanism for rocksdb uploader
> 
>
> Key: FLINK-33932
> URL: https://issues.apache.org/jira/browse/FLINK-33932
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Reporter: Guojun Li
>Priority: Major
>  Labels: pull-request-available
>
> Rocksdb uploader will throw exception and decline the current checkpoint if 
> there are errors when uploading to remote file system like hdfs.
> The exception is as below:
> {noformat}
>  
> 2023-12-19 08:46:00,197 INFO  
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Decline 
> checkpoint 2 by task 
> 5b008c2c048fa8534d648455e69f9497_fbcce2f96483f8f15e87dc6c9afd372f_183_0 of 
> job a025f19e at 
> application-6f1c6e3d-1702480803995-5093022-taskmanager-1-1 @ 
> fdbd:dc61:1a:101:0:0:0:36 (dataPort=38789).
> org.apache.flink.util.SerializedThrowable: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Asynchronous task 
> checkpoint failed.
>     at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:320)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:155)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]
>     at java.lang.Thread.run(Thread.java:829) [?:?]
> Caused by: org.apache.flink.util.SerializedThrowable: java.lang.Exception: 
> Could not materialize checkpoint 2 for operator GlobalGroupAggregate[132] -> 
> Calc[133] (184/500)#0.
>     at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:298)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     ... 4 more
> Caused by: org.apache.flink.util.SerializedThrowable: 
> java.util.concurrent.ExecutionException: java.io.IOException: Could not flush 
> to file and close the file system output stream to 
> hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the 
> stream state handle
>     at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?]
>     at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?]
>     at 
> org.apache.flink.util.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:544)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:54)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     ... 3 more
> Caused by: org.apache.flink.util.SerializedThrowable: java.io.IOException: 
> Could not flush to file and close the file system output stream to 
> hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the 
> stream state handle
>     at 
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:516)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> org.apache.flink.contrib.streaming.state.RocksDBStateUploader.uploadLocalFileToCheckpointFs(RocksDBStateUploader.java:157)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> org.apache.flink.contrib.streaming.state.RocksDBStateUploader.lambda$createUploadFutures$0(RocksDBStateUploader.java:113)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     at 
> org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:32)
>  ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT]
>     

Re: [PR] [FLINK-33881]Avoid copy and update value in TtlListState#getUnexpiredOrNull [flink]

2024-01-09 Thread via GitHub


flinkbot commented on PR #24057:
URL: https://github.com/apache/flink/pull/24057#issuecomment-1884093107

   
   ## CI report:
   
   * b1eef2343133eedbae3219ff1ecfe9984e846769 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (FLINK-33738) Make exponential-delay restart-strategy the default restart strategy

2024-01-09 Thread Rui Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan resolved FLINK-33738.
-
Resolution: Fixed

> Make exponential-delay restart-strategy the default restart strategy
> 
>
> Key: FLINK-33738
> URL: https://issues.apache.org/jira/browse/FLINK-33738
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33738) Make exponential-delay restart-strategy the default restart strategy

2024-01-09 Thread Rui Fan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804948#comment-17804948
 ] 

Rui Fan commented on FLINK-33738:
-

Merged to master(1.19) via: d9fc5ee03afe86e3d4c2fcee50616df2a7c095f2 and 
cad090aaed770c90facb6edbcce57dd341449a02

> Make exponential-delay restart-strategy the default restart strategy
> 
>
> Key: FLINK-33738
> URL: https://issues.apache.org/jira/browse/FLINK-33738
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33738][Scheduler] Make exponential-delay restart-strategy the default restart strategy [flink]

2024-01-09 Thread via GitHub


1996fanrui merged PR #24040:
URL: https://github.com/apache/flink/pull/24040


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33738][Scheduler] Make exponential-delay restart-strategy the default restart strategy [flink]

2024-01-09 Thread via GitHub


1996fanrui commented on PR #24040:
URL: https://github.com/apache/flink/pull/24040#issuecomment-1884091150

   Thanks to everyone for the review! The CI is green for now, merging~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [FLINK-33881]Avoid copy and update value in TtlListState#getUnexpiredOrNull [flink]

2024-01-09 Thread via GitHub


ljz2051 opened a new pull request, #24057:
URL: https://github.com/apache/flink/pull/24057

   ## What is the purpose of the change
   
   This pull request optimize TtlListState clean up process by avoiding copy 
and update value in TtlListState#getUnexpiredOrNull
   
   ## Brief change log
   
 - refactor TtlListState#getUnexpiredOrNull
   
   ## Verifying this change
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   org.apache.flink.runtime.state.ttl.TtlStateTestBase +  
TtlFixedLenElemListStateTestContext/TtlNonFixedLenElemListStateTestContext
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency):  no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`:  no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33881) [TtlListState]Avoid copy and update value in TtlListState#getUnexpiredOrNull

2024-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33881:
---
Labels: pull-request-available  (was: )

> [TtlListState]Avoid copy and update value in TtlListState#getUnexpiredOrNull
> 
>
> Key: FLINK-33881
> URL: https://issues.apache.org/jira/browse/FLINK-33881
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: Jinzhong Li
>Assignee: Jinzhong Li
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2023-12-19-21-25-21-446.png, 
> image-2023-12-19-21-26-43-518.png
>
>
> In some scenarios, 'TtlListState#getUnexpiredOrNull -> 
> elementSerializer.copy(ttlValue)'  consumes a lot of cpu resources.
> !image-2023-12-19-21-25-21-446.png|width=529,height=119!
> I found that for TtlListState#getUnexpiredOrNull, if none of the elements 
> have expired, it still needs to copy all the elements and update the whole 
> list/map in TtlIncrementalCleanup#runCleanup();
> !image-2023-12-19-21-26-43-518.png|width=505,height=266!
> I think we could optimize TtlListState#getUnexpiredOrNull by:
> 1)find the first expired element index in the list;
> 2)If not found, return to the original list;
> 3)If found, then constrct the unexpire list (puts the previous elements into 
> the list), and go through the subsequent elements, adding expired elements 
> into the list.
> {code:java}
> public List> getUnexpiredOrNull(@Nonnull List> 
> ttlValues) {
> //...
> int firstExpireIndex = -1;
> for (int i = 0; i < ttlValues.size(); i++) {
> if (TtlUtils.expired(ttlValues.get(i), ttl, currentTimestamp)) {
> firstExpireIndex = i;
> break;
> }
> }
> if (firstExpireIndex == -1) {
> return ttlValues;  //return the original ttlValues
> }
> List> unexpired = new ArrayList<>(ttlValues.size());
> for (int i = 0; i < ttlValues.size(); i++) {
> if (i < firstExpireIndex) {
> // unexpired.add(ttlValues.get(i));
> unexpired.add(elementSerializer.copy(ttlValues.get(i)));
> }
> if (i > firstExpireIndex) {
> if (!TtlUtils.expired(ttlValues.get(i), ttl, currentTimestamp)) {
> // unexpired.add(ttlValues.get(i));
> unexpired.add(elementSerializer.copy(ttlValues.get(i)));
> }
> }
> }
> //  .
> } {code}
> *In this way, the extra iteration overhead is actually very very small, but 
> the benefit when there are no expired elements is significant.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33792] Generate the same code for the same logic [flink]

2024-01-09 Thread via GitHub


zoudan commented on PR #23984:
URL: https://github.com/apache/flink/pull/23984#issuecomment-1884085496

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (FLINK-34013) ProfilingServiceTest.testRollingDeletion is unstable on AZP

2024-01-09 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang resolved FLINK-34013.
--
Fix Version/s: 1.19.0
   Resolution: Fixed

merged in master: c09f07a406398bc4b2320e9b5ae0a8f5f27a00dc

> ProfilingServiceTest.testRollingDeletion is unstable on AZP
> ---
>
> Key: FLINK-34013
> URL: https://issues.apache.org/jira/browse/FLINK-34013
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Assignee: Yu Chen
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> This build 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56073=logs=0e7be18f-84f2-53f0-a32d-4a5e4a174679=7c1d86e3-35bd-5fd5-3b7c-30c126a78702=8258
>  fails as 
> {noformat}
> Jan 06 02:09:28 org.opentest4j.AssertionFailedError: expected: <2> but was: 
> <3>
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145)
> Jan 06 02:09:28   at 
> org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:531)
> Jan 06 02:09:28   at 
> org.apache.flink.runtime.util.profiler.ProfilingServiceTest.verifyRollingDeletionWorks(ProfilingServiceTest.java:167)
> Jan 06 02:09:28   at 
> org.apache.flink.runtime.util.profiler.ProfilingServiceTest.testRollingDeletion(ProfilingServiceTest.java:117)
> Jan 06 02:09:28   at java.lang.reflect.Method.invoke(Method.java:498)
> Jan 06 02:09:28   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> Jan 06 02:09:28   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> Jan 06 02:09:28   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> Jan 06 02:09:28   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> Jan 06 02:09:28   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-34013][runtime] Remove duplicated stopProfiling() in `ProfilingService` [flink]

2024-01-09 Thread via GitHub


Myasuka merged PR #24045:
URL: https://github.com/apache/flink/pull/24045


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33980][core] Reorganize job configuration [flink]

2024-01-09 Thread via GitHub


masteryhx commented on PR #24025:
URL: https://github.com/apache/flink/pull/24025#issuecomment-1884076649

   Hi, @JunRuiLee, https://github.com/apache/flink/pull/24046 has been merged.
   You could rebase the master and go ahead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-34030) Avoid using negative value for periodic-materialize.interval

2024-01-09 Thread Hangxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hangxiang Yu updated FLINK-34030:
-
Affects Version/s: 1.19.0

> Avoid using negative value for periodic-materialize.interval
> 
>
> Key: FLINK-34030
> URL: https://issues.apache.org/jira/browse/FLINK-34030
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.19.0
>Reporter: Hangxiang Yu
>Assignee: Hangxiang Yu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Similar to FLINK-32023, a nagative value doesn't work for Duration Type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-34030][changelog] Introduce periodic-materialize.enabled to enable/disable periodic materialization [flink]

2024-01-09 Thread via GitHub


masteryhx closed pull request #24046: [FLINK-34030][changelog] Introduce 
periodic-materialize.enabled to enable/disable periodic materialization
URL: https://github.com/apache/flink/pull/24046


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (FLINK-34030) Avoid using negative value for periodic-materialize.interval

2024-01-09 Thread Hangxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hangxiang Yu resolved FLINK-34030.
--
Fix Version/s: 1.19.0
   Resolution: Fixed

merged a6c1f308 and 96d62017 into master

> Avoid using negative value for periodic-materialize.interval
> 
>
> Key: FLINK-34030
> URL: https://issues.apache.org/jira/browse/FLINK-34030
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Hangxiang Yu
>Assignee: Hangxiang Yu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Similar to FLINK-32023, a nagative value doesn't work for Duration Type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-34046] Added metrics for AsyncWaitOperator Retry flow [flink]

2024-01-09 Thread via GitHub


sdineshkumar1985 closed pull request #24056: [FLINK-34046] Added metrics for 
AsyncWaitOperator Retry flow
URL: https://github.com/apache/flink/pull/24056


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [hotfix][docs][table formats] Add protobuf format info to overview page [flink]

2024-01-09 Thread via GitHub


libenchao merged PR #24053:
URL: https://github.com/apache/flink/pull/24053


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34046] Added metrics for AsyncWaitOperator Retry flow [flink]

2024-01-09 Thread via GitHub


flinkbot commented on PR #24056:
URL: https://github.com/apache/flink/pull/24056#issuecomment-1884071870

   
   ## CI report:
   
   * 4633801c8ca2c929ed828815b337ac32bedd905e UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33402] 2 CONTINUED [flink]

2024-01-09 Thread via GitHub


varun1729DD commented on PR #24055:
URL: https://github.com/apache/flink/pull/24055#issuecomment-1884068345

   @tweise 
   Continuing this investigation. I tried reader only and the error did not go 
away
   Looking at the logs again I still see the concurrency bug as reported in the 
original Jira thread. 
   
   We see the JM (Enumerator) send 'No More Splits event'
   https://github.com/apache/flink/assets/87725200/7b450559-3ea7-447c-bbfa-94ea411efb6d;>
   
   And shortly after the TM acts on this event and switches to the next source:
   
   https://github.com/apache/flink/assets/87725200/b8511087-eee7-4e0f-9ada-88d861e7cd7f;>
   
   What I wonder is what is causing the reader to prematurely close? 
Synchronization seems to fix it. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-34046) Add metrics to AsyncWaitOperator Retry Flow

2024-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-34046:
---
Labels: pull-request-available  (was: )

> Add metrics to AsyncWaitOperator Retry Flow
> ---
>
> Key: FLINK-34046
> URL: https://issues.apache.org/jira/browse/FLINK-34046
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Reporter: Dinesh
>Priority: Minor
>  Labels: pull-request-available
>
> AsyncWaitOperator supports retry if retry Strategy is set. But there is no 
> metrics to count the messages retried, message retry succeeded and dropped 
> message count after reaching configured retry count.
> To address this we propose to add metrics for Retry Count, Retry Success 
> Count and Dropped after max retry Count.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-34046] Added metrics for AsyncWaitOperator Retry flow [flink]

2024-01-09 Thread via GitHub


sdineshkumar1985 opened a new pull request, #24056:
URL: https://github.com/apache/flink/pull/24056

   
   
   ## What is the purpose of the change
   
   AsyncWaitOperator supports retry if retry Strategy is set. But there is no 
metrics to count the messages retried, message retry succeeded and dropped 
message count after reaching configured retry count. This pr has logic to add 
these metrics counter.
   
   
   
   ## Brief change log
   
 - AsyncWaitOperator is updated to have metrics counter related to Retry 
flow
   
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   *(Please pick either of the following options)*
   
   
   This change is already covered by existing tests, such as 
AsyncWaitOperatorTest.
   
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33402] 2 CONTINUED [flink]

2024-01-09 Thread via GitHub


flinkbot commented on PR #24055:
URL: https://github.com/apache/flink/pull/24055#issuecomment-1884066881

   
   ## CI report:
   
   * 81e2c868f6404a630a6b8d21477a165f003a044f UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33402] Hybrid Source Concurrency Race Condition Fixes and Related Bugs Results in Data Loss [flink]

2024-01-09 Thread via GitHub


varun1729DD commented on PR #23687:
URL: https://github.com/apache/flink/pull/23687#issuecomment-1884063943

   Continued here: https://github.com/apache/flink/pull/24055


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33233) Null point exception when non-native udf used in join condition

2024-01-09 Thread luoyuxia (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804939#comment-17804939
 ] 

luoyuxia commented on FLINK-33233:
--

[~mapohl] Thanks for reminder. Haven't been aware of it. 
[~yunfanfight...@foxmail.com] Could you please back port to 1.17, 1.18?

> Null point exception when non-native udf used in join condition
> ---
>
> Key: FLINK-33233
> URL: https://issues.apache.org/jira/browse/FLINK-33233
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Affects Versions: 1.17.0
>Reporter: yunfan
>Assignee: yunfan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Any non-native udf used in hive-parser join condition. 
> It will caused NullPointException.
> It can reproduced by follow code by adding this test to 
> {code:java}
> org.apache.flink.connectors.hive.HiveDialectQueryITCase{code}
>  
> {code:java}
> // Add follow code to org.apache.flink.connectors.hive.HiveDialectQueryITCase
> @Test
> public void testUdfInJoinCondition() throws Exception {
> List result = CollectionUtil.iteratorToList(tableEnv.executeSql(
> "select foo.y, bar.I from bar join foo on hiveudf(foo.x) = bar.I 
> where bar.I > 1").collect());
> assertThat(result.toString())
> .isEqualTo("[+I[2, 2]]");
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34036) Various HiveDialectQueryITCase tests fail in GitHub Actions workflow with Hadoop 3.1.3 enabled

2024-01-09 Thread luoyuxia (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804938#comment-17804938
 ] 

luoyuxia commented on FLINK-34036:
--

[~mapohl] It also looks weird to me. I'll have a deep look when I'm free.  But 
considering in [https://github.com/apache/flink-connector-hive/pull/5,] we have 
moved it to a dedicated repo and the test pass for hadoop3, and we're to remove 
hive from flink repo, may be we can ignore it?

> Various HiveDialectQueryITCase tests fail in GitHub Actions workflow with 
> Hadoop 3.1.3 enabled
> --
>
> Key: FLINK-34036
> URL: https://issues.apache.org/jira/browse/FLINK-34036
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hadoop Compatibility, Connectors / Hive
>Affects Versions: 1.18.0, 1.19.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions, test-stability
>
> The following {{HiveDialectQueryITCase}} tests fail consistently in the 
> FLINK-27075 GitHub Actions [master nightly 
> workflow|https://github.com/XComp/flink/actions/workflows/nightly-dev.yml] of 
> Flink (and also the [release-1.18 
> workflow|https://github.com/XComp/flink/actions/workflows/nightly-current.yml]):
> * {{testInsertDirectory}}
> * {{testCastTimeStampToDecimal}}
> * {{testNullLiteralAsArgument}}
> {code}
> Error: 03:38:45 03:38:45.661 [ERROR] Tests run: 22, Failures: 1, Errors: 2, 
> Skipped: 0, Time elapsed: 379.0 s <<< FAILURE! -- in 
> org.apache.flink.connectors.hive.HiveDialectQueryITCase
> Error: 03:38:45 03:38:45.662 [ERROR] 
> org.apache.flink.connectors.hive.HiveDialectQueryITCase.testNullLiteralAsArgument
>  -- Time elapsed: 0.069 s <<< ERROR!
> Jan 09 03:38:45 java.lang.NoSuchMethodError: 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(Ljava/lang/Object;Lorg/apache/hadoop/hive/serde2/objectinspector/PrimitiveObjectInspector;)Ljava/sql/Timestamp;
> Jan 09 03:38:45   at 
> org.apache.flink.connectors.hive.HiveDialectQueryITCase.testNullLiteralAsArgument(HiveDialectQueryITCase.java:959)
> Jan 09 03:38:45   at java.lang.reflect.Method.invoke(Method.java:498)
> Jan 09 03:38:45 
> Error: 03:38:45 03:38:45.662 [ERROR] 
> org.apache.flink.connectors.hive.HiveDialectQueryITCase.testCastTimeStampToDecimal
>  -- Time elapsed: 0.007 s <<< ERROR!
> Jan 09 03:38:45 org.apache.flink.table.api.ValidationException: Table with 
> identifier 'test-catalog.default.t1' does not exist.
> Jan 09 03:38:45   at 
> org.apache.flink.table.catalog.CatalogManager.dropTableInternal(CatalogManager.java:1266)
> Jan 09 03:38:45   at 
> org.apache.flink.table.catalog.CatalogManager.dropTable(CatalogManager.java:1206)
> Jan 09 03:38:45   at 
> org.apache.flink.table.operations.ddl.DropTableOperation.execute(DropTableOperation.java:74)
> Jan 09 03:38:45   at 
> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:1107)
> Jan 09 03:38:45   at 
> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:735)
> Jan 09 03:38:45   at 
> org.apache.flink.connectors.hive.HiveDialectQueryITCase.testCastTimeStampToDecimal(HiveDialectQueryITCase.java:835)
> Jan 09 03:38:45   at java.lang.reflect.Method.invoke(Method.java:498)
> Jan 09 03:38:45 
> Error: 03:38:45 03:38:45.663 [ERROR] 
> org.apache.flink.connectors.hive.HiveDialectQueryITCase.testInsertDirectory 
> -- Time elapsed: 7.326 s <<< FAILURE!
> Jan 09 03:38:45 org.opentest4j.AssertionFailedError: 
> Jan 09 03:38:45 
> Jan 09 03:38:45 expected: "A:english=90#math=100#history=85"
> Jan 09 03:38:45  but was: "A:english=90math=100history=85"
> Jan 09 03:38:45   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> Jan 09 03:38:45   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> Jan 09 03:38:45   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> Jan 09 03:38:45   at 
> org.apache.flink.connectors.hive.HiveDialectQueryITCase.testInsertDirectory(HiveDialectQueryITCase.java:498)
> Jan 09 03:38:45   at java.lang.reflect.Method.invoke(Method.java:498)
> {code}
> Additionally, the {{HiveITCase}} in the e2e test suite is affected:
> {code}
> Error: 05:20:20 05:20:20.949 [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 0.106 s <<< FAILURE! -- in 
> org.apache.flink.tests.hive.HiveITCase
> Error: 05:20:20 05:20:20.949 [ERROR] org.apache.flink.tests.hive.HiveITCase 
> -- Time elapsed: 0.106 s <<< ERROR!
> Jan 07 05:20:20 java.lang.ExceptionInInitializerError
> Jan 07 05:20:20   at 

[jira] [Reopened] (FLINK-33233) Null point exception when non-native udf used in join condition

2024-01-09 Thread luoyuxia (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luoyuxia reopened FLINK-33233:
--

> Null point exception when non-native udf used in join condition
> ---
>
> Key: FLINK-33233
> URL: https://issues.apache.org/jira/browse/FLINK-33233
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Affects Versions: 1.17.0
>Reporter: yunfan
>Assignee: yunfan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Any non-native udf used in hive-parser join condition. 
> It will caused NullPointException.
> It can reproduced by follow code by adding this test to 
> {code:java}
> org.apache.flink.connectors.hive.HiveDialectQueryITCase{code}
>  
> {code:java}
> // Add follow code to org.apache.flink.connectors.hive.HiveDialectQueryITCase
> @Test
> public void testUdfInJoinCondition() throws Exception {
> List result = CollectionUtil.iteratorToList(tableEnv.executeSql(
> "select foo.y, bar.I from bar join foo on hiveudf(foo.x) = bar.I 
> where bar.I > 1").collect());
> assertThat(result.toString())
> .isEqualTo("[+I[2, 2]]");
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-34027] Introduces AsyncScalarFunction as a new UDF type [flink]

2024-01-09 Thread via GitHub


AlanConfluent commented on PR #23975:
URL: https://github.com/apache/flink/pull/23975#issuecomment-1884054834

   I pushed my commit which adds more to the ability to resolve types in this 
manner.  I haven't yet had a chance to see all of the cases ScalarFunction 
works or doesn't, but this is pretty powerful now.  Take a look. @twalthr 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33402) Hybrid Source Concurrency Race Condition Fixes and Related Bugs Results in Data Loss

2024-01-09 Thread Varun Narayanan Chakravarthy (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Narayanan Chakravarthy updated FLINK-33402:
-
Description: 
Hello Team,
I noticed that there is data loss when using Hybrid Source. We are reading from 
a series of concrete File Sources ~100. All these locations are chained 
together using the Hybrid source.
The issue stems from a race-condition in Flink Hybrid Source code. The Hybrid 
Sources switches the next source before the current source is complete. 
Similarly for the Hybrid Source readers. I have also shared the patch file that 
fixes the issue.
>From the logs:

*Task Manager logs:* 
2023-10-10 17:46:23.577 [Source: parquet-source (1/2)#0|#0] INFO  
o.apache.flink.connector.base.source.reader.SourceReaderBase  - Adding split(s) 
to reader: [FileSourceSplit: s3://REDACTED/part-1-13189.snappy [0, 94451)  
hosts=[localhost] ID=000229 position=null] 
2023-10-10 17:46:23.715 [Source Data Fetcher for Source: parquet-source 
(1/2)#0|#0] INFO  org.apache.hadoop.fs.s3a.S3AInputStream  - Switching to 
Random IO seek policy 
2023-10-10 17:46:23.715 [Source Data Fetcher for Source: parquet-source 
(1/2)#0|#0] INFO  org.apache.hadoop.fs.s3a.S3AInputStream  - Switching to 
Random IO seek policy 
2023-10-10 17:46:24.012 [Source: parquet-source (1/2)#0|#0] INFO  
o.apache.flink.connector.base.source.reader.SourceReaderBase  - Finished 
reading split(s) [000154] 
2023-10-10 17:46:24.012 [Source Data Fetcher for Source: parquet-source 
(1/2)#0|#0] INFO  o.a.flink.connector.base.source.reader.fetcher.SplitFetcher  
- Finished reading from splits [000154] 
2023-10-10 17:46:24.014 [Source: parquet-source (1/2)#0|#0] INFO  
o.apache.flink.connector.base.source.reader.SourceReaderBase  - Reader received 
NoMoreSplits event. 
2023-10-10 17:46:24.014 [Source: parquet-source (1/2)#0|#0] DEBUG 
o.a.flink.connector.base.source.hybrid.HybridSourceReader  - No more splits for 
subtask=0 sourceIndex=11 
currentReader=org.apache.flink.connector.file.src.impl.FileSourceReader@59620ef8
 
2023-10-10 17:46:24.116 [Source Data Fetcher for Source: parquet-source 
(1/2)#0|#0] INFO  org.apache.hadoop.fs.s3a.S3AInputStream  - Switching to 
Random IO seek policy 
2023-10-10 17:46:24.116 [Source Data Fetcher for Source: parquet-source 
(1/2)#0|#0] INFO  org.apache.hadoop.fs.s3a.S3AInputStream  - Switching to 
Random IO seek policy 
2023-10-10 17:46:24.116 [Source: parquet-source (1/2)#0|#0] INFO  
o.a.flink.connector.base.source.hybrid.HybridSourceReader  - Switch source 
event: subtask=0 sourceIndex=12 
source=org.apache.flink.connector.kafka.source.KafkaSource@7849da7e 2023-10-10 
17:46:24.116 [Source: parquet-source (1/2)#0|#0] INFO  
o.apache.flink.connector.base.source.reader.SourceReaderBase  - Closing Source 
Reader. 
2023-10-10 17:46:24.116 [Source: parquet-source (1/2)#0|#0] INFO  
o.a.flink.connector.base.source.reader.fetcher.SplitFetcher  - Shutting down 
split fetcher 0 
2023-10-10 17:46:24.198 [Source Data Fetcher for Source: parquet-source 
(1/2)#0|#0] INFO  o.a.flink.connector.base.source.reader.fetcher.SplitFetcher  
- Split fetcher 0 exited. 
2023-10-10 17:46:24.198 [Source: parquet-source (1/2)#0|#0] DEBUG 
o.a.flink.connector.base.source.hybrid.HybridSourceReader  - Reader closed: 
subtask=0 sourceIndex=11 
currentReader=org.apache.flink.connector.file.src.impl.FileSourceReader@59620ef8

```
We identified that data from `s3://REDACTED/part-1-13189.snappy` is missing.  
This is assigned to Reader with ID 000229. Now, we can see from the logs 
this split is added after the no-more splits event and is NOT read.

```



*Job Manager logs:*
2023-10-10 17:46:23.576 [SourceCoordinator-Source: parquet-source] INFO  
o.a.f.c.file.src.assigners.LocalityAwareSplitAssigner  - Assigning remote split 
to requesting host '10': Optional[FileSourceSplit: 
s3://REDACTED/part-1-13189.snappy [0, 94451)  hosts=[localhost] ID=000229 
position=null]
2023-10-10 17:46:23.576 [SourceCoordinator-Source: parquet-source] INFO  
o.a.flink.connector.file.src.impl.StaticFileSplitEnumerator  - Assigned split 
to subtask 0 : FileSourceSplit: s3://REDACTED/part-1-13189.snappy [0, 94451)  
hosts=[localhost] ID=000229 position=null
2023-10-10 17:46:23.786 [SourceCoordinator-Source: parquet-source] INFO  
o.apache.flink.runtime.source.coordinator.SourceCoordinator  - Source Source: 
parquet-source received split request from parallel task 1 (#0)
2023-10-10 17:46:23.786 [SourceCoordinator-Source: parquet-source] DEBUG 
o.a.f.c.base.source.hybrid.HybridSourceSplitEnumerator  - handleSplitRequest 
subtask=1 sourceIndex=11 pendingSplits={}
2023-10-10 17:46:23.786 [SourceCoordinator-Source: parquet-source] INFO  
o.a.flink.connector.file.src.impl.StaticFileSplitEnumerator  - Subtask 1 (on 
host '10.4.168.40') is requesting a file source split
2023-10-10 17:46:23.786 [SourceCoordinator-Source: 

[jira] [Updated] (FLINK-33402) Hybrid Source Concurrency Race Condition Fixes and Related Bugs Results in Data Loss

2024-01-09 Thread Varun Narayanan Chakravarthy (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Narayanan Chakravarthy updated FLINK-33402:
-
Description: 
Hello Team,
I noticed that there is data loss when using Hybrid Source. We are reading from 
a series of concrete File Sources ~100. All these locations are chained 
together using the Hybrid source.
The issue stems from a race-condition in Flink Hybrid Source code. The Hybrid 
Sources switches the next source before the current source is complete. 
Similarly for the Hybrid Source readers. I have also shared the patch file that 
fixes the issue.
>From the logs:

*Task Manager logs:* 
2023-10-10 17:46:23.577 [Source: parquet-source (1/2)#0|#0] INFO  
o.apache.flink.connector.base.source.reader.SourceReaderBase  - Adding split(s) 
to reader: [FileSourceSplit: s3://REDACTED/part-1-13189.snappy [0, 94451)  
hosts=[localhost] ID=000229 position=null] 
2023-10-10 17:46:23.715 [Source Data Fetcher for Source: parquet-source 
(1/2)#0|#0] INFO  org.apache.hadoop.fs.s3a.S3AInputStream  - Switching to 
Random IO seek policy 
2023-10-10 17:46:23.715 [Source Data Fetcher for Source: parquet-source 
(1/2)#0|#0] INFO  org.apache.hadoop.fs.s3a.S3AInputStream  - Switching to 
Random IO seek policy 
2023-10-10 17:46:24.012 [Source: parquet-source (1/2)#0|#0] INFO  
o.apache.flink.connector.base.source.reader.SourceReaderBase  - Finished 
reading split(s) [000154] 
2023-10-10 17:46:24.012 [Source Data Fetcher for Source: parquet-source 
(1/2)#0|#0] INFO  o.a.flink.connector.base.source.reader.fetcher.SplitFetcher  
- Finished reading from splits [000154] 
2023-10-10 17:46:24.014 [Source: parquet-source (1/2)#0|#0] INFO  
o.apache.flink.connector.base.source.reader.SourceReaderBase  - Reader received 
NoMoreSplits event. 
2023-10-10 17:46:24.014 [Source: parquet-source (1/2)#0|#0] DEBUG 
o.a.flink.connector.base.source.hybrid.HybridSourceReader  - No more splits for 
subtask=0 sourceIndex=11 
currentReader=org.apache.flink.connector.file.src.impl.FileSourceReader@59620ef8
 
2023-10-10 17:46:24.116 [Source Data Fetcher for Source: parquet-source 
(1/2)#0|#0] INFO  org.apache.hadoop.fs.s3a.S3AInputStream  - Switching to 
Random IO seek policy 
2023-10-10 17:46:24.116 [Source Data Fetcher for Source: parquet-source 
(1/2)#0|#0] INFO  org.apache.hadoop.fs.s3a.S3AInputStream  - Switching to 
Random IO seek policy 
2023-10-10 17:46:24.116 [Source: parquet-source (1/2)#0|#0] INFO  
o.a.flink.connector.base.source.hybrid.HybridSourceReader  - Switch source 
event: subtask=0 sourceIndex=12 
source=org.apache.flink.connector.kafka.source.KafkaSource@7849da7e 2023-10-10 
17:46:24.116 [Source: parquet-source (1/2)#0|#0] INFO  
o.apache.flink.connector.base.source.reader.SourceReaderBase  - Closing Source 
Reader. 
2023-10-10 17:46:24.116 [Source: parquet-source (1/2)#0|#0] INFO  
o.a.flink.connector.base.source.reader.fetcher.SplitFetcher  - Shutting down 
split fetcher 0 
2023-10-10 17:46:24.198 [Source Data Fetcher for Source: parquet-source 
(1/2)#0|#0] INFO  o.a.flink.connector.base.source.reader.fetcher.SplitFetcher  
- Split fetcher 0 exited. 
2023-10-10 17:46:24.198 [Source: parquet-source (1/2)#0|#0] DEBUG 
o.a.flink.connector.base.source.hybrid.HybridSourceReader  - Reader closed: 
subtask=0 sourceIndex=11 
currentReader=org.apache.flink.connector.file.src.impl.FileSourceReader@59620ef8
We identified that data from `s3://REDACTED/part-1-13189.snappy` is missing.  
This is assigned to Reader with ID 000229. Now, we can see from the logs 
this split is added after the no-more splits event and is NOT read.

*Job Manager logs:*
2023-10-10 17:46:23.576 [SourceCoordinator-Source: parquet-source] INFO  
o.a.f.c.file.src.assigners.LocalityAwareSplitAssigner  - Assigning remote split 
to requesting host '10': Optional[FileSourceSplit: 
s3://REDACTED/part-1-13189.snappy [0, 94451)  hosts=[localhost] ID=000229 
position=null]
2023-10-10 17:46:23.576 [SourceCoordinator-Source: parquet-source] INFO  
o.a.flink.connector.file.src.impl.StaticFileSplitEnumerator  - Assigned split 
to subtask 0 : FileSourceSplit: s3://REDACTED/part-1-13189.snappy [0, 94451)  
hosts=[localhost] ID=000229 position=null
2023-10-10 17:46:23.786 [SourceCoordinator-Source: parquet-source] INFO  
o.apache.flink.runtime.source.coordinator.SourceCoordinator  - Source Source: 
parquet-source received split request from parallel task 1 (#0)
2023-10-10 17:46:23.786 [SourceCoordinator-Source: parquet-source] DEBUG 
o.a.f.c.base.source.hybrid.HybridSourceSplitEnumerator  - handleSplitRequest 
subtask=1 sourceIndex=11 pendingSplits={}
2023-10-10 17:46:23.786 [SourceCoordinator-Source: parquet-source] INFO  
o.a.flink.connector.file.src.impl.StaticFileSplitEnumerator  - Subtask 1 (on 
host '10.4.168.40') is requesting a file source split
2023-10-10 17:46:23.786 [SourceCoordinator-Source: parquet-source] INFO  

Re: [PR] Bump follow-redirects from 1.15.1 to 1.15.4 in /flink-runtime-web/web-dashboard [flink]

2024-01-09 Thread via GitHub


flinkbot commented on PR #24054:
URL: https://github.com/apache/flink/pull/24054#issuecomment-1884026649

   
   ## CI report:
   
   * b929d0971445a879e03e1f38c062a0aae5d8905a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-34046) Add metrics to AsyncWaitOperator Retry Flow

2024-01-09 Thread Dinesh (Jira)
Dinesh created FLINK-34046:
--

 Summary: Add metrics to AsyncWaitOperator Retry Flow
 Key: FLINK-34046
 URL: https://issues.apache.org/jira/browse/FLINK-34046
 Project: Flink
  Issue Type: Improvement
  Components: API / DataStream
Reporter: Dinesh


AsyncWaitOperator supports retry if retry Strategy is set. But there is no 
metrics to count the messages retried, message retry succeeded and dropped 
message count after reaching configured retry count.

To address this we propose to add metrics for Retry Count, Retry Success Count 
and Dropped after max retry Count.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] Bump follow-redirects from 1.15.1 to 1.15.4 in /flink-runtime-web/web-dashboard [flink]

2024-01-09 Thread via GitHub


dependabot[bot] opened a new pull request, #24054:
URL: https://github.com/apache/flink/pull/24054

   Bumps 
[follow-redirects](https://github.com/follow-redirects/follow-redirects) from 
1.15.1 to 1.15.4.
   
   Commits
   
   https://github.com/follow-redirects/follow-redirects/commit/65858205e59f1e23c9bf173348a7a7cbb8ac47f5;>6585820
 Release version 1.15.4 of the npm package.
   https://github.com/follow-redirects/follow-redirects/commit/7a6567e16dfa9ad18a70bfe91784c28653fbf19d;>7a6567e
 Disallow bracketed hostnames.
   https://github.com/follow-redirects/follow-redirects/commit/05629af696588b90d64e738bc2e809a97a5f92fc;>05629af
 Prefer native URL instead of deprecated url.parse.
   https://github.com/follow-redirects/follow-redirects/commit/1cba8e85fa73f563a439fe460cf028688e4358df;>1cba8e8
 Prefer native URL instead of legacy url.resolve.
   https://github.com/follow-redirects/follow-redirects/commit/72bc2a4229bc18dc9fbd57c60579713e6264cb92;>72bc2a4
 Simplify _processResponse error handling.
   https://github.com/follow-redirects/follow-redirects/commit/3d42aecdca39b144a0a2f27ea134b4cf67dd796a;>3d42aec
 Add bracket tests.
   https://github.com/follow-redirects/follow-redirects/commit/bcbb096b32686ecad6cd34235358ed6f2217d4f0;>bcbb096
 Do not directly set Error properties.
   https://github.com/follow-redirects/follow-redirects/commit/192dbe7ce671ecad813c074bffe3b3f5d3680fee;>192dbe7
 Release version 1.15.3 of the npm package.
   https://github.com/follow-redirects/follow-redirects/commit/bd8c81e4f32d12f28a35d265f88b1716703687c6;>bd8c81e
 Fix resource leak on destroy.
   https://github.com/follow-redirects/follow-redirects/commit/9c728c314b06f9595dcd7f245d40731e8a27d79f;>9c728c3
 Split linting and testing.
   Additional commits viewable in https://github.com/follow-redirects/follow-redirects/compare/v1.15.1...v1.15.4;>compare
 view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=follow-redirects=npm_and_yarn=1.15.1=1.15.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   You can disable automated security fix PRs for this repo from the [Security 
Alerts page](https://github.com/apache/flink/network/alerts).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34027] Introduces AsyncScalarFunction as a new UDF type [flink]

2024-01-09 Thread via GitHub


AlanConfluent commented on PR #23975:
URL: https://github.com/apache/flink/pull/23975#issuecomment-1883987246

   I did notice that if you have some fairly complex generic hierarchy:
   
   ```
   public abstract static class AsyncFuncGeneric extends AsyncFuncBase {
   
   private static final long serialVersionUID = 3L;
   
   abstract T[] newT(int param);
   
   public void eval(CompletableFuture future, Integer param) {
   executor.schedule(() -> future.complete(newT(param)), 10, 
TimeUnit.MILLISECONDS);
   }
   }
   
   /** Test function. */
   public static class LongAsyncFuncGeneric extends AsyncFuncGeneric {
   @Override
   Long[] newT(int param) {
   Long[] result = new Long[1];
   result[0] = 10L + param;
   return result;
   }
   }
```

It will fail to resolve the type here.  I actually have a stash where I 
have gone much farther is implementing resolve for the type system so that I 
can not only resolve a TypeVariable, but other types.  I think if we really 
want to handle this sort of case, I can push that stash, but was unsure if it 
was over the top for this purpose.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [hotfix] Remove incorrect echo in python_ci.yml [flink-connector-shared-utils]

2024-01-09 Thread via GitHub


GOODBOY008 commented on PR #32:
URL: 
https://github.com/apache/flink-connector-shared-utils/pull/32#issuecomment-1883981540

   @snuyanzin Can you review this pr, if you are available. Thank you. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [hotfix][docs][table formats] Add protobuf format info to overview page [flink]

2024-01-09 Thread via GitHub


sharath1709 commented on PR #24053:
URL: https://github.com/apache/flink/pull/24053#issuecomment-1883966460

   @libenchao Could you please review this PR when you find time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [hotfix][docs][table formats] Add protobuf format info to overview page [flink]

2024-01-09 Thread via GitHub


flinkbot commented on PR #24053:
URL: https://github.com/apache/flink/pull/24053#issuecomment-1883962929

   
   ## CI report:
   
   * 2095033617637ffe90e3a885fa5b7e43d7344a82 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-34038) IncrementalGroupAggregateRestoreTest.testRestore fails

2024-01-09 Thread Bonnie Varghese (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bonnie Varghese reassigned FLINK-34038:
---

Assignee: Bonnie Varghese

> IncrementalGroupAggregateRestoreTest.testRestore fails
> --
>
> Key: FLINK-34038
> URL: https://issues.apache.org/jira/browse/FLINK-34038
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: Bonnie Varghese
>Priority: Major
>  Labels: test-stability
>
> {{IncrementalGroupAggregateRestoreTest.testRestore}} fails on {{master}}:
> {code}
> Jan 08 18:53:18 18:53:18.406 [ERROR] Tests run: 3, Failures: 1, Errors: 0, 
> Skipped: 1, Time elapsed: 8.706 s <<< FAILURE! -- in 
> org.apache.flink.table.planner.plan.nodes.exec.stream.IncrementalGroupAggregateRestoreTest
> Jan 08 18:53:18 18:53:18.406 [ERROR] 
> org.apache.flink.table.planner.plan.nodes.exec.stream.IncrementalGroupAggregateRestoreTest.testRestore(TableTestProgram,
>  ExecNodeMetadata)[2] -- Time elapsed: 1.368 s <<< FAILURE!
> Jan 08 18:53:18 java.lang.AssertionError: 
> Jan 08 18:53:18 
> Jan 08 18:53:18 Expecting actual:
> Jan 08 18:53:18   ["+I[1, 5, 2, 3]",
> Jan 08 18:53:18 "+I[2, 2, 1, 1]",
> Jan 08 18:53:18 "-U[1, 5, 2, 3]",
> Jan 08 18:53:18 "+U[1, 3, 2, 2]",
> Jan 08 18:53:18 "-U[1, 3, 2, 2]",
> Jan 08 18:53:18 "+U[1, 9, 3, 4]"]
> Jan 08 18:53:18 to contain exactly in any order:
> Jan 08 18:53:18   ["+I[1, 5, 2, 3]", "+I[2, 2, 1, 1]", "-U[1, 5, 2, 3]", 
> "+U[1, 9, 3, 4]"]
> Jan 08 18:53:18 but the following elements were unexpected:
> Jan 08 18:53:18   ["+U[1, 3, 2, 2]", "-U[1, 3, 2, 2]"]
> Jan 08 18:53:18 
> Jan 08 18:53:18   at 
> org.apache.flink.table.planner.plan.nodes.exec.testutils.RestoreTestBase.testRestore(RestoreTestBase.java:292)
> Jan 08 18:53:18   at java.lang.reflect.Method.invoke(Method.java:498)
> [...]
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56110=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=10822



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [hotfix][docs] Add protobuf format info to overview page [flink]

2024-01-09 Thread via GitHub


sharath1709 opened a new pull request, #24053:
URL: https://github.com/apache/flink/pull/24053

   
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-31481][table] Support enhanced show databases syntax [flink]

2024-01-09 Thread via GitHub


jeyhunkarimov commented on code in PR #23612:
URL: https://github.com/apache/flink/pull/23612#discussion_r1446697164


##
docs/content.zh/docs/dev/table/sql/show.md:
##
@@ -507,10 +507,21 @@ SHOW CURRENT CATALOG
 ## SHOW DATABASES
 
 ```sql
-SHOW DATABASES
+SHOW DATABASES [ ( FROM | IN ) catalog_name] [ [NOT] (LIKE | ILIKE) 
 ]

Review Comment:
   Agree. Removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-31481][table] Support enhanced show databases syntax [flink]

2024-01-09 Thread via GitHub


jeyhunkarimov commented on code in PR #23612:
URL: https://github.com/apache/flink/pull/23612#discussion_r1446691730


##
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/operations/ShowDatabasesOperation.java:
##
@@ -20,26 +20,108 @@
 
 import org.apache.flink.annotation.Internal;
 import org.apache.flink.table.api.internal.TableResultInternal;
+import org.apache.flink.table.functions.SqlLikeUtils;
 
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static java.util.Objects.requireNonNull;
 import static 
org.apache.flink.table.api.internal.TableResultUtils.buildStringArrayResult;
 
 /** Operation to describe a SHOW DATABASES statement. */
 @Internal
 public class ShowDatabasesOperation implements ShowOperation {
 
+private final String preposition;

Review Comment:
   If we are flexible with the summary string, then I completely agree



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32416] initial implementation of DynamicKafkaSource with bound… [flink-connector-kafka]

2024-01-09 Thread via GitHub


mas-chen commented on code in PR #44:
URL: 
https://github.com/apache/flink-connector-kafka/pull/44#discussion_r1446686787


##
flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/enumerator/KafkaSourceEnumState.java:
##
@@ -42,7 +42,7 @@ public KafkaSourceEnumState(
 this.initialDiscoveryFinished = initialDiscoveryFinished;
 }
 
-KafkaSourceEnumState(
+public KafkaSourceEnumState(

Review Comment:
   This is the only KafkaSource code that I had to modify. As a result of 
moving packages, this needed to be publicly visible (the class is still 
annotated as Internal)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-31481][table] Support enhanced show databases syntax [flink]

2024-01-09 Thread via GitHub


jeyhunkarimov commented on code in PR #23612:
URL: https://github.com/apache/flink/pull/23612#discussion_r1446686365


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/operations/converters/SqlShowDatabasesConverter.java:
##
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.operations.converters;
+
+import org.apache.flink.sql.parser.dql.SqlShowDatabases;
+import org.apache.flink.table.catalog.CatalogManager;
+import org.apache.flink.table.operations.Operation;
+import org.apache.flink.table.operations.ShowDatabasesOperation;
+
+/** A converter for {@link SqlShowDatabases}. */
+public class SqlShowDatabasesConverter implements 
SqlNodeConverter {
+
+@Override
+public Operation convertSqlNode(SqlShowDatabases sqlShowDatabases, 
ConvertContext context) {
+if (sqlShowDatabases.getPreposition() == null) {
+return new ShowDatabasesOperation(
+sqlShowDatabases.getLikeType(),
+sqlShowDatabases.getLikeSqlPattern(),
+sqlShowDatabases.isNotLike());
+} else {
+CatalogManager catalogManager = context.getCatalogManager();
+String[] fullCatalogName = sqlShowDatabases.getCatalog();
+String catalogName =
+fullCatalogName.length == 0
+? catalogManager.getCurrentCatalog()
+: fullCatalogName[0];

Review Comment:
   That is also what we do. If no catalog is provided, summary string outputs 
no catalog name:
   
   ```
   final String sql1 = "SHOW DATABASES";
   assertShowDatabases(sql1, sql1);
   ```
   
   The thing is if a user provides a catalog name, it does not necessarily 
execute the `ShowDatabaseOperation::execute(Context ctx)` method. For example, 
please see the below code snipped I used in tests. 
   
   ```
   private void assertShowDatabases(String sql, String expectedSummary) {
   Operation operation = parse(sql);
   assertThat(operation).isInstanceOf(ShowDatabasesOperation.class);
   final ShowDatabasesOperation showDatabasesOperation = 
(ShowDatabasesOperation) operation;
   
assertThat(showDatabasesOperation.asSummaryString()).isEqualTo(expectedSummary);
   }
   ```
   
   
   Moreover, we the catalog name provided in the constructor 
(`ShowDatabaseOperation`) is not necessarily the same as the one we obtain in 
`ShowDatabaseOperation::execute(Context ctx)`. That is why, we retrieve the 
catalog only if the `catalogName` is `null`:
   
   ```
   String cName =
   catalogName == null ? 
ctx.getCatalogManager().getCurrentCatalog() : catalogName;
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33817][flink-protobuf] Set ReadDefaultValues=False by default in proto3 for performance improvement [flink]

2024-01-09 Thread via GitHub


sharath1709 commented on PR #24035:
URL: https://github.com/apache/flink/pull/24035#issuecomment-1883805059

   @flinkbot run azure
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33611] [flink-protobuf] Support Large Protobuf Schemas [flink]

2024-01-09 Thread via GitHub


sharath1709 commented on PR #23937:
URL: https://github.com/apache/flink/pull/23937#issuecomment-1883804593

   @flinkbot run azure
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-28229][streaming-java] Source API alternatives for StreamExecutionEnvironment#fromCollection() methods [flink]

2024-01-09 Thread via GitHub


afedulov commented on PR #22850:
URL: https://github.com/apache/flink/pull/22850#issuecomment-1883801537

   Resolved by https://github.com/apache/flink/pull/23558


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-28229][streaming-java] Source API alternatives for StreamExecutionEnvironment#fromCollection() methods [flink]

2024-01-09 Thread via GitHub


afedulov closed pull request #22850: [FLINK-28229][streaming-java] Source API 
alternatives for StreamExecutionEnvironment#fromCollection() methods
URL: https://github.com/apache/flink/pull/22850


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [hotfix] flink-connector-hive artifactId with scala version [flink-connector-hive]

2024-01-09 Thread via GitHub


snuyanzin merged PR #7:
URL: https://github.com/apache/flink-connector-hive/pull/7


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [hotfix] flink-connector-hive artifactId with scala version [flink-connector-hive]

2024-01-09 Thread via GitHub


boring-cyborg[bot] commented on PR #7:
URL: 
https://github.com/apache/flink-connector-hive/pull/7#issuecomment-1883795018

   Awesome work, congrats on your first merged pull request!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-34007) Flink Job stuck in suspend state after losing leadership in HA Mode

2024-01-09 Thread Zhenqiu Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenqiu Huang updated FLINK-34007:
--
Attachment: job-manager.log

> Flink Job stuck in suspend state after losing leadership in HA Mode
> ---
>
> Key: FLINK-34007
> URL: https://issues.apache.org/jira/browse/FLINK-34007
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.18.1, 1.18.2
>Reporter: Zhenqiu Huang
>Priority: Major
> Attachments: job-manager.log
>
>
> The observation is that Job manager goes to suspend state with a failed 
> container not able to register itself to resource manager after timeout.
> JM Log, see attached
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34007) Flink Job stuck in suspend state after losing leadership in HA Mode

2024-01-09 Thread Zhenqiu Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenqiu Huang updated FLINK-34007:
--
Summary: Flink Job stuck in suspend state after losing leadership in HA 
Mode  (was: Flink Job stuck in suspend state after recovery from failure in HA 
Mode)

> Flink Job stuck in suspend state after losing leadership in HA Mode
> ---
>
> Key: FLINK-34007
> URL: https://issues.apache.org/jira/browse/FLINK-34007
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.18.1, 1.18.2
>Reporter: Zhenqiu Huang
>Priority: Major
>
> The observation is that Job manager goes to suspend state with a failed 
> container not able to register itself to resource manager after timeout.
> JM Log, see attached
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] Bump org.eclipse.jetty:jetty-server from 9.3.20.v20170531 to 9.4.51.v20230217 in /flink-sql-connector-hive-3.1.3 [flink-connector-hive]

2024-01-09 Thread via GitHub


dependabot[bot] opened a new pull request, #13:
URL: https://github.com/apache/flink-connector-hive/pull/13

   Bumps org.eclipse.jetty:jetty-server from 9.3.20.v20170531 to 
9.4.51.v20230217.
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.eclipse.jetty:jetty-server=maven=9.3.20.v20170531=9.4.51.v20230217)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   You can disable automated security fix PRs for this repo from the [Security 
Alerts page](https://github.com/apache/flink-connector-hive/network/alerts).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Bump org.apache.avro:avro from 1.8.2 to 1.11.3 in /flink-sql-connector-hive-3.1.3 [flink-connector-hive]

2024-01-09 Thread via GitHub


dependabot[bot] opened a new pull request, #12:
URL: https://github.com/apache/flink-connector-hive/pull/12

   Bumps org.apache.avro:avro from 1.8.2 to 1.11.3.
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.apache.avro:avro=maven=1.8.2=1.11.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   You can disable automated security fix PRs for this repo from the [Security 
Alerts page](https://github.com/apache/flink-connector-hive/network/alerts).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Bump org.apache.zookeeper:zookeeper from 3.4.9 to 3.7.2 in /flink-sql-connector-hive-3.1.3 [flink-connector-hive]

2024-01-09 Thread via GitHub


dependabot[bot] opened a new pull request, #9:
URL: https://github.com/apache/flink-connector-hive/pull/9

   Bumps org.apache.zookeeper:zookeeper from 3.4.9 to 3.7.2.
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.apache.zookeeper:zookeeper=maven=3.4.9=3.7.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   You can disable automated security fix PRs for this repo from the [Security 
Alerts page](https://github.com/apache/flink-connector-hive/network/alerts).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Bump commons-net:commons-net from 3.6 to 3.9.0 [flink-connector-hive]

2024-01-09 Thread via GitHub


dependabot[bot] opened a new pull request, #11:
URL: https://github.com/apache/flink-connector-hive/pull/11

   Bumps commons-net:commons-net from 3.6 to 3.9.0.
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=commons-net:commons-net=maven=3.6=3.9.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   You can disable automated security fix PRs for this repo from the [Security 
Alerts page](https://github.com/apache/flink-connector-hive/network/alerts).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Bump org.apache.avro:avro from 1.8.2 to 1.11.3 [flink-connector-hive]

2024-01-09 Thread via GitHub


dependabot[bot] opened a new pull request, #10:
URL: https://github.com/apache/flink-connector-hive/pull/10

   Bumps org.apache.avro:avro from 1.8.2 to 1.11.3.
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.apache.avro:avro=maven=1.8.2=1.11.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   You can disable automated security fix PRs for this repo from the [Security 
Alerts page](https://github.com/apache/flink-connector-hive/network/alerts).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [hotfix] Run end to end tests only if profile run-end-to-end-tests explicitely activated [flink-connector-hive]

2024-01-09 Thread via GitHub


snuyanzin merged PR #8:
URL: https://github.com/apache/flink-connector-hive/pull/8


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (FLINK-34007) Flink Job stuck in suspend state after recovery from failure in HA Mode

2024-01-09 Thread Gyula Fora (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804840#comment-17804840
 ] 

Gyula Fora edited comment on FLINK-34007 at 1/9/24 7:46 PM:


>From initial investigation, the job manager is initially lose the leadership, 
>then goes to SUSPENDED status. Shouldn't the job manager exit directly rather 
>than goes to SUSPENDED status?

 


was (Author: zhenqiuhuang):
>From initial investigation, the job manager is initially lose the leadership, 
>then goes to SUSPENDED status. Shouldn't the job manager exit directly rather 
>than goes to SUSPENDED status?

2024-01-08 21:44:57,142 INFO  
org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner [] - 
JobMasterServiceLeadershipRunner for job 217cee964b2cfdc3115fb74cac0ec550 was 
revoked leadership with leader id 9987190b-35f4-4238-b317-057dc3615e4d. 
Stopping current JobMasterServiceProcess.
2024-01-08 21:45:16,280 INFO  
org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - 
http://172.16.197.136:8081 lost leadership
2024-01-08 21:45:16,280 INFO  
org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - 
Resource manager service is revoked leadership with session id 
9987190b-35f4-4238-b317-057dc3615e4d.
2024-01-08 21:45:16,281 INFO  
org.apache.flink.runtime.dispatcher.runner.DefaultDispatcherRunner [] - 
DefaultDispatcherRunner was revoked the leadership with leader id 
9987190b-35f4-4238-b317-057dc3615e4d. Stopping the DispatcherLeaderProcess.
2024-01-08 21:45:16,282 INFO  
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - 
Stopping SessionDispatcherLeaderProcess.
2024-01-08 21:45:16,282 INFO  
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopping 
dispatcher pekko.tcp://flink@172.16.197.136:6123/user/rpc/dispatcher_1.
2024-01-08 21:45:16,282 INFO  
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopping all 
currently running jobs of dispatcher 
pekko.tcp://flink@172.16.197.136:6123/user/rpc/dispatcher_1.
2024-01-08 21:45:16,282 INFO  org.apache.flink.runtime.jobmaster.JobMaster  
   [] - Stopping the JobMaster for job 
'amp-ade-fitness-clickstream-projection-uat' (217cee964b2cfdc3115fb74cac0ec550).
2024-01-08 21:45:16,285 INFO  
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Job 
217cee964b2cfdc3115fb74cac0ec550 reached terminal state SUSPENDED.
2024-01-08 21:45:16,286 INFO  
org.apache.flink.runtime.security.token.DefaultDelegationTokenManager [] - 
Stopping credential renewal
2024-01-08 21:45:16,286 INFO  
org.apache.flink.runtime.security.token.DefaultDelegationTokenManager [] - 
Stopped credential renewal
2024-01-08 21:45:16,286 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager [] 
- Closing the slot manager.
2024-01-08 21:45:16,286 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager [] 
- Suspending the slot manager.
2024-01-08 21:45:16,287 INFO  
org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
Stopping DefaultLeaderRetrievalService.
2024-01-08 21:45:16,287 INFO  
org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver [] 
- Stopping 
KubernetesLeaderRetrievalDriver{configMapName='acsflink-5e92d541f0cd0ad7352c4dc5463c54df-cluster-config-map'}.
2024-01-08 21:45:16,287 INFO  
org.apache.flink.kubernetes.kubeclient.resources.KubernetesConfigMapSharedInformer
 [] - Stopped to watch for 
amp-ae-video-uat/acsflink-5e92d541f0cd0ad7352c4dc5463c54df-cluster-config-map, 
watching id:cc34317a-3299-4cb5-a966-55cb546e8bf9
2024-01-08 21:45:16,287 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Job 
amp-ade-fitness-clickstream-projection-uat (217cee964b2cfdc3115fb74cac0ec550) 
switched from state RUNNING to SUSPENDED.

> Flink Job stuck in suspend state after recovery from failure in HA Mode
> ---
>
> Key: FLINK-34007
> URL: https://issues.apache.org/jira/browse/FLINK-34007
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.18.1, 1.18.2
>Reporter: Zhenqiu Huang
>Priority: Major
>
> The observation is that Job manager goes to suspend state with a failed 
> container not able to register itself to resource manager after timeout.
> JM Log, see attached
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34007) Flink Job stuck in suspend state after recovery from failure in HA Mode

2024-01-09 Thread Gyula Fora (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora updated FLINK-34007:
---
Description: 
The observation is that Job manager goes to suspend state with a failed 
container not able to register itself to resource manager after timeout.

JM Log, see attached

 

  was:
The observation is that Job manager goes to suspend state with a failed 
container not able to register itself to resource manager after timeout.

JM Log:

2024-01-04 02:58:39,210 INFO  
org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner [] - 
JobMasterServiceLeadershipRunner for job 217cee964b2cfdc3115fb74cac0ec550 was 
revoked leadership with leader id eda6fee6-ce02-4076-9a99-8c43a92629f7. 
Stopping current JobMasterServiceProcess.
2024-01-04 02:58:58,347 INFO  
org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - 
http://172.16.71.11:8081 lost leadership
2024-01-04 02:58:58,347 INFO  
org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - 
Resource manager service is revoked leadership with session id 
eda6fee6-ce02-4076-9a99-8c43a92629f7.
2024-01-04 02:58:58,348 INFO  
org.apache.flink.runtime.dispatcher.runner.DefaultDispatcherRunner [] - 
DefaultDispatcherRunner was revoked the leadership with leader id 
eda6fee6-ce02-4076-9a99-8c43a92629f7. Stopping the DispatcherLeaderProcess.
2024-01-04 02:58:58,348 INFO  
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - 
Stopping SessionDispatcherLeaderProcess.
2024-01-04 02:58:58,349 INFO  
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopping 
dispatcher pekko.tcp://flink@172.16.71.11:6123/user/rpc/dispatcher_1.
2024-01-04 02:58:58,349 INFO  org.apache.flink.runtime.jobmaster.JobMaster  
   [] - Stopping the JobMaster for job 
'amp-ade-fitness-clickstream-projection-uat' (217cee964b2cfdc3115fb74cac0ec550).
2024-01-04 02:58:58,349 INFO  
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopping all 
currently running jobs of dispatcher 
pekko.tcp://flink@172.16.71.11:6123/user/rpc/dispatcher_1.
2024-01-04 02:58:58,351 INFO  
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Job 
217cee964b2cfdc3115fb74cac0ec550 reached terminal state SUSPENDED.
2024-01-04 02:58:58,352 INFO  
org.apache.flink.runtime.security.token.DefaultDelegationTokenManager [] - 
Stopping credential renewal
2024-01-04 02:58:58,352 INFO  
org.apache.flink.runtime.security.token.DefaultDelegationTokenManager [] - 
Stopped credential renewal
2024-01-04 02:58:58,352 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager [] 
- Closing the slot manager.
2024-01-04 02:58:58,351 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Job 
amp-ade-fitness-clickstream-projection-uat (217cee964b2cfdc3115fb74cac0ec550) 
switched from state RUNNING to SUSPENDED.
org.apache.flink.util.FlinkException: AdaptiveScheduler is being stopped.
at 
org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.closeAsync(AdaptiveScheduler.java:474)
 ~[flink-dist-1.18.1.6-ase.jar:1.18.1.6-ase]
at 
org.apache.flink.runtime.jobmaster.JobMaster.stopScheduling(JobMaster.java:1093)
 ~[flink-dist-1.18.1.6-ase.jar:1.18.1.6-ase]
at 
org.apache.flink.runtime.jobmaster.JobMaster.stopJobExecution(JobMaster.java:1056)
 ~[flink-dist-1.18.1.6-ase.jar:1.18.1.6-ase]
at 
org.apache.flink.runtime.jobmaster.JobMaster.onStop(JobMaster.java:454) 
~[flink-dist-1.18.1.6-ase.jar:1.18.1.6-ase]
at 
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStop(RpcEndpoint.java:239)
 ~[flink-dist-1.18.1.6-ase.jar:1.18.1.6-ase]
at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor$StartedState.lambda$terminate$0(PekkoRpcActor.java:574)
 ~[flink-rpc-akkadb952114-fa83-4aba-b20a-b7e5771ce59c.jar:1.18.1.6-ase]
at 
org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
 ~[flink-dist-1.18.1.6-ase.jar:1.18.1.6-ase]
at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor$StartedState.terminate(PekkoRpcActor.java:573)
 ~[flink-rpc-akkadb952114-fa83-4aba-b20a-b7e5771ce59c.jar:1.18.1.6-ase]
at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleControlMessage(PekkoRpcActor.java:196)
 ~[flink-rpc-akkadb952114-fa83-4aba-b20a-b7e5771ce59c.jar:1.18.1.6-ase]
at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) 
[flink-rpc-akkadb952114-fa83-4aba-b20a-b7e5771ce59c.jar:1.18.1.6-ase]
at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) 
[flink-rpc-akkadb952114-fa83-4aba-b20a-b7e5771ce59c.jar:1.18.1.6-ase]
at scala.PartialFunction.applyOrElse(PartialFunction.scala:127) 
[flink-rpc-akkadb952114-fa83-4aba-b20a-b7e5771ce59c.jar:1.18.1.6-ase]
at 

Re: [PR] [FLINK-33365] include filters with Lookup joins [flink-connector-jdbc]

2024-01-09 Thread via GitHub


davidradl commented on code in PR #79:
URL: 
https://github.com/apache/flink-connector-jdbc/pull/79#discussion_r1446523771


##
flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/table/JdbcRowDataLookupFunction.java:
##
@@ -92,9 +103,22 @@ public JdbcRowDataLookupFunction(
 })
 .toArray(DataType[]::new);
 this.maxRetryTimes = maxRetryTimes;
-this.query =
+
+final String baseSelectStatement =
 options.getDialect()
 .getSelectFromStatement(options.getTableName(), 
fieldNames, keyNames);
+if (conditions == null || conditions.length == 0) {
+this.query = baseSelectStatement;
+if (LOG.isDebugEnabled()) {

Review Comment:
   log4jLogger does check for debug, but other Logger implementations may not . 
I see this method is used extensively in core Flink - I am included to leave it 
in, it means the parameters are not formatted when debug is not on. WDYT 
@libenchao 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33365] include filters with Lookup joins [flink-connector-jdbc]

2024-01-09 Thread via GitHub


davidradl commented on code in PR #79:
URL: 
https://github.com/apache/flink-connector-jdbc/pull/79#discussion_r1446523771


##
flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/table/JdbcRowDataLookupFunction.java:
##
@@ -92,9 +103,22 @@ public JdbcRowDataLookupFunction(
 })
 .toArray(DataType[]::new);
 this.maxRetryTimes = maxRetryTimes;
-this.query =
+
+final String baseSelectStatement =
 options.getDialect()
 .getSelectFromStatement(options.getTableName(), 
fieldNames, keyNames);
+if (conditions == null || conditions.length == 0) {
+this.query = baseSelectStatement;
+if (LOG.isDebugEnabled()) {

Review Comment:
   log4jLogger does check for debug, but other Logger implementations may not . 
I see this method is used extensively in core Flink - I am included to leave it 
in, it means the parameters are not formatted when debug is not on. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33969] Implement restore tests for TableSourceScan node [flink]

2024-01-09 Thread via GitHub


bvarghese1 commented on PR #24020:
URL: https://github.com/apache/flink/pull/24020#issuecomment-1883622718

   > From very quick skimming... I looked at `SourceAbilitySpec`, and it looks 
like this test ought to help cover the `JsonSubTypes` there.
   > 
   > I think I see the following readily:
   > 
   > ```
   > ProjectPushDownSpec
   > FilterPushDownSpec
   > LimitPushDownSpec
   > PartitionPushDownSpec
   > ReadingMetadataSpec
   > WatermarkPushDownSpec
   > ```
   > 
   > Is there coverage for the other two?
   > 
   > ```
   > SourceWatermarkSpec
   > AggregatePushDownSpec
   > ```
   
   Added coverage for `SourceWatermarkSpec`.
   `AggregatePushDownSpec` is only supported in BATCH mode - 
https://github.com/apache/flink/blob/master/flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/config/OptimizerConfigOptions.java#L106


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-30593][autoscaler] Improve restart time tracking [flink-kubernetes-operator]

2024-01-09 Thread via GitHub


mxm commented on PR #735:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/735#issuecomment-1883609244

   Sure, they should be running now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-30593][autoscaler] Improve restart time tracking [flink-kubernetes-operator]

2024-01-09 Thread via GitHub


afedulov commented on PR #735:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/735#issuecomment-1883606900

   @mxm I addressed the comments and rebased, could you please kick off the 
tests again?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-30593][autoscaler] Improve restart time tracking [flink-kubernetes-operator]

2024-01-09 Thread via GitHub


afedulov commented on code in PR #735:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/735#discussion_r1446480705


##
flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobAutoScalerImpl.java:
##
@@ -182,22 +182,22 @@ private void runScalingLogic(Context ctx, 
AutoscalerFlinkMetrics autoscalerMetri
 jobTopology.getVerticesInTopologicalOrder(),
 () -> lastEvaluatedMetrics.get(ctx.getJobKey()));
 
-if (!collectedMetrics.isFullyCollected()) {
-// We have done an upfront evaluation, but we are not ready for 
scaling.
-resetRecommendedParallelism(evaluatedMetrics);
-return;
-}
-
 var scalingHistory = getTrimmedScalingHistory(stateStore, ctx, now);
 // A scaling tracking without an end time gets created whenever a 
scaling decision is
 // applied. Here, when the job transitions to RUNNING, we record the 
time for it.
 if (ctx.getJobStatus() == JobStatus.RUNNING) {
-if (scalingTracking.setEndTimeIfTrackedAndParallelismMatches(
+if 
(scalingTracking.recordRestartDurationIfTrackedAndParallelismMatches(
 now, jobTopology, scalingHistory)) {
 stateStore.storeScalingTracking(ctx, scalingTracking);
 }
 }

Review Comment:
   > It could be part of stateStore.storeScalingTracking(ctx, scalingTracking); 
even
I removed the redundant RUNNING check, as Max recommended, so it looks more 
straightforward now. Pushing this call down into the `storeScalingTracking` 
would make it harder to reason, since it is key that `runRescaleLogic` is only 
executed when the job is in the RUNNING state and hence the transition is 
considered complete. It also does not seem right to bundle the logic specific 
to this concrete situation into `KubernetesAutoScalerStateStore` which acts 
more as a simple persistence layer. Hope this is fine by you.



##
flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobAutoScalerImpl.java:
##
@@ -182,22 +182,22 @@ private void runScalingLogic(Context ctx, 
AutoscalerFlinkMetrics autoscalerMetri
 jobTopology.getVerticesInTopologicalOrder(),
 () -> lastEvaluatedMetrics.get(ctx.getJobKey()));
 
-if (!collectedMetrics.isFullyCollected()) {
-// We have done an upfront evaluation, but we are not ready for 
scaling.
-resetRecommendedParallelism(evaluatedMetrics);
-return;
-}
-
 var scalingHistory = getTrimmedScalingHistory(stateStore, ctx, now);
 // A scaling tracking without an end time gets created whenever a 
scaling decision is
 // applied. Here, when the job transitions to RUNNING, we record the 
time for it.
 if (ctx.getJobStatus() == JobStatus.RUNNING) {
-if (scalingTracking.setEndTimeIfTrackedAndParallelismMatches(
+if 
(scalingTracking.recordRestartDurationIfTrackedAndParallelismMatches(
 now, jobTopology, scalingHistory)) {
 stateStore.storeScalingTracking(ctx, scalingTracking);
 }
 }

Review Comment:
   Good catch, thanks.



##
flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobAutoScalerImpl.java:
##
@@ -182,22 +182,22 @@ private void runScalingLogic(Context ctx, 
AutoscalerFlinkMetrics autoscalerMetri
 jobTopology.getVerticesInTopologicalOrder(),
 () -> lastEvaluatedMetrics.get(ctx.getJobKey()));
 
-if (!collectedMetrics.isFullyCollected()) {
-// We have done an upfront evaluation, but we are not ready for 
scaling.
-resetRecommendedParallelism(evaluatedMetrics);
-return;
-}
-
 var scalingHistory = getTrimmedScalingHistory(stateStore, ctx, now);
 // A scaling tracking without an end time gets created whenever a 
scaling decision is
 // applied. Here, when the job transitions to RUNNING, we record the 
time for it.
 if (ctx.getJobStatus() == JobStatus.RUNNING) {
-if (scalingTracking.setEndTimeIfTrackedAndParallelismMatches(
+if 
(scalingTracking.recordRestartDurationIfTrackedAndParallelismMatches(
 now, jobTopology, scalingHistory)) {
 stateStore.storeScalingTracking(ctx, scalingTracking);
 }
 }

Review Comment:
   > It could be part of stateStore.storeScalingTracking(ctx, scalingTracking); 
even
   
I removed the redundant RUNNING check, as Max recommended, so it looks more 
straightforward now. Pushing this call down into the `storeScalingTracking` 
would make it harder to reason, since it is key that `runRescaleLogic` is only 
executed when the job is in the RUNNING state and hence the transition is 
considered complete. It also does not seem right to bundle the logic specific 
to this concrete situation into 

[jira] [Commented] (FLINK-34007) Flink Job stuck in suspend state after recovery from failure in HA Mode

2024-01-09 Thread Zhenqiu Huang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804840#comment-17804840
 ] 

Zhenqiu Huang commented on FLINK-34007:
---

>From initial investigation, the job manager is initially lose the leadership, 
>then goes to SUSPENDED status. Shouldn't the job manager exit directly rather 
>than goes to SUSPENDED status?

2024-01-08 21:44:57,142 INFO  
org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner [] - 
JobMasterServiceLeadershipRunner for job 217cee964b2cfdc3115fb74cac0ec550 was 
revoked leadership with leader id 9987190b-35f4-4238-b317-057dc3615e4d. 
Stopping current JobMasterServiceProcess.
2024-01-08 21:45:16,280 INFO  
org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - 
http://172.16.197.136:8081 lost leadership
2024-01-08 21:45:16,280 INFO  
org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - 
Resource manager service is revoked leadership with session id 
9987190b-35f4-4238-b317-057dc3615e4d.
2024-01-08 21:45:16,281 INFO  
org.apache.flink.runtime.dispatcher.runner.DefaultDispatcherRunner [] - 
DefaultDispatcherRunner was revoked the leadership with leader id 
9987190b-35f4-4238-b317-057dc3615e4d. Stopping the DispatcherLeaderProcess.
2024-01-08 21:45:16,282 INFO  
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - 
Stopping SessionDispatcherLeaderProcess.
2024-01-08 21:45:16,282 INFO  
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopping 
dispatcher pekko.tcp://flink@172.16.197.136:6123/user/rpc/dispatcher_1.
2024-01-08 21:45:16,282 INFO  
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopping all 
currently running jobs of dispatcher 
pekko.tcp://flink@172.16.197.136:6123/user/rpc/dispatcher_1.
2024-01-08 21:45:16,282 INFO  org.apache.flink.runtime.jobmaster.JobMaster  
   [] - Stopping the JobMaster for job 
'amp-ade-fitness-clickstream-projection-uat' (217cee964b2cfdc3115fb74cac0ec550).
2024-01-08 21:45:16,285 INFO  
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Job 
217cee964b2cfdc3115fb74cac0ec550 reached terminal state SUSPENDED.
2024-01-08 21:45:16,286 INFO  
org.apache.flink.runtime.security.token.DefaultDelegationTokenManager [] - 
Stopping credential renewal
2024-01-08 21:45:16,286 INFO  
org.apache.flink.runtime.security.token.DefaultDelegationTokenManager [] - 
Stopped credential renewal
2024-01-08 21:45:16,286 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager [] 
- Closing the slot manager.
2024-01-08 21:45:16,286 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager [] 
- Suspending the slot manager.
2024-01-08 21:45:16,287 INFO  
org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
Stopping DefaultLeaderRetrievalService.
2024-01-08 21:45:16,287 INFO  
org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver [] 
- Stopping 
KubernetesLeaderRetrievalDriver{configMapName='acsflink-5e92d541f0cd0ad7352c4dc5463c54df-cluster-config-map'}.
2024-01-08 21:45:16,287 INFO  
org.apache.flink.kubernetes.kubeclient.resources.KubernetesConfigMapSharedInformer
 [] - Stopped to watch for 
amp-ae-video-uat/acsflink-5e92d541f0cd0ad7352c4dc5463c54df-cluster-config-map, 
watching id:cc34317a-3299-4cb5-a966-55cb546e8bf9
2024-01-08 21:45:16,287 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Job 
amp-ade-fitness-clickstream-projection-uat (217cee964b2cfdc3115fb74cac0ec550) 
switched from state RUNNING to SUSPENDED.

> Flink Job stuck in suspend state after recovery from failure in HA Mode
> ---
>
> Key: FLINK-34007
> URL: https://issues.apache.org/jira/browse/FLINK-34007
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.18.1, 1.18.2
>Reporter: Zhenqiu Huang
>Priority: Major
>
> The observation is that Job manager goes to suspend state with a failed 
> container not able to register itself to resource manager after timeout.
> JM Log:
> 2024-01-04 02:58:39,210 INFO  
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner [] - 
> JobMasterServiceLeadershipRunner for job 217cee964b2cfdc3115fb74cac0ec550 was 
> revoked leadership with leader id eda6fee6-ce02-4076-9a99-8c43a92629f7. 
> Stopping current JobMasterServiceProcess.
> 2024-01-04 02:58:58,347 INFO  
> org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - 
> http://172.16.71.11:8081 lost leadership
> 2024-01-04 02:58:58,347 INFO  
> org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - 
> Resource manager service is revoked leadership with session id 
> eda6fee6-ce02-4076-9a99-8c43a92629f7.
> 2024-01-04 02:58:58,348 

[PR] [hotfix] Run end to end tests only if profile run-end-to-end-tests explicitely activated [flink-connector-hive]

2024-01-09 Thread via GitHub


snuyanzin opened a new pull request, #8:
URL: https://github.com/apache/flink-connector-hive/pull/8

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33914][ci] Adds basic Flink CI workflow [flink]

2024-01-09 Thread via GitHub


XComp commented on code in PR #23970:
URL: https://github.com/apache/flink/pull/23970#discussion_r1446358342


##
.github/actions/run_mvn/action.yml:
##
@@ -0,0 +1,42 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+---
+name: "Runs Maven Command"

Review Comment:
   You're right, I used the wrong environment variable in my comment above. The 
example should be `GITHUB_ENV`. The JDK setting is added to `GITHUB_ENV` in the 
`set_java_in_container` custom action. It's not "automagically" added to the 
environment.
   
   I did test runs for bash functions and exporting env variables (see 
[workflow](https://github.com/XComp/github-actions-playground/actions/runs/7463171863/job/20307209534#step:4:5)).
 Neither the function nor the variable are passed down to subsequent steps if 
you don't use `GITHUB_ENV`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   >