Re: [PR] [FLINK-25537] [JUnit5 Migration] Module: flink-core package api-common [flink]

2023-12-20 Thread via GitHub


flinkbot commented on PR #23960:
URL: https://github.com/apache/flink/pull/23960#issuecomment-1864026986

   
   ## CI report:
   
   * f5a54bc7ae7fd532f00d043ecad6e998de786695 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-29050) [JUnit5 Migration] Module: flink-hadoop-compatibility

2023-12-20 Thread RocMarshal (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798856#comment-17798856
 ] 

RocMarshal commented on FLINK-29050:


Based on [https://github.com/apache/flink/pull/20990#discussion_r1292783464]

We'd like to do the following sub-tasks for the current jira.

- Rename  AbstractTestBase, JavaProgramTestBase MultipleProgramsTestBase to 
JUnit4

- Use jUnit5 to re-write the implementations for the above classes & tag 
JUnit4 classes as deprecated 

- Use junit5 implementation classes to migrate the Module: 
flink-hadoop-compatibility

- Use junit5 implementation to make adaption for the sub-classes of JUnit4

> [JUnit5 Migration] Module: flink-hadoop-compatibility
> -
>
> Key: FLINK-29050
> URL: https://issues.apache.org/jira/browse/FLINK-29050
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hadoop Compatibility, Tests
>Reporter: RocMarshal
>Assignee: RocMarshal
>Priority: Major
>  Labels: pull-request-available, stale-assigned, starter
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29050) [JUnit5 Migration] Module: flink-hadoop-compatibility

2023-12-20 Thread RocMarshal (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798856#comment-17798856
 ] 

RocMarshal edited comment on FLINK-29050 at 12/20/23 8:07 AM:
--

Based on [https://github.com/apache/flink/pull/20990#discussion_r1292783464]

We'd like to do the following sub-tasks for the current jira.
 - Rename  AbstractTestBase, JavaProgramTestBase MultipleProgramsTestBase to 
JUnit4,      Use jUnit5 to re-write the implementations for the above 
classes & tag JUnit4 classes as deprecated 

 - Use junit5 implementation classes to migrate the Module: 
flink-hadoop-compatibility

 - Use junit5 implementation to make adaption for the sub-classes of JUnit4


was (Author: rocmarshal):
Based on [https://github.com/apache/flink/pull/20990#discussion_r1292783464]

We'd like to do the following sub-tasks for the current jira.

- Rename  AbstractTestBase, JavaProgramTestBase MultipleProgramsTestBase to 
JUnit4

- Use jUnit5 to re-write the implementations for the above classes & tag 
JUnit4 classes as deprecated 

- Use junit5 implementation classes to migrate the Module: 
flink-hadoop-compatibility

- Use junit5 implementation to make adaption for the sub-classes of JUnit4

> [JUnit5 Migration] Module: flink-hadoop-compatibility
> -
>
> Key: FLINK-29050
> URL: https://issues.apache.org/jira/browse/FLINK-29050
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hadoop Compatibility, Tests
>Reporter: RocMarshal
>Assignee: RocMarshal
>Priority: Major
>  Labels: pull-request-available, stale-assigned, starter
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29050) [JUnit5 Migration] Module: flink-hadoop-compatibility

2023-12-20 Thread RocMarshal (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798856#comment-17798856
 ] 

RocMarshal edited comment on FLINK-29050 at 12/20/23 8:08 AM:
--

Based on [https://github.com/apache/flink/pull/20990#discussion_r1292783464]

We'd like to do the following sub-tasks for the current jira.
 - Rename  AbstractTestBase, JavaProgramTestBase MultipleProgramsTestBase to 
JUnit4,     
 - Use jUnit5 to re-write the implementations for the above classes & tag 
JUnit4 classes as deprecated 

 - Use junit5 implementation classes to migrate the Module: 
flink-hadoop-compatibility

 - Use junit5 implementation to make adaption for the sub-classes of JUnit4


was (Author: rocmarshal):
Based on [https://github.com/apache/flink/pull/20990#discussion_r1292783464]

We'd like to do the following sub-tasks for the current jira.
 - Rename  AbstractTestBase, JavaProgramTestBase MultipleProgramsTestBase to 
JUnit4,      Use jUnit5 to re-write the implementations for the above 
classes & tag JUnit4 classes as deprecated 

 - Use junit5 implementation classes to migrate the Module: 
flink-hadoop-compatibility

 - Use junit5 implementation to make adaption for the sub-classes of JUnit4

> [JUnit5 Migration] Module: flink-hadoop-compatibility
> -
>
> Key: FLINK-29050
> URL: https://issues.apache.org/jira/browse/FLINK-29050
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hadoop Compatibility, Tests
>Reporter: RocMarshal
>Assignee: RocMarshal
>Priority: Major
>  Labels: pull-request-available, stale-assigned, starter
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33783) Add options to ignore parsing error in Kafka SQL Connector

2023-12-20 Thread Bartosz Mikulski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798873#comment-17798873
 ] 

Bartosz Mikulski commented on FLINK-33783:
--

Hi [~martijnvisser]!

Sorry for the delayed response. I agree that this topic should be a FLIP at 
some point! Can I create such FLIP now or does it require a bigger discussion 
first?

I created this issue first as a stopgap solution. I already implemented the 
feature in a fork, it can be found here: 
[https://github.com/deep-bi/flink-connector-kafka/releases/tag/deep-v3.0.2]. 
Maybe opening a pull request would be a great first step for starting a wider 
discussion about DLQ (Dead Letter Queue) handling in SQL. What do you think?

Regarding your side note, I agree with you. There should be little to no chance 
of having unparsable events. However, our client use case is a bit different - 
they want to manage the schemas themselves and disable the auto-register 
feature (see: https://issues.apache.org/jira/browse/FLINK-33045). In this 
scenario, they want to control the schema registry outside of the Flink. They 
also use other things in the pipeline, like Kafka Streams, etc. If some events 
end up in the topic that are not registered they should be ignored. Again, this 
is still more of a hypothetical scenario, but it still may happen at some point.

> Add options to ignore parsing error in Kafka SQL Connector
> --
>
> Key: FLINK-33783
> URL: https://issues.apache.org/jira/browse/FLINK-33783
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kafka, Table SQL / API
>Reporter: Bartosz Mikulski
>Priority: Major
>
> h1. Current state
> If an unparsable event enters a Flink Kafka Source in SQL then the whole 
> application restarts. For JSON format there is a property that allows to 
> ignore unparsable events. Other formats, like `confluent-avro` etc don't 
> support that.
> h1. Desired state
> We would like to ignore the parsing exception in Kafka Source in SQL 
> regardless of the format used. Additionally, a new metric should be 
> introduced that returns a count of unparsable events seen so far.
> In the future there should be a Dead Letter Queue handling in SQL Sources 
> similar to Kafka Streams: 
> [https://www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues/].
> For now, the universal ignore with metric would be enough.
> h1. Implementation
> We already have an implementation for this case in the Flink Kafka Connector 
> and we would like to open a pull request for it. However, we created the 
> issue first as per this recommendation 
> [https://flink.apache.org/how-to-contribute/contribute-code/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-25537] [JUnit5 Migration] Module: flink-core package api-common [flink]

2023-12-20 Thread via GitHub


GOODBOY008 commented on PR #23960:
URL: https://github.com/apache/flink/pull/23960#issuecomment-1864104413

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33783) Add options to ignore parsing error in Kafka SQL Connector

2023-12-20 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798878#comment-17798878
 ] 

Martijn Visser commented on FLINK-33783:


No worries [~bartoszdeepbi] 
We shouldn't open up a PR yet, it's better to first have a discussion/FLIP on 
the Dev mailing list. You can include your PoC for the solution in the proposal 
of course, so that it would be rather easy to open up a PR at a later stage. If 
you want, I can grant you FLIP permissions to create a page and then you can 
open the discussion as well. 

> Add options to ignore parsing error in Kafka SQL Connector
> --
>
> Key: FLINK-33783
> URL: https://issues.apache.org/jira/browse/FLINK-33783
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kafka, Table SQL / API
>Reporter: Bartosz Mikulski
>Priority: Major
>
> h1. Current state
> If an unparsable event enters a Flink Kafka Source in SQL then the whole 
> application restarts. For JSON format there is a property that allows to 
> ignore unparsable events. Other formats, like `confluent-avro` etc don't 
> support that.
> h1. Desired state
> We would like to ignore the parsing exception in Kafka Source in SQL 
> regardless of the format used. Additionally, a new metric should be 
> introduced that returns a count of unparsable events seen so far.
> In the future there should be a Dead Letter Queue handling in SQL Sources 
> similar to Kafka Streams: 
> [https://www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues/].
> For now, the universal ignore with metric would be enough.
> h1. Implementation
> We already have an implementation for this case in the Flink Kafka Connector 
> and we would like to open a pull request for it. However, we created the 
> issue first as per this recommendation 
> [https://flink.apache.org/how-to-contribute/contribute-code/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [WIP][FLINK-25537] [JUnit5 Migration] Module: flink-core [flink]

2023-12-20 Thread via GitHub


GOODBOY008 closed pull request #19563: [WIP][FLINK-25537]  [JUnit5 Migration] 
Module: flink-core
URL: https://github.com/apache/flink/pull/19563


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-25537] [JUnit5 Migration] Module: flink-core package api-common [flink]

2023-12-20 Thread via GitHub


Jiabao-Sun commented on code in PR #23960:
URL: https://github.com/apache/flink/pull/23960#discussion_r1432450266


##
flink-core/src/test/java/org/apache/flink/api/common/accumulators/AverageAccumulatorTest.java:
##
@@ -94,19 +94,19 @@ public void testMergeSuccess() {
 }
 
 avg1.merge(avg2);
-assertEquals(4.5, avg1.getLocalValue(), 0.0);
+assertThat(avg1.getLocalValue()).isCloseTo(4.5, within(0.0));
 }
 
 @Test
-public void testMergeFailed() {
+void testMergeFailed() {
 AverageAccumulator average = new AverageAccumulator();
 Accumulator averageNew = null;
 average.add(1);
 try {
 average.merge(averageNew);
 fail("should fail with an exception");
 } catch (IllegalArgumentException e) {
-assertNotNull(e.getMessage());
+assertThat(e.getMessage()).isNotNull();

Review Comment:
   We can use `assertThatThrownBy` to simplify.



##
flink-core/src/test/java/org/apache/flink/api/common/eventtime/WatermarkOutputMultiplexerTest.java:
##
@@ -18,24 +18,23 @@
 
 package org.apache.flink.api.common.eventtime;
 
-import org.junit.Test;
+import org.hamcrest.MatcherAssert;
+import org.junit.jupiter.api.Test;
 
 import java.util.UUID;
 
 import static 
org.apache.flink.api.common.eventtime.WatermarkMatchers.watermark;
+import static org.assertj.core.api.AssertionsForClassTypes.assertThat;

Review Comment:
   ```suggestion
   import static org.assertj.core.api.Assertions.assertThat;
   ```



##
flink-core/src/test/java/org/apache/flink/api/common/ExecutionConfigFromConfigurationTest.java:
##
@@ -30,6 +30,7 @@
 import java.util.function.Consumer;
 import java.util.function.Function;
 
+import static org.assertj.core.api.AssertionsForClassTypes.assertThat;
 import static org.hamcrest.CoreMatchers.equalTo;
 import static org.junit.Assert.assertThat;

Review Comment:
   ```suggestion
   import static org.assertj.core.api.Assertions.assertThat;
   ```



##
flink-core/src/test/java/org/apache/flink/api/common/functions/util/RuntimeUDFContextTest.java:
##
@@ -62,13 +60,7 @@ public void testBroadcastVariableNotFound() {
 }
 
 try {
-ctx.getBroadcastVariableWithInitializer(
-"some name",
-new BroadcastVariableInitializer() {
-public Object 
initializeBroadcastVariable(Iterable data) {
-return null;
-}
-});
+ctx.getBroadcastVariableWithInitializer("some name", data -> 
null);

Review Comment:
   `assertThatThrownBy`



##
flink-core/src/test/java/org/apache/flink/api/common/functions/util/RuntimeUDFContextTest.java:
##
@@ -95,36 +87,36 @@ public void testBroadcastVariableSimple() {
 ctx.setBroadcastVariable("name1", Arrays.asList(1, 2, 3, 4));
 ctx.setBroadcastVariable("name2", Arrays.asList(1.0, 2.0, 3.0, 
4.0));
 
-assertTrue(ctx.hasBroadcastVariable("name1"));
-assertTrue(ctx.hasBroadcastVariable("name2"));
+assertThat(ctx.hasBroadcastVariable("name1")).isTrue();
+assertThat(ctx.hasBroadcastVariable("name2")).isTrue();
 
 List list1 = ctx.getBroadcastVariable("name1");
 List list2 = ctx.getBroadcastVariable("name2");
 
-assertEquals(Arrays.asList(1, 2, 3, 4), list1);
-assertEquals(Arrays.asList(1.0, 2.0, 3.0, 4.0), list2);
+assertThat(list1).isEqualTo(Arrays.asList(1, 2, 3, 4));
+assertThat(list2).isEqualTo(Arrays.asList(1.0, 2.0, 3.0, 4.0));

Review Comment:
   ```suggestion
   assertThat(list1).containsExactly(1, 2, 3, 4);
   assertThat(list2).containsExactly(1.0, 2.0, 3.0, 4.0);
   ```



##
flink-core/src/test/java/org/apache/flink/api/common/ExecutionConfigFromConfigurationTest.java:
##
@@ -20,7 +20,7 @@
 
 import org.apache.flink.configuration.Configuration;
 
-import org.junit.Test;
+import org.junit.jupiter.api.Test;
 import org.junit.runner.RunWith;
 import org.junit.runners.Parameterized;

Review Comment:
   ```suggestion
   import 
org.apache.flink.testutils.junit.extensions.parameterized.ParameterizedTestExtension;
   import org.apache.flink.testutils.junit.extensions.parameterized.Parameters;
   ```



##
flink-core/src/test/java/org/apache/flink/api/common/eventtime/WatermarkOutputMultiplexerTest.java:
##
@@ -44,12 +43,12 @@ public void singleImmediateWatermark() {
 
 watermarkOutput.emitWatermark(new Watermark(0));
 
-assertThat(underlyingWatermarkOutput.lastWatermark(), 
is(watermark(0)));
-assertThat(underlyingWatermarkOutput.isIdle(), is(false));
+MatcherAssert.assertThat(underlyingWatermarkOutput.lastWatermark(), 
is(watermark(0)));
+MatcherAssert.a

[jira] [Created] (FLINK-33893) Implement restore tests for PythonCorrelate node

2023-12-20 Thread Jacky Lau (Jira)
Jacky Lau created FLINK-33893:
-

 Summary: Implement restore tests for PythonCorrelate node
 Key: FLINK-33893
 URL: https://issues.apache.org/jira/browse/FLINK-33893
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Affects Versions: 1.19.0
Reporter: Jacky Lau
 Fix For: 1.19.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33895) Implement restore tests for PythonGroupWindowAggregate node

2023-12-20 Thread Jacky Lau (Jira)
Jacky Lau created FLINK-33895:
-

 Summary: Implement restore tests for PythonGroupWindowAggregate 
node
 Key: FLINK-33895
 URL: https://issues.apache.org/jira/browse/FLINK-33895
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Affects Versions: 1.19.0
Reporter: Jacky Lau
 Fix For: 1.19.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33894) Implement restore tests for PythonGroupAggregate node

2023-12-20 Thread Jacky Lau (Jira)
Jacky Lau created FLINK-33894:
-

 Summary: Implement restore tests for PythonGroupAggregate node
 Key: FLINK-33894
 URL: https://issues.apache.org/jira/browse/FLINK-33894
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Affects Versions: 1.19.0
Reporter: Jacky Lau
 Fix For: 1.19.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33896) Implement restore tests for Correlate node

2023-12-20 Thread Jacky Lau (Jira)
Jacky Lau created FLINK-33896:
-

 Summary: Implement restore tests for Correlate node
 Key: FLINK-33896
 URL: https://issues.apache.org/jira/browse/FLINK-33896
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Affects Versions: 1.19.0
Reporter: Jacky Lau
 Fix For: 1.19.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33795] Add a new config to forbid autoscale execution in the configured excluded periods [flink-kubernetes-operator]

2023-12-20 Thread via GitHub


flashJd commented on code in PR #728:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/728#discussion_r1432470030


##
flink-autoscaler/src/main/java/org/apache/flink/autoscaler/ScalingMetricCollector.java:
##
@@ -147,7 +149,13 @@ public CollectedMetricHistory updateMetrics(
 if (now.isBefore(windowFullTime)) {
 LOG.info("Metric window not full until {}", 
readable(windowFullTime));
 } else {
-collectedMetrics.setFullyCollected(true);
+if (isExcluded) {
+LOG.info(
+"Autoscaling on halt based on exclusion rule {}",
+conf.get(AutoScalerOptions.EXCLUDED_PERIODS));
+} else {
+collectedMetrics.setFullyCollected(true);
+}

Review Comment:
   @gyfora @mxm you are right, it's better to place the excludedPeriods 
blocking logic in ScalingExecutor. As the meaning of excludedPeriods is very 
similar with config `scaling.enabled`, i combines the logic in 
`ScalingExecutor`. Now excludedPeriods blocking works well with recommended 
parallelism



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33386][runtime] Support tasks balancing at slot level for Default Scheduler [flink]

2023-12-20 Thread via GitHub


KarmaGYZ commented on code in PR #23635:
URL: https://github.com/apache/flink/pull/23635#discussion_r1432481088


##
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java:
##
@@ -334,7 +334,9 @@ public void onUnknownDeploymentsOf(
 .createSlotPoolService(
 jid,
 createDeclarativeSlotPoolFactory(
-
jobMasterConfiguration.getConfiguration()));
+
jobMasterConfiguration.getConfiguration()),
+null,

Review Comment:
   IIUC, you plan to modify this parameter in the commit introducing the 
configuration? Sounds good from my side.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33386][runtime] Support tasks balancing at slot level for Default Scheduler [flink]

2023-12-20 Thread via GitHub


KarmaGYZ commented on code in PR #23635:
URL: https://github.com/apache/flink/pull/23635#discussion_r1432482972


##
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/DefaultDeclarativeSlotPool.java:
##
@@ -118,26 +129,47 @@ public DefaultDeclarativeSlotPool(
 this.totalResourceRequirements = ResourceCounter.empty();
 this.fulfilledResourceRequirements = ResourceCounter.empty();
 this.slotToRequirementProfileMappings = new HashMap<>();
+this.componentMainThreadExecutor = 
Preconditions.checkNotNull(componentMainThreadExecutor);
+this.slotRequestMaxInterval = slotRequestMaxInterval;
 }
 
 @Override
 public void increaseResourceRequirementsBy(ResourceCounter increment) {
-if (increment.isEmpty()) {
+updateResourceRequirementsBy(
+increment,
+() -> totalResourceRequirements = 
totalResourceRequirements.add(increment));

Review Comment:
   I lean to this version.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33632] Adding custom flink mutator [flink-kubernetes-operator]

2023-12-20 Thread via GitHub


tagarr commented on PR #733:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/733#issuecomment-1864167599

   @gyfora We had considered adding mutation into the controller loop in the 
same manner that the validator is done, but we believe that mutation of the CR 
should only really be done on it’s creation or user modified updates. Changes 
within the reconcile could cause unknown issues and confuse the logic within 
the controller


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33823] Make PlannerQueryOperation SQL serializable [flink]

2023-12-20 Thread via GitHub


twalthr commented on code in PR #23948:
URL: https://github.com/apache/flink/pull/23948#discussion_r1432489408


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/operations/PlannerQueryOperation.java:
##
@@ -72,6 +77,15 @@ public String asSummaryString() {
 "PlannerNode", Collections.emptyMap(), getChildren(), 
Operation::asSummaryString);
 }
 
+@Override
+public String asSerializableString() {
+try {
+return toSqlString.get();
+} catch (Exception e) {
+throw new TableException("Could not convert Calcite tree to a SQL 
string", e);

Review Comment:
   I would rephrase to `Given plan is not serializable into SQL` or so. Not 
mention Calicte.



##
flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/planner/utils/TableTestBase.scala:
##
@@ -1646,6 +1650,28 @@ object TableTestUtil {
 planner.translateToRel(modifyOperation)
   }
 
+  def toQuotedSqlString(relNode: RelNode, tEnv: TableEnvironment): String = {

Review Comment:
   Who is calling this method?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33863][checkpoint] Fix restoring compressed operator state [flink]

2023-12-20 Thread via GitHub


fredia merged PR #23938:
URL: https://github.com/apache/flink/pull/23938


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-33897) Allow triggering unaligned checkpoint via CLI

2023-12-20 Thread Zakelly Lan (Jira)
Zakelly Lan created FLINK-33897:
---

 Summary: Allow triggering unaligned checkpoint via CLI
 Key: FLINK-33897
 URL: https://issues.apache.org/jira/browse/FLINK-33897
 Project: Flink
  Issue Type: Improvement
  Components: Command Line Client, Runtime / Checkpointing
Reporter: Zakelly Lan


After FLINK-6755, user could trigger checkpoint through CLI. However I noticed 
there would be value supporting trigger it in unaligned way, since the job may 
encounter a high back-pressure and an aligned checkpoint would fail.

 

I suggest we provide an option '-unaligned' in CLI to support that.

 

Similar option would also be useful for REST api



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33897) Allow triggering unaligned checkpoint via CLI

2023-12-20 Thread Zakelly Lan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zakelly Lan reassigned FLINK-33897:
---

Assignee: Zakelly Lan

> Allow triggering unaligned checkpoint via CLI
> -
>
> Key: FLINK-33897
> URL: https://issues.apache.org/jira/browse/FLINK-33897
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client, Runtime / Checkpointing
>Reporter: Zakelly Lan
>Assignee: Zakelly Lan
>Priority: Major
>
> After FLINK-6755, user could trigger checkpoint through CLI. However I 
> noticed there would be value supporting trigger it in unaligned way, since 
> the job may encounter a high back-pressure and an aligned checkpoint would 
> fail.
>  
> I suggest we provide an option '-unaligned' in CLI to support that.
>  
> Similar option would also be useful for REST api



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33898) Allow triggering unaligned checkpoint via REST api

2023-12-20 Thread Zakelly Lan (Jira)
Zakelly Lan created FLINK-33898:
---

 Summary: Allow triggering unaligned checkpoint via REST api
 Key: FLINK-33898
 URL: https://issues.apache.org/jira/browse/FLINK-33898
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Checkpointing, Runtime / REST
Reporter: Zakelly Lan
Assignee: Zakelly Lan


See FLINK-33897. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33823] Make PlannerQueryOperation SQL serializable [flink]

2023-12-20 Thread via GitHub


dawidwys commented on code in PR #23948:
URL: https://github.com/apache/flink/pull/23948#discussion_r1432520652


##
flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/planner/utils/TableTestBase.scala:
##
@@ -1646,6 +1650,28 @@ object TableTestUtil {
 planner.translateToRel(modifyOperation)
   }
 
+  def toQuotedSqlString(relNode: RelNode, tEnv: TableEnvironment): String = {

Review Comment:
   It's called only in the `TableTestBase#addTableWithWatermark`. On a second 
thought, I think it's better to throw an exception in this case. 
   
   1. We don't need it here, in this `TestBase`
   2. LogicalWatermarkAssigner cannot be converted back to a `SqlNode` anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33386][runtime] Support tasks balancing at slot level for Default Scheduler [flink]

2023-12-20 Thread via GitHub


RocMarshal commented on code in PR #23635:
URL: https://github.com/apache/flink/pull/23635#discussion_r1432535390


##
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java:
##
@@ -334,7 +334,9 @@ public void onUnknownDeploymentsOf(
 .createSlotPoolService(
 jid,
 createDeclarativeSlotPoolFactory(
-
jobMasterConfiguration.getConfiguration()));
+
jobMasterConfiguration.getConfiguration()),
+null,

Review Comment:
   Got it, Thank you confirmation~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33795] Add a new config to forbid autoscale execution in the configured excluded periods [flink-kubernetes-operator]

2023-12-20 Thread via GitHub


flashJd commented on PR #728:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/728#issuecomment-1864238511

   @gyfora @mxm by the way, I'd like to discuss the logic of scaling 
effectiveness evaluation with you. 
   
   1. Now it's controlled by two config 
`scaling.effectiveness.detection.enabled` and `scaling.effectiveness.threshold` 
and we evaluate the effectiveness under the condition `last scaling is scale 
up` and only refer to the last scaling effectiveness. 
   2. Image the following scenario: scale up double parallism first, then scale 
down to 0.8 parallism, then scale up double scale down 0.8, the effectiveness 
detection will be invalid in this scenario, even scale up is ineffecive, we'll 
continue scale up
   3. Maybe we can add a new config like 
`scaling.effectiveness.history.reference.num` and set a default value, then we 
can evaluate based on the last `scaling.effectiveness.history.reference.num` 
scale up summaries.
   
   Looking forward to your reply.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33386][runtime] Support tasks balancing at slot level for Default Scheduler [flink]

2023-12-20 Thread via GitHub


RocMarshal commented on code in PR #23635:
URL: https://github.com/apache/flink/pull/23635#discussion_r1432535390


##
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java:
##
@@ -334,7 +334,9 @@ public void onUnknownDeploymentsOf(
 .createSlotPoolService(
 jid,
 createDeclarativeSlotPoolFactory(
-
jobMasterConfiguration.getConfiguration()));
+
jobMasterConfiguration.getConfiguration()),
+null,

Review Comment:
   > you plan to modify this parameter in the commit introducing the 
configuration? 
   Yes.
   
   
   Thank you confirmation~



##
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java:
##
@@ -334,7 +334,9 @@ public void onUnknownDeploymentsOf(
 .createSlotPoolService(
 jid,
 createDeclarativeSlotPoolFactory(
-
jobMasterConfiguration.getConfiguration()));
+
jobMasterConfiguration.getConfiguration()),
+null,

Review Comment:
   > you plan to modify this parameter in the commit introducing the 
configuration? 
   Yes.
   
   
   Thank you for your confirmation~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33386][runtime] Support tasks balancing at slot level for Default Scheduler [flink]

2023-12-20 Thread via GitHub


RocMarshal commented on code in PR #23635:
URL: https://github.com/apache/flink/pull/23635#discussion_r1432535390


##
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java:
##
@@ -334,7 +334,9 @@ public void onUnknownDeploymentsOf(
 .createSlotPoolService(
 jid,
 createDeclarativeSlotPoolFactory(
-
jobMasterConfiguration.getConfiguration()));
+
jobMasterConfiguration.getConfiguration()),
+null,

Review Comment:
   > you plan to modify this parameter in the commit introducing the 
configuration? 
   
   Yes.
   
   
   Thank you for your confirmation~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33879) Hybrid Shuffle may hang during redistribution

2023-12-20 Thread Jiang Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiang Xin updated FLINK-33879:
--
Description: Currently, the Hybrid Shuffle can work with Memory Tier and 
Disk Tier together. Image such a scenirio, 

> Hybrid Shuffle may hang during redistribution
> -
>
> Key: FLINK-33879
> URL: https://issues.apache.org/jira/browse/FLINK-33879
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Reporter: Jiang Xin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, the Hybrid Shuffle can work with Memory Tier and Disk Tier 
> together. Image such a scenirio, 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33879) Hybrid Shuffle may hang during redistribution

2023-12-20 Thread Jiang Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiang Xin updated FLINK-33879:
--
Description: Currently, the Hybrid Shuffle can work with Memory Tier and 
Disk Tier together, however, in the following scenirio the   (was: Currently, 
the Hybrid Shuffle can work with Memory Tier and Disk Tier together. Image such 
a scenirio, )

> Hybrid Shuffle may hang during redistribution
> -
>
> Key: FLINK-33879
> URL: https://issues.apache.org/jira/browse/FLINK-33879
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Reporter: Jiang Xin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, the Hybrid Shuffle can work with Memory Tier and Disk Tier 
> together, however, in the following scenirio the 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33879) Hybrid Shuffle may hang during redistribution

2023-12-20 Thread Jiang Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiang Xin updated FLINK-33879:
--
Description: Currently, the Hybrid Shuffle can work with Memory Tier and 
Disk Tier together, if the memory tier  however, in the following scenirio the 
result partition would stop   (was: Currently, the Hybrid Shuffle can work with 
Memory Tier and Disk Tier together, however, in the following scenirio the )

> Hybrid Shuffle may hang during redistribution
> -
>
> Key: FLINK-33879
> URL: https://issues.apache.org/jira/browse/FLINK-33879
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Reporter: Jiang Xin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, the Hybrid Shuffle can work with Memory Tier and Disk Tier 
> together, if the memory tier  however, in the following scenirio the result 
> partition would stop 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33899) Java 17 and 21 support for mongodb connector

2023-12-20 Thread Jiabao Sun (Jira)
Jiabao Sun created FLINK-33899:
--

 Summary: Java 17 and 21 support for mongodb connector
 Key: FLINK-33899
 URL: https://issues.apache.org/jira/browse/FLINK-33899
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / MongoDB
Affects Versions: mongodb-1.0.2
Reporter: Jiabao Sun
 Fix For: mongodb-1.1.0


After FLINK-33302 is finished it is now possible to specify jdk version
That allows to add jdk17 and jdk21 support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-33899][connectors/mongodb] Java 17 and 21 support for mongodb connector [flink-connector-mongodb]

2023-12-20 Thread via GitHub


Jiabao-Sun opened a new pull request, #21:
URL: https://github.com/apache/flink-connector-mongodb/pull/21

   After [FLINK-33302](https://issues.apache.org/jira/browse/FLINK-33302) is 
finished it is now possible to specify jdk version
   That allows to add jdk17 and jdk21 support.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33899) Java 17 and 21 support for mongodb connector

2023-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33899:
---
Labels: pull-request-available  (was: )

> Java 17 and 21 support for mongodb connector
> 
>
> Key: FLINK-33899
> URL: https://issues.apache.org/jira/browse/FLINK-33899
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / MongoDB
>Affects Versions: mongodb-1.0.2
>Reporter: Jiabao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: mongodb-1.1.0
>
>
> After FLINK-33302 is finished it is now possible to specify jdk version
> That allows to add jdk17 and jdk21 support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33879) Hybrid Shuffle may hang during redistribution

2023-12-20 Thread Jiang Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiang Xin updated FLINK-33879:
--
Description: 
Currently, the Hybrid Shuffle can work with Memory Tier and Disk Tier together, 
however, in the following scenirio the result partition would stop working.

Suppose we have a shuffle task with 2 sub partitions. The LocalBufferPool has 
15 buffers, the memory tier can use at most 15-(2*(2+1)+1) = 8 buffers. If the 
memory tier used up all 8 buffers and the input channel doesn't consume them 
because of some problem, the disk tier can still work with 1 reserved buffer. 
However, if a redistribution happens now and the pool size is decrease to less 
than 8, then the BufferAccumulator can not request buffers any more, so 

  was:Currently, the Hybrid Shuffle can work with Memory Tier and Disk Tier 
together, if the memory tier  however, in the following scenirio the result 
partition would stop 


> Hybrid Shuffle may hang during redistribution
> -
>
> Key: FLINK-33879
> URL: https://issues.apache.org/jira/browse/FLINK-33879
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Reporter: Jiang Xin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, the Hybrid Shuffle can work with Memory Tier and Disk Tier 
> together, however, in the following scenirio the result partition would stop 
> working.
> Suppose we have a shuffle task with 2 sub partitions. The LocalBufferPool has 
> 15 buffers, the memory tier can use at most 15-(2*(2+1)+1) = 8 buffers. If 
> the memory tier used up all 8 buffers and the input channel doesn't consume 
> them because of some problem, the disk tier can still work with 1 reserved 
> buffer. However, if a redistribution happens now and the pool size is 
> decrease to less than 8, then the BufferAccumulator can not request buffers 
> any more, so 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33879) Hybrid Shuffle may hang during redistribution

2023-12-20 Thread Jiang Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiang Xin updated FLINK-33879:
--
Description: 
Currently, the Hybrid Shuffle can work with the memory tier and disk tier 
together, however, in the following scenario the result partition would stop 
working.

Suppose we have a shuffle task with 2 sub-partitions. The LocalBufferPool has 
15 buffers, the memory tier can use at most 15-(2*(2+1)+1) = 8 buffers 
accroding to `TieredStorageMemoryManagerImpl#getMaxNonReclaimableBuffers`. If 
the memory tier uses up all 8 buffers and the input channel doesn't consume 
them because of some problem, the disk tier can still work with 1 reserved 
buffer. However, if a redistribution happens now and the pool size is decreased 
to less than 8, then the BufferAccumulator can not request buffers anymore, and 
thus the result partition stops working as well.

  was:
Currently, the Hybrid Shuffle can work with Memory Tier and Disk Tier together, 
however, in the following scenirio the result partition would stop working.

Suppose we have a shuffle task with 2 sub partitions. The LocalBufferPool has 
15 buffers, the memory tier can use at most 15-(2*(2+1)+1) = 8 buffers. If the 
memory tier used up all 8 buffers and the input channel doesn't consume them 
because of some problem, the disk tier can still work with 1 reserved buffer. 
However, if a redistribution happens now and the pool size is decrease to less 
than 8, then the BufferAccumulator can not request buffers any more, so 


> Hybrid Shuffle may hang during redistribution
> -
>
> Key: FLINK-33879
> URL: https://issues.apache.org/jira/browse/FLINK-33879
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Reporter: Jiang Xin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, the Hybrid Shuffle can work with the memory tier and disk tier 
> together, however, in the following scenario the result partition would stop 
> working.
> Suppose we have a shuffle task with 2 sub-partitions. The LocalBufferPool has 
> 15 buffers, the memory tier can use at most 15-(2*(2+1)+1) = 8 buffers 
> accroding to `TieredStorageMemoryManagerImpl#getMaxNonReclaimableBuffers`. If 
> the memory tier uses up all 8 buffers and the input channel doesn't consume 
> them because of some problem, the disk tier can still work with 1 reserved 
> buffer. However, if a redistribution happens now and the pool size is 
> decreased to less than 8, then the BufferAccumulator can not request buffers 
> anymore, and thus the result partition stops working as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33879) Hybrid Shuffle may hang during redistribution

2023-12-20 Thread Jiang Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiang Xin updated FLINK-33879:
--
Description: 
Currently, the Hybrid Shuffle can work with the memory tier and disk tier 
together, however, in the following scenario the result partition would stop 
working.

Suppose we have a shuffle task with 2 sub-partitions. The LocalBufferPool has 
15 buffers, the memory tier can use at most 15-(2*(2+1)+1) = 8 buffers 
accroding to `TieredStorageMemoryManagerImpl#getMaxNonReclaimableBuffers`. If 
the memory tier uses up all 8 buffers and the input channel doesn't consume 
them because of some problem, the disk tier can still work with 1 reserved 
buffer. However, if a redistribution happens now and the pool size is decreased 
to less than 8, then the BufferAccumulator can not request buffers anymore, and 
thus the result partition stops working as well.

The purpose is to make the result partition can still work with disk tier and 
write the shuffle data to disk, so that once the input channel is restored, the 
data on the disk can be consumed immediately

  was:
Currently, the Hybrid Shuffle can work with the memory tier and disk tier 
together, however, in the following scenario the result partition would stop 
working.

Suppose we have a shuffle task with 2 sub-partitions. The LocalBufferPool has 
15 buffers, the memory tier can use at most 15-(2*(2+1)+1) = 8 buffers 
accroding to `TieredStorageMemoryManagerImpl#getMaxNonReclaimableBuffers`. If 
the memory tier uses up all 8 buffers and the input channel doesn't consume 
them because of some problem, the disk tier can still work with 1 reserved 
buffer. However, if a redistribution happens now and the pool size is decreased 
to less than 8, then the BufferAccumulator can not request buffers anymore, and 
thus the result partition stops working as well.


> Hybrid Shuffle may hang during redistribution
> -
>
> Key: FLINK-33879
> URL: https://issues.apache.org/jira/browse/FLINK-33879
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Reporter: Jiang Xin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, the Hybrid Shuffle can work with the memory tier and disk tier 
> together, however, in the following scenario the result partition would stop 
> working.
> Suppose we have a shuffle task with 2 sub-partitions. The LocalBufferPool has 
> 15 buffers, the memory tier can use at most 15-(2*(2+1)+1) = 8 buffers 
> accroding to `TieredStorageMemoryManagerImpl#getMaxNonReclaimableBuffers`. If 
> the memory tier uses up all 8 buffers and the input channel doesn't consume 
> them because of some problem, the disk tier can still work with 1 reserved 
> buffer. However, if a redistribution happens now and the pool size is 
> decreased to less than 8, then the BufferAccumulator can not request buffers 
> anymore, and thus the result partition stops working as well.
> The purpose is to make the result partition can still work with disk tier and 
> write the shuffle data to disk, so that once the input channel is restored, 
> the data on the disk can be consumed immediately



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33879) Hybrid Shuffle may hang during redistribution

2023-12-20 Thread Jiang Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiang Xin updated FLINK-33879:
--
Description: 
Currently, the Hybrid Shuffle can work with the memory tier and disk tier 
together, however, in the following scenario the result partition would stop 
working.

Suppose we have a shuffle task with 2 sub-partitions. The LocalBufferPool has 
15 buffers, the memory tier can use at most 15-(2*(2+1)+1) = 8 buffers 
according to `TieredStorageMemoryManagerImpl#getMaxNonReclaimableBuffers`. If 
the memory tier uses up all 8 buffers and the input channel doesn't consume 
them because of some problem, the disk tier can still work with 1 reserved 
buffer. However, if a redistribution happens now and the pool size is decreased 
to less than 8, then the BufferAccumulator can not request buffers anymore, and 
thus the result partition stops working as well.

The purpose is to make the result partition still work with the disk tier and 
write the shuffle data to disk so that once the input channel is ready, the 
data on the disk can be consumed immediately

  was:
Currently, the Hybrid Shuffle can work with the memory tier and disk tier 
together, however, in the following scenario the result partition would stop 
working.

Suppose we have a shuffle task with 2 sub-partitions. The LocalBufferPool has 
15 buffers, the memory tier can use at most 15-(2*(2+1)+1) = 8 buffers 
accroding to `TieredStorageMemoryManagerImpl#getMaxNonReclaimableBuffers`. If 
the memory tier uses up all 8 buffers and the input channel doesn't consume 
them because of some problem, the disk tier can still work with 1 reserved 
buffer. However, if a redistribution happens now and the pool size is decreased 
to less than 8, then the BufferAccumulator can not request buffers anymore, and 
thus the result partition stops working as well.

The purpose is to make the result partition can still work with disk tier and 
write the shuffle data to disk, so that once the input channel is restored, the 
data on the disk can be consumed immediately


> Hybrid Shuffle may hang during redistribution
> -
>
> Key: FLINK-33879
> URL: https://issues.apache.org/jira/browse/FLINK-33879
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Reporter: Jiang Xin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, the Hybrid Shuffle can work with the memory tier and disk tier 
> together, however, in the following scenario the result partition would stop 
> working.
> Suppose we have a shuffle task with 2 sub-partitions. The LocalBufferPool has 
> 15 buffers, the memory tier can use at most 15-(2*(2+1)+1) = 8 buffers 
> according to `TieredStorageMemoryManagerImpl#getMaxNonReclaimableBuffers`. If 
> the memory tier uses up all 8 buffers and the input channel doesn't consume 
> them because of some problem, the disk tier can still work with 1 reserved 
> buffer. However, if a redistribution happens now and the pool size is 
> decreased to less than 8, then the BufferAccumulator can not request buffers 
> anymore, and thus the result partition stops working as well.
> The purpose is to make the result partition still work with the disk tier and 
> write the shuffle data to disk so that once the input channel is ready, the 
> data on the disk can be consumed immediately



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31472) AsyncSinkWriterThrottlingTest failed with Illegal mailbox thread

2023-12-20 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798923#comment-17798923
 ] 

Sergey Nuyanzin commented on FLINK-31472:
-

Thanks
Merged as 
[6f6ab1ea3527d96d57739eb4202bccd3c73c6458|https://github.com/apache/flink/commit/6f6ab1ea3527d96d57739eb4202bccd3c73c6458]

> AsyncSinkWriterThrottlingTest failed with Illegal mailbox thread
> 
>
> Key: FLINK-31472
> URL: https://issues.apache.org/jira/browse/FLINK-31472
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Affects Versions: 1.17.0, 1.16.1, 1.19.0
>Reporter: Ran Tao
>Assignee: Ahmed Hamdy
>Priority: Major
>  Labels: pull-request-available, test-stability
>
> when run mvn clean test, this case failed occasionally.
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.955 
> s <<< FAILURE! - in 
> org.apache.flink.connector.base.sink.writer.AsyncSinkWriterThrottlingTest
> [ERROR] 
> org.apache.flink.connector.base.sink.writer.AsyncSinkWriterThrottlingTest.testSinkThroughputShouldThrottleToHalfBatchSize
>   Time elapsed: 0.492 s  <<< ERROR!
> java.lang.IllegalStateException: Illegal thread detected. This method must be 
> called from inside the mailbox thread!
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailboxImpl.checkIsMailboxThread(TaskMailboxImpl.java:262)
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailboxImpl.take(TaskMailboxImpl.java:137)
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxExecutorImpl.yield(MailboxExecutorImpl.java:84)
>         at 
> org.apache.flink.connector.base.sink.writer.AsyncSinkWriter.flush(AsyncSinkWriter.java:367)
>         at 
> org.apache.flink.connector.base.sink.writer.AsyncSinkWriter.lambda$registerCallback$3(AsyncSinkWriter.java:315)
>         at 
> org.apache.flink.streaming.runtime.tasks.TestProcessingTimeService$CallbackTask.onProcessingTime(TestProcessingTimeService.java:199)
>         at 
> org.apache.flink.streaming.runtime.tasks.TestProcessingTimeService.setCurrentTime(TestProcessingTimeService.java:76)
>         at 
> org.apache.flink.connector.base.sink.writer.AsyncSinkWriterThrottlingTest.testSinkThroughputShouldThrottleToHalfBatchSize(AsyncSinkWriterThrottlingTest.java:64)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>         at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>         at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>         at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>         at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>         at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>         at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>         at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>         at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>         at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>         at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>         at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>         at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>         at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>         at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>         at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>         at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>         at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
>         at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
>         at 
> org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
>         at 
> org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72)
>         at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:147)
>         at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:127)
>         at 
> org.junit.platform.launcher.core.EngineExecuti

[jira] [Resolved] (FLINK-31472) AsyncSinkWriterThrottlingTest failed with Illegal mailbox thread

2023-12-20 Thread Sergey Nuyanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Nuyanzin resolved FLINK-31472.
-
Fix Version/s: 1.19.0
   Resolution: Fixed

> AsyncSinkWriterThrottlingTest failed with Illegal mailbox thread
> 
>
> Key: FLINK-31472
> URL: https://issues.apache.org/jira/browse/FLINK-31472
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Affects Versions: 1.17.0, 1.16.1, 1.19.0
>Reporter: Ran Tao
>Assignee: Ahmed Hamdy
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> when run mvn clean test, this case failed occasionally.
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.955 
> s <<< FAILURE! - in 
> org.apache.flink.connector.base.sink.writer.AsyncSinkWriterThrottlingTest
> [ERROR] 
> org.apache.flink.connector.base.sink.writer.AsyncSinkWriterThrottlingTest.testSinkThroughputShouldThrottleToHalfBatchSize
>   Time elapsed: 0.492 s  <<< ERROR!
> java.lang.IllegalStateException: Illegal thread detected. This method must be 
> called from inside the mailbox thread!
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailboxImpl.checkIsMailboxThread(TaskMailboxImpl.java:262)
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailboxImpl.take(TaskMailboxImpl.java:137)
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxExecutorImpl.yield(MailboxExecutorImpl.java:84)
>         at 
> org.apache.flink.connector.base.sink.writer.AsyncSinkWriter.flush(AsyncSinkWriter.java:367)
>         at 
> org.apache.flink.connector.base.sink.writer.AsyncSinkWriter.lambda$registerCallback$3(AsyncSinkWriter.java:315)
>         at 
> org.apache.flink.streaming.runtime.tasks.TestProcessingTimeService$CallbackTask.onProcessingTime(TestProcessingTimeService.java:199)
>         at 
> org.apache.flink.streaming.runtime.tasks.TestProcessingTimeService.setCurrentTime(TestProcessingTimeService.java:76)
>         at 
> org.apache.flink.connector.base.sink.writer.AsyncSinkWriterThrottlingTest.testSinkThroughputShouldThrottleToHalfBatchSize(AsyncSinkWriterThrottlingTest.java:64)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>         at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>         at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>         at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>         at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>         at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>         at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>         at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>         at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>         at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>         at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>         at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>         at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>         at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>         at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>         at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>         at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>         at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
>         at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
>         at 
> org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
>         at 
> org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72)
>         at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:147)
>         at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:127)
>         at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:90)
>         at 
> org.junit.platform.launcher.core.Engin

Re: [PR] [FLINK-31472][Connectors/Base] Update AsyncSink Throttling test to use concurrent mailbox [flink]

2023-12-20 Thread via GitHub


snuyanzin merged PR #23946:
URL: https://github.com/apache/flink/pull/23946


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-31472][Connectors/Base] Update AsyncSink Throttling test to use concurrent mailbox [flink]

2023-12-20 Thread via GitHub


snuyanzin commented on PR #23946:
URL: https://github.com/apache/flink/pull/23946#issuecomment-1864303733

   @vahmed-hamdy would you mind creating backports ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33414) MiniClusterITCase.testHandleStreamingJobsWhenNotEnoughSlot fails due to unexpected TimeoutException

2023-12-20 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798926#comment-17798926
 ] 

Matthias Pohl commented on FLINK-33414:
---

https://github.com/XComp/flink/actions/runs/7235798862/job/19713778857#step:12:9017

> MiniClusterITCase.testHandleStreamingJobsWhenNotEnoughSlot fails due to 
> unexpected TimeoutException
> ---
>
> Key: FLINK-33414
> URL: https://issues.apache.org/jira/browse/FLINK-33414
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, test-stability
>
> We see this test instability in [this 
> build|https://github.com/XComp/flink/actions/runs/6695266358/job/18192039035#step:12:9253].
> {code:java}
> Error: 17:04:52 17:04:52.042 [ERROR] Failures: 
> 9252Error: 17:04:52 17:04:52.042 [ERROR]   
> MiniClusterITCase.testHandleStreamingJobsWhenNotEnoughSlot:120 
> 9253Oct 30 17:04:52 Expecting a throwable with root cause being an instance 
> of:
> 9254Oct 30 17:04:52   
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException
> 9255Oct 30 17:04:52 but was an instance of:
> 9256Oct 30 17:04:52   java.util.concurrent.TimeoutException: Timeout has 
> occurred: 100 ms
> 9257Oct 30 17:04:52   at 
> org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:86)
> 9258Oct 30 17:04:52   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> 9259Oct 30 17:04:52   at 
> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> 9260Oct 30 17:04:52   ...(27 remaining lines not displayed - this can be 
> changed with Assertions.setMaxStackTraceElementsDisplayed) {code}
> The same error occurred in the [finegrained_resourcemanager stage of this 
> build|https://github.com/XComp/flink/actions/runs/6468655160/job/17563927249#step:11:26516]
>  (as reported in FLINK-33245).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable

2023-12-20 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798927#comment-17798927
 ] 

Matthias Pohl commented on FLINK-22765:
---

https://github.com/XComp/flink/actions/runs/7235798862/job/19713769287#step:12:8846

> ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
> 
>
> Key: FLINK-22765
> URL: https://issues.apache.org/jira/browse/FLINK-22765
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Major
>  Labels: pull-request-available, stale-assigned, test-stability
> Fix For: 1.14.0, 1.16.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292&view=logs&j=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3&t=a99e99c7-21cd-5a1f-7274-585e62b72f56
> {code}
> May 25 00:56:38 java.lang.AssertionError: 
> May 25 00:56:38 
> May 25 00:56:38 Expected: is ""
> May 25 00:56:38  but: was "The system is out of resources.\nConsult the 
> following stack trace for details."
> May 25 00:56:38   at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:956)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:923)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70)
> May 25 00:56:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> May 25 00:56:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 25 00:56:38   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 25 00:56:38   at java.lang.reflect.Method.invoke(Method.java:498)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> May 25 00:56:38   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> May 25 00:56:38   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> May 25 00:56:38   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> May 25 00:56:38   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> May 25 00:56:38   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> May 25 00:56:38 
> {code}



--
This message wa

[jira] [Commented] (FLINK-29114) TableSourceITCase#testTableHintWithLogicalTableScanReuse sometimes fails with result mismatch

2023-12-20 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798928#comment-17798928
 ] 

Matthias Pohl commented on FLINK-29114:
---

https://github.com/XComp/flink/actions/runs/7253843919/job/19761640929#step:12:11644

> TableSourceITCase#testTableHintWithLogicalTableScanReuse sometimes fails with 
> result mismatch 
> --
>
> Key: FLINK-29114
> URL: https://issues.apache.org/jira/browse/FLINK-29114
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner, Tests
>Affects Versions: 1.15.0
>Reporter: Sergey Nuyanzin
>Priority: Minor
>  Labels: auto-deprioritized-major, test-stability
>
> It could be reproduced locally by repeating tests. Usually about 100 
> iterations are enough to have several failed tests
> {noformat}
> [ERROR] Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 1.664 s <<< FAILURE! - in 
> org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase
> [ERROR] 
> org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase.testTableHintWithLogicalTableScanReuse
>   Time elapsed: 0.108 s  <<< FAILURE!
> java.lang.AssertionError: expected: 3,2,Hello world, 3,2,Hello world, 3,2,Hello world)> but was: 2,2,Hello, 2,2,Hello, 3,2,Hello world, 3,2,Hello world)>
>     at org.junit.Assert.fail(Assert.java:89)
>     at org.junit.Assert.failNotEquals(Assert.java:835)
>     at org.junit.Assert.assertEquals(Assert.java:120)
>     at org.junit.Assert.assertEquals(Assert.java:146)
>     at 
> org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase.testTableHintWithLogicalTableScanReuse(TableSourceITCase.scala:428)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>     at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>     at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
>     at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>     at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
>     at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
>     at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>     at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>     at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
>     at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
>     at 
> org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
>     at 
> org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72)
>     at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107)
>     at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
>     at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54)
>     at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:67)

[jira] [Updated] (FLINK-29114) TableSourceITCase#testTableHintWithLogicalTableScanReuse sometimes fails with result mismatch

2023-12-20 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-29114:
--
Affects Version/s: 1.19.0

> TableSourceITCase#testTableHintWithLogicalTableScanReuse sometimes fails with 
> result mismatch 
> --
>
> Key: FLINK-29114
> URL: https://issues.apache.org/jira/browse/FLINK-29114
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner, Tests
>Affects Versions: 1.15.0, 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Minor
>  Labels: auto-deprioritized-major, test-stability
>
> It could be reproduced locally by repeating tests. Usually about 100 
> iterations are enough to have several failed tests
> {noformat}
> [ERROR] Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 1.664 s <<< FAILURE! - in 
> org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase
> [ERROR] 
> org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase.testTableHintWithLogicalTableScanReuse
>   Time elapsed: 0.108 s  <<< FAILURE!
> java.lang.AssertionError: expected: 3,2,Hello world, 3,2,Hello world, 3,2,Hello world)> but was: 2,2,Hello, 2,2,Hello, 3,2,Hello world, 3,2,Hello world)>
>     at org.junit.Assert.fail(Assert.java:89)
>     at org.junit.Assert.failNotEquals(Assert.java:835)
>     at org.junit.Assert.assertEquals(Assert.java:120)
>     at org.junit.Assert.assertEquals(Assert.java:146)
>     at 
> org.apache.flink.table.planner.runtime.batch.sql.TableSourceITCase.testTableHintWithLogicalTableScanReuse(TableSourceITCase.scala:428)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>     at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>     at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
>     at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>     at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
>     at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
>     at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>     at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>     at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
>     at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
>     at 
> org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
>     at 
> org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72)
>     at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107)
>     at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
>     at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54)
>     at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:67)
>     at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestr

[jira] [Commented] (FLINK-22765) ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable

2023-12-20 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798929#comment-17798929
 ] 

Matthias Pohl commented on FLINK-22765:
---

https://github.com/XComp/flink/actions/runs/7256554715/job/19769225928#step:12:8656

> ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError is unstable
> 
>
> Key: FLINK-22765
> URL: https://issues.apache.org/jira/browse/FLINK-22765
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.14.0, 1.13.5, 1.15.0, 1.17.2, 1.19.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Major
>  Labels: pull-request-available, stale-assigned, test-stability
> Fix For: 1.14.0, 1.16.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18292&view=logs&j=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3&t=a99e99c7-21cd-5a1f-7274-585e62b72f56
> {code}
> May 25 00:56:38 java.lang.AssertionError: 
> May 25 00:56:38 
> May 25 00:56:38 Expected: is ""
> May 25 00:56:38  but: was "The system is out of resources.\nConsult the 
> following stack trace for details."
> May 25 00:56:38   at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:956)
> May 25 00:56:38   at org.junit.Assert.assertThat(Assert.java:923)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.run(ExceptionUtilsITCase.java:94)
> May 25 00:56:38   at 
> org.apache.flink.runtime.util.ExceptionUtilsITCase.testIsMetaspaceOutOfMemoryError(ExceptionUtilsITCase.java:70)
> May 25 00:56:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> May 25 00:56:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 25 00:56:38   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 25 00:56:38   at java.lang.reflect.Method.invoke(Method.java:498)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> May 25 00:56:38   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> May 25 00:56:38   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> May 25 00:56:38   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> May 25 00:56:38   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> May 25 00:56:38   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> May 25 00:56:38   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> May 25 00:56:38   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> May 25 00:56:38   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 25 00:56:38   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> May 25 00:56:38   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> May 25 00:56:38   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> May 25 00:56:38 
> {code}



--
This message wa

[jira] [Reopened] (FLINK-33641) JUnit5 fails to delete a directory on AZP for various table-planner tests

2023-12-20 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl reopened FLINK-33641:
---

We experience the issue still. The follow GitHub Actions workflow included the 
fix above:

[https://github.com/XComp/flink/actions/runs/7267597869/job/19802079134#step:12:11646]

The workflow is based on 1cf27ff26332e5b3bf2fba4f749dac1805af72de which has 
b2b8323ccd931be85dbaa6542552c26e6a29a105 included:
{code:java}
$ git log 1cf27ff26332e5b3bf2fba4f749dac1805af72de | grep 
b2b8323ccd931be85dbaa6542552c26e6a29a105
commit b2b8323ccd931be85dbaa6542552c26e6a29a105{code}

> JUnit5 fails to delete a directory on AZP for various table-planner tests
> -
>
> Key: FLINK-33641
> URL: https://issues.apache.org/jira/browse/FLINK-33641
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Assignee: Jiabao Sun
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> this build 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=54856&view=logs&j=a9db68b9-a7e0-54b6-0f98-010e0aff39e2&t=cdd32e0b-6047-565b-c58f-14054472f1be&l=11289
> fails with 
> {noformat}
> Nov 24 02:21:53   Suppressed: java.nio.file.DirectoryNotEmptyException: 
> /tmp/junit1727687356898183357/junit4798298549994985259/1ac07a5866d81240870d5a2982531508
> Nov 24 02:21:53   at 
> sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:242)
> Nov 24 02:21:53   at 
> sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
> Nov 24 02:21:53   at java.nio.file.Files.delete(Files.java:1126)
> Nov 24 02:21:53   at 
> org.junit.jupiter.engine.extension.TempDirectory$CloseablePath$1.deleteAndContinue(TempDirectory.java:293)
> Nov 24 02:21:53   at 
> org.junit.jupiter.engine.extension.TempDirectory$CloseablePath$1.postVisitDirectory(TempDirectory.java:288)
> Nov 24 02:21:53   at 
> org.junit.jupiter.engine.extension.TempDirectory$CloseablePath$1.postVisitDirectory(TempDirectory.java:264)
> Nov 24 02:21:53   at 
> java.nio.file.Files.walkFileTree(Files.java:2688)
> Nov 24 02:21:53   at 
> java.nio.file.Files.walkFileTree(Files.java:2742)
> Nov 24 02:21:53   at 
> org.junit.jupiter.engine.extension.TempDirectory$CloseablePath.deleteAllFilesAndDirectories(TempDirectory.java:264)
> Nov 24 02:21:53   at 
> org.junit.jupiter.engine.extension.TempDirectory$CloseablePath.close(TempDirectory.java:249)
> Nov 24 02:21:53   ... 92 more
> {noformat}
> not sure however this might be related to recent JUnit4 => JUnit5 upgrade



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-26644) python StreamExecutionEnvironmentTests.test_generate_stream_graph_with_dependencies failed on azure

2023-12-20 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798933#comment-17798933
 ] 

Matthias Pohl commented on FLINK-26644:
---

https://github.com/XComp/flink/actions/runs/7244405295/job/19733042425#step:12:26151

> python 
> StreamExecutionEnvironmentTests.test_generate_stream_graph_with_dependencies 
> failed on azure
> ---
>
> Key: FLINK-26644
> URL: https://issues.apache.org/jira/browse/FLINK-26644
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.14.4, 1.15.0, 1.16.0, 1.19.0
>Reporter: Yun Gao
>Priority: Minor
>  Labels: auto-deprioritized-major, test-stability
>
> {code:java}
> 2022-03-14T18:50:24.6842853Z Mar 14 18:50:24 
> === FAILURES 
> ===
> 2022-03-14T18:50:24.6844089Z Mar 14 18:50:24 _ 
> StreamExecutionEnvironmentTests.test_generate_stream_graph_with_dependencies _
> 2022-03-14T18:50:24.6844846Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6846063Z Mar 14 18:50:24 self = 
>   testMethod=test_generate_stream_graph_with_dependencies>
> 2022-03-14T18:50:24.6847104Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6847766Z Mar 14 18:50:24 def 
> test_generate_stream_graph_with_dependencies(self):
> 2022-03-14T18:50:24.6848677Z Mar 14 18:50:24 python_file_dir = 
> os.path.join(self.tempdir, "python_file_dir_" + str(uuid.uuid4()))
> 2022-03-14T18:50:24.6849833Z Mar 14 18:50:24 os.mkdir(python_file_dir)
> 2022-03-14T18:50:24.6850729Z Mar 14 18:50:24 python_file_path = 
> os.path.join(python_file_dir, "test_stream_dependency_manage_lib.py")
> 2022-03-14T18:50:24.6852679Z Mar 14 18:50:24 with 
> open(python_file_path, 'w') as f:
> 2022-03-14T18:50:24.6853646Z Mar 14 18:50:24 f.write("def 
> add_two(a):\nreturn a + 2")
> 2022-03-14T18:50:24.6854394Z Mar 14 18:50:24 env = self.env
> 2022-03-14T18:50:24.6855019Z Mar 14 18:50:24 
> env.add_python_file(python_file_path)
> 2022-03-14T18:50:24.6855519Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6856254Z Mar 14 18:50:24 def plus_two_map(value):
> 2022-03-14T18:50:24.6857045Z Mar 14 18:50:24 from 
> test_stream_dependency_manage_lib import add_two
> 2022-03-14T18:50:24.6857865Z Mar 14 18:50:24 return value[0], 
> add_two(value[1])
> 2022-03-14T18:50:24.6858466Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6858924Z Mar 14 18:50:24 def add_from_file(i):
> 2022-03-14T18:50:24.6859806Z Mar 14 18:50:24 with 
> open("data/data.txt", 'r') as f:
> 2022-03-14T18:50:24.6860266Z Mar 14 18:50:24 return i[0], 
> i[1] + int(f.read())
> 2022-03-14T18:50:24.6860879Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6862022Z Mar 14 18:50:24 from_collection_source = 
> env.from_collection([('a', 0), ('b', 0), ('c', 1), ('d', 1),
> 2022-03-14T18:50:24.6863259Z Mar 14 18:50:24  
>  ('e', 2)],
> 2022-03-14T18:50:24.6864057Z Mar 14 18:50:24  
> type_info=Types.ROW([Types.STRING(),
> 2022-03-14T18:50:24.6864651Z Mar 14 18:50:24  
>  Types.INT()]))
> 2022-03-14T18:50:24.6865150Z Mar 14 18:50:24 
> from_collection_source.name("From Collection")
> 2022-03-14T18:50:24.6866212Z Mar 14 18:50:24 keyed_stream = 
> from_collection_source.key_by(lambda x: x[1], key_type=Types.INT())
> 2022-03-14T18:50:24.6867083Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6867793Z Mar 14 18:50:24 plus_two_map_stream = 
> keyed_stream.map(plus_two_map).name("Plus Two Map").set_parallelism(3)
> 2022-03-14T18:50:24.6868620Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6869412Z Mar 14 18:50:24 add_from_file_map = 
> plus_two_map_stream.map(add_from_file).name("Add From File Map")
> 2022-03-14T18:50:24.6870239Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6870883Z Mar 14 18:50:24 test_stream_sink = 
> add_from_file_map.add_sink(self.test_sink).name("Test Sink")
> 2022-03-14T18:50:24.6871803Z Mar 14 18:50:24 
> test_stream_sink.set_parallelism(4)
> 2022-03-14T18:50:24.6872291Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6872756Z Mar 14 18:50:24 archive_dir_path = 
> os.path.join(self.tempdir, "archive_" + str(uuid.uuid4()))
> 2022-03-14T18:50:24.6873557Z Mar 14 18:50:24 
> os.mkdir(archive_dir_path)
> 2022-03-14T18:50:24.6874817Z Mar 14 18:50:24 with 
> open(os.path.join(archive_dir_path, "data.txt"), 'w') as f:
> 2022-03-14T18:50:24.6875414Z Mar 14 18:50:24 f.write("3")
> 2022-03-14T18:50:24.6875906Z Mar 14 18:50:24 archiv

[jira] [Created] (FLINK-33900) Multiple failures in WindowRankITCase due to NoResourceAvailableException

2023-12-20 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33900:
-

 Summary: Multiple failures in WindowRankITCase due to 
NoResourceAvailableException
 Key: FLINK-33900
 URL: https://issues.apache.org/jira/browse/FLINK-33900
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.19.0
Reporter: Matthias Pohl


[https://github.com/XComp/flink/actions/runs/7244405295/job/19733011527#step:12:14989]

There are multiple tests in {{WindowRankITCase}} that fail due to a 
{{NoResourceAvailableException}} supposedly:
{code:java}
[...]
Error: 09:19:33 09:19:32.966 [ERROR] 
WindowRankITCase.testTumbleWindowTVFWithOffset  Time elapsed: 300.072 s  <<< 
FAILURE!
14558Dec 18 09:19:33 org.opentest4j.MultipleFailuresError: 
14559Dec 18 09:19:33 Multiple Failures (2 failures)
14560Dec 18 09:19:33org.apache.flink.runtime.client.JobExecutionException: 
Job execution failed.
14561Dec 18 09:19:33java.lang.AssertionError: 
14562Dec 18 09:19:33at 
org.junit.vintage.engine.execution.TestRun.getStoredResultOrSuccessful(TestRun.java:200)
14563Dec 18 09:19:33at 
org.junit.vintage.engine.execution.RunListenerAdapter.fireExecutionFinished(RunListenerAdapter.java:248)
14564Dec 18 09:19:33at 
org.junit.vintage.engine.execution.RunListenerAdapter.testFinished(RunListenerAdapter.java:214)
14565Dec 18 09:19:33at 
org.junit.vintage.engine.execution.RunListenerAdapter.testFinished(RunListenerAdapter.java:88)
14566Dec 18 09:19:33at 
org.junit.runner.notification.SynchronizedRunListener.testFinished(SynchronizedRunListener.java:87)
14567Dec 18 09:19:33at 
org.junit.runner.notification.RunNotifier$9.notifyListener(RunNotifier.java:225)
14568Dec 18 09:19:33at 
org.junit.runner.notification.RunNotifier$SafeNotifier.run(RunNotifier.java:72)
14569Dec 18 09:19:33at 
org.junit.runner.notification.RunNotifier.fireTestFinished(RunNotifier.java:222)
14570Dec 18 09:19:33at 
org.junit.internal.runners.model.EachTestNotifier.fireTestFinished(EachTestNotifier.java:38)
14571Dec 18 09:19:33at 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:372)
14572Dec 18 09:19:33at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
14573Dec 18 09:19:33at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
14574Dec 18 09:19:33at 
org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
14575Dec 18 09:19:33at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
14576Dec 18 09:19:33at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
14577Dec 18 09:19:33at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
14578Dec 18 09:19:33at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
14579Dec 18 09:19:33at 
org.junit.runners.ParentRunner.run(ParentRunner.java:413)
14580Dec 18 09:19:33at org.junit.runners.Suite.runChild(Suite.java:128)
14581Dec 18 09:19:33at org.junit.runners.Suite.runChild(Suite.java:27)
14582Dec 18 09:19:33at 
org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
14583Dec 18 09:19:33at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
14584Dec 18 09:19:33at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
14585Dec 18 09:19:33at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
14586Dec 18 09:19:33at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
14587Dec 18 09:19:33at 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
14588Dec 18 09:19:33at 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
14589Dec 18 09:19:33at org.junit.rules.RunRules.evaluate(RunRules.java:20)
14590Dec 18 09:19:33at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
14591Dec 18 09:19:33at 
org.junit.runners.ParentRunner.run(ParentRunner.java:413)
14592Dec 18 09:19:33at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
14593Dec 18 09:19:33at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
14594Dec 18 09:19:33at 
org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
14595Dec 18 09:19:33at 
org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
14596Dec 18 09:19:33at 
org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72)
14597Dec 18 09:19:33at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:147)
14598Dec 18 09:19:33at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:127)
14599Dec 18 09:19:33at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:90)
14600Dec 18 09:19:33at 
org.junit.platform.launcher.core.Eng

[jira] [Commented] (FLINK-26644) python StreamExecutionEnvironmentTests.test_generate_stream_graph_with_dependencies failed on azure

2023-12-20 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798938#comment-17798938
 ] 

Matthias Pohl commented on FLINK-26644:
---

https://github.com/XComp/flink/actions/runs/7258143773/job/19773334710#step:12:26613

> python 
> StreamExecutionEnvironmentTests.test_generate_stream_graph_with_dependencies 
> failed on azure
> ---
>
> Key: FLINK-26644
> URL: https://issues.apache.org/jira/browse/FLINK-26644
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.14.4, 1.15.0, 1.16.0, 1.19.0
>Reporter: Yun Gao
>Priority: Minor
>  Labels: auto-deprioritized-major, test-stability
>
> {code:java}
> 2022-03-14T18:50:24.6842853Z Mar 14 18:50:24 
> === FAILURES 
> ===
> 2022-03-14T18:50:24.6844089Z Mar 14 18:50:24 _ 
> StreamExecutionEnvironmentTests.test_generate_stream_graph_with_dependencies _
> 2022-03-14T18:50:24.6844846Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6846063Z Mar 14 18:50:24 self = 
>   testMethod=test_generate_stream_graph_with_dependencies>
> 2022-03-14T18:50:24.6847104Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6847766Z Mar 14 18:50:24 def 
> test_generate_stream_graph_with_dependencies(self):
> 2022-03-14T18:50:24.6848677Z Mar 14 18:50:24 python_file_dir = 
> os.path.join(self.tempdir, "python_file_dir_" + str(uuid.uuid4()))
> 2022-03-14T18:50:24.6849833Z Mar 14 18:50:24 os.mkdir(python_file_dir)
> 2022-03-14T18:50:24.6850729Z Mar 14 18:50:24 python_file_path = 
> os.path.join(python_file_dir, "test_stream_dependency_manage_lib.py")
> 2022-03-14T18:50:24.6852679Z Mar 14 18:50:24 with 
> open(python_file_path, 'w') as f:
> 2022-03-14T18:50:24.6853646Z Mar 14 18:50:24 f.write("def 
> add_two(a):\nreturn a + 2")
> 2022-03-14T18:50:24.6854394Z Mar 14 18:50:24 env = self.env
> 2022-03-14T18:50:24.6855019Z Mar 14 18:50:24 
> env.add_python_file(python_file_path)
> 2022-03-14T18:50:24.6855519Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6856254Z Mar 14 18:50:24 def plus_two_map(value):
> 2022-03-14T18:50:24.6857045Z Mar 14 18:50:24 from 
> test_stream_dependency_manage_lib import add_two
> 2022-03-14T18:50:24.6857865Z Mar 14 18:50:24 return value[0], 
> add_two(value[1])
> 2022-03-14T18:50:24.6858466Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6858924Z Mar 14 18:50:24 def add_from_file(i):
> 2022-03-14T18:50:24.6859806Z Mar 14 18:50:24 with 
> open("data/data.txt", 'r') as f:
> 2022-03-14T18:50:24.6860266Z Mar 14 18:50:24 return i[0], 
> i[1] + int(f.read())
> 2022-03-14T18:50:24.6860879Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6862022Z Mar 14 18:50:24 from_collection_source = 
> env.from_collection([('a', 0), ('b', 0), ('c', 1), ('d', 1),
> 2022-03-14T18:50:24.6863259Z Mar 14 18:50:24  
>  ('e', 2)],
> 2022-03-14T18:50:24.6864057Z Mar 14 18:50:24  
> type_info=Types.ROW([Types.STRING(),
> 2022-03-14T18:50:24.6864651Z Mar 14 18:50:24  
>  Types.INT()]))
> 2022-03-14T18:50:24.6865150Z Mar 14 18:50:24 
> from_collection_source.name("From Collection")
> 2022-03-14T18:50:24.6866212Z Mar 14 18:50:24 keyed_stream = 
> from_collection_source.key_by(lambda x: x[1], key_type=Types.INT())
> 2022-03-14T18:50:24.6867083Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6867793Z Mar 14 18:50:24 plus_two_map_stream = 
> keyed_stream.map(plus_two_map).name("Plus Two Map").set_parallelism(3)
> 2022-03-14T18:50:24.6868620Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6869412Z Mar 14 18:50:24 add_from_file_map = 
> plus_two_map_stream.map(add_from_file).name("Add From File Map")
> 2022-03-14T18:50:24.6870239Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6870883Z Mar 14 18:50:24 test_stream_sink = 
> add_from_file_map.add_sink(self.test_sink).name("Test Sink")
> 2022-03-14T18:50:24.6871803Z Mar 14 18:50:24 
> test_stream_sink.set_parallelism(4)
> 2022-03-14T18:50:24.6872291Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6872756Z Mar 14 18:50:24 archive_dir_path = 
> os.path.join(self.tempdir, "archive_" + str(uuid.uuid4()))
> 2022-03-14T18:50:24.6873557Z Mar 14 18:50:24 
> os.mkdir(archive_dir_path)
> 2022-03-14T18:50:24.6874817Z Mar 14 18:50:24 with 
> open(os.path.join(archive_dir_path, "data.txt"), 'w') as f:
> 2022-03-14T18:50:24.6875414Z Mar 14 18:50:24 f.write("3")
> 2022-03-14T18:50:24.6875906Z Mar 14 18:50:24 archiv

[jira] [Commented] (FLINK-28440) EventTimeWindowCheckpointingITCase failed with restore

2023-12-20 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798940#comment-17798940
 ] 

Matthias Pohl commented on FLINK-28440:
---

https://github.com/XComp/flink/actions/runs/7271736522/job/19813058211#step:12:8117

> EventTimeWindowCheckpointingITCase failed with restore
> --
>
> Key: FLINK-28440
> URL: https://issues.apache.org/jira/browse/FLINK-28440
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / State Backends
>Affects Versions: 1.16.0, 1.17.0, 1.18.0, 1.19.0
>Reporter: Huang Xingbo
>Assignee: Yanfei Lei
>Priority: Critical
>  Labels: auto-deprioritized-critical, pull-request-available, 
> stale-assigned, test-stability
> Fix For: 1.19.0
>
> Attachments: image-2023-02-01-00-51-54-506.png, 
> image-2023-02-01-01-10-01-521.png, image-2023-02-01-01-19-12-182.png, 
> image-2023-02-01-16-47-23-756.png, image-2023-02-01-16-57-43-889.png, 
> image-2023-02-02-10-52-56-599.png, image-2023-02-03-10-09-07-586.png, 
> image-2023-02-03-12-03-16-155.png, image-2023-02-03-12-03-56-614.png
>
>
> {code:java}
> Caused by: java.lang.Exception: Exception while creating 
> StreamOperatorStateContext.
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:256)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:722)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:698)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:665)
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935)
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:904)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_0a448493b4782967b150582570326227_(2/4) from 
> any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:353)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:165)
>   ... 11 more
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
> /tmp/junit1835099326935900400/junit1113650082510421526/52ee65b7-033f-4429-8ddd-adbe85e27ced
>  (No such file or directory)
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.advance(StateChangelogHandleStreamHandleReader.java:87)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.hasNext(StateChangelogHandleStreamHandleReader.java:69)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.readBackendHandle(ChangelogBackendRestoreOperation.java:96)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.restore(ChangelogBackendRestoreOperation.java:75)
>   at 
> org.apache.flink.state.changelog.ChangelogStateBackend.restore(ChangelogStateBackend.java:92)
>   at 
> org.apache.flink.state.changelog.AbstractChangelogStateBackend.createKeyedStateBackend(AbstractChangelogStateBackend.java:136)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:336)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   ... 13 more
> Caused by: java.io.FileNotFoundException: 
> /tmp/junit1835099326935900400/junit1113650082510421526/52ee65b7-033

[jira] [Comment Edited] (FLINK-33641) JUnit5 fails to delete a directory on AZP for various table-planner tests

2023-12-20 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798931#comment-17798931
 ] 

Matthias Pohl edited comment on FLINK-33641 at 12/20/23 11:50 AM:
--

We experience the issue still. The follow GitHub Actions workflows included the 
fix above:

[https://github.com/XComp/flink/actions/runs/7267597869/job/19802079134#step:12:11646]

[https://github.com/XComp/flink/actions/runs/7246692424/job/19739487910#step:12:11634]

 

The workflow is based on 1cf27ff26332e5b3bf2fba4f749dac1805af72de which has 
b2b8323ccd931be85dbaa6542552c26e6a29a105 included:
{code:java}
$ git log 1cf27ff26332e5b3bf2fba4f749dac1805af72de | grep 
b2b8323ccd931be85dbaa6542552c26e6a29a105
commit b2b8323ccd931be85dbaa6542552c26e6a29a105{code}


was (Author: mapohl):
We experience the issue still. The follow GitHub Actions workflow included the 
fix above:

[https://github.com/XComp/flink/actions/runs/7267597869/job/19802079134#step:12:11646]

The workflow is based on 1cf27ff26332e5b3bf2fba4f749dac1805af72de which has 
b2b8323ccd931be85dbaa6542552c26e6a29a105 included:
{code:java}
$ git log 1cf27ff26332e5b3bf2fba4f749dac1805af72de | grep 
b2b8323ccd931be85dbaa6542552c26e6a29a105
commit b2b8323ccd931be85dbaa6542552c26e6a29a105{code}

> JUnit5 fails to delete a directory on AZP for various table-planner tests
> -
>
> Key: FLINK-33641
> URL: https://issues.apache.org/jira/browse/FLINK-33641
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Assignee: Jiabao Sun
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> this build 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=54856&view=logs&j=a9db68b9-a7e0-54b6-0f98-010e0aff39e2&t=cdd32e0b-6047-565b-c58f-14054472f1be&l=11289
> fails with 
> {noformat}
> Nov 24 02:21:53   Suppressed: java.nio.file.DirectoryNotEmptyException: 
> /tmp/junit1727687356898183357/junit4798298549994985259/1ac07a5866d81240870d5a2982531508
> Nov 24 02:21:53   at 
> sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:242)
> Nov 24 02:21:53   at 
> sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
> Nov 24 02:21:53   at java.nio.file.Files.delete(Files.java:1126)
> Nov 24 02:21:53   at 
> org.junit.jupiter.engine.extension.TempDirectory$CloseablePath$1.deleteAndContinue(TempDirectory.java:293)
> Nov 24 02:21:53   at 
> org.junit.jupiter.engine.extension.TempDirectory$CloseablePath$1.postVisitDirectory(TempDirectory.java:288)
> Nov 24 02:21:53   at 
> org.junit.jupiter.engine.extension.TempDirectory$CloseablePath$1.postVisitDirectory(TempDirectory.java:264)
> Nov 24 02:21:53   at 
> java.nio.file.Files.walkFileTree(Files.java:2688)
> Nov 24 02:21:53   at 
> java.nio.file.Files.walkFileTree(Files.java:2742)
> Nov 24 02:21:53   at 
> org.junit.jupiter.engine.extension.TempDirectory$CloseablePath.deleteAllFilesAndDirectories(TempDirectory.java:264)
> Nov 24 02:21:53   at 
> org.junit.jupiter.engine.extension.TempDirectory$CloseablePath.close(TempDirectory.java:249)
> Nov 24 02:21:53   ... 92 more
> {noformat}
> not sure however this might be related to recent JUnit4 => JUnit5 upgrade



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28440) EventTimeWindowCheckpointingITCase failed with restore

2023-12-20 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798941#comment-17798941
 ] 

Matthias Pohl commented on FLINK-28440:
---

https://github.com/XComp/flink/actions/runs/7260428960/job/19779937286#step:12:8108

> EventTimeWindowCheckpointingITCase failed with restore
> --
>
> Key: FLINK-28440
> URL: https://issues.apache.org/jira/browse/FLINK-28440
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / State Backends
>Affects Versions: 1.16.0, 1.17.0, 1.18.0, 1.19.0
>Reporter: Huang Xingbo
>Assignee: Yanfei Lei
>Priority: Critical
>  Labels: auto-deprioritized-critical, pull-request-available, 
> stale-assigned, test-stability
> Fix For: 1.19.0
>
> Attachments: image-2023-02-01-00-51-54-506.png, 
> image-2023-02-01-01-10-01-521.png, image-2023-02-01-01-19-12-182.png, 
> image-2023-02-01-16-47-23-756.png, image-2023-02-01-16-57-43-889.png, 
> image-2023-02-02-10-52-56-599.png, image-2023-02-03-10-09-07-586.png, 
> image-2023-02-03-12-03-16-155.png, image-2023-02-03-12-03-56-614.png
>
>
> {code:java}
> Caused by: java.lang.Exception: Exception while creating 
> StreamOperatorStateContext.
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:256)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:722)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:698)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:665)
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935)
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:904)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_0a448493b4782967b150582570326227_(2/4) from 
> any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:353)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:165)
>   ... 11 more
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
> /tmp/junit1835099326935900400/junit1113650082510421526/52ee65b7-033f-4429-8ddd-adbe85e27ced
>  (No such file or directory)
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.advance(StateChangelogHandleStreamHandleReader.java:87)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.hasNext(StateChangelogHandleStreamHandleReader.java:69)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.readBackendHandle(ChangelogBackendRestoreOperation.java:96)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.restore(ChangelogBackendRestoreOperation.java:75)
>   at 
> org.apache.flink.state.changelog.ChangelogStateBackend.restore(ChangelogStateBackend.java:92)
>   at 
> org.apache.flink.state.changelog.AbstractChangelogStateBackend.createKeyedStateBackend(AbstractChangelogStateBackend.java:136)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:336)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   ... 13 more
> Caused by: java.io.FileNotFoundException: 
> /tmp/junit1835099326935900400/junit1113650082510421526/52ee65b7-033

[jira] [Created] (FLINK-33901) Trial Period: GitHub Actions

2023-12-20 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33901:
-

 Summary: Trial Period: GitHub Actions
 Key: FLINK-33901
 URL: https://issues.apache.org/jira/browse/FLINK-33901
 Project: Flink
  Issue Type: New Feature
  Components: Build System / CI
Affects Versions: 1.19.0
Reporter: Matthias Pohl


This issue is (in contrast to FLINK-27075 which is used for issues that were 
collected while preparing 
[FLIP-396|https://cwiki.apache.org/confluence/display/FLINK/FLIP-396%3A+Trial+to+test+GitHub+Actions+as+an+alternative+for+Flink%27s+current+Azure+CI+infrastructure])
 collecting all the subtasks that are necessary to initiate the trial phase for 
GitHub Actions (as discussed in 
[FLIP-396|https://cwiki.apache.org/confluence/display/FLINK/FLIP-396%3A+Trial+to+test+GitHub+Actions+as+an+alternative+for+Flink%27s+current+Azure+CI+infrastructure]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33902) Switch to OpenSSL legacy algorithms

2023-12-20 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33902:
-

 Summary: Switch to OpenSSL legacy algorithms
 Key: FLINK-33902
 URL: https://issues.apache.org/jira/browse/FLINK-33902
 Project: Flink
  Issue Type: Sub-task
  Components: Build System
Affects Versions: 1.19.0
Reporter: Matthias Pohl


In FLINK-33550 we discovered that the GHA runners provided by GitHub have a 
newer version of OpenSSL installed which caused errors in the SSL tests:
{code:java}
Certificate was added to keystore
Certificate was added to keystore
Certificate reply was installed in keystore
Error outputting keys and certificates
40F767F1D97F:error:0308010C:digital envelope 
routines:inner_evp_generic_fetch:unsupported:../crypto/evp/evp_fetch.c:349:Global
 default library context, Algorithm (RC2-40-CBC : 0), Properties ()
Nov 14 15:39:21 [FAIL] Test script contains errors. {code}
The workaround is to enable legacy algorithms using the {{-legacy}} parameter 
in 3.0.0+. We might need to check whether that works for older OpenSSL version 
(in Azure CI).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-33550) "DataSet allround end-to-end test" failed due to certificate error

2023-12-20 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl resolved FLINK-33550.
---
Resolution: Later

This issue is superceded by FLINK-33902.

> "DataSet allround end-to-end test" failed due to certificate error
> --
>
> Key: FLINK-33550
> URL: https://issues.apache.org/jira/browse/FLINK-33550
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions, test-stability
>
> https://github.com/XComp/flink/actions/runs/6865535375/job/18670359423
> {code}
> Certificate was added to keystore
> Certificate was added to keystore
> Certificate reply was installed in keystore
> Error outputting keys and certificates
> 40F767F1D97F:error:0308010C:digital envelope 
> routines:inner_evp_generic_fetch:unsupported:../crypto/evp/evp_fetch.c:349:Global
>  default library context, Algorithm (RC2-40-CBC : 0), Properties ()
> Nov 14 15:39:21 [FAIL] Test script contains errors.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33902) Switch to OpenSSL legacy algorithms

2023-12-20 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798958#comment-17798958
 ] 

Sergey Nuyanzin commented on FLINK-33902:
-

should there be a follow up task to convert certificates to the ones supporting 
modern algorithms? 

> Switch to OpenSSL legacy algorithms
> ---
>
> Key: FLINK-33902
> URL: https://issues.apache.org/jira/browse/FLINK-33902
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions
>
> In FLINK-33550 we discovered that the GHA runners provided by GitHub have a 
> newer version of OpenSSL installed which caused errors in the SSL tests:
> {code:java}
> Certificate was added to keystore
> Certificate was added to keystore
> Certificate reply was installed in keystore
> Error outputting keys and certificates
> 40F767F1D97F:error:0308010C:digital envelope 
> routines:inner_evp_generic_fetch:unsupported:../crypto/evp/evp_fetch.c:349:Global
>  default library context, Algorithm (RC2-40-CBC : 0), Properties ()
> Nov 14 15:39:21 [FAIL] Test script contains errors. {code}
> The workaround is to enable legacy algorithms using the {{-legacy}} parameter 
> in 3.0.0+. We might need to check whether that works for older OpenSSL 
> version (in Azure CI).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33903) Reenable tests that edit file permissions

2023-12-20 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33903:
-

 Summary: Reenable tests that edit file permissions
 Key: FLINK-33903
 URL: https://issues.apache.org/jira/browse/FLINK-33903
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.19.0
Reporter: Matthias Pohl


In GitHub Actions the permissions seem to work differently to how it works in 
Azure CI. In both cases, we run the test suite as root. But in GHA runners, the 
file permission changes won't have any effects because the test started as a 
root user have permissions to adapt the files in any case.

This issue is about enabling the tests again by rewriting the tests (ideally, 
because we shouldn't rely on OS features in the tests). Alternatively, we find 
a way to make those tests work in GHA as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33904) Add zip as a package to GitHub Actions runners

2023-12-20 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33904:
-

 Summary: Add zip  as a package to GitHub Actions runners
 Key: FLINK-33904
 URL: https://issues.apache.org/jira/browse/FLINK-33904
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.19.0
Reporter: Matthias Pohl


FLINK-33253 shows that {{test_pyflink.sh}} fails in GHA because it doesn't find 
{{{}zip{}}}. We should add this as a dependency in the e2e test.
{code:java}
/root/flink/flink-end-to-end-tests/test-scripts/test_pyflink.sh: line 107: zip: 
command not found {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33905) FLIP-382: Unify the Provision of Diverse Metadata for Context-like APIs

2023-12-20 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-33905:
---

 Summary: FLIP-382: Unify the Provision of Diverse Metadata for 
Context-like APIs
 Key: FLINK-33905
 URL: https://issues.apache.org/jira/browse/FLINK-33905
 Project: Flink
  Issue Type: Improvement
  Components: API / Core
Affects Versions: 1.19.0
Reporter: Wencong Liu


This ticket is proposed for 
[FLIP-382|https://cwiki.apache.org/confluence/display/FLINK/FLIP-382%3A+Unify+the+Provision+of+Diverse+Metadata+for+Context-like+APIs].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33906) tools/azure-pipelines/debug_files_utils.sh should support GHA output as well

2023-12-20 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33906:
-

 Summary: tools/azure-pipelines/debug_files_utils.sh should support 
GHA output as well
 Key: FLINK-33906
 URL: https://issues.apache.org/jira/browse/FLINK-33906
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.19.0
Reporter: Matthias Pohl


{{tools/azure-pipelines/debug_files_utils.sh}} sets variables to reference the 
debug output. This is backend-specific and only supports Azure CI right now. We 
should add support for GHA.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33907) Makes copying test jars being done earlier

2023-12-20 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-33907:
--
Description: 
We experienced an issue in GHA which is due to the fact how test resources are 
pre-computed in GHA:
{code:java}
This fixes the following error when compiling flink-clients:
Error: 2.054 [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-dependency-plugin:3.2.0:copy-dependencies 
(copy-dependencies) on project flink-clients: Artifact has not been packaged 
yet. When used on reactor artifact, copy should be executed after packaging: 
see MDEP-187. -> [Help 1] {code}
We need to move this goal to a earlier phase.

  was:
We experienced an issue in GHA which is due to the fact how test resources are 
pre-computed in GHA:
{code:java}
This fixes the following error when compiling flink-clients:
Error: 2.054 [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-dependency-plugin:3.2.0:copy-dependencies 
(copy-dependencies) on project flink-clients: Artifact has not been packaged 
yet. When used on reactor artifact, copy should be executed after packaging: 
see MDEP-187. -> [Help 1] {code}
We need to move this goal to a later phase.


> Makes copying test jars being done earlier
> --
>
> Key: FLINK-33907
> URL: https://issues.apache.org/jira/browse/FLINK-33907
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions
>
> We experienced an issue in GHA which is due to the fact how test resources 
> are pre-computed in GHA:
> {code:java}
> This fixes the following error when compiling flink-clients:
> Error: 2.054 [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-dependency-plugin:3.2.0:copy-dependencies 
> (copy-dependencies) on project flink-clients: Artifact has not been packaged 
> yet. When used on reactor artifact, copy should be executed after packaging: 
> see MDEP-187. -> [Help 1] {code}
> We need to move this goal to a earlier phase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33907) Makes copying test jars being done earlier

2023-12-20 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33907:
-

 Summary: Makes copying test jars being done earlier
 Key: FLINK-33907
 URL: https://issues.apache.org/jira/browse/FLINK-33907
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.19.0
Reporter: Matthias Pohl


We experienced an issue in GHA which is due to the fact how test resources are 
pre-computed in GHA:
{code:java}
This fixes the following error when compiling flink-clients:
Error: 2.054 [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-dependency-plugin:3.2.0:copy-dependencies 
(copy-dependencies) on project flink-clients: Artifact has not been packaged 
yet. When used on reactor artifact, copy should be executed after packaging: 
see MDEP-187. -> [Help 1] {code}
We need to move this goal to a later phase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33908) Custom Action: Move files within the Docker image to the root folder to match the user

2023-12-20 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33908:
-

 Summary: Custom Action: Move files within the Docker image to the 
root folder to match the user
 Key: FLINK-33908
 URL: https://issues.apache.org/jira/browse/FLINK-33908
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.19.0
Reporter: Matthias Pohl


The way the ci template is setup (right now) is to work in the root user's home 
folder. For this we're copying the checkout into /root. This copying is done in 
multiple places which makes it a candidate for a custom action.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33909) Custom Action: Select the right branch and commit hash

2023-12-20 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33909:
-

 Summary: Custom Action: Select the right branch and commit hash
 Key: FLINK-33909
 URL: https://issues.apache.org/jira/browse/FLINK-33909
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.19.0
Reporter: Matthias Pohl


For nightly builds, we want to select the release branches dynamically (rather 
than using the automatic selection through GHA schedule). We want to do this 
dynamically because the GHA feature for branch selection seems to be kind of 
limited right now, e.g.:
 * Don't run a branch that hasn't have any changes in the past 1 day (or any 
other time period)
 * Run only the most-recent release branches and ignore older release branches 
(similar to what we're doing in 
[flink-ci/git-repo-sync:sync_repo.sh|https://github.com/flink-ci/git-repo-sync/blob/master/sync_repo.sh#L28]
 right now)

A custom action that selects the branch and commit hash enables us to overwrite 
this setting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33910) Custom Action: Enable Java version in Flink's CI Docker image

2023-12-20 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33910:
-

 Summary: Custom Action: Enable Java version in Flink's CI Docker 
image
 Key: FLINK-33910
 URL: https://issues.apache.org/jira/browse/FLINK-33910
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.19.0
Reporter: Matthias Pohl


Flink's CI Docker image comes with multiple Java versions which can be enabled 
through environment variables. We should have a custom action that sets these 
variables properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [Docs]remove no necessary text [flink]

2023-12-20 Thread via GitHub


zzzk1 closed pull request #23940: [Docs]remove no necessary text
URL: https://github.com/apache/flink/pull/23940


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-33911) Custom Action: Select workflow configuration

2023-12-20 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33911:
-

 Summary: Custom Action: Select workflow configuration
 Key: FLINK-33911
 URL: https://issues.apache.org/jira/browse/FLINK-33911
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.19.0
Reporter: Matthias Pohl


During experiments, we noticed that the GHA UI isn't capable of utilizing a 
random count of compositions of workflows. If we get into the 3rd level of 
composite workflow, the job name will be cut off in the left menu which makes 
navigating the jobs harder (because you have duplicate of the same job, e.g. 
Compile, belonging to different job profiles).

As a workaround, we came up with Flink CI workflow profiles to configure the CI 
template yaml that is used in every job. A profile configuration can be 
specified through a JSON file that lives in the {{.github/workflow}} folder. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33912) Template: Add CI template for pre-compile steps

2023-12-20 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33912:
-

 Summary: Template: Add CI template for pre-compile steps
 Key: FLINK-33912
 URL: https://issues.apache.org/jira/browse/FLINK-33912
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.19.0
Reporter: Matthias Pohl


We want to have a template that triggers all checks that do not require 
compilation. Those quick checks (e.g. code format) can run without waiting for 
the compilation step to succeed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33913) Template: Add CI template for running Flink's test suite

2023-12-20 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33913:
-

 Summary: Template: Add CI template for running Flink's test suite
 Key: FLINK-33913
 URL: https://issues.apache.org/jira/browse/FLINK-33913
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.19.0
Reporter: Matthias Pohl


We want to have a template that runs the entire Flink test suite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33914) Workflow: Add basic CI that will run with the default configuration

2023-12-20 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33914:
-

 Summary: Workflow: Add basic CI that will run with the default 
configuration
 Key: FLINK-33914
 URL: https://issues.apache.org/jira/browse/FLINK-33914
 Project: Flink
  Issue Type: Sub-task
Reporter: Matthias Pohl


Runs the Flink CI template with the default configuration (Java 8) and can be 
enabled in each push to the branch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33915) Workflow: Add nightly build for the dev version (currently called "master")

2023-12-20 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33915:
-

 Summary: Workflow: Add nightly build for the dev version 
(currently called "master")
 Key: FLINK-33915
 URL: https://issues.apache.org/jira/browse/FLINK-33915
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.19.0
Reporter: Matthias Pohl


The nightly builds run on master and the two most-recently released versions of 
Flink as those are the supported versions. This logic is currently captured in 
[flink-ci/git-repo-sync:sync_repo.sh|https://github.com/flink-ci/git-repo-sync/blob/master/sync_repo.sh#L28].

In 
[FLIP-396|https://cwiki.apache.org/confluence/display/FLINK/FLIP-396%3A+Trial+to+test+GitHub+Actions+as+an+alternative+for+Flink%27s+current+Azure+CI+infrastructure]
 we decided to go ahead and provide nightly builds for {{master}} and 
{{{}release-1.18{}}}. This issue is about providing the nightly workflow for 
{{master}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33901) Trial Period: GitHub Actions

2023-12-20 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl reassigned FLINK-33901:
-

Assignee: Matthias Pohl

> Trial Period: GitHub Actions
> 
>
> Key: FLINK-33901
> URL: https://issues.apache.org/jira/browse/FLINK-33901
> Project: Flink
>  Issue Type: New Feature
>  Components: Build System / CI
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: github-actions
>
> This issue is (in contrast to FLINK-27075 which is used for issues that were 
> collected while preparing 
> [FLIP-396|https://cwiki.apache.org/confluence/display/FLINK/FLIP-396%3A+Trial+to+test+GitHub+Actions+as+an+alternative+for+Flink%27s+current+Azure+CI+infrastructure])
>  collecting all the subtasks that are necessary to initiate the trial phase 
> for GitHub Actions (as discussed in 
> [FLIP-396|https://cwiki.apache.org/confluence/display/FLINK/FLIP-396%3A+Trial+to+test+GitHub+Actions+as+an+alternative+for+Flink%27s+current+Azure+CI+infrastructure]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33916) Workflow: Add nightly build for release-1.18

2023-12-20 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33916:
-

 Summary: Workflow: Add nightly build for release-1.18
 Key: FLINK-33916
 URL: https://issues.apache.org/jira/browse/FLINK-33916
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.19.0
Reporter: Matthias Pohl


Add nightly workflow for {{{}release-1.18{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33902) Switch to OpenSSL legacy algorithms

2023-12-20 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798990#comment-17798990
 ] 

Matthias Pohl edited comment on FLINK-33902 at 12/20/23 12:59 PM:
--

We could do it. But there's no real value in my opinion. We use this in a test 
case (there's no relevance functional-wise, as far as I understand). Therefore, 
it's also not really a security issue

But of course, it would be the right thing to do from a theoretical standpoint.


was (Author: mapohl):
We could do it. But there's no real value in my opinion. We use this in a test 
case (there's no relevance functional-wise, as far as I understand). Therefore, 
it's also not really a security issue.

> Switch to OpenSSL legacy algorithms
> ---
>
> Key: FLINK-33902
> URL: https://issues.apache.org/jira/browse/FLINK-33902
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions
>
> In FLINK-33550 we discovered that the GHA runners provided by GitHub have a 
> newer version of OpenSSL installed which caused errors in the SSL tests:
> {code:java}
> Certificate was added to keystore
> Certificate was added to keystore
> Certificate reply was installed in keystore
> Error outputting keys and certificates
> 40F767F1D97F:error:0308010C:digital envelope 
> routines:inner_evp_generic_fetch:unsupported:../crypto/evp/evp_fetch.c:349:Global
>  default library context, Algorithm (RC2-40-CBC : 0), Properties ()
> Nov 14 15:39:21 [FAIL] Test script contains errors. {code}
> The workaround is to enable legacy algorithms using the {{-legacy}} parameter 
> in 3.0.0+. We might need to check whether that works for older OpenSSL 
> version (in Azure CI).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33902) Switch to OpenSSL legacy algorithms

2023-12-20 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798990#comment-17798990
 ] 

Matthias Pohl commented on FLINK-33902:
---

We could do it. But there's no real value in my opinion. We use this in a test 
case (there's no relevance functional-wise, as far as I understand). Therefore, 
it's also not really a security issue.

> Switch to OpenSSL legacy algorithms
> ---
>
> Key: FLINK-33902
> URL: https://issues.apache.org/jira/browse/FLINK-33902
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions
>
> In FLINK-33550 we discovered that the GHA runners provided by GitHub have a 
> newer version of OpenSSL installed which caused errors in the SSL tests:
> {code:java}
> Certificate was added to keystore
> Certificate was added to keystore
> Certificate reply was installed in keystore
> Error outputting keys and certificates
> 40F767F1D97F:error:0308010C:digital envelope 
> routines:inner_evp_generic_fetch:unsupported:../crypto/evp/evp_fetch.c:349:Global
>  default library context, Algorithm (RC2-40-CBC : 0), Properties ()
> Nov 14 15:39:21 [FAIL] Test script contains errors. {code}
> The workaround is to enable legacy algorithms using the {{-legacy}} parameter 
> in 3.0.0+. We might need to check whether that works for older OpenSSL 
> version (in Azure CI).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33861] Implement restore tests for WindowRank node [flink]

2023-12-20 Thread via GitHub


dawidwys merged PR #23956:
URL: https://github.com/apache/flink/pull/23956


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (FLINK-33861) Implement restore tests for WindowRank node

2023-12-20 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz closed FLINK-33861.

Fix Version/s: 1.19.0
   Resolution: Implemented

Implemented in aa5766e257a8b40e15d08eafa1e005837694772b

> Implement restore tests for WindowRank node
> ---
>
> Key: FLINK-33861
> URL: https://issues.apache.org/jira/browse/FLINK-33861
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Bonnie Varghese
>Assignee: Bonnie Varghese
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33860] Implement restore tests for WindowTableFunction node [flink]

2023-12-20 Thread via GitHub


dawidwys commented on PR #23936:
URL: https://github.com/apache/flink/pull/23936#issuecomment-1864442955

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [FLINK-33902][ci] Adds -legacy to openssl command [flink]

2023-12-20 Thread via GitHub


XComp opened a new pull request, #23961:
URL: https://github.com/apache/flink/pull/23961

   ## What is the purpose of the change
   
   See context of this change in FLINK-33902.
   
   ## Brief change log
   
   * Utilizes `-legacy` parameter to support OpenSSL 3.0.0+
   
   ## Verifying this change
   
   Azure CI should still not cause any errors. The GitHub Actions workflow part 
was tested in the context of FLINK-33550.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? not applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-33902) Switch to OpenSSL legacy algorithms

2023-12-20 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl reassigned FLINK-33902:
-

Assignee: Matthias Pohl

> Switch to OpenSSL legacy algorithms
> ---
>
> Key: FLINK-33902
> URL: https://issues.apache.org/jira/browse/FLINK-33902
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: github-actions, pull-request-available
>
> In FLINK-33550 we discovered that the GHA runners provided by GitHub have a 
> newer version of OpenSSL installed which caused errors in the SSL tests:
> {code:java}
> Certificate was added to keystore
> Certificate was added to keystore
> Certificate reply was installed in keystore
> Error outputting keys and certificates
> 40F767F1D97F:error:0308010C:digital envelope 
> routines:inner_evp_generic_fetch:unsupported:../crypto/evp/evp_fetch.c:349:Global
>  default library context, Algorithm (RC2-40-CBC : 0), Properties ()
> Nov 14 15:39:21 [FAIL] Test script contains errors. {code}
> The workaround is to enable legacy algorithms using the {{-legacy}} parameter 
> in 3.0.0+. We might need to check whether that works for older OpenSSL 
> version (in Azure CI).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33902) Switch to OpenSSL legacy algorithms

2023-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33902:
---
Labels: github-actions pull-request-available  (was: github-actions)

> Switch to OpenSSL legacy algorithms
> ---
>
> Key: FLINK-33902
> URL: https://issues.apache.org/jira/browse/FLINK-33902
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions, pull-request-available
>
> In FLINK-33550 we discovered that the GHA runners provided by GitHub have a 
> newer version of OpenSSL installed which caused errors in the SSL tests:
> {code:java}
> Certificate was added to keystore
> Certificate was added to keystore
> Certificate reply was installed in keystore
> Error outputting keys and certificates
> 40F767F1D97F:error:0308010C:digital envelope 
> routines:inner_evp_generic_fetch:unsupported:../crypto/evp/evp_fetch.c:349:Global
>  default library context, Algorithm (RC2-40-CBC : 0), Properties ()
> Nov 14 15:39:21 [FAIL] Test script contains errors. {code}
> The workaround is to enable legacy algorithms using the {{-legacy}} parameter 
> in 3.0.0+. We might need to check whether that works for older OpenSSL 
> version (in Azure CI).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-31650][metrics][rest] Remove transient metrics for subtasks in terminal state [flink]

2023-12-20 Thread via GitHub


X-czh commented on PR #23447:
URL: https://github.com/apache/flink/pull/23447#issuecomment-1864445063

   Sure, no problem


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33902][ci] Adds -legacy to openssl command [flink]

2023-12-20 Thread via GitHub


flinkbot commented on PR #23961:
URL: https://github.com/apache/flink/pull/23961#issuecomment-1864453035

   
   ## CI report:
   
   * 074424258c912f5bda9b7085db81eb3eeebdbec5 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [FLINK-27082][ci] Disables tests that relies on disabling file permissions [flink]

2023-12-20 Thread via GitHub


XComp opened a new pull request, #23962:
URL: https://github.com/apache/flink/pull/23962

   Based on the following PR(s):
   * https://github.com/apache/flink/pull/23961
   
   ## What is the purpose of the change
   
   See FLINK-27082 and FLINK-33903 for context
   
   ## Brief change log
   
   * Disables/Ignores tests that rely on setting the file permissions to see 
that if the permissions are not granted will trigger error handling
   
   ## Verifying this change
   
   Tests are disabled. No issue is expected.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? not applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-27082) Disable tests relying on non-writable directories

2023-12-20 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl reassigned FLINK-27082:
-

Assignee: Matthias Pohl  (was: Chesnay Schepler)

> Disable tests relying on non-writable directories
> -
>
> Key: FLINK-27082
> URL: https://issues.apache.org/jira/browse/FLINK-27082
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Chesnay Schepler
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: github-actions, pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> We have a number of tests that rely on {{File#setWritable}} to produce errors.
> These currently fail on GHA because we're running the tests as root, who can 
> delete files anyway.
> Switching to a non-root user is a follow-up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-27082) Disable tests relying on non-writable directories

2023-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-27082:
---
Labels: github-actions pull-request-available test-stability  (was: 
github-actions test-stability)

> Disable tests relying on non-writable directories
> -
>
> Key: FLINK-27082
> URL: https://issues.apache.org/jira/browse/FLINK-27082
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: github-actions, pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> We have a number of tests that rely on {{File#setWritable}} to produce errors.
> These currently fail on GHA because we're running the tests as root, who can 
> delete files anyway.
> Switching to a non-root user is a follow-up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-27082) Disable tests relying on non-writable directories

2023-12-20 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-27082:
--
Parent Issue: FLINK-33901  (was: FLINK-27075)

> Disable tests relying on non-writable directories
> -
>
> Key: FLINK-27082
> URL: https://issues.apache.org/jira/browse/FLINK-27082
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Chesnay Schepler
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: github-actions, pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> We have a number of tests that rely on {{File#setWritable}} to produce errors.
> These currently fail on GHA because we're running the tests as root, who can 
> delete files anyway.
> Switching to a non-root user is a follow-up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-27082][ci] Disables tests that relies on disabling file permissions [flink]

2023-12-20 Thread via GitHub


flinkbot commented on PR #23962:
URL: https://github.com/apache/flink/pull/23962#issuecomment-1864463518

   
   ## CI report:
   
   * c06b5080ace2436c9840da6f69725ac78371ace2 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [FLINK-33904][test] Installs missing zip package [flink]

2023-12-20 Thread via GitHub


XComp opened a new pull request, #23963:
URL: https://github.com/apache/flink/pull/23963

   Based on the following PR(s):
   * https://github.com/apache/flink/pull/23961
   * https://github.com/apache/flink/pull/23962
   
   ## What is the purpose of the change
   
   See FLINK-33904 for further context.
   
   ## Brief change log
   
   * Adds install statement to `test_pyflink.sh` because that script requires 
this package to be around.
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? not applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33904) Add zip as a package to GitHub Actions runners

2023-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33904:
---
Labels: github-actions pull-request-available  (was: github-actions)

> Add zip  as a package to GitHub Actions runners
> ---
>
> Key: FLINK-33904
> URL: https://issues.apache.org/jira/browse/FLINK-33904
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions, pull-request-available
>
> FLINK-33253 shows that {{test_pyflink.sh}} fails in GHA because it doesn't 
> find {{{}zip{}}}. We should add this as a dependency in the e2e test.
> {code:java}
> /root/flink/flink-end-to-end-tests/test-scripts/test_pyflink.sh: line 107: 
> zip: command not found {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33904) Add zip as a package to GitHub Actions runners

2023-12-20 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl reassigned FLINK-33904:
-

Assignee: Matthias Pohl

> Add zip  as a package to GitHub Actions runners
> ---
>
> Key: FLINK-33904
> URL: https://issues.apache.org/jira/browse/FLINK-33904
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: github-actions, pull-request-available
>
> FLINK-33253 shows that {{test_pyflink.sh}} fails in GHA because it doesn't 
> find {{{}zip{}}}. We should add this as a dependency in the e2e test.
> {code:java}
> /root/flink/flink-end-to-end-tests/test-scripts/test_pyflink.sh: line 107: 
> zip: command not found {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33906) tools/azure-pipelines/debug_files_utils.sh should support GHA output as well

2023-12-20 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl reassigned FLINK-33906:
-

Assignee: Matthias Pohl

> tools/azure-pipelines/debug_files_utils.sh should support GHA output as well
> 
>
> Key: FLINK-33906
> URL: https://issues.apache.org/jira/browse/FLINK-33906
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: github-actions
>
> {{tools/azure-pipelines/debug_files_utils.sh}} sets variables to reference 
> the debug output. This is backend-specific and only supports Azure CI right 
> now. We should add support for GHA.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33881) [TtlListState]Avoid copy and update value in TtlListState#getUnexpiredOrNull

2023-12-20 Thread Jinzhong Li (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinzhong Li updated FLINK-33881:

Description: 
In some scenarios, 'TtlListState#getUnexpiredOrNull -> 
elementSerializer.copy(ttlValue)'  consumes a lot of cpu resources.

!image-2023-12-19-21-25-21-446.png|width=529,height=119!

I found that for TtlListState#getUnexpiredOrNull, if none of the elements have 
expired, it still needs to copy all the elements and update the whole list/map 
in TtlIncrementalCleanup#runCleanup();

!image-2023-12-19-21-26-43-518.png|width=505,height=266!

I think we could optimize TtlListState#getUnexpiredOrNull by:
1)find the first expired element index in the list;
2)If not found, return to the original list;
3)If found, then constrct the unexpire list (puts the previous elements into 
the list), and go through the subsequent elements, adding expired elements into 
the list.
{code:java}
public List> getUnexpiredOrNull(@Nonnull List> 
ttlValues) {
//...
int firstExpireIndex = -1;
for (int i = 0; i < ttlValues.size(); i++) {
if (TtlUtils.expired(ttlValues.get(i), ttl, currentTimestamp)) {
firstExpireIndex = i;
break;
}
}
if (firstExpireIndex == -1) {
return ttlValues;  //return the original ttlValues
}
List> unexpired = new ArrayList<>(ttlValues.size());
for (int i = 0; i < ttlValues.size(); i++) {
if (i < firstExpireIndex) {
// unexpired.add(ttlValues.get(i));
unexpired.add(elementSerializer.copy(ttlValues.get(i)));
}
if (i > firstExpireIndex) {
if (!TtlUtils.expired(ttlValues.get(i), ttl, currentTimestamp)) {
// unexpired.add(ttlValues.get(i));
unexpired.add(elementSerializer.copy(ttlValues.get(i)));
}
}
}
//  .
} {code}
*In this way, the extra iteration overhead is actually very very small, but the 
benefit when there are no expired elements is significant.*

  was:
In some scenarios, 'TtlListState#getUnexpiredOrNull -> 
elementSerializer.copy(ttlValue)'  consumes a lot of cpu resources.

!image-2023-12-19-21-25-21-446.png|width=529,height=119!

I found that for TtlListState#getUnexpiredOrNull, if none of the elements have 
expired, it still needs to copy all the elements and update the whole list/map 
in TtlIncrementalCleanup#runCleanup();

!image-2023-12-19-21-26-43-518.png|width=505,height=266!

I think we could optimize TtlListState#getUnexpiredOrNull by:
1)find the first expired element index in the list;
2)If not found, return to the original list;
3)If found, then constrct the unexpire list (puts the previous elements into 
the list), and go through the subsequent elements, adding expired elements into 
the list.
{code:java}
public List> getUnexpiredOrNull(@Nonnull List> 
ttlValues) {
//...
int firstExpireIndex = -1;
for (int i = 0; i < ttlValues.size(); i++) {
if (TtlUtils.expired(ttlValues.get(i), ttl, currentTimestamp)) {
firstExpireIndex = i;
break;
}
}
if (firstExpireIndex == -1) {
return ttlValues;  //return the original ttlValues
}
List> unexpired = new ArrayList<>(ttlValues.size());
for (int i = 0; i < ttlValues.size(); i++) {
if (i < firstExpireIndex) {
unexpired.add(ttlValues.get(i));
}
if (i > firstExpireIndex) {
if (!TtlUtils.expired(ttlValues.get(i), ttl, currentTimestamp)) {
unexpired.add(ttlValues.get(i));
}
}
}
//  .
} {code}
*In this way, the extra iteration overhead is actually very very small, but the 
benefit when there are no expired elements is significant.*


> [TtlListState]Avoid copy and update value in TtlListState#getUnexpiredOrNull
> 
>
> Key: FLINK-33881
> URL: https://issues.apache.org/jira/browse/FLINK-33881
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: Jinzhong Li
>Priority: Minor
> Attachments: image-2023-12-19-21-25-21-446.png, 
> image-2023-12-19-21-26-43-518.png
>
>
> In some scenarios, 'TtlListState#getUnexpiredOrNull -> 
> elementSerializer.copy(ttlValue)'  consumes a lot of cpu resources.
> !image-2023-12-19-21-25-21-446.png|width=529,height=119!
> I found that for TtlListState#getUnexpiredOrNull, if none of the elements 
> have expired, it still needs to copy all the elements and update the whole 
> list/map in TtlIncrementalCleanup#runCleanup();
> !image-2023-12-19-21-26-43-518.png|width=505,height=266!
> I think we could optimize TtlListState#getUnexpiredOrNull by:
> 1)find the first expired element index in the list;
> 2)If not found, return to the original list;
> 3)If

Re: [PR] [FLINK-33904][test] Installs missing zip package [flink]

2023-12-20 Thread via GitHub


flinkbot commented on PR #23963:
URL: https://github.com/apache/flink/pull/23963#issuecomment-1864473918

   
   ## CI report:
   
   * 906c9d52ad8504d135966a499d05cd230316a5a1 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33906) tools/azure-pipelines/debug_files_utils.sh should support GHA output as well

2023-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33906:
---
Labels: github-actions pull-request-available  (was: github-actions)

> tools/azure-pipelines/debug_files_utils.sh should support GHA output as well
> 
>
> Key: FLINK-33906
> URL: https://issues.apache.org/jira/browse/FLINK-33906
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: github-actions, pull-request-available
>
> {{tools/azure-pipelines/debug_files_utils.sh}} sets variables to reference 
> the debug output. This is backend-specific and only supports Azure CI right 
> now. We should add support for GHA.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-33906][ci] tools/azure-pipelines/debug_files_utils.sh should support GHA output as well [flink]

2023-12-20 Thread via GitHub


XComp opened a new pull request, #23964:
URL: https://github.com/apache/flink/pull/23964

   Based on the following PR(s):
   * https://github.com/apache/flink/pull/23961
   * https://github.com/apache/flink/pull/23962
   * https://github.com/apache/flink/pull/23963
   
   ## What is the purpose of the change
   
   See FLINK-33906 for further context.
   
   ## Brief change log
   
   * Adds support for GHA environment variables
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? not applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-33907) Makes copying test jars being done earlier

2023-12-20 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl reassigned FLINK-33907:
-

Assignee: Matthias Pohl

> Makes copying test jars being done earlier
> --
>
> Key: FLINK-33907
> URL: https://issues.apache.org/jira/browse/FLINK-33907
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: github-actions
>
> We experienced an issue in GHA which is due to the fact how test resources 
> are pre-computed in GHA:
> {code:java}
> This fixes the following error when compiling flink-clients:
> Error: 2.054 [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-dependency-plugin:3.2.0:copy-dependencies 
> (copy-dependencies) on project flink-clients: Artifact has not been packaged 
> yet. When used on reactor artifact, copy should be executed after packaging: 
> see MDEP-187. -> [Help 1] {code}
> We need to move this goal to a earlier phase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33641) JUnit5 fails to delete a directory on AZP for various table-planner tests

2023-12-20 Thread Jiabao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799011#comment-17799011
 ] 

Jiabao Sun commented on FLINK-33641:


Hi [~mapohl], please take a look at this PR 
[https://github.com/apache/flink/pull/23917]

In the previous fix, I think we shouldn't reuse the tempFolder to create a new 
temporary folder.
Although we clean up the newly created directory during the "after" method, it 
is still possible for the checkpoint thread to continue writing, resulting in 
the failure of JUnit5 TemporaryFolder cleanup.
 
https://github.com/apache/flink/pull/23917#discussion_r1428188735

I think this problem was fixed by 23917, so could you help include the commit 
ac88acfbb1b4ebf7336e9a20e0b6d0b0fe32be51 and try it again?

> JUnit5 fails to delete a directory on AZP for various table-planner tests
> -
>
> Key: FLINK-33641
> URL: https://issues.apache.org/jira/browse/FLINK-33641
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Assignee: Jiabao Sun
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> this build 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=54856&view=logs&j=a9db68b9-a7e0-54b6-0f98-010e0aff39e2&t=cdd32e0b-6047-565b-c58f-14054472f1be&l=11289
> fails with 
> {noformat}
> Nov 24 02:21:53   Suppressed: java.nio.file.DirectoryNotEmptyException: 
> /tmp/junit1727687356898183357/junit4798298549994985259/1ac07a5866d81240870d5a2982531508
> Nov 24 02:21:53   at 
> sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:242)
> Nov 24 02:21:53   at 
> sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
> Nov 24 02:21:53   at java.nio.file.Files.delete(Files.java:1126)
> Nov 24 02:21:53   at 
> org.junit.jupiter.engine.extension.TempDirectory$CloseablePath$1.deleteAndContinue(TempDirectory.java:293)
> Nov 24 02:21:53   at 
> org.junit.jupiter.engine.extension.TempDirectory$CloseablePath$1.postVisitDirectory(TempDirectory.java:288)
> Nov 24 02:21:53   at 
> org.junit.jupiter.engine.extension.TempDirectory$CloseablePath$1.postVisitDirectory(TempDirectory.java:264)
> Nov 24 02:21:53   at 
> java.nio.file.Files.walkFileTree(Files.java:2688)
> Nov 24 02:21:53   at 
> java.nio.file.Files.walkFileTree(Files.java:2742)
> Nov 24 02:21:53   at 
> org.junit.jupiter.engine.extension.TempDirectory$CloseablePath.deleteAllFilesAndDirectories(TempDirectory.java:264)
> Nov 24 02:21:53   at 
> org.junit.jupiter.engine.extension.TempDirectory$CloseablePath.close(TempDirectory.java:249)
> Nov 24 02:21:53   ... 92 more
> {noformat}
> not sure however this might be related to recent JUnit4 => JUnit5 upgrade



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33560][Connectors/AWS] Externalize AWS Python connectors from Flink to AWS project [flink-connector-aws]

2023-12-20 Thread via GitHub


hlteoh37 commented on code in PR #121:
URL: 
https://github.com/apache/flink-connector-aws/pull/121#discussion_r1432721109


##
.github/workflows/nightly.yml:
##
@@ -31,4 +31,12 @@ jobs:
   flink_version: ${{ matrix.flink }}
   flink_url: https://s3.amazonaws.com/flink-nightly/flink-${{ matrix.flink 
}}-bin-scala_2.12.tgz
   cache_flink_binary: false
-secrets: inherit
\ No newline at end of file
+secrets: inherit
+
+  python_test:
+strategy:
+  matrix:

Review Comment:
   Wonder if we should consider using a similar matrix for python versions



##
flink-connector-aws/flink-python-connector-aws/pom.xml:
##
@@ -0,0 +1,223 @@
+
+
+http://maven.apache.org/POM/4.0.0";
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+
+4.0.0
+
+
+org.apache.flink
+flink-connector-aws-parent
+4.3-SNAPSHOT
+
+
+flink-python-connector-aws
+Flink : Connectors : AWS : Python
+
+pom
+
+
+false
+
+
+
+
+org.apache.flink
+flink-sql-connector-kinesis
+${project.version}
+
+
+org.apache.flink
+flink-sql-connector-aws-kinesis-firehose
+${project.version}
+
+
+org.apache.flink
+flink-runtime
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java
+${flink.version}
+
+
+org.apache.flink
+flink-connector-test-utils
+${flink.version}
+
+
+
+
+
+
+org.apache.maven.plugins
+maven-antrun-plugin
+
+
+clean
+clean
+
+run
+
+
+
+
+

Review Comment:
   This doesn't seem right for kinesis



##
.gitignore:
##
@@ -36,4 +36,20 @@ tools/flink
 tools/flink-*
 tools/releasing/release
 tools/japicmp-output
-*/.idea/
\ No newline at end of file
+*/.idea/
+
+# Generated file, do not store in git

Review Comment:
   Should we remove this comment if we plan to store this?



##
flink-connector-aws/flink-python-connector-aws/pom.xml:
##
@@ -0,0 +1,223 @@
+
+
+http://maven.apache.org/POM/4.0.0";
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+
+4.0.0
+
+
+org.apache.flink
+flink-connector-aws-parent
+4.3-SNAPSHOT
+
+
+flink-python-connector-aws
+Flink : Connectors : AWS : Python
+
+pom
+
+
+false
+
+
+
+
+org.apache.flink
+flink-sql-connector-kinesis
+${project.version}
+
+
+org.apache.flink
+flink-sql-connector-aws-kinesis-firehose
+${project.version}
+
+
+org.apache.flink
+flink-runtime
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java
+${flink.version}
+
+
+org.apache.flink
+flink-connector-test-utils
+${flink.version}
+

Review Comment:
   How come we need this at `compile` level?



##
flink-connector-aws/flink-python-connector-aws/pom.xml:
##
@@ -0,0 +1,223 @@
+
+
+http://maven.apache.org/POM/4.0.0";
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+
+4.0.0
+
+
+org.apache.flink
+flink-connector-aws-parent
+4.3-SNAPSHOT
+
+
+flink-python-connector-aws
+Flink : Connectors : AWS : Python
+
+pom
+
+
+false
+
+
+
+
+org.apache.flink
+flink-sql-connector-kinesis
+${project.version}
+
+
+org.apache.flink
+flink-sql-connector-aws-kinesis-firehose
+${project.version}
+
+
+org.apache.flink
+flink-runtime
+${flink.version}
+
+
+org.apache.flink
+flink-streaming-java
+${flink.version}
+

Review Comment:
   Why do we need these at `compile` level?



##
flink-connector-aws/flink-python-connector-aws/pom.xml:
##
@@ -0,0 +1,223 @@
+
+
+http://maven.apache.org/POM/4.0.0";
+ xmlns:xsi="http://www.w3.

  1   2   3   >