[GitHub] [flink] TanYuxin-tyx commented on pull request #22443: [FLINK-31878][connectors] Fix the wrong name of PauseOrResumeSplitsTask#toString
TanYuxin-tyx commented on PR #22443: URL: https://github.com/apache/flink/pull/22443#issuecomment-1518498947 @reswqa @MartijnVisser @liuyongvs Thanks for reviewing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-web] tzulitai closed pull request #632: Add Flink Kafka Connector v3.0.0
tzulitai closed pull request #632: Add Flink Kafka Connector v3.0.0 URL: https://github.com/apache/flink-web/pull/632 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-web] tzulitai closed pull request #606: Flink Kafka Connector 3.0.0
tzulitai closed pull request #606: Flink Kafka Connector 3.0.0 URL: https://github.com/apache/flink-web/pull/606 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] VinayLondhe14 commented on pull request #18730: [FLINK-25495] The deployment of application-mode support client attach
VinayLondhe14 commented on PR #18730: URL: https://github.com/apache/flink/pull/18730#issuecomment-1518395452 Is there a particular reason this PR has not been merged? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #22451: FLINK-31880: Fixed broken date test. It was failing when run in CST time zone.
flinkbot commented on PR #22451: URL: https://github.com/apache/flink/pull/22451#issuecomment-1518370390 ## CI report: * 639c2246b369dfa275e193207fd2cb7e74bb7870 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-31880) Bad Test in OrcColumnarRowSplitReaderTest
[ https://issues.apache.org/jira/browse/FLINK-31880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-31880: --- Labels: pull-request-available (was: ) > Bad Test in OrcColumnarRowSplitReaderTest > - > > Key: FLINK-31880 > URL: https://issues.apache.org/jira/browse/FLINK-31880 > Project: Flink > Issue Type: Bug > Components: Connectors / ORC, Formats (JSON, Avro, Parquet, ORC, > SequenceFile) >Reporter: Kurt Ostfeld >Priority: Minor > Labels: pull-request-available > > This is a development issue with, what looks like a buggy unit test. > > I tried to build Flink with a clean copy of the repository and I get: > > ``` > [INFO] Results: > [INFO] > [ERROR] Failures: > [ERROR] OrcColumnarRowSplitReaderTest.testReadFileWithTypes:365 > expected: "1969-12-31" > but was: "1970-01-01" > [INFO] > [ERROR] Tests run: 26, Failures: 1, Errors: 0, Skipped: 0 > ``` > > I see the test is testing Date data types with `new Date(562423)` which is 9 > minutes and 22 seconds after the epoch time, which is 1970-01-01 UTC time, or > when I run that on my laptop in CST timezone, I get `Wed Dec 31 18:09:22 CST > 1969`. > > I have a simple pull request ready which fixes this issue and uses the Java 8 > LocalDate API instead which avoids time zones entirely. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] kurtostfeld opened a new pull request, #22451: FLINK-31880: Fixed broken date test. It was failing when run in CST time zone.
kurtostfeld opened a new pull request, #22451: URL: https://github.com/apache/flink/pull/22451 ## What is the purpose of the change Fix the broken unit test so that unit tests pass. ## Brief change log Very simple self-explanatory single file fix. ## Verifying this change verified. ## Documentation None required. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-31880) Bad Test in OrcColumnarRowSplitReaderTest
Kurt Ostfeld created FLINK-31880: Summary: Bad Test in OrcColumnarRowSplitReaderTest Key: FLINK-31880 URL: https://issues.apache.org/jira/browse/FLINK-31880 Project: Flink Issue Type: Bug Components: Connectors / ORC, Formats (JSON, Avro, Parquet, ORC, SequenceFile) Reporter: Kurt Ostfeld This is a development issue with, what looks like a buggy unit test. I tried to build Flink with a clean copy of the repository and I get: ``` [INFO] Results: [INFO] [ERROR] Failures: [ERROR] OrcColumnarRowSplitReaderTest.testReadFileWithTypes:365 expected: "1969-12-31" but was: "1970-01-01" [INFO] [ERROR] Tests run: 26, Failures: 1, Errors: 0, Skipped: 0 ``` I see the test is testing Date data types with `new Date(562423)` which is 9 minutes and 22 seconds after the epoch time, which is 1970-01-01 UTC time, or when I run that on my laptop in CST timezone, I get `Wed Dec 31 18:09:22 CST 1969`. I have a simple pull request ready which fixes this issue and uses the Java 8 LocalDate API instead which avoids time zones entirely. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31828) List field in a POJO data stream results in table program compilation failure
[ https://issues.apache.org/jira/browse/FLINK-31828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715160#comment-17715160 ] Vladimir Matveev commented on FLINK-31828: -- Hi [~aitozi], thank you for the quick response! Yes, I had the EOF exception when trying to print the table with RAW columns. Great to know that it will be fixed :) > List field in a POJO data stream results in table program compilation failure > - > > Key: FLINK-31828 > URL: https://issues.apache.org/jira/browse/FLINK-31828 > Project: Flink > Issue Type: Bug > Components: Table SQL / Runtime >Affects Versions: 1.16.1 > Environment: Java 11 > Flink 1.16.1 >Reporter: Vladimir Matveev >Priority: Major > Attachments: MainPojo.java, generated-code.txt, stacktrace.txt > > > Suppose I have a POJO class like this: > {code:java} > public class Example { > private String key; > private List> values; > // getters, setters, equals+hashCode omitted > } > {code} > When a DataStream with this class is converted to a table, and some > operations are performed on it, it results in an exception which explicitly > says that I should file a ticket: > {noformat} > Caused by: org.apache.flink.api.common.InvalidProgramException: Table program > cannot be compiled. This is a bug. Please file an issue. > {noformat} > Please find the example Java code and the full stack trace attached. > From the exception and generated code it seems that Flink is upset with the > list field being treated as an array - but I cannot have an array type there > in the real code. > Also note that if I _don't_ specify the schema explicitly, it then maps the > {{values}} field to a `RAW('java.util.List', '...')` type, which also does > not work correctly and fails the job in case of even simplest operations like > printing. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] XComp commented on a diff in pull request #22422: [FLINK-31838][runtime] Moves thread handling for leader event listener calls from DefaultMultipleComponentLeaderElectionService into
XComp commented on code in PR #22422: URL: https://github.com/apache/flink/pull/22422#discussion_r1174080585 ## flink-runtime/src/main/java/org/apache/flink/runtime/leaderelection/LeaderElectionEventHandler.java: ## @@ -39,12 +40,29 @@ public interface LeaderElectionEventHandler { */ void onGrantLeadership(UUID newLeaderSessionId); +/** + * Called by specific {@link LeaderElectionDriver} when the leadership is granted. + * + * This method will trigger the grant event processing in a separate thread. + * + * @param newLeaderSessionId the valid leader session id + */ +CompletableFuture onGrantLeadershipAsync(UUID newLeaderSessionId); Review Comment: > We would need to get a reference to the owner of `DefaultLeaderElectionService`, wouldn't we? Ok, I was not thinking straight when writing the previous comment. We have the `LeaderContender`'s reference. 🤦 I'm gonna give it some thought. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-31848) And Operator has side effect when operands have udf
[ https://issues.apache.org/jira/browse/FLINK-31848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715115#comment-17715115 ] Shuiqiang Chen edited comment on FLINK-31848 at 4/21/23 6:32 PM: - [~martijnvisser][~zju_zsx][~jark] Thanks for your reply. After running several test cases based on release-1.15 (which I previously analyzed) and the latest 1.18-SNAPSHOT branch, it seems that the left.resultTerm in the left.code has always been assigned with a boolean value and will never cause a syntax error by calling `if (!${left.resultTerm})`. Therefore I agree with [~zju_zsx] that the logic can be safely simplified. While with `if (!${left.nullTerm} && !${left.resultTerm})`, it would be a little bit more intuitive that ${left.nullTerm} indicates whether left result is UNKNOWN or not in three value logic. was (Author: csq): [~martijnvisser][~zju_zsx][~jark] Thanks for your reply. After running several test cases based on release-1.15 (which I previously analyzed) and the latest 1.18-SNAPSHOT branch, it seems that the left.resultTerm in the left.code has always been assigned with a boolean value and will never cause a syntax error by calling `if (!${left.resultTerm})`. Therefore I agree with [~zju_zsx] that the logic can be safely simplified. While with `if (!${left.nullTerm} && !${left.resultTerm})`, it would be a little bit more intuitive that ${left.nullTerm} indicates whether left result is UNKNOWN or not in three value logic. > And Operator has side effect when operands have udf > --- > > Key: FLINK-31848 > URL: https://issues.apache.org/jira/browse/FLINK-31848 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.13.2 >Reporter: zju_zsx >Priority: Major > Attachments: image-2023-04-19-14-54-46-458.png > > > > {code:java} > CREATE TABLE kafka_source ( > `content` varchar, > `testid` bigint, > `extra` int > ); > CREATE TABLE console_sink ( > `content` varchar, > `testid` bigint > ) > with ( > 'connector' = 'print' > ); > insert into console_sink > select > content,testid+1 > from kafka_source where testid is not null and testid > 0 and my_udf(testid) > != 0; {code} > my_udf has a constraint that the testid should not be null, but the testid is > not null and testid > 0 does not take effect. > > Im ScalarOperatorGens.generateAnd > !image-2023-04-19-14-54-46-458.png! > if left.nullTerm is true, right code will be execute 。 > it seems that > {code:java} > if (!${left.nullTerm} && !${left.resultTerm}) {code} > can be safely replaced with > {code:java} > if (!${left.resultTerm}){code} > ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-31848) And Operator has side effect when operands have udf
[ https://issues.apache.org/jira/browse/FLINK-31848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715115#comment-17715115 ] Shuiqiang Chen edited comment on FLINK-31848 at 4/21/23 6:31 PM: - [~martijnvisser][~zju_zsx][~jark] Thanks for your reply. After running several test cases based on release-1.15 (which I previously analyzed) and the latest 1.18-SNAPSHOT branch, it seems that the left.resultTerm in the left.code has always been assigned with a boolean value and will never cause a syntax error by calling `if (!${left.resultTerm})`. Therefore I agree with [~zju_zsx] that the logic can be safely simplified. While with `if (!${left.nullTerm} && !${left.resultTerm})`, it would be a little bit more intuitive that ${left.nullTerm} indicates whether left result is UNKNOWN or not in three value logic. was (Author: csq): [~martijnvisser][~zju_zsx][~jark] Thanks for your reply. After running several test cases based on release-1.15 (which I previously analyzed) and the latest 1.18-SNAPSHOT branch, it seems that the left.resultTerm in the left.code has always been assigned with a boolean value and will never cause a syntax error by calling _if (!${left.resultTerm})_ . Therefore I agree with [~zju_zsx] that the logic can be safely simplified. While with _if (!${left.nullTerm} && !${left.resultTerm})_ in current code base, it would be a little bit more intuitive that ${left.nullTerm} indicating whether left result is UNKNOWN or not. > And Operator has side effect when operands have udf > --- > > Key: FLINK-31848 > URL: https://issues.apache.org/jira/browse/FLINK-31848 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.13.2 >Reporter: zju_zsx >Priority: Major > Attachments: image-2023-04-19-14-54-46-458.png > > > > {code:java} > CREATE TABLE kafka_source ( > `content` varchar, > `testid` bigint, > `extra` int > ); > CREATE TABLE console_sink ( > `content` varchar, > `testid` bigint > ) > with ( > 'connector' = 'print' > ); > insert into console_sink > select > content,testid+1 > from kafka_source where testid is not null and testid > 0 and my_udf(testid) > != 0; {code} > my_udf has a constraint that the testid should not be null, but the testid is > not null and testid > 0 does not take effect. > > Im ScalarOperatorGens.generateAnd > !image-2023-04-19-14-54-46-458.png! > if left.nullTerm is true, right code will be execute 。 > it seems that > {code:java} > if (!${left.nullTerm} && !${left.resultTerm}) {code} > can be safely replaced with > {code:java} > if (!${left.resultTerm}){code} > ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-31848) And Operator has side effect when operands have udf
[ https://issues.apache.org/jira/browse/FLINK-31848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715115#comment-17715115 ] Shuiqiang Chen edited comment on FLINK-31848 at 4/21/23 6:29 PM: - [~martijnvisser][~zju_zsx][~jark] Thanks for your reply. After running several test cases based on release-1.15 (which I previously analyzed) and the latest 1.18-SNAPSHOT branch, it seems that the left.resultTerm in the left.code has always been assigned with a boolean value and will never cause a syntax error by calling _if (!${left.resultTerm})_ . Therefore I agree with [~zju_zsx] that the logic can be safely simplified. While with _if (!${left.nullTerm} && !${left.resultTerm})_ in current code base, it would be a little bit more intuitive that ${left.nullTerm} indicating whether left result is UNKNOWN or not. was (Author: csq): [~martijnvisser][~zju_zsx][~jark] Thanks for your reply. After running several test cases based on release-1.15 (which I previously analyzed) and the latest 1.18-SNAPSHOT branch, it seems that the left.resultTerm in the left.code has always been assigned with a boolean value and will never cause a syntax error by calling if (!${left.resultTerm}) . Therefore I agree with [~zju_zsx] that the logic can be safely simplified. > And Operator has side effect when operands have udf > --- > > Key: FLINK-31848 > URL: https://issues.apache.org/jira/browse/FLINK-31848 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.13.2 >Reporter: zju_zsx >Priority: Major > Attachments: image-2023-04-19-14-54-46-458.png > > > > {code:java} > CREATE TABLE kafka_source ( > `content` varchar, > `testid` bigint, > `extra` int > ); > CREATE TABLE console_sink ( > `content` varchar, > `testid` bigint > ) > with ( > 'connector' = 'print' > ); > insert into console_sink > select > content,testid+1 > from kafka_source where testid is not null and testid > 0 and my_udf(testid) > != 0; {code} > my_udf has a constraint that the testid should not be null, but the testid is > not null and testid > 0 does not take effect. > > Im ScalarOperatorGens.generateAnd > !image-2023-04-19-14-54-46-458.png! > if left.nullTerm is true, right code will be execute 。 > it seems that > {code:java} > if (!${left.nullTerm} && !${left.resultTerm}) {code} > can be safely replaced with > {code:java} > if (!${left.resultTerm}){code} > ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-31848) And Operator has side effect when operands have udf
[ https://issues.apache.org/jira/browse/FLINK-31848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715115#comment-17715115 ] Shuiqiang Chen edited comment on FLINK-31848 at 4/21/23 6:18 PM: - [~martijnvisser][~zju_zsx][~jark] Thanks for your reply. After running several test cases based on release-1.15 (which I previously analyzed) and the latest 1.18-SNAPSHOT branch, it seems that the left.resultTerm in the left.code has always been assigned with a boolean value and will never cause a syntax error by calling if (!${left.resultTerm}) . Therefore I agree with [~zju_zsx] that the logic can be safely simplified. was (Author: csq): [~martijnvisser][~zju_zsx][~jark] Thanks for your reply. After ran several test cases based on release-1.15 (which I previously analyzed) and the latest 1.18-SNAPSHOT branch, it seems that the left.resultTerm in the left.code has always been assigned a boolean value and will never cause a syntax error by calling if (!${left.resultTerm}) . Therefore I agree with [~zju_zsx] that the logic can be safely simplified. > And Operator has side effect when operands have udf > --- > > Key: FLINK-31848 > URL: https://issues.apache.org/jira/browse/FLINK-31848 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.13.2 >Reporter: zju_zsx >Priority: Major > Attachments: image-2023-04-19-14-54-46-458.png > > > > {code:java} > CREATE TABLE kafka_source ( > `content` varchar, > `testid` bigint, > `extra` int > ); > CREATE TABLE console_sink ( > `content` varchar, > `testid` bigint > ) > with ( > 'connector' = 'print' > ); > insert into console_sink > select > content,testid+1 > from kafka_source where testid is not null and testid > 0 and my_udf(testid) > != 0; {code} > my_udf has a constraint that the testid should not be null, but the testid is > not null and testid > 0 does not take effect. > > Im ScalarOperatorGens.generateAnd > !image-2023-04-19-14-54-46-458.png! > if left.nullTerm is true, right code will be execute 。 > it seems that > {code:java} > if (!${left.nullTerm} && !${left.resultTerm}) {code} > can be safely replaced with > {code:java} > if (!${left.resultTerm}){code} > ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31879) org.apache.avro.util.Utf8 cannot be serialized with avro when used in state
Feroze Daud created FLINK-31879: --- Summary: org.apache.avro.util.Utf8 cannot be serialized with avro when used in state Key: FLINK-31879 URL: https://issues.apache.org/jira/browse/FLINK-31879 Project: Flink Issue Type: Bug Components: API / Type Serialization System Reporter: Feroze Daud Scenario: Write a flink app that reads avro messages from a kafka topic. The avro pojos are generated with _org.apache.avro.util.Utf8_ type instead of _java.lang.String_ When this happens, Flink logs an error message as follows: {noformat} Class class org.apache.avro.util.Utf8 cannot be used as a POJO type because not all fields are valid POJO fields, and must be processed as GenericType. Please read the Flink documentation on "Data Types & Serialization" for details of the effect on performance. {noformat} This is problematic because `Utf8` is designed to be a fast serialized/deserialized type for Avro. But since it is not inheriting from SpecificRecordBase, it seems as if it gets handled by Kryo serializer. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31848) And Operator has side effect when operands have udf
[ https://issues.apache.org/jira/browse/FLINK-31848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715115#comment-17715115 ] Shuiqiang Chen commented on FLINK-31848: [~martijnvisser][~zju_zsx][~jark] Thanks for your reply. After ran several test cases based on release-1.15 (which I previously analyzed) and the latest 1.18-SNAPSHOT branch, it seems that the left.resultTerm in the left.code has always been assigned a boolean value and will never cause a syntax error by calling if (!${left.resultTerm}) . Therefore I agree with [~zju_zsx] that the logic can be safely simplified. > And Operator has side effect when operands have udf > --- > > Key: FLINK-31848 > URL: https://issues.apache.org/jira/browse/FLINK-31848 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.13.2 >Reporter: zju_zsx >Priority: Major > Attachments: image-2023-04-19-14-54-46-458.png > > > > {code:java} > CREATE TABLE kafka_source ( > `content` varchar, > `testid` bigint, > `extra` int > ); > CREATE TABLE console_sink ( > `content` varchar, > `testid` bigint > ) > with ( > 'connector' = 'print' > ); > insert into console_sink > select > content,testid+1 > from kafka_source where testid is not null and testid > 0 and my_udf(testid) > != 0; {code} > my_udf has a constraint that the testid should not be null, but the testid is > not null and testid > 0 does not take effect. > > Im ScalarOperatorGens.generateAnd > !image-2023-04-19-14-54-46-458.png! > if left.nullTerm is true, right code will be execute 。 > it seems that > {code:java} > if (!${left.nullTerm} && !${left.resultTerm}) {code} > can be safely replaced with > {code:java} > if (!${left.resultTerm}){code} > ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] reswqa commented on a diff in pull request #22424: [FLINK-31842][runtime] Calculate the utilization of the task executor only when it is using
reswqa commented on code in PR #22424: URL: https://github.com/apache/flink/pull/22424#discussion_r1174029242 ## flink-runtime/src/test/java/org/apache/flink/runtime/clusterframework/types/LocationPreferenceSlotSelectionStrategyTest.java: ## @@ -134,39 +139,57 @@ private Matcher withSlotInfo(SlotInf resourceProfile, Collections.singletonList(nonLocalTm)); Optional match = runMatching(slotProfile); -Assert.assertTrue(match.isPresent()); -final SlotSelectionStrategy.SlotInfoAndLocality slotInfoAndLocality = match.get(); -assertThat(candidates, hasItem(withSlotInfo(slotInfoAndLocality.getSlotInfo(; -assertThat(slotInfoAndLocality, hasLocality(Locality.NON_LOCAL)); +assertThat(match) +.hasValueSatisfying( +slotInfoAndLocality -> { +assertThat(slotInfoAndLocality.getLocality()) +.isEqualTo(Locality.NON_LOCAL); +assertThat(candidates) +.anySatisfy( +slotInfoAndResources -> + assertThat(slotInfoAndResources.getSlotInfo()) +.isEqualTo( + slotInfoAndLocality + .getSlotInfo())); +}); } @Test -public void matchPreferredLocation() { +void matchPreferredLocation() { SlotProfile slotProfile = SlotProfileTestingUtils.preferredLocality( biggerResourceProfile, Collections.singletonList(tml2)); Optional match = runMatching(slotProfile); -Assert.assertEquals(slotInfo2, match.get().getSlotInfo()); +assertThat(match) +.hasValueSatisfying( +slotInfoAndLocality -> + assertThat(slotInfoAndLocality.getSlotInfo()).isEqualTo(slotInfo2)); Review Comment: The same logic appears in these two test class many times, I'd prefer extracts a specific method to handle this assertion. ## flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotInfoWithUtilization.java: ## @@ -18,26 +18,37 @@ package org.apache.flink.runtime.jobmaster.slotpool; +import org.apache.flink.annotation.VisibleForTesting; import org.apache.flink.runtime.clusterframework.types.AllocationID; +import org.apache.flink.runtime.clusterframework.types.ResourceID; import org.apache.flink.runtime.clusterframework.types.ResourceProfile; import org.apache.flink.runtime.jobmaster.SlotInfo; import org.apache.flink.runtime.taskmanager.TaskManagerLocation; +import java.util.function.Function; + /** * Container for {@link SlotInfo} and the task executors utilization (freeSlots / * totalOfferedSlots). */ public final class SlotInfoWithUtilization implements SlotInfo { private final SlotInfo slotInfoDelegate; -private final double taskExecutorUtilization; +private final Function taskExecutorUtilizationLookup; -private SlotInfoWithUtilization(SlotInfo slotInfo, double taskExecutorUtilization) { +private SlotInfoWithUtilization( +SlotInfo slotInfo, Function taskExecutorUtilizationLookup) { this.slotInfoDelegate = slotInfo; -this.taskExecutorUtilization = taskExecutorUtilization; +this.taskExecutorUtilizationLookup = taskExecutorUtilizationLookup; } +@VisibleForTesting double getTaskExecutorUtilization() { -return taskExecutorUtilization; +return taskExecutorUtilizationLookup.apply( +slotInfoDelegate.getTaskManagerLocation().getResourceID()); +} Review Comment: It seems that this method has still not been removed. Have you just forgotten it or do you have other concerns 🤔 ## flink-runtime/src/test/java/org/apache/flink/runtime/clusterframework/types/LocationPreferenceSlotSelectionStrategyTest.java: ## @@ -63,67 +54,81 @@ public void testPhysicalSlotResourceProfileRespected() { Collections.emptySet()); Optional match = runMatching(slotProfile); -Assert.assertTrue( -match.get() -.getSlotInfo() -.getResourceProfile() - .isMatching(slotProfile.getPhysicalSlotResourceProfile())); +assertThat(match) +.hasValueSatisfying( +slotInfoAndLocality -> +assertThat( +slotInfoAndLocality +.getSlotInfo() +
[jira] [Updated] (FLINK-31873) Add setMaxParallelism to the DataStreamSink Class
[ https://issues.apache.org/jira/browse/FLINK-31873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Xiao updated FLINK-31873: -- Description: When turning on Flink reactive mode, it is suggested to convert all {{setParallelism}} calls to {{setMaxParallelism}} from [elastic scaling docs|https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/elastic_scaling/#configuration]. With the current implementation of the {{DataStreamSink}} class, only the {{[setParallelism|https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamSink.java#L172-L181]}} function of the {{[Transformation|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L248-L285]}} class is exposed - {{Transformation}} also has the {{[setMaxParallelism|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L277-L285]}} function which is not exposed. This means for any sink in the Flink pipeline, we cannot set a max parallelism. was: When turning on Flink reactive mode, it is suggested to convert all {{setParallelism}} calls to {{setMaxParallelism from }}[elastic scaling docs|https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/elastic_scaling/#configuration]. With the current implementation of the {{DataStreamSink}} class, only the {{[setParallelism|https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamSink.java#L172-L181]}} function of the {{[Transformation|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L248-L285]}} class is exposed - {{Transformation}} also has the {{[setMaxParallelism|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L277-L285]}} function which is not exposed. This means for any sink in the Flink pipeline, we cannot set a max parallelism. > Add setMaxParallelism to the DataStreamSink Class > - > > Key: FLINK-31873 > URL: https://issues.apache.org/jira/browse/FLINK-31873 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: Eric Xiao >Priority: Major > Attachments: Screenshot 2023-04-20 at 4.33.14 PM.png > > > When turning on Flink reactive mode, it is suggested to convert all > {{setParallelism}} calls to {{setMaxParallelism}} from [elastic scaling > docs|https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/elastic_scaling/#configuration]. > With the current implementation of the {{DataStreamSink}} class, only the > {{[setParallelism|https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamSink.java#L172-L181]}} > function of the > {{[Transformation|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L248-L285]}} > class is exposed - {{Transformation}} also has the > {{[setMaxParallelism|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L277-L285]}} > function which is not exposed. > > This means for any sink in the Flink pipeline, we cannot set a max > parallelism. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31828) List field in a POJO data stream results in table program compilation failure
[ https://issues.apache.org/jira/browse/FLINK-31828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715111#comment-17715111 ] Aitozi commented on FLINK-31828: Hi [~netvl] Thanks for this detailed bug report. I have reproduced your problem. And I also spend some time to dig the way to use RAW type to declare the List type in your case. I found that there's a bug in the cast rule (using the wrong serializer), so it will fail with EOF exception as you mentioned (hope it's the same error with you). I will prepare a PR to solve this bug > List field in a POJO data stream results in table program compilation failure > - > > Key: FLINK-31828 > URL: https://issues.apache.org/jira/browse/FLINK-31828 > Project: Flink > Issue Type: Bug > Components: Table SQL / Runtime >Affects Versions: 1.16.1 > Environment: Java 11 > Flink 1.16.1 >Reporter: Vladimir Matveev >Priority: Major > Attachments: MainPojo.java, generated-code.txt, stacktrace.txt > > > Suppose I have a POJO class like this: > {code:java} > public class Example { > private String key; > private List> values; > // getters, setters, equals+hashCode omitted > } > {code} > When a DataStream with this class is converted to a table, and some > operations are performed on it, it results in an exception which explicitly > says that I should file a ticket: > {noformat} > Caused by: org.apache.flink.api.common.InvalidProgramException: Table program > cannot be compiled. This is a bug. Please file an issue. > {noformat} > Please find the example Java code and the full stack trace attached. > From the exception and generated code it seems that Flink is upset with the > list field being treated as an array - but I cannot have an array type there > in the real code. > Also note that if I _don't_ specify the schema explicitly, it then maps the > {{values}} field to a `RAW('java.util.List', '...')` type, which also does > not work correctly and fails the job in case of even simplest operations like > printing. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31878) Fix the wrong name of PauseOrResumeSplitsTask#toString in connector fetcher
[ https://issues.apache.org/jira/browse/FLINK-31878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weijie Guo updated FLINK-31878: --- Priority: Minor (was: Major) > Fix the wrong name of PauseOrResumeSplitsTask#toString in connector fetcher > > > Key: FLINK-31878 > URL: https://issues.apache.org/jira/browse/FLINK-31878 > Project: Flink > Issue Type: Bug > Components: Connectors / Common >Affects Versions: 1.18.0 >Reporter: Yuxin Tan >Assignee: Yuxin Tan >Priority: Minor > Labels: pull-request-available > Fix For: 1.18.0 > > > The class name PauseOrResumeSplitsTask#toString is not right. Users will be > very confused when calling the toString method of the class. So we should fix > it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-31878) Fix the wrong name of PauseOrResumeSplitsTask#toString in connector fetcher
[ https://issues.apache.org/jira/browse/FLINK-31878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weijie Guo closed FLINK-31878. -- Fix Version/s: 1.18.0 Resolution: Fixed master(1.18) via 0104427dc9e38e898ba3865b499cc515004041c9. > Fix the wrong name of PauseOrResumeSplitsTask#toString in connector fetcher > > > Key: FLINK-31878 > URL: https://issues.apache.org/jira/browse/FLINK-31878 > Project: Flink > Issue Type: Bug > Components: Connectors / Common >Affects Versions: 1.18.0 >Reporter: Yuxin Tan >Assignee: Yuxin Tan >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > > The class name PauseOrResumeSplitsTask#toString is not right. Users will be > very confused when calling the toString method of the class. So we should fix > it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] reswqa merged pull request #22443: [FLINK-31878][connectors] Fix the wrong name of PauseOrResumeSplitsTask#toString
reswqa merged PR #22443: URL: https://github.com/apache/flink/pull/22443 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-31398) Don't wrap with TemporaryClassLoaderContext in OperationExecutor
[ https://issues.apache.org/jira/browse/FLINK-31398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weijie Guo closed FLINK-31398. -- Fix Version/s: 1.17.1 Resolution: Fixed > Don't wrap with TemporaryClassLoaderContext in OperationExecutor > > > Key: FLINK-31398 > URL: https://issues.apache.org/jira/browse/FLINK-31398 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive, Table SQL / Client >Reporter: luoyuxia >Assignee: Weijie Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0, 1.17.1 > > > Currently, method OperationExecutor#executeStatement in sql client will wrap > currently with ` > sessionContext.getSessionState().resourceManager.getUserClassLoader()`. > Actually, it's not necessary. What' worse, > it'll will cause the exception 'Trying to access closed classloader. Please > check if you store xxx' after quiting sql client. > The reason is in `ShutdownHookManager`, it will register a hook after jvm > shutdown. In `ShutdownHookManager`, it will > create `Configuration`. It will then access > `Thread.currentThread().getContextClassLoader()` which is > FlinkUserClassLoader, the FlinkUserClassLoader has been closed before. So, > it'll then cause `'Trying to access closed classloader` exception. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-31398) Don't wrap with TemporaryClassLoaderContext in OperationExecutor
[ https://issues.apache.org/jira/browse/FLINK-31398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714809#comment-17714809 ] Weijie Guo edited comment on FLINK-31398 at 4/21/23 5:17 PM: - master(1.18) via 921267bcd06c586d4fad28bbf37b4532593c9e3c. release-1.17 via d40d4dd26ac544307583477b6930e7af50330935. was (Author: weijie guo): master(1.18) via 921267bcd06c586d4fad28bbf37b4532593c9e3c. > Don't wrap with TemporaryClassLoaderContext in OperationExecutor > > > Key: FLINK-31398 > URL: https://issues.apache.org/jira/browse/FLINK-31398 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive, Table SQL / Client >Reporter: luoyuxia >Assignee: Weijie Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > > Currently, method OperationExecutor#executeStatement in sql client will wrap > currently with ` > sessionContext.getSessionState().resourceManager.getUserClassLoader()`. > Actually, it's not necessary. What' worse, > it'll will cause the exception 'Trying to access closed classloader. Please > check if you store xxx' after quiting sql client. > The reason is in `ShutdownHookManager`, it will register a hook after jvm > shutdown. In `ShutdownHookManager`, it will > create `Configuration`. It will then access > `Thread.currentThread().getContextClassLoader()` which is > FlinkUserClassLoader, the FlinkUserClassLoader has been closed before. So, > it'll then cause `'Trying to access closed classloader` exception. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] reswqa merged pull request #22444: [BP-1.17][FLINK-31398][sql-gateway] OperationExecutor is no longer set context classloader when executing statement.
reswqa merged PR #22444: URL: https://github.com/apache/flink/pull/22444 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-29692) Support early/late fires for Windowing TVFs
[ https://issues.apache.org/jira/browse/FLINK-29692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715100#comment-17715100 ] Chalres Tan commented on FLINK-29692: - re [~jark]: Thanks for the response. The example query you provided would yield the correct results, but from my understanding with window TVF, after the window expires then the state associated with that window will be cleared. With the example you provided, is it true that the state will just accumulate over time? Also, the approach with the query won't work by adjusting the use case slightly. For example, if instead of 1 hour tumbling windows they were 2 hour windows or if instead of using tumbling windows we had hopping windows with length 1 hour every 30 minutes. > Support early/late fires for Windowing TVFs > --- > > Key: FLINK-29692 > URL: https://issues.apache.org/jira/browse/FLINK-29692 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Planner >Affects Versions: 1.15.3 >Reporter: Canope Nerda >Priority: Major > > I have cases where I need to 1) output data as soon as possible and 2) handle > late arriving data to achieve eventual correctness. In the logic, I need to > do window deduplication which is based on Windowing TVFs and according to > source code, early/late fires are not supported yet in Windowing TVFs. > Actually 1) contradicts with 2). Without early/late fires, we had to > compromise, either live with fresh incorrect data or tolerate excess latency > for correctness. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-24998) SIGSEGV in Kryo / C2 CompilerThread on Java 17
[ https://issues.apache.org/jira/browse/FLINK-24998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715086#comment-17715086 ] Chesnay Schepler edited comment on FLINK-24998 at 4/21/23 4:42 PM: --- [~schoeneu] That is a good find. I _think_ I used JDK 17.0.1 during my tests, which _may_ fit the ticket? I will look into this again next week. was (Author: zentol): [~schoeneu] That is a good find. I _think_ I used JDK 17.0.1 during my tests, which would fit the ticket? I will look into this again next week. > SIGSEGV in Kryo / C2 CompilerThread on Java 17 > -- > > Key: FLINK-24998 > URL: https://issues.apache.org/jira/browse/FLINK-24998 > Project: Flink > Issue Type: Sub-task > Components: API / Type Serialization System >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Blocker > Attachments: -__w-1-s-flink-tests-target-hs_err_pid470059.log > > > While running our tests on CI with Java 17 they failed infrequently with a > SIGSEGV error. > All occurrences were related to Kryo and happened in the C2 CompilerThread. > {code:java} > Current thread (0x7f1394165c00): JavaThread "C2 CompilerThread0" daemon > [_thread_in_native, id=470077, stack(0x7f1374361000,0x7f1374462000)] > Current CompileTask: > C2: 14251 6333 4 com.esotericsoftware.kryo.io.Input::readString > (127 bytes) > {code} > The full error is attached to the ticket. -I can also provide the core dump > if needed.- -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-24998) SIGSEGV in Kryo / C2 CompilerThread on Java 17
[ https://issues.apache.org/jira/browse/FLINK-24998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715086#comment-17715086 ] Chesnay Schepler commented on FLINK-24998: -- [~schoeneu] That is a good find. I _think_ I used JDK 17.0.1 during my tests, which would fit the ticket? I will look into this again next week. > SIGSEGV in Kryo / C2 CompilerThread on Java 17 > -- > > Key: FLINK-24998 > URL: https://issues.apache.org/jira/browse/FLINK-24998 > Project: Flink > Issue Type: Sub-task > Components: API / Type Serialization System >Reporter: Chesnay Schepler >Priority: Blocker > Attachments: -__w-1-s-flink-tests-target-hs_err_pid470059.log > > > While running our tests on CI with Java 17 they failed infrequently with a > SIGSEGV error. > All occurrences were related to Kryo and happened in the C2 CompilerThread. > {code:java} > Current thread (0x7f1394165c00): JavaThread "C2 CompilerThread0" daemon > [_thread_in_native, id=470077, stack(0x7f1374361000,0x7f1374462000)] > Current CompileTask: > C2: 14251 6333 4 com.esotericsoftware.kryo.io.Input::readString > (127 bytes) > {code} > The full error is attached to the ticket. -I can also provide the core dump > if needed.- -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-24998) SIGSEGV in Kryo / C2 CompilerThread on Java 17
[ https://issues.apache.org/jira/browse/FLINK-24998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler reassigned FLINK-24998: Assignee: Chesnay Schepler > SIGSEGV in Kryo / C2 CompilerThread on Java 17 > -- > > Key: FLINK-24998 > URL: https://issues.apache.org/jira/browse/FLINK-24998 > Project: Flink > Issue Type: Sub-task > Components: API / Type Serialization System >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Blocker > Attachments: -__w-1-s-flink-tests-target-hs_err_pid470059.log > > > While running our tests on CI with Java 17 they failed infrequently with a > SIGSEGV error. > All occurrences were related to Kryo and happened in the C2 CompilerThread. > {code:java} > Current thread (0x7f1394165c00): JavaThread "C2 CompilerThread0" daemon > [_thread_in_native, id=470077, stack(0x7f1374361000,0x7f1374462000)] > Current CompileTask: > C2: 14251 6333 4 com.esotericsoftware.kryo.io.Input::readString > (127 bytes) > {code} > The full error is attached to the ticket. -I can also provide the core dump > if needed.- -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31827) Incorrect estimation of the target data rate of a vertex when only a subset of its upstream vertex's output is consumed
[ https://issues.apache.org/jira/browse/FLINK-31827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715078#comment-17715078 ] Gyula Fora commented on FLINK-31827: The first fix on the operator side is merged for this: ff4bbd2a7bbed4ba0b1443d53731c883a230b6d4 This covers most cases without additional Flink metrics. Will folow up with the optional flink metrics. > Incorrect estimation of the target data rate of a vertex when only a subset > of its upstream vertex's output is consumed > --- > > Key: FLINK-31827 > URL: https://issues.apache.org/jira/browse/FLINK-31827 > Project: Flink > Issue Type: Bug > Components: Autoscaler >Reporter: Zhanghao Chen >Assignee: Gyula Fora >Priority: Major > Labels: pull-request-available > Attachments: image-2023-04-17-23-37-35-280.png > > > Currently, the target data rate of a vertex = SUM(target data rate * > input/output ratio) for all of its upstream vertices. This assumes that all > output records of an upstream vertex is consumed by the downstream vertex. > However, it does not always hold. Consider the following job plan generated > by a Flink SQL job. The middle vertex contains multiple chained Calc(select > xx) operators, each connecting to a separate downstream sink tasks. As a > result, each sink task only consumes a sub-portion of the middle vertex's > output. > To fix it, we need operator level edge info to infer the upstream-downstream > relationship as well as operator level output metrics. The metrics part is > easy but AFAIK, there's no way to get the operator level edge info from the > Flink REST API yet. > !image-2023-04-17-23-37-35-280.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-kubernetes-operator] gyfora merged pull request #571: [FLINK-31827] Compute output ratio per edge
gyfora merged PR #571: URL: https://github.com/apache/flink-kubernetes-operator/pull/571 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] XComp commented on a diff in pull request #22422: [FLINK-31838][runtime] Moves thread handling for leader event listener calls from DefaultMultipleComponentLeaderElectionService into
XComp commented on code in PR #22422: URL: https://github.com/apache/flink/pull/22422#discussion_r1173943821 ## flink-runtime/src/main/java/org/apache/flink/runtime/leaderelection/LeaderElectionEventHandler.java: ## @@ -39,12 +40,29 @@ public interface LeaderElectionEventHandler { */ void onGrantLeadership(UUID newLeaderSessionId); +/** + * Called by specific {@link LeaderElectionDriver} when the leadership is granted. + * + * This method will trigger the grant event processing in a separate thread. + * + * @param newLeaderSessionId the valid leader session id + */ +CompletableFuture onGrantLeadershipAsync(UUID newLeaderSessionId); Review Comment: How would we determine that we're running in the main thread? We would need to get a reference to the owner of `DefaultLeaderElectionService`, wouldn't we? That way we would be able to check whether the call is run in the main thread of the owner. But this would closer coupling of the owner and the `DefaultLeaderElectionService`. Or do you have something else in mind which I'm missing? :thinking: My understanding is that we have the following requirements: * `onGrantLeadership`, `onRevokeLeadership` and `onLeaderInformationChanged` should be called in a single thread to ensure sequential execution. * the methods shouldn't be called in the main thread of the `LeaderElectionService`'s owner. The k8s and ZK implementation do this implicitly through their event thread handling. The `DefaultMultipleComponentLeaderElectionService` doesn't do that (anymore). But that's an implementation detail of the service. That's why I thought that it's also the `DefaultMultipleComponentLeaderElectionService`'s responsibility to call the event processing functions asynchronously. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] XComp commented on a diff in pull request #22422: [FLINK-31838][runtime] Moves thread handling for leader event listener calls from DefaultMultipleComponentLeaderElectionService into
XComp commented on code in PR #22422: URL: https://github.com/apache/flink/pull/22422#discussion_r1173929303 ## flink-runtime/src/test/java/org/apache/flink/runtime/leaderelection/DefaultLeaderElectionServiceTest.java: ## @@ -130,22 +137,101 @@ void testLeaderInformationChangedAndShouldBeCorrected() throws Exception { } @Test -void testHasLeadership() throws Exception { +void testHasLeadershipWithLeadershipButNoGrantEventProcessed() throws Exception { new Context() { { -runTest( -() -> { -testingLeaderElectionDriver.isLeader(); -final UUID currentLeaderSessionId = -leaderElectionService.getLeaderSessionID(); -assertThat(currentLeaderSessionId).isNotNull(); - assertThat(leaderElectionService.hasLeadership(currentLeaderSessionId)) +runTestWithManuallyTriggeredEvents( +executorService -> { +final UUID expectedSessionID = UUID.randomUUID(); + + testingLeaderElectionDriver.isLeader(expectedSessionID); + + assertThat(leaderElectionService.hasLeadership(expectedSessionID)) +.isFalse(); + assertThat(leaderElectionService.hasLeadership(UUID.randomUUID())) +.isFalse(); +}); +} +}; +} + +@Test +void testHasLeadershipWithLeadershipAndGrantEventProcessed() throws Exception { +new Context() { +{ +runTestWithManuallyTriggeredEvents( +executorService -> { +final UUID expectedSessionID = UUID.randomUUID(); + + testingLeaderElectionDriver.isLeader(expectedSessionID); +executorService.trigger(); + + assertThat(leaderElectionService.hasLeadership(expectedSessionID)) .isTrue(); assertThat(leaderElectionService.hasLeadership(UUID.randomUUID())) .isFalse(); +}); +} +}; +} + +@Test +void testHasLeadershipWithLeadershipLostButNoRevokeEventProcessed() throws Exception { +new Context() { +{ +runTestWithManuallyTriggeredEvents( +executorService -> { +final UUID expectedSessionID = UUID.randomUUID(); + + testingLeaderElectionDriver.isLeader(expectedSessionID); +executorService.trigger(); +testingLeaderElectionDriver.notLeader(); + + assertThat(leaderElectionService.hasLeadership(expectedSessionID)) +.isFalse(); Review Comment: I was puzzled about that one as well at the start. But it makes sense: The HA backend (i.e. the driver) holds the ground truth of leadership. If the HA backend indicates the leadership loss, no operation should be executed that relies on `hasLeadership(UUID)` because another leader could have picked up in the meantime already. The `onRevokeLeadership` call that follows is used for cleaning up the component's leadership state only. Does that make sense? I could add a description to that assert to avoid other's wonder the same when reading the test code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-31866) Autoscaler metric trimming reduces the number of metric observations on recovery
[ https://issues.apache.org/jira/browse/FLINK-31866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-31866: --- Labels: pull-request-available (was: ) > Autoscaler metric trimming reduces the number of metric observations on > recovery > > > Key: FLINK-31866 > URL: https://issues.apache.org/jira/browse/FLINK-31866 > Project: Flink > Issue Type: Bug > Components: Autoscaler, Kubernetes Operator >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-1.5.0 > > > The autoscaler uses a ConfigMap to store past metric observations which is > used to re-initialize the autoscaler state in case of failures or upgrades. > Whenever trimming of the ConfigMap occurs, we need to make sure we also > update the timestamp for the start of the metric collection, so any removed > observations can be compensated with by collecting new ones. If we don't do > this, the metric window will effectively shrink due to removing observations. > This can lead to triggering scaling decisions when the operator gets > redeployed due to the removed items. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-kubernetes-operator] mxm opened a new pull request, #573: [FLINK-31866] Start metric window with timestamp of first observation
mxm opened a new pull request, #573: URL: https://github.com/apache/flink-kubernetes-operator/pull/573 The autoscaler uses a ConfigMap to store past metric observations which is used to re-initialize the autoscaler state in case of failures or upgrades. Whenever trimming of the ConfigMap occurs, we need to make sure we also update the timestamp for the start of the metric collection, so any removed observations can be compensated with by collecting new ones. If we don't do this, the metric window will effectively shrink due to removing observations. This can lead to triggering scaling decisions when the operator gets redeployed due to the removed items. The solution we are opting here is to treat the first metric observation timestamp as the start of the metric collection. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] fredia commented on a diff in pull request #22257: [FLINK-31593][tests] Upgraded migration test data
fredia commented on code in PR #22257: URL: https://github.com/apache/flink/pull/22257#discussion_r1173259129 ## flink-tests/src/test/resources/new-stateful-broadcast-udf-migration-itcase-flink1.17-rocksdb-checkpoint/0193581b-bd07-4b9d-b5d3-a6c986164fa4: ## @@ -0,0 +1,362 @@ +# This is a RocksDB option file. +# +# For detailed file format spec, please refer to the example file +# in examples/rocksdb_option_file_example.ini +# Review Comment: @XComp Yes, rocksdb config file is usually generated when creating checkpoints. Rocksdb checkpoint/savepoint-native is added after https://issues.apache.org/jira/browse/FLINK-26146, but some rocksdb configuration does not take effect, for some tests no rocksdb config file are generated, see https://issues.apache.org/jira/browse/FLINK-26176. The following are some existing checkpoint/native-savepoint files related to rocksdb. We can see that there were rocksdb config files in the past. - [ ] new-stateful-broadcast-udf-migration-itcase-flink1.15-rocksdb-checkpoint - [ ] new-stateful-broadcast-udf-migration-itcase-flink1.16-rocksdb-checkpoint - [x] new-stateful-udf-migration-itcase-flink1.15-rocksdb-checkpoint - [x] [new-stateful-udf-migration-itcase-flink1.16-rocksdb-checkpoint ](https://github.com/apache/flink/blob/master/flink-tests/src/test/resources/new-stateful-udf-migration-itcase-flink1.16-rocksdb-checkpoint/0e40bd9a-a3b3-4d71-9d32-dfaacba11244) - [ ] stateful-scala-udf-migration-itcase-flink1.15-rocksdb-checkpoint - [ ] stateful-scala-udf-migration-itcase-flink1.16-rocksdb-checkpoint - [ ] stateful-scala-with-broadcast-udf-migration-itcase-flink1.15-rocksdb-checkpoint - [x] stateful-scala-with-broadcast-udf-migration-itcase-flink1.16-rocksdb-checkpoint - [x] type-serializer-snapshot-migration-itcase-flink1.16-rocksdb-checkpoint - [ ] new-stateful-broadcast-udf-migration-itcase-flink1.15-rocksdb-savepoint-native - [x] new-stateful-broadcast-udf-migration-itcase-flink1.16-rocksdb-savepoint-native - [ ] new-stateful-udf-migration-itcase-flink1.15-rocksdb-savepoint-native - [x] new-stateful-udf-migration-itcase-flink1.16-rocksdb-savepoint-native - [ ] stateful-scala-udf-migration-itcase-flink1.15-rocksdb-savepoint-native - [x] stateful-scala-udf-migration-itcase-flink1.16-rocksdb-savepoint-native - [ ] stateful-scala-with-broadcast-udf-migration-itcase-flink1.15-rocksdb-savepoint-native - [x] stateful-scala-with-broadcast-udf-migration-itcase-flink1.16-rocksdb-savepoint-native - [x] type-serializer-snapshot-migration-itcase-flink1.16-rocksdb-savepoint-native ## flink-tests/src/test/resources/new-stateful-broadcast-udf-migration-itcase-flink1.17-rocksdb-checkpoint/0193581b-bd07-4b9d-b5d3-a6c986164fa4: ## @@ -0,0 +1,362 @@ +# This is a RocksDB option file. +# +# For detailed file format spec, please refer to the example file +# in examples/rocksdb_option_file_example.ini +# Review Comment: @XComp Yes, rocksdb config file is usually generated when creating checkpoints. Rocksdb checkpoint/savepoint-native is added after https://issues.apache.org/jira/browse/FLINK-26146, but some rocksdb configuration does not take effect, for some tests no rocksdb config file are generated, see https://issues.apache.org/jira/browse/FLINK-26176. The following are some existing checkpoint/native-savepoint files related to rocksdb. We can see that there were rocksdb config files in the past. - [ ] new-stateful-broadcast-udf-migration-itcase-flink1.15-rocksdb-checkpoint - [ ] new-stateful-broadcast-udf-migration-itcase-flink1.16-rocksdb-checkpoint - [ ] new-stateful-udf-migration-itcase-flink1.15-rocksdb-checkpoint - [x] [new-stateful-udf-migration-itcase-flink1.16-rocksdb-checkpoint ](https://github.com/apache/flink/blob/master/flink-tests/src/test/resources/new-stateful-udf-migration-itcase-flink1.16-rocksdb-checkpoint/0e40bd9a-a3b3-4d71-9d32-dfaacba11244) - [ ] stateful-scala-udf-migration-itcase-flink1.15-rocksdb-checkpoint - [ ] stateful-scala-udf-migration-itcase-flink1.16-rocksdb-checkpoint - [ ] stateful-scala-with-broadcast-udf-migration-itcase-flink1.15-rocksdb-checkpoint - [x] stateful-scala-with-broadcast-udf-migration-itcase-flink1.16-rocksdb-checkpoint - [x] type-serializer-snapshot-migration-itcase-flink1.16-rocksdb-checkpoint - [ ] new-stateful-broadcast-udf-migration-itcase-flink1.15-rocksdb-savepoint-native - [x] new-stateful-broadcast-udf-migration-itcase-flink1.16-rocksdb-savepoint-native - [ ] new-stateful-udf-migration-itcase-flink1.15-rocksdb-savepoint-native - [x] new-stateful-udf-migration-itcase-flink1.16-rocksdb-savepoint-native - [ ] stateful-scala-udf-migration-itcase-flink1.15-rocksdb-savepoint-native - [x] stateful-scala-udf-migration-itcase-flink1.16-rocksdb-savepoint-native - [ ] stateful-scala-with-broad
[GitHub] [flink] liuyongvs commented on pull request #22450: [DOC]: Remove duplicate sentence in docker deployment documentation
liuyongvs commented on PR #22450: URL: https://github.com/apache/flink/pull/22450#issuecomment-1517949017 LGTM +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-31866) Autoscaler metric trimming reduces the number of metric observations on recovery
[ https://issues.apache.org/jira/browse/FLINK-31866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels updated FLINK-31866: --- Summary: Autoscaler metric trimming reduces the number of metric observations on recovery (was: Autoscaler metric trimming reduces the numbet of metric observations on recovery) > Autoscaler metric trimming reduces the number of metric observations on > recovery > > > Key: FLINK-31866 > URL: https://issues.apache.org/jira/browse/FLINK-31866 > Project: Flink > Issue Type: Bug > Components: Autoscaler, Kubernetes Operator >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: kubernetes-operator-1.5.0 > > > The autoscaler uses a ConfigMap to store past metric observations which is > used to re-initialize the autoscaler state in case of failures or upgrades. > Whenever trimming of the ConfigMap occurs, we need to make sure we also > update the timestamp for the start of the metric collection, so any removed > observations can be compensated with by collecting new ones. If we don't do > this, the metric window will effectively shrink due to removing observations. > This can lead to triggering scaling decisions when the operator gets > redeployed due to the removed items. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-24998) SIGSEGV in Kryo / C2 CompilerThread on Java 17
[ https://issues.apache.org/jira/browse/FLINK-24998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715038#comment-17715038 ] Urs Schoenenberger commented on FLINK-24998: [~chesnay] is this issue still reproducible with newer builds of Java 17? I'm asking because it looks suspiciously like JDK bug [https://bugs.openjdk.org/browse/JDK-8277529] . The comments in there say "It turned out that we are updating the control input of a data node directly to an ArrayCopy node when splitting a load through a region [...] This is illegal [...]" which sounds like it might apply to kryo's Input::readString. This makes me a little hopeful that current build may have resolved this, thereby decoupling the need to update Kryo from the Java 17 enabling efforts? > SIGSEGV in Kryo / C2 CompilerThread on Java 17 > -- > > Key: FLINK-24998 > URL: https://issues.apache.org/jira/browse/FLINK-24998 > Project: Flink > Issue Type: Sub-task > Components: API / Type Serialization System >Reporter: Chesnay Schepler >Priority: Blocker > Attachments: -__w-1-s-flink-tests-target-hs_err_pid470059.log > > > While running our tests on CI with Java 17 they failed infrequently with a > SIGSEGV error. > All occurrences were related to Kryo and happened in the C2 CompilerThread. > {code:java} > Current thread (0x7f1394165c00): JavaThread "C2 CompilerThread0" daemon > [_thread_in_native, id=470077, stack(0x7f1374361000,0x7f1374462000)] > Current CompileTask: > C2: 14251 6333 4 com.esotericsoftware.kryo.io.Input::readString > (127 bytes) > {code} > The full error is attached to the ticket. -I can also provide the core dump > if needed.- -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31866) Autoscaler metric trimming reduces the numbet of metric observations on recovery
[ https://issues.apache.org/jira/browse/FLINK-31866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels updated FLINK-31866: --- Summary: Autoscaler metric trimming reduces the numbet of metric observations on recovery (was: Autoscaler metric trimming reduces the numbet of metric observations) > Autoscaler metric trimming reduces the numbet of metric observations on > recovery > > > Key: FLINK-31866 > URL: https://issues.apache.org/jira/browse/FLINK-31866 > Project: Flink > Issue Type: Bug > Components: Autoscaler, Kubernetes Operator >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: kubernetes-operator-1.5.0 > > > The autoscaler uses a ConfigMap to store past metric observations which is > used to re-initialize the autoscaler state in case of failures or upgrades. > Whenever trimming of the ConfigMap occurs, we need to make sure we also > update the timestamp for the start of the metric collection, so any removed > observations can be compensated with by collecting new ones. If we don't do > this, the metric window will effectively shrink due to removing observations. > This can lead to triggering scaling decisions when the operator gets > redeployed due to the removed items. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31873) Add setMaxParallelism to the DataStreamSink Class
[ https://issues.apache.org/jira/browse/FLINK-31873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Xiao updated FLINK-31873: -- Description: When turning on Flink reactive mode, it is suggested to convert all {{setParallelism}} calls to {{setMaxParallelism from }}[elastic scaling docs|https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/elastic_scaling/#configuration]. With the current implementation of the {{DataStreamSink}} class, only the {{[setParallelism|https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamSink.java#L172-L181]}} function of the {{[Transformation|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L248-L285]}} class is exposed - {{Transformation}} also has the {{[setMaxParallelism|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L277-L285]}} function which is not exposed. This means for any sink in the Flink pipeline, we cannot set a max parallelism. was: When turning on Flink reactive mode, it is suggested to convert all {{setParallelism}} calls to {{setMaxParallelism from the }}[elastic scaling docs|https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/elastic_scaling/#configuration]. With the current implementation of the {{DataStreamSink}} class, only the {{[setParallelism|https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamSink.java#L172-L181]}} function of the {{[Transformation|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L248-L285]}} class is exposed - {{Transformation}} also has the {{[setMaxParallelism|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L277-L285]}} function which is not exposed. > Add setMaxParallelism to the DataStreamSink Class > - > > Key: FLINK-31873 > URL: https://issues.apache.org/jira/browse/FLINK-31873 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: Eric Xiao >Priority: Major > Attachments: Screenshot 2023-04-20 at 4.33.14 PM.png > > > When turning on Flink reactive mode, it is suggested to convert all > {{setParallelism}} calls to {{setMaxParallelism from }}[elastic scaling > docs|https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/elastic_scaling/#configuration]. > With the current implementation of the {{DataStreamSink}} class, only the > {{[setParallelism|https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamSink.java#L172-L181]}} > function of the > {{[Transformation|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L248-L285]}} > class is exposed - {{Transformation}} also has the > {{[setMaxParallelism|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L277-L285]}} > function which is not exposed. > > This means for any sink in the Flink pipeline, we cannot set a max > parallelism. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] zentol commented on a diff in pull request #22422: [FLINK-31838][runtime] Moves thread handling for leader event listener calls from DefaultMultipleComponentLeaderElectionService into
zentol commented on code in PR #22422: URL: https://github.com/apache/flink/pull/22422#discussion_r1173821559 ## flink-runtime/src/main/java/org/apache/flink/runtime/leaderelection/LeaderElectionEventHandler.java: ## @@ -39,12 +40,29 @@ public interface LeaderElectionEventHandler { */ void onGrantLeadership(UUID newLeaderSessionId); +/** + * Called by specific {@link LeaderElectionDriver} when the leadership is granted. + * + * This method will trigger the grant event processing in a separate thread. + * + * @param newLeaderSessionId the valid leader session id + */ +CompletableFuture onGrantLeadershipAsync(UUID newLeaderSessionId); Review Comment: Why can't it be an implementation detail of the `DefaultLeaderElectionService` _now_? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-31873) Add setMaxParallelism to the DataStreamSink Class
[ https://issues.apache.org/jira/browse/FLINK-31873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715028#comment-17715028 ] Eric Xiao commented on FLINK-31873: --- Thanks [~martijnvisser] and [~luoyuxia], I will start with opening up a thread in the Dev mailing list before making a FLIP :). > Add setMaxParallelism to the DataStreamSink Class > - > > Key: FLINK-31873 > URL: https://issues.apache.org/jira/browse/FLINK-31873 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: Eric Xiao >Priority: Major > Attachments: Screenshot 2023-04-20 at 4.33.14 PM.png > > > When turning on Flink reactive mode, it is suggested to convert all > {{setParallelism}} calls to {{setMaxParallelism from the }}[elastic scaling > docs|https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/elastic_scaling/#configuration]. > With the current implementation of the {{DataStreamSink}} class, only the > {{[setParallelism|https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamSink.java#L172-L181]}} > function of the > {{[Transformation|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L248-L285]}} > class is exposed - {{Transformation}} also has the > {{[setMaxParallelism|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L277-L285]}} > function which is not exposed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] zentol commented on a diff in pull request #22422: [FLINK-31838][runtime] Moves thread handling for leader event listener calls from DefaultMultipleComponentLeaderElectionService into
zentol commented on code in PR #22422: URL: https://github.com/apache/flink/pull/22422#discussion_r1173807106 ## flink-runtime/src/test/java/org/apache/flink/runtime/leaderelection/DefaultLeaderElectionServiceTest.java: ## @@ -130,22 +137,101 @@ void testLeaderInformationChangedAndShouldBeCorrected() throws Exception { } @Test -void testHasLeadership() throws Exception { +void testHasLeadershipWithLeadershipButNoGrantEventProcessed() throws Exception { new Context() { { -runTest( -() -> { -testingLeaderElectionDriver.isLeader(); -final UUID currentLeaderSessionId = -leaderElectionService.getLeaderSessionID(); -assertThat(currentLeaderSessionId).isNotNull(); - assertThat(leaderElectionService.hasLeadership(currentLeaderSessionId)) +runTestWithManuallyTriggeredEvents( +executorService -> { +final UUID expectedSessionID = UUID.randomUUID(); + + testingLeaderElectionDriver.isLeader(expectedSessionID); + + assertThat(leaderElectionService.hasLeadership(expectedSessionID)) +.isFalse(); + assertThat(leaderElectionService.hasLeadership(UUID.randomUUID())) +.isFalse(); +}); +} +}; +} + +@Test +void testHasLeadershipWithLeadershipAndGrantEventProcessed() throws Exception { +new Context() { +{ +runTestWithManuallyTriggeredEvents( +executorService -> { +final UUID expectedSessionID = UUID.randomUUID(); + + testingLeaderElectionDriver.isLeader(expectedSessionID); +executorService.trigger(); + + assertThat(leaderElectionService.hasLeadership(expectedSessionID)) .isTrue(); assertThat(leaderElectionService.hasLeadership(UUID.randomUUID())) .isFalse(); +}); +} +}; +} + +@Test +void testHasLeadershipWithLeadershipLostButNoRevokeEventProcessed() throws Exception { +new Context() { +{ +runTestWithManuallyTriggeredEvents( +executorService -> { +final UUID expectedSessionID = UUID.randomUUID(); + + testingLeaderElectionDriver.isLeader(expectedSessionID); +executorService.trigger(); +testingLeaderElectionDriver.notLeader(); + + assertThat(leaderElectionService.hasLeadership(expectedSessionID)) +.isFalse(); Review Comment: This seems strange, shouldnt it be `true`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] XComp commented on a diff in pull request #22422: [FLINK-31838][runtime] Moves thread handling for leader event listener calls from DefaultMultipleComponentLeaderElectionService into
XComp commented on code in PR #22422: URL: https://github.com/apache/flink/pull/22422#discussion_r1173806764 ## flink-runtime/src/main/java/org/apache/flink/runtime/leaderelection/LeaderElectionEventHandler.java: ## @@ -39,12 +40,29 @@ public interface LeaderElectionEventHandler { */ void onGrantLeadership(UUID newLeaderSessionId); +/** + * Called by specific {@link LeaderElectionDriver} when the leadership is granted. + * + * This method will trigger the grant event processing in a separate thread. + * + * @param newLeaderSessionId the valid leader session id + */ +CompletableFuture onGrantLeadershipAsync(UUID newLeaderSessionId); Review Comment: Considering that, adding deprecation annotations here would make sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] XComp commented on a diff in pull request #22422: [FLINK-31838][runtime] Moves thread handling for leader event listener calls from DefaultMultipleComponentLeaderElectionService into
XComp commented on code in PR #22422: URL: https://github.com/apache/flink/pull/22422#discussion_r1173805207 ## flink-runtime/src/main/java/org/apache/flink/runtime/leaderelection/LeaderElectionEventHandler.java: ## @@ -39,12 +40,29 @@ public interface LeaderElectionEventHandler { */ void onGrantLeadership(UUID newLeaderSessionId); +/** + * Called by specific {@link LeaderElectionDriver} when the leadership is granted. + * + * This method will trigger the grant event processing in a separate thread. + * + * @param newLeaderSessionId the valid leader session id + */ +CompletableFuture onGrantLeadershipAsync(UUID newLeaderSessionId); Review Comment: In the long run, we could remove the async calls again from the interface. The driver implementations only need the synchronous methods because their event handling lives in a separate thread already. The asynchronous execution would become an implementation detail of `DefaultLeaderElectionService` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] XComp commented on a diff in pull request #22422: [FLINK-31838][runtime] Moves thread handling for leader event listener calls from DefaultMultipleComponentLeaderElectionService into
XComp commented on code in PR #22422: URL: https://github.com/apache/flink/pull/22422#discussion_r1173805207 ## flink-runtime/src/main/java/org/apache/flink/runtime/leaderelection/LeaderElectionEventHandler.java: ## @@ -39,12 +40,29 @@ public interface LeaderElectionEventHandler { */ void onGrantLeadership(UUID newLeaderSessionId); +/** + * Called by specific {@link LeaderElectionDriver} when the leadership is granted. + * + * This method will trigger the grant event processing in a separate thread. + * + * @param newLeaderSessionId the valid leader session id + */ +CompletableFuture onGrantLeadershipAsync(UUID newLeaderSessionId); Review Comment: In the long run, we could remove the async calls again from the interface. The driver implementations only need the synchronous methods only because their event handling lives in a separate thread already. The asynchronous execution would become an implementation detail of `DefaultLeaderElectionService` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] zentol commented on a diff in pull request #22422: [FLINK-31838][runtime] Moves thread handling for leader event listener calls from DefaultMultipleComponentLeaderElectionService into
zentol commented on code in PR #22422: URL: https://github.com/apache/flink/pull/22422#discussion_r1173801508 ## flink-runtime/src/main/java/org/apache/flink/runtime/leaderelection/LeaderElectionEventHandler.java: ## @@ -39,12 +40,29 @@ public interface LeaderElectionEventHandler { */ void onGrantLeadership(UUID newLeaderSessionId); +/** + * Called by specific {@link LeaderElectionDriver} when the leadership is granted. + * + * This method will trigger the grant event processing in a separate thread. + * + * @param newLeaderSessionId the valid leader session id + */ +CompletableFuture onGrantLeadershipAsync(UUID newLeaderSessionId); Review Comment: What's the long-term plan here; will we keep both variants and everything has to support both? Or will we drop the sync variants once we remove legacy stuff? Can we enforce somehow that only a specific variant is called for a given implementation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] zentol commented on a diff in pull request #22422: [FLINK-31838][runtime] Moves thread handling for leader event listener calls from DefaultMultipleComponentLeaderElectionService into
zentol commented on code in PR #22422: URL: https://github.com/apache/flink/pull/22422#discussion_r1173801508 ## flink-runtime/src/main/java/org/apache/flink/runtime/leaderelection/LeaderElectionEventHandler.java: ## @@ -39,12 +40,29 @@ public interface LeaderElectionEventHandler { */ void onGrantLeadership(UUID newLeaderSessionId); +/** + * Called by specific {@link LeaderElectionDriver} when the leadership is granted. + * + * This method will trigger the grant event processing in a separate thread. + * + * @param newLeaderSessionId the valid leader session id + */ +CompletableFuture onGrantLeadershipAsync(UUID newLeaderSessionId); Review Comment: What's the long-term plan here; will we keep both variants and everything has to support both? Or will we drop the sync variants once we remove legacy stuff? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] snuyanzin commented on a diff in pull request #22257: [FLINK-31593][tests] Upgraded migration test data
snuyanzin commented on code in PR #22257: URL: https://github.com/apache/flink/pull/22257#discussion_r117377 ## flink-tests/src/test/resources/new-stateful-broadcast-udf-migration-itcase-flink1.17-rocksdb-checkpoint/0193581b-bd07-4b9d-b5d3-a6c986164fa4: ## @@ -0,0 +1,362 @@ +# This is a RocksDB option file. +# +# For detailed file format spec, please refer to the example file +# in examples/rocksdb_option_file_example.ini +# Review Comment: thanks for clarification @fredia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] XComp commented on a diff in pull request #22257: [FLINK-31593][tests] Upgraded migration test data
XComp commented on code in PR #22257: URL: https://github.com/apache/flink/pull/22257#discussion_r1173766452 ## flink-tests/src/test/resources/new-stateful-broadcast-udf-migration-itcase-flink1.17-rocksdb-checkpoint/0193581b-bd07-4b9d-b5d3-a6c986164fa4: ## @@ -0,0 +1,362 @@ +# This is a RocksDB option file. +# +# For detailed file format spec, please refer to the example file +# in examples/rocksdb_option_file_example.ini +# Review Comment: Thanks for clarification. @snuyanzin can we finalize the PR in that case? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (FLINK-30651) Move utility methods to CatalogTest and remove CatalogTestUtils class
[ https://issues.apache.org/jira/browse/FLINK-30651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samrat Deb resolved FLINK-30651. Resolution: Invalid > Move utility methods to CatalogTest and remove CatalogTestUtils class > -- > > Key: FLINK-30651 > URL: https://issues.apache.org/jira/browse/FLINK-30651 > Project: Flink > Issue Type: Technical Debt > Components: Table SQL / API, Tests >Reporter: Samrat Deb >Assignee: Samrat Deb >Priority: Minor > Labels: pull-request-available > Fix For: 1.18.0 > > > [CatalogTestUtils|https://github.com/apache/flink/blame/master/flink-table/flink-table-common/src/test/java/org/apache/flink/table/catalog/CatalogTestUtil.java#L43] > class contains static utilities function. This functions/ methods can be > moved to CatalogTest class and make code-flow easier to understand. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] flinkbot commented on pull request #22450: [DOC]: Remove duplicate sentence in docker deployment documentation
flinkbot commented on PR #22450: URL: https://github.com/apache/flink/pull/22450#issuecomment-1517826614 ## CI report: * e7a3e7051dd0a041fc4e5ab2e68238c183b59a34 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] yangjf2019 commented on pull request #567: [FLINK-31815] Fixing the container vulnerability by upgrade the SnakeYaml Maven dependency
yangjf2019 commented on PR #567: URL: https://github.com/apache/flink-kubernetes-operator/pull/567#issuecomment-1517823027 I found that the exception is generated when I commit `e2e-tests/data/flinkdep-cr.yaml` and run `default/flink-example-statemachine`, do we need to upgrade the flink version first? https://github.com/apache/flink-kubernetes-operator/blob/main/e2e-tests/data/flinkdep-cr.yaml#L50 https://github.com/apache/flink-kubernetes-operator/blob/main/e2e-tests/data/multi-sessionjob.yaml#L131-L145 https://github.com/apache/flink-kubernetes-operator/blob/main/e2e-tests/data/sessionjob-cr.yaml#L79 https://github.com/apache/flink-kubernetes-operator/actions/runs/4750576429/jobs/8439813076 https://user-images.githubusercontent.com/54518670/233643419-140478bc-b92d-4d2d-b7e8-9c3225423de6.png";> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #22449: [FLINK-31752] Fix SourceOperator numRecordsOut duplicate bug
flinkbot commented on PR #22449: URL: https://github.com/apache/flink/pull/22449#issuecomment-1517819883 ## CI report: * f7e156ab4e74d487e3047bc8ac86182c69d82d15 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (FLINK-31723) DispatcherTest#testCancellationDuringInitialization is unstable
[ https://issues.apache.org/jira/browse/FLINK-31723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Morávek resolved FLINK-31723. --- Fix Version/s: 1.18.0 Resolution: Fixed > DispatcherTest#testCancellationDuringInitialization is unstable > --- > > Key: FLINK-31723 > URL: https://issues.apache.org/jira/browse/FLINK-31723 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.18.0 >Reporter: Sergey Nuyanzin >Assignee: David Morávek >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.18.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=47889&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7c1d86e3-35bd-5fd5-3b7c-30c126a78702&l=6761 > {noformat} > Apr 04 02:26:26 [ERROR] > org.apache.flink.runtime.dispatcher.DispatcherTest.testCancellationDuringInitialization > Time elapsed: 0.033 s <<< FAILURE! > Apr 04 02:26:26 java.lang.AssertionError: > Apr 04 02:26:26 > Apr 04 02:26:26 Expected: is > Apr 04 02:26:26 but: was > Apr 04 02:26:26 at > org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > Apr 04 02:26:26 at org.junit.Assert.assertThat(Assert.java:964) > Apr 04 02:26:26 at org.junit.Assert.assertThat(Assert.java:930) > Apr 04 02:26:26 at > org.apache.flink.runtime.dispatcher.DispatcherTest.testCancellationDuringInitialization(DispatcherTest.java:389) > [...] > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] charlesdong1991 opened a new pull request, #22450: [DOC]: Remove duplicate sentence in docker deployment documentation
charlesdong1991 opened a new pull request, #22450: URL: https://github.com/apache/flink/pull/22450 Remove seemingly duplicate sentence in docker deployment page -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-31723) DispatcherTest#testCancellationDuringInitialization is unstable
[ https://issues.apache.org/jira/browse/FLINK-31723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714997#comment-17714997 ] David Morávek commented on FLINK-31723: --- master: 378d3ca0d4b487b1ebc9354e9ebe8952cc3a9d11 52bf14b0ba949e048c78862be2ed8ebfb58c780e > DispatcherTest#testCancellationDuringInitialization is unstable > --- > > Key: FLINK-31723 > URL: https://issues.apache.org/jira/browse/FLINK-31723 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.18.0 >Reporter: Sergey Nuyanzin >Assignee: David Morávek >Priority: Critical > Labels: pull-request-available, test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=47889&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7c1d86e3-35bd-5fd5-3b7c-30c126a78702&l=6761 > {noformat} > Apr 04 02:26:26 [ERROR] > org.apache.flink.runtime.dispatcher.DispatcherTest.testCancellationDuringInitialization > Time elapsed: 0.033 s <<< FAILURE! > Apr 04 02:26:26 java.lang.AssertionError: > Apr 04 02:26:26 > Apr 04 02:26:26 Expected: is > Apr 04 02:26:26 but: was > Apr 04 02:26:26 at > org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > Apr 04 02:26:26 at org.junit.Assert.assertThat(Assert.java:964) > Apr 04 02:26:26 at org.junit.Assert.assertThat(Assert.java:930) > Apr 04 02:26:26 at > org.apache.flink.runtime.dispatcher.DispatcherTest.testCancellationDuringInitialization(DispatcherTest.java:389) > [...] > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] dmvk merged pull request #22435: [FLINK-31723] Fix DispatcherTest#testCancellationDuringInitialization to not make assumptions about an underlying scheduler implementation.
dmvk merged PR #22435: URL: https://github.com/apache/flink/pull/22435 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-31752) SourceOperatorStreamTask increments numRecordsOut twice
[ https://issues.apache.org/jira/browse/FLINK-31752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-31752: --- Labels: pull-request-available (was: ) > SourceOperatorStreamTask increments numRecordsOut twice > --- > > Key: FLINK-31752 > URL: https://issues.apache.org/jira/browse/FLINK-31752 > Project: Flink > Issue Type: Bug > Components: Runtime / Metrics >Affects Versions: 1.17.0 >Reporter: Weihua Hu >Assignee: Yunfeng Zhou >Priority: Major > Labels: pull-request-available > Attachments: image-2023-04-07-15-51-44-304.png > > > The counter of numRecordsOut was introduce to ChainingOutput to reduce the > function call stack depth in > https://issues.apache.org/jira/browse/FLINK-30536 > But SourceOperatorStreamTask.AsyncDataOutputToOutput increments the counter > of numRecordsOut too. This results in the source operator's numRecordsOut are > doubled. > We should delete the numRecordsOut.inc in > SourceOperatorStreamTask.AsyncDataOutputToOutput. > [~xtsong][~lindong] Could you please take a look at this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] yunfengzhou-hub opened a new pull request, #22449: [FLINK-31752] Fix SourceOperator numRecordsOut duplicate bug
yunfengzhou-hub opened a new pull request, #22449: URL: https://github.com/apache/flink/pull/22449 ## What is the purpose of the change This pull request fixes the bug that the metric numRecordsOut is increased twice in `SourceOperatorStreamTask` and in `ChainingOutput`, which was introduced in #21579. ## Brief change log - Removes the process to increase numRecordsOut in `SourceOperatorStreamTask`. - Adds integration test class `SourceMetricsITCase` to verify the correctness of this metric on SourceOperator. This test class is introduced according to `SinkMetricsITCase`. ## Verifying this change The correctness of the changes in this pull request is covered by the newly introduced test class `SourceMetricsITCase`. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): yes - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? not applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-31848) And Operator has side effect when operands have udf
[ https://issues.apache.org/jira/browse/FLINK-31848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714995#comment-17714995 ] Jark Wu commented on FLINK-31848: - [~csq] do you have a simple case to reproduce the wrong result (and show the result)? And did you test it on the latest version? > And Operator has side effect when operands have udf > --- > > Key: FLINK-31848 > URL: https://issues.apache.org/jira/browse/FLINK-31848 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.13.2 >Reporter: zju_zsx >Priority: Major > Attachments: image-2023-04-19-14-54-46-458.png > > > > {code:java} > CREATE TABLE kafka_source ( > `content` varchar, > `testid` bigint, > `extra` int > ); > CREATE TABLE console_sink ( > `content` varchar, > `testid` bigint > ) > with ( > 'connector' = 'print' > ); > insert into console_sink > select > content,testid+1 > from kafka_source where testid is not null and testid > 0 and my_udf(testid) > != 0; {code} > my_udf has a constraint that the testid should not be null, but the testid is > not null and testid > 0 does not take effect. > > Im ScalarOperatorGens.generateAnd > !image-2023-04-19-14-54-46-458.png! > if left.nullTerm is true, right code will be execute 。 > it seems that > {code:java} > if (!${left.nullTerm} && !${left.resultTerm}) {code} > can be safely replaced with > {code:java} > if (!${left.resultTerm}){code} > ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-31869) test_multi_sessionjob.sh gets stuck very frequently
[ https://issues.apache.org/jira/browse/FLINK-31869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora closed FLINK-31869. -- Fix Version/s: kubernetes-operator-1.5.0 Resolution: Fixed Merged to main 1f54ffa484c359c4b81a409f27092dbfba82157b There are still frequent test failures but at least the CI doesnt get stuck for hours > test_multi_sessionjob.sh gets stuck very frequently > --- > > Key: FLINK-31869 > URL: https://issues.apache.org/jira/browse/FLINK-31869 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Reporter: Gyula Fora >Assignee: Gyula Fora >Priority: Blocker > Labels: pull-request-available > Fix For: kubernetes-operator-1.5.0 > > > The test_multi_sessionjob.sh gets stuck almost all the time on recent builds. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31874) Support truncate table statement in batch mode
[ https://issues.apache.org/jira/browse/FLINK-31874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jark Wu updated FLINK-31874: Fix Version/s: 1.18.0 > Support truncate table statement in batch mode > -- > > Key: FLINK-31874 > URL: https://issues.apache.org/jira/browse/FLINK-31874 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Reporter: luoyuxia >Priority: Major > Fix For: 1.18.0 > > > Described in [FLIP-302: Support TRUNCATE TABLE statement in batch > mode|https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement+in+batch+mode] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29692) Support early/late fires for Windowing TVFs
[ https://issues.apache.org/jira/browse/FLINK-29692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714985#comment-17714985 ] Jark Wu commented on FLINK-29692: - Hi [~charles-tan], thank you for sharing your use case. I'm just curious that is it possible to support your use case by using Group Aggregate instead of Window Aggregate? For example: {code} SELECT user, COUNT(*) as cnt FROM withdrawal GROUP BY user, DATE_FORMAT(withdrawal_timestamp, "-MM-dd HH:00") -- trim into hour HAVING cnt >= 3 {code} IIUC, this can also archive that "notified if a withdrawal from a bank account happens 3 times in an hour" ASAP. And you may get better performance from the tuning[1]. [1]: https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/tuning/ > Support early/late fires for Windowing TVFs > --- > > Key: FLINK-29692 > URL: https://issues.apache.org/jira/browse/FLINK-29692 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Planner >Affects Versions: 1.15.3 >Reporter: Canope Nerda >Priority: Major > > I have cases where I need to 1) output data as soon as possible and 2) handle > late arriving data to achieve eventual correctness. In the logic, I need to > do window deduplication which is based on Windowing TVFs and according to > source code, early/late fires are not supported yet in Windowing TVFs. > Actually 1) contradicts with 2). Without early/late fires, we had to > compromise, either live with fresh incorrect data or tolerate excess latency > for correctness. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31869) test_multi_sessionjob.sh gets stuck very frequently
[ https://issues.apache.org/jira/browse/FLINK-31869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-31869: --- Labels: pull-request-available (was: ) > test_multi_sessionjob.sh gets stuck very frequently > --- > > Key: FLINK-31869 > URL: https://issues.apache.org/jira/browse/FLINK-31869 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Reporter: Gyula Fora >Assignee: Gyula Fora >Priority: Blocker > Labels: pull-request-available > > The test_multi_sessionjob.sh gets stuck almost all the time on recent builds. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-kubernetes-operator] gyfora merged pull request #572: [FLINK-31869] Fix e2e cleanup getting stuck
gyfora merged PR #572: URL: https://github.com/apache/flink-kubernetes-operator/pull/572 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (FLINK-31869) test_multi_sessionjob.sh gets stuck very frequently
[ https://issues.apache.org/jira/browse/FLINK-31869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora reassigned FLINK-31869: -- Assignee: Gyula Fora > test_multi_sessionjob.sh gets stuck very frequently > --- > > Key: FLINK-31869 > URL: https://issues.apache.org/jira/browse/FLINK-31869 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Reporter: Gyula Fora >Assignee: Gyula Fora >Priority: Blocker > > The test_multi_sessionjob.sh gets stuck almost all the time on recent builds. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-30609) Add ephemeral storage to CRD
[ https://issues.apache.org/jira/browse/FLINK-30609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora closed FLINK-30609. -- Resolution: Fixed merged to main bad872d04e324fde2b6396e18bae5a37d804f59b > Add ephemeral storage to CRD > > > Key: FLINK-30609 > URL: https://issues.apache.org/jira/browse/FLINK-30609 > Project: Flink > Issue Type: New Feature > Components: Kubernetes Operator >Reporter: Matyas Orhidi >Assignee: Zhenqiu Huang >Priority: Major > Labels: pull-request-available, starter > Fix For: kubernetes-operator-1.5.0 > > > We should consider adding ephemeral storage to the existing [resource > specification > |https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/reference/#resource]in > CRD, next to {{cpu}} and {{memory}} > https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#setting-requests-and-limits-for-local-ephemeral-storage -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-kubernetes-operator] gyfora merged pull request #561: [FLINK-30609] Add ephemeral storage to CRD
gyfora merged PR #561: URL: https://github.com/apache/flink-kubernetes-operator/pull/561 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-30852) TaskTest.testCleanupWhenSwitchToInitializationFails reports AssertionError but doesn't fail
[ https://issues.apache.org/jira/browse/FLINK-30852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kalashnikov updated FLINK-30852: -- Fix Version/s: 1.17.1 > TaskTest.testCleanupWhenSwitchToInitializationFails reports AssertionError > but doesn't fail > --- > > Key: FLINK-30852 > URL: https://issues.apache.org/jira/browse/FLINK-30852 > Project: Flink > Issue Type: Bug > Components: Runtime / Task >Affects Versions: 1.17.0 >Reporter: Matthias Pohl >Assignee: Anton Kalashnikov >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.18.0, 1.17.1 > > > While investigating FLINK-30844, I noticed that {{TaskTest.testCleanup}} > reports an AssertionError in the logs but doesn't fail: > {code} > 00:59:01,886 [main] ERROR > org.apache.flink.runtime.taskmanager.Task[] - Error while > canceling task Test Task (1/1)#0. > java.lang.AssertionError: This should not be called > at org.junit.Assert.fail(Assert.java:89) ~[junit-4.13.2.jar:4.13.2] > at > org.apache.flink.runtime.taskmanager.TaskTest$TestInvokableCorrect.cancel(TaskTest.java:1304) > ~[test-classes/:?] > at > org.apache.flink.runtime.taskmanager.Task.cancelInvokable(Task.java:1529) > ~[classes/:?] > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:796) > ~[classes/:?] > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) > ~[classes/:?] > at > org.apache.flink.runtime.taskmanager.TaskTest.testCleanupWhenSwitchToInitializationFails(TaskTest.java:184) > ~[test-classes/:?] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_292] > [...] > {code} > [~akalashnikov] is this expected? > The affected build is > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=45440&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-30852) TaskTest.testCleanupWhenSwitchToInitializationFails reports AssertionError but doesn't fail
[ https://issues.apache.org/jira/browse/FLINK-30852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kalashnikov resolved FLINK-30852. --- Resolution: Fixed > TaskTest.testCleanupWhenSwitchToInitializationFails reports AssertionError > but doesn't fail > --- > > Key: FLINK-30852 > URL: https://issues.apache.org/jira/browse/FLINK-30852 > Project: Flink > Issue Type: Bug > Components: Runtime / Task >Affects Versions: 1.17.0 >Reporter: Matthias Pohl >Assignee: Anton Kalashnikov >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.18.0, 1.17.1 > > > While investigating FLINK-30844, I noticed that {{TaskTest.testCleanup}} > reports an AssertionError in the logs but doesn't fail: > {code} > 00:59:01,886 [main] ERROR > org.apache.flink.runtime.taskmanager.Task[] - Error while > canceling task Test Task (1/1)#0. > java.lang.AssertionError: This should not be called > at org.junit.Assert.fail(Assert.java:89) ~[junit-4.13.2.jar:4.13.2] > at > org.apache.flink.runtime.taskmanager.TaskTest$TestInvokableCorrect.cancel(TaskTest.java:1304) > ~[test-classes/:?] > at > org.apache.flink.runtime.taskmanager.Task.cancelInvokable(Task.java:1529) > ~[classes/:?] > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:796) > ~[classes/:?] > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) > ~[classes/:?] > at > org.apache.flink.runtime.taskmanager.TaskTest.testCleanupWhenSwitchToInitializationFails(TaskTest.java:184) > ~[test-classes/:?] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_292] > [...] > {code} > [~akalashnikov] is this expected? > The affected build is > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=45440&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-30852) TaskTest.testCleanupWhenSwitchToInitializationFails reports AssertionError but doesn't fail
[ https://issues.apache.org/jira/browse/FLINK-30852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714961#comment-17714961 ] Anton Kalashnikov commented on FLINK-30852: --- merged to release-1.17: aa47c3f862414511e92637d9816f6908c86b4cf6 > TaskTest.testCleanupWhenSwitchToInitializationFails reports AssertionError > but doesn't fail > --- > > Key: FLINK-30852 > URL: https://issues.apache.org/jira/browse/FLINK-30852 > Project: Flink > Issue Type: Bug > Components: Runtime / Task >Affects Versions: 1.17.0 >Reporter: Matthias Pohl >Assignee: Anton Kalashnikov >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.18.0 > > > While investigating FLINK-30844, I noticed that {{TaskTest.testCleanup}} > reports an AssertionError in the logs but doesn't fail: > {code} > 00:59:01,886 [main] ERROR > org.apache.flink.runtime.taskmanager.Task[] - Error while > canceling task Test Task (1/1)#0. > java.lang.AssertionError: This should not be called > at org.junit.Assert.fail(Assert.java:89) ~[junit-4.13.2.jar:4.13.2] > at > org.apache.flink.runtime.taskmanager.TaskTest$TestInvokableCorrect.cancel(TaskTest.java:1304) > ~[test-classes/:?] > at > org.apache.flink.runtime.taskmanager.Task.cancelInvokable(Task.java:1529) > ~[classes/:?] > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:796) > ~[classes/:?] > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) > ~[classes/:?] > at > org.apache.flink.runtime.taskmanager.TaskTest.testCleanupWhenSwitchToInitializationFails(TaskTest.java:184) > ~[test-classes/:?] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_292] > [...] > {code} > [~akalashnikov] is this expected? > The affected build is > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=45440&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] akalash merged pull request #22436: [FLINK-30852][runtime] Checking task cancelation explicitly rather th…
akalash merged PR #22436: URL: https://github.com/apache/flink/pull/22436 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-28386) Trigger an immediate checkpoint after all sources finished
[ https://issues.apache.org/jira/browse/FLINK-28386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714956#comment-17714956 ] Piotr Nowojski commented on FLINK-28386: [~zlzhang0122] Checkpoint could be used just to make the side effects visible (committing results in two phase commit operators/sinks). On the other hand, why savepoint makes any sense? There is no point in recovering from such snapshot anyway. About the ticket. Taking into account unaligned checkpoints, I think a better condition would be to trigger a checkpoint once all tasks are finished. With unaligned checkpoints, downstream tasks can be still processing in-flight data, while upstream sources are finished, so triggering checkpoint on finished sources wouldn't achieve the desired goal of stopping the job faster. > Trigger an immediate checkpoint after all sources finished > -- > > Key: FLINK-28386 > URL: https://issues.apache.org/jira/browse/FLINK-28386 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Checkpointing >Reporter: Yun Gao >Priority: Major > > Currently for bounded job in streaming mode, by default it will wait for one > more checkpoint to commit the last piece of data. If the checkpoint period is > long, the waiting time might also be long. to optimize this situation, we > could eagerly trigger a checkpoint after all sources are finished. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-31846) Support cancel final checkpoint when all tasks are finished
[ https://issues.apache.org/jira/browse/FLINK-31846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Piotr Nowojski closed FLINK-31846. -- Resolution: Duplicate I will close this ticket, as I think it just duplicates FLINK-28386, feel free to re-open if I missed something. Otherwise, let's maybe move the discussion to the other ticket. > Support cancel final checkpoint when all tasks are finished > --- > > Key: FLINK-31846 > URL: https://issues.apache.org/jira/browse/FLINK-31846 > Project: Flink > Issue Type: New Feature > Components: Runtime / Checkpointing >Affects Versions: 1.15.2 >Reporter: Fan Hong >Priority: Major > > As stated in [1], all tasks will wait for the final checkpoint before > exiting. It also mentioned this mechanism will prolong the execution time. > So, can we provide configurations to make tasks NOT wait for the final > checkpoint? > > [1]: > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/checkpointing/#waiting-for-the-final-checkpoint-before-task-exit -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-31846) Support cancel final checkpoint when all tasks are finished
[ https://issues.apache.org/jira/browse/FLINK-31846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714946#comment-17714946 ] Piotr Nowojski edited comment on FLINK-31846 at 4/21/23 11:33 AM: -- {quote} f the final checkpoint cannot be cancelled, can we bring it forward to start upon completion of all tasks? {quote} I'm afraid it's not possible at the moment, it's a valid feature request. There is even a ticket for that https://issues.apache.org/jira/browse/FLINK-28386. I'm not sure how helpful is this, but you can also manually trigger checkpoints: https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/ops/rest_api/#jobs-jobid-checkpoints-1 Alternatively, you can also disable checkpoints with finished tasks. If your sources are finishing more or less all at once, the only thing you are sacrificing are: * not being able to checkpoint job if some sources have already finished, while others are still working * exactly-once semantic (like committing transactions to Kafka). But you could still use Kafka sink with at-least-once semantic. was (Author: pnowojski): {quote} f the final checkpoint cannot be cancelled, can we bring it forward to start upon completion of all tasks? {quote} I'm afraid it's not possible at the moment, it's a valid feature request. There is even a ticket for that https://issues.apache.org/jira/browse/FLINK-26113. I'm not sure how helpful is this, but you can also manually trigger checkpoints: https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/ops/rest_api/#jobs-jobid-checkpoints-1 Alternatively, you can also disable checkpoints with finished tasks. If your sources are finishing more or less all at once, the only thing you are sacrificing are: * not being able to checkpoint job if some sources have already finished, while others are still working * exactly-once semantic (like committing transactions to Kafka). But you could still use Kafka sink with at-least-once semantic. > Support cancel final checkpoint when all tasks are finished > --- > > Key: FLINK-31846 > URL: https://issues.apache.org/jira/browse/FLINK-31846 > Project: Flink > Issue Type: New Feature > Components: Runtime / Checkpointing >Affects Versions: 1.15.2 >Reporter: Fan Hong >Priority: Major > > As stated in [1], all tasks will wait for the final checkpoint before > exiting. It also mentioned this mechanism will prolong the execution time. > So, can we provide configurations to make tasks NOT wait for the final > checkpoint? > > [1]: > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/checkpointing/#waiting-for-the-final-checkpoint-before-task-exit -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-31846) Support cancel final checkpoint when all tasks are finished
[ https://issues.apache.org/jira/browse/FLINK-31846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714946#comment-17714946 ] Piotr Nowojski edited comment on FLINK-31846 at 4/21/23 11:32 AM: -- {quote} f the final checkpoint cannot be cancelled, can we bring it forward to start upon completion of all tasks? {quote} I'm afraid it's not possible at the moment, it's a valid feature request. There is even a ticket for that https://issues.apache.org/jira/browse/FLINK-26113. I'm not sure how helpful is this, but you can also manually trigger checkpoints: https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/ops/rest_api/#jobs-jobid-checkpoints-1 Alternatively, you can also disable checkpoints with finished tasks. If your sources are finishing more or less all at once, the only thing you are sacrificing are: * not being able to checkpoint job if some sources have already finished, while others are still working * exactly-once semantic (like committing transactions to Kafka). But you could still use Kafka sink with at-least-once semantic. was (Author: pnowojski): {quote} f the final checkpoint cannot be cancelled, can we bring it forward to start upon completion of all tasks? {quote} I'm afraid it's not possible at the moment, but would be a valid feature request. I'm not sure how helpful is this, but you can also manually trigger checkpoints: https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/ops/rest_api/#jobs-jobid-checkpoints-1 Alternatively, you can also disable checkpoints with finished tasks. If your sources are finishing more or less all at once, the only thing you are sacrificing are: * not being able to checkpoint job if some sources have already finished, while others are still working * exactly-once semantic (like committing transactions to Kafka). But you could still use Kafka sink with at-least-once semantic. > Support cancel final checkpoint when all tasks are finished > --- > > Key: FLINK-31846 > URL: https://issues.apache.org/jira/browse/FLINK-31846 > Project: Flink > Issue Type: New Feature > Components: Runtime / Checkpointing >Affects Versions: 1.15.2 >Reporter: Fan Hong >Priority: Major > > As stated in [1], all tasks will wait for the final checkpoint before > exiting. It also mentioned this mechanism will prolong the execution time. > So, can we provide configurations to make tasks NOT wait for the final > checkpoint? > > [1]: > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/checkpointing/#waiting-for-the-final-checkpoint-before-task-exit -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] Aitozi commented on pull request #22378: [FLINK-31344][planner] Support to update nested columns in update sta…
Aitozi commented on PR #22378: URL: https://github.com/apache/flink/pull/22378#issuecomment-1517684780 ping @luoyuxia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] 1996fanrui commented on pull request #22392: [FLINK-31588][checkpoint] Correct the unaligned checkpoint type after aligned barrier timeout to unaligned barrier on PipelinedSubpartitio
1996fanrui commented on PR #22392: URL: https://github.com/apache/flink/pull/22392#issuecomment-1517672547 > Thanks for the fix! > > Can you add a small unit test? Apart of that LGTM Feel free to merge after adding the unit test and with green azure :) Thanks for the quick feedback, updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-31846) Support cancel final checkpoint when all tasks are finished
[ https://issues.apache.org/jira/browse/FLINK-31846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714946#comment-17714946 ] Piotr Nowojski edited comment on FLINK-31846 at 4/21/23 11:08 AM: -- {quote} f the final checkpoint cannot be cancelled, can we bring it forward to start upon completion of all tasks? {quote} I'm afraid it's not possible at the moment, but would be a valid feature request. I'm not sure how helpful is this, but you can also manually trigger checkpoints: https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/ops/rest_api/#jobs-jobid-checkpoints-1 Alternatively, you can also disable checkpoints with finished tasks. If your sources are finishing more or less all at once, the only thing you are sacrificing are: * not being able to checkpoint job if some sources have already finished, while others are still working * exactly-once semantic (like committing transactions to Kafka). But you could still use Kafka sink with at-least-once semantic. was (Author: pnowojski): {quote} f the final checkpoint cannot be cancelled, can we bring it forward to start upon completion of all tasks? {quote} I'm afraid it's not possible at the moment, but would be a valid feature request. I'm not sure how helpful is this, but you can also manually trigger checkpoints: https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/ops/rest_api/#jobs-jobid-checkpoints-1 Alternatively, you can also disable checkpoints with finished tasks. If your sources are finishing more or less all at once, the only thing you are sacrificing is exactly-once semantic (like committing transactions to Kafka). But you could still use Kafka sink with at-least-once semantic. > Support cancel final checkpoint when all tasks are finished > --- > > Key: FLINK-31846 > URL: https://issues.apache.org/jira/browse/FLINK-31846 > Project: Flink > Issue Type: New Feature > Components: Runtime / Checkpointing >Affects Versions: 1.15.2 >Reporter: Fan Hong >Priority: Major > > As stated in [1], all tasks will wait for the final checkpoint before > exiting. It also mentioned this mechanism will prolong the execution time. > So, can we provide configurations to make tasks NOT wait for the final > checkpoint? > > [1]: > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/checkpointing/#waiting-for-the-final-checkpoint-before-task-exit -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31846) Support cancel final checkpoint when all tasks are finished
[ https://issues.apache.org/jira/browse/FLINK-31846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714946#comment-17714946 ] Piotr Nowojski commented on FLINK-31846: {quote} f the final checkpoint cannot be cancelled, can we bring it forward to start upon completion of all tasks? {quote} I'm afraid it's not possible at the moment, but would be a valid feature request. I'm not sure how helpful is this, but you can also manually trigger checkpoints: https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/ops/rest_api/#jobs-jobid-checkpoints-1 Alternatively, you can also disable checkpoints with finished tasks. If your sources are finishing more or less all at once, the only thing you are sacrificing is exactly-once semantic (like committing transactions to Kafka). But you could still use Kafka sink with at-least-once semantic. > Support cancel final checkpoint when all tasks are finished > --- > > Key: FLINK-31846 > URL: https://issues.apache.org/jira/browse/FLINK-31846 > Project: Flink > Issue Type: New Feature > Components: Runtime / Checkpointing >Affects Versions: 1.15.2 >Reporter: Fan Hong >Priority: Major > > As stated in [1], all tasks will wait for the final checkpoint before > exiting. It also mentioned this mechanism will prolong the execution time. > So, can we provide configurations to make tasks NOT wait for the final > checkpoint? > > [1]: > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/checkpointing/#waiting-for-the-final-checkpoint-before-task-exit -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] dmvk commented on pull request #22441: [hotfix]Use wrapper classes like other parameters
dmvk commented on PR #22441: URL: https://github.com/apache/flink/pull/22441#issuecomment-1517640808 What's the motivation behind the PR? Is this just a cosmetic thing, or is there something broken? If cosmetic, I'd expect this to be the other way around, using the unboxed types that prevent nullability issues. If something is broken, we should first have an issue covering it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #22448: [FLINK-31386][network] Fix the potential deadlock issue of blocking shuffle
flinkbot commented on PR #22448: URL: https://github.com/apache/flink/pull/22448#issuecomment-1517637993 ## CI report: * f71e3afff45ea49d2ecd060476dc77b90afe3255 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] wsry commented on pull request #22448: [FLINK-31386][network] Fix the potential deadlock issue of blocking shuffle
wsry commented on PR #22448: URL: https://github.com/apache/flink/pull/22448#issuecomment-1517633083 This is a cherry picked PR, will merge after tests pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-31803) UpdateJobResourceRequirementsRecoveryITCase.testRescaledJobGraphsWillBeRecoveredCorrectly(Path) is unstable on azure
[ https://issues.apache.org/jira/browse/FLINK-31803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714937#comment-17714937 ] David Morávek commented on FLINK-31803: --- master: c2ab806a3624471bb36f87ba98d51f672b7894fe > UpdateJobResourceRequirementsRecoveryITCase.testRescaledJobGraphsWillBeRecoveredCorrectly(Path) > is unstable on azure > > > Key: FLINK-31803 > URL: https://issues.apache.org/jira/browse/FLINK-31803 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Tests >Affects Versions: 1.18.0 >Reporter: Sergey Nuyanzin >Assignee: David Morávek >Priority: Critical > Labels: pull-request-available, test-stability > > {noformat} > Apr 07 01:28:23 java.util.concurrent.CompletionException: > Apr 07 01:28:23 org.apache.flink.runtime.rest.util.RestClientException: > [org.apache.flink.runtime.rest.NotFoundException: Job > d3538259fba86dfc0bd9bd5680076836 not found > Apr 07 01:28:23 at > org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$1(AbstractExecutionGraphHandler.java:99) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:884) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:866) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > Apr 07 01:28:23 at > org.apache.flink.runtime.rest.handler.legacy.DefaultExecutionGraphCache.lambda$getExecutionGraphInternal$0(DefaultExecutionGraphCache.java:109) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > Apr 07 01:28:23 at > org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:260) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > Apr 07 01:28:23 at > org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1275) > Apr 07 01:28:23 at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93) > {noformat} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=47996&view=logs&j=8fd9202e-fd17-5b26-353c-ac1ff76c8f28&t=ea7cf968-e585-52cb-e0fc-f48de023a7ca&l=7713 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-31803) UpdateJobResourceRequirementsRecoveryITCase.testRescaledJobGraphsWillBeRecoveredCorrectly(Path) is unstable on azure
[ https://issues.apache.org/jira/browse/FLINK-31803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Morávek resolved FLINK-31803. --- Fix Version/s: 1.18.0 Resolution: Fixed > UpdateJobResourceRequirementsRecoveryITCase.testRescaledJobGraphsWillBeRecoveredCorrectly(Path) > is unstable on azure > > > Key: FLINK-31803 > URL: https://issues.apache.org/jira/browse/FLINK-31803 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Tests >Affects Versions: 1.18.0 >Reporter: Sergey Nuyanzin >Assignee: David Morávek >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.18.0 > > > {noformat} > Apr 07 01:28:23 java.util.concurrent.CompletionException: > Apr 07 01:28:23 org.apache.flink.runtime.rest.util.RestClientException: > [org.apache.flink.runtime.rest.NotFoundException: Job > d3538259fba86dfc0bd9bd5680076836 not found > Apr 07 01:28:23 at > org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$1(AbstractExecutionGraphHandler.java:99) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:884) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:866) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > Apr 07 01:28:23 at > org.apache.flink.runtime.rest.handler.legacy.DefaultExecutionGraphCache.lambda$getExecutionGraphInternal$0(DefaultExecutionGraphCache.java:109) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > Apr 07 01:28:23 at > org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:260) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > Apr 07 01:28:23 at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > Apr 07 01:28:23 at > org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1275) > Apr 07 01:28:23 at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93) > {noformat} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=47996&view=logs&j=8fd9202e-fd17-5b26-353c-ac1ff76c8f28&t=ea7cf968-e585-52cb-e0fc-f48de023a7ca&l=7713 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] dmvk merged pull request #22408: [FLINK-31803] Harden UpdateJobResourceRequirementsRecoveryITCase.
dmvk merged PR #22408: URL: https://github.com/apache/flink/pull/22408 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] reswqa commented on pull request #22432: [FLINK-18808][runtime] Include side outputs in numRecordsOut metric
reswqa commented on PR #22432: URL: https://github.com/apache/flink/pull/22432#issuecomment-1517630686 Hi @pnowojski, Would you mind taking a look at this in you free time as you have been reviewed the original [PR](https://github.com/apache/flink/pull/13109). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] wsry opened a new pull request, #22448: [FLINK-31386][network] Fix the potential deadlock issue of blocking shuffle
wsry opened a new pull request, #22448: URL: https://github.com/apache/flink/pull/22448 ## What is the purpose of the change Currently, the SortMergeResultPartition may allocate more network buffers than the guaranteed size of the LocalBufferPool. As a result, some result partitions may need to wait other result partitions to release the over-allocated network buffers to continue. However, the result partitions which have allocated more than guaranteed buffers relies on the processing of input data to trigger data spilling and buffer recycling. The input data further relies on batch reading buffers used by the SortMergeResultPartitionReadScheduler which may already taken by those blocked result partitions that are waiting for buffers. Then deadlock occurs. This patch fixes the deadlock issue by reserving the guaranteed buffers on initializing. ## Brief change log - Reserve the guaranteed buffers on initializing for SortMergeResultPartition. ## Verifying this change This change added tests. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #22447: [FLINK-31764][runtime] Get rid of numberOfRequestedOverdraftMemorySegments in LocalBufferPool
flinkbot commented on PR #22447: URL: https://github.com/apache/flink/pull/22447#issuecomment-1517625986 ## CI report: * 62fc6fa527aaf68ee2e2d582a56107ad52fd8714 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-31764) Get rid of numberOfRequestedOverdraftMemorySegments in LocalBufferPool
[ https://issues.apache.org/jira/browse/FLINK-31764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-31764: --- Labels: pull-request-available (was: ) > Get rid of numberOfRequestedOverdraftMemorySegments in LocalBufferPool > -- > > Key: FLINK-31764 > URL: https://issues.apache.org/jira/browse/FLINK-31764 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network >Affects Versions: 1.18.0 >Reporter: Weijie Guo >Assignee: Weijie Guo >Priority: Major > Labels: pull-request-available > > After FLINK-31763, we don't need the specific field > {{numberOfRequestedOverdraftMemorySegments}} to record the overdraft buffers > has been requested anymore since we regard all buffers exceeding the > \{{currentPoolSize}} as overdraft. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] reswqa opened a new pull request, #22447: [FLINK-31764][runtime] Get rid of numberOfRequestedOverdraftMemorySegments in LocalBufferPool
reswqa opened a new pull request, #22447: URL: https://github.com/apache/flink/pull/22447 ## What is the purpose of the change *After [FLINK-31763](https://issues.apache.org/jira/browse/FLINK-31763), we don't need the specific field `numberOfRequestedOverdraftMemorySegments` to record the overdraft buffers has been requested anymore since we regard all buffers exceeding the `currentPoolSize` as overdraft.* ## Brief change log - *Introduce getNumberOfRequestedMemorySegments and rename the old one to a more appropriate name.* - *Get rid of numberOfRequestedOverdraftMemorySegments in LocalBufferPool.* ## Verifying this change This change is already covered by existing tests in `LocalBufferPoolTest`. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): per-buffer - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-31878) Fix the wrong name of PauseOrResumeSplitsTask#toString in connector fetcher
[ https://issues.apache.org/jira/browse/FLINK-31878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuxin Tan updated FLINK-31878: -- Fix Version/s: (was: 1.18.0) > Fix the wrong name of PauseOrResumeSplitsTask#toString in connector fetcher > > > Key: FLINK-31878 > URL: https://issues.apache.org/jira/browse/FLINK-31878 > Project: Flink > Issue Type: Bug > Components: Connectors / Common >Affects Versions: 1.18.0 >Reporter: Yuxin Tan >Assignee: Yuxin Tan >Priority: Major > Labels: pull-request-available > > The class name PauseOrResumeSplitsTask#toString is not right. Users will be > very confused when calling the toString method of the class. So we should fix > it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31878) Fix the wrong name of PauseOrResumeSplitsTask#toString in connector fetcher
[ https://issues.apache.org/jira/browse/FLINK-31878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuxin Tan updated FLINK-31878: -- Affects Version/s: 1.18.0 > Fix the wrong name of PauseOrResumeSplitsTask#toString in connector fetcher > > > Key: FLINK-31878 > URL: https://issues.apache.org/jira/browse/FLINK-31878 > Project: Flink > Issue Type: Bug > Components: Connectors / Common >Affects Versions: 1.18.0 >Reporter: Yuxin Tan >Assignee: Yuxin Tan >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > > The class name PauseOrResumeSplitsTask#toString is not right. Users will be > very confused when calling the toString method of the class. So we should fix > it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] TanYuxin-tyx commented on a diff in pull request #22395: [FLINK-31799][docs] Python connector download link should refer to the url defined in externalized repository
TanYuxin-tyx commented on code in PR #22395: URL: https://github.com/apache/flink/pull/22395#discussion_r1173578262 ## docs/layouts/shortcodes/py_connector_download_link.html: ## @@ -0,0 +1,62 @@ +{{/* +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +*/}}{{/* +Generates an XML snippet for the externalized connector python download table. +*/}} +{{ $name := .Get 0 }} +{{ $connector_version := .Get 1 }} +{{ $connector := index .Site.Data $name }} +{{ $flink_version := .Site.Params.VersionTitle }} +{{ $full_version := printf "%s-%s" $connector_version $flink_version }} + + +{{ if eq $.Site.Language.Lang "en" }} +In order to use the {{ $connector.name }} in PyFlink jobs, the following +dependencies are required: +{{ else if eq $.Site.Language.Lang "zh" }} +为了在 PyFlink 作业中使用 {{ $connector.name }} ,需要添加下列依赖: +{{ end }} + + +Version +PyFlink JAR + + +{{ range $connector.variants }} + +{{- .maven -}} +{{ if $.Site.Params.IsStable }} +{{ if eq .sql_url nil}} +There is no sql jar available yet. +{{ else }} +Download +{{ end }} +{{ else }} +Only available for stable releases. +{{ end }} + +{{ end }} + + +{{ if eq .Site.Language.Lang "en" }} +See Python dependency management +for more details on how to use JARs in PyFlink. +{{ else if eq .Site.Language.Lang "zh" }} +在 PyFlink 中如何添加 JAR 包依赖参见 Python 依赖管理。 Review Comment: "参见" is a little strange, maybe "请参考". But I see that it is also like this in `py_download_link.html`. So fixing or not fixing is both fine, no strong opinions. BTW, if fixing this, then the one in the `py_download_link.html` should also be fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pvary commented on pull request #22437: [FLINK-31868] Fix DefaultInputSplitAssigner javadoc for class
pvary commented on PR #22437: URL: https://github.com/apache/flink/pull/22437#issuecomment-1517549223 Thanks @zhuzhurk for the guidance and commit! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] MartijnVisser commented on pull request #22443: [FLINK-31878][connectors] Fix the wrong name of PauseOrResumeSplitsTask#toString
MartijnVisser commented on PR #22443: URL: https://github.com/apache/flink/pull/22443#issuecomment-1517539242 Wouldn't this break all connectors that have implemented this method already, like Kafka? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #22446: [FLINK-28060][BP 1.15][Connector/Kafka] Updated Kafka Clients to 3.2.1
flinkbot commented on PR #22446: URL: https://github.com/apache/flink/pull/22446#issuecomment-1517536013 ## CI report: * e0086f25a4e965cbbc5030c7e9517f91d1991e4a UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] chachae opened a new pull request, #22446: [FLINK-28060][BP 1.15][Connector/Kafka] Updated Kafka Clients to 3.2.1
chachae opened a new pull request, #22446: URL: https://github.com/apache/flink/pull/22446 Cherry pick https://github.com/apache/flink/pull/19994 and https://github.com/apache/flink/pull/20526 to release-1.15 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org