Re: [PR] [FLINK-35242] Optimize schema evolution & add SE IT cases [flink-cdc]

2024-05-24 Thread via GitHub


hk-lrzy commented on code in PR #3339:
URL: https://github.com/apache/flink-cdc/pull/3339#discussion_r1614307385


##
flink-cdc-composer/src/main/java/org/apache/flink/cdc/composer/flink/translator/SchemaOperatorTranslator.java:
##
@@ -58,12 +59,18 @@ public DataStream translate(
 MetadataApplier metadataApplier,
 List routes) {
 switch (schemaChangeBehavior) {
-case EVOLVE:
-return addSchemaOperator(input, parallelism, metadataApplier, 
routes);
 case IGNORE:
-return dropSchemaChangeEvent(input, parallelism);
+return dropSchemaChangeEvent(
+addSchemaOperator(input, parallelism, metadataApplier, 
routes, true),
+parallelism);
+case TRY_EVOLVE:
+return addSchemaOperator(input, parallelism, metadataApplier, 
routes, true);
+case EVOLVE:
+return addSchemaOperator(input, parallelism, metadataApplier, 
routes, false);
 case EXCEPTION:
-return exceptionOnSchemaChange(input, parallelism);
+return exceptionOnSchemaChange(

Review Comment:
   I agree it, and i also point it in the issue with [JIRA / 
https://issues.apache.org/jira/browse/FLINK-35436](https://issues.apache.org/jira/browse/FLINK-35242)
 and have another https://github.com/apache/flink-cdc/pull/3355
   
   I think we can merge it into this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35242] Optimize schema evolution & add SE IT cases [flink-cdc]

2024-05-24 Thread via GitHub


hk-lrzy commented on code in PR #3339:
URL: https://github.com/apache/flink-cdc/pull/3339#discussion_r1612861870


##
flink-cdc-composer/src/main/java/org/apache/flink/cdc/composer/flink/translator/SchemaOperatorTranslator.java:
##
@@ -58,12 +59,18 @@ public DataStream translate(
 MetadataApplier metadataApplier,
 List routes) {
 switch (schemaChangeBehavior) {
-case EVOLVE:
-return addSchemaOperator(input, parallelism, metadataApplier, 
routes);
 case IGNORE:
-return dropSchemaChangeEvent(input, parallelism);
+return dropSchemaChangeEvent(
+addSchemaOperator(input, parallelism, metadataApplier, 
routes, true),
+parallelism);
+case TRY_EVOLVE:
+return addSchemaOperator(input, parallelism, metadataApplier, 
routes, true);
+case EVOLVE:
+return addSchemaOperator(input, parallelism, metadataApplier, 
routes, false);
 case EXCEPTION:
-return exceptionOnSchemaChange(input, parallelism);
+return exceptionOnSchemaChange(

Review Comment:
I have some questions about the behavior for `IGNORE` and `EXCEPTION`.
Now when we setting the behavior as IGNORE or EXCEPTION, the job will be 
failed, should it be fixed it or create a new PR for it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-35449) MySQL CDC Flink SQL column names are case-sensitive

2024-05-24 Thread linweijiang (Jira)
linweijiang created FLINK-35449:
---

 Summary: MySQL CDC Flink SQL column names are case-sensitive
 Key: FLINK-35449
 URL: https://issues.apache.org/jira/browse/FLINK-35449
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Affects Versions: cdc-3.1.0
Reporter: linweijiang


Using Flink SQL with MySQL CDC, I noticed that the column names in Flink SQL 
are case-sensitive with respect to the column names in the MySQL tables. I 
couldn't find any configuration options to change this behavior.

Do we have support for case-insensitive configurations to address this issue? 
Thanks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33892][runtime] Support Job Recovery from JobMaster Failures for Batch Jobs. [flink]

2024-05-24 Thread via GitHub


JunRuiLee commented on PR #24771:
URL: https://github.com/apache/flink/pull/24771#issuecomment-2130666752

   Thanks @zhuzhurk for the thorough review. I have refactored the 
BatchJobRecoveryTest and JMFailoverITCase based on your comments. PTAL.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35424] Elasticsearch connector 8 supports SSL context [flink-connector-elasticsearch]

2024-05-24 Thread via GitHub


liuml07 commented on code in PR #104:
URL: 
https://github.com/apache/flink-connector-elasticsearch/pull/104#discussion_r1614242366


##
flink-connector-elasticsearch8/src/test/java/org/apache/flink/connector/elasticsearch/sink/Elasticsearch8TestUtils.java:
##
@@ -0,0 +1,69 @@
+package org.apache.flink.connector.elasticsearch.sink;
+
+import org.apache.http.util.EntityUtils;
+import org.elasticsearch.client.Request;
+import org.elasticsearch.client.Response;
+import org.elasticsearch.client.RestClient;
+import org.testcontainers.utility.DockerImageName;
+
+import java.io.IOException;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+/** Utility class for Elasticsearch8 tests. */
+public class Elasticsearch8TestUtils {

Review Comment:
   Sounds good, @reta ! Thanks for the review and helpful comments.
   
   Let's see if @snuyanzin has bandwidth to take a look while I'm trying to 
refactor the tests separately.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35424] Elasticsearch connector 8 supports SSL context [flink-connector-elasticsearch]

2024-05-24 Thread via GitHub


reta commented on code in PR #104:
URL: 
https://github.com/apache/flink-connector-elasticsearch/pull/104#discussion_r1614235030


##
flink-connector-elasticsearch8/src/test/java/org/apache/flink/connector/elasticsearch/sink/Elasticsearch8TestUtils.java:
##
@@ -0,0 +1,69 @@
+package org.apache.flink.connector.elasticsearch.sink;
+
+import org.apache.http.util.EntityUtils;
+import org.elasticsearch.client.Request;
+import org.elasticsearch.client.Response;
+import org.elasticsearch.client.RestClient;
+import org.testcontainers.utility.DockerImageName;
+
+import java.io.IOException;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+/** Utility class for Elasticsearch8 tests. */
+public class Elasticsearch8TestUtils {

Review Comment:
   > If you prefer we wait for that change, I can push to this branch after I 
have a working version. If you agree, I can also create a new PR of the testing 
code refactoring for future proof (new tests will be easily covered by secure 
clusters).
   
   I would agree with you that parameterized tests would very likely make 
things cleaner (we do have some level of duplication now). If you could pull it 
off, would be great, I sadly cannot merge, only review, so we still have time 
till committer comes in.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35424] Elasticsearch connector 8 supports SSL context [flink-connector-elasticsearch]

2024-05-24 Thread via GitHub


liuml07 commented on code in PR #104:
URL: 
https://github.com/apache/flink-connector-elasticsearch/pull/104#discussion_r1614194871


##
flink-connector-elasticsearch8/src/test/java/org/apache/flink/connector/elasticsearch/sink/Elasticsearch8TestUtils.java:
##
@@ -0,0 +1,69 @@
+package org.apache.flink.connector.elasticsearch.sink;
+
+import org.apache.http.util.EntityUtils;
+import org.elasticsearch.client.Request;
+import org.elasticsearch.client.Response;
+import org.elasticsearch.client.RestClient;
+import org.testcontainers.utility.DockerImageName;
+
+import java.io.IOException;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+/** Utility class for Elasticsearch8 tests. */
+public class Elasticsearch8TestUtils {

Review Comment:
   @reta I'm working on the parameterized tests, which is purely a test code 
refactoring:
- Make base test class parameterized with `secure` parameter. As JUnit 5 
has limited support for parameterized tests with inheritance, I used the 
`ParameterizedTestExtension` introduced in Flink, see [this 
doc](https://docs.google.com/document/d/1514Wa_aNB9bJUen4xm5uiuXOooOJTtXqS_Jqk9KJitU/edit#heading=h.jf7lfvm64da4)
- Manage the test container lifecycle instead of using the managed 
annotation `@Testcontainers` and `@Container`
- Create and use common methods in the base class that concrete test 
classes can be mostly parameter-agnostic
   
   If you prefer we wait for that change, I can push to this branch after I 
have a working version. If you agree, I can also create a new PR of the testing 
code refactoring for future proof (new tests will be easily covered by secure 
clusters).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (FLINK-35285) Autoscaler key group optimization can interfere with scale-down.max-factor

2024-05-24 Thread Trystan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849352#comment-17849352
 ] 

Trystan edited comment on FLINK-35285 at 5/24/24 8:41 PM:
--

[~gyfora] is there maybe another setting that can help tune this? At least on 
1.7.0, I often find that a max scale down factor of 0.5 (which seems to be 
essentially mandatory given the current computations) leads to an overshoot - 
so then it scales back up. For example 40 -> 20 -> 40 -> 24.

I'd prefer to be conservative on the scale down and set it to 0.2 or 0.3. In 
the case of maxParallelism=120, 0.3 would work for _this_ scale down from 40 
(sort of - it results in 30), but 0.2 would not - we would effectively have a 
minParallelism of 40 and never go below it. Yet in the case of current=120, 
max=120, maxScaleDown=.2, it works just fine - it'll scale to 96. The "good" 
minScaleFactors seem highly dependent on both the maxParallelism and 
currentParallelism.

It seems that the problem lies in this loop: 
[https://github.com/apache/flink-kubernetes-operator/blob/fe3d24e4500d6fcaed55250ccc816546886fd1cf/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java#L299-L303]

If we add 
{code:java}
&& p < currentParallelism {code}
to the loop we get the expected behavior on scale down. Of course, then other 
keygroup-optimized scale ups break. Perhaps there needs to be different loops 
for scale up / scale down. On scale down, ensure that p < currentParallelism 
and on scale up p > currentParallelism. I think this would fix the current 
scenario as well as the existing ones. I added a few tests locally that confirm 
it as well. If this is viable I'd be happy to make a PR.

Is there something obvious that I'm missing, something I can tune better?


was (Author: trystan):
[~gyfora] is there maybe another setting that can help tune this? At least on 
1.7.0, I often find that a max scale down factor of 0.5 (which seems to be 
essentially mandatory given the current computations) leads to an overshoot - 
so then it scales back up. For example 40 -> 20 -> 40 -> 24.

I'd prefer to be conservative on the scale down and set it to 0.2 or 0.3. In 
the case of maxParallelism=120, 0.3 would work for _this_ scale down from 40 
(sort of - it results in 30), but 0.2 would not - we would effectively have a 
minParallelism of 40 and never go below it. Yet in the case of current=120, 
max=120, maxScaleDown=.2, it works just fine - it'll scale to 96. The "good" 
minScaleFactors seem highly dependent on both the maxParallelism and 
currentParallelism.

It seems that the problem lies in this loop: 
[https://github.com/apache/flink-kubernetes-operator/blob/fe3d24e4500d6fcaed55250ccc816546886fd1cf/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java#L299-L303]

If we add 
{code:java}
&& p < currentParallelism {code}
to the loop we get the expected behavior on scale down. Of course, then other 
keygroup-optimized scale ups break. Perhaps there needs to be different loops 
for scale up / scale down?

Is there something obvious that I'm missing, something I can tune better?

> Autoscaler key group optimization can interfere with scale-down.max-factor
> --
>
> Key: FLINK-35285
> URL: https://issues.apache.org/jira/browse/FLINK-35285
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Reporter: Trystan
>Priority: Minor
>
> When setting a less aggressive scale down limit, the key group optimization 
> can prevent a vertex from scaling down at all. It will hunt from target 
> upwards to maxParallelism/2, and will always find currentParallelism again.
>  
> A simple test trying to scale down from a parallelism of 60 with a 
> scale-down.max-factor of 0.2:
> {code:java}
> assertEquals(48, JobVertexScaler.scale(60, inputShipStrategies, 360, .8, 8, 
> 360)); {code}
>  
> It seems reasonable to make a good attempt to spread data across subtasks, 
> but not at the expense of total deadlock. The problem is that during scale 
> down it doesn't actually ensure that newParallelism will be < 
> currentParallelism. The only workaround is to set a scale down factor large 
> enough such that it finds the next lowest divisor of the maxParallelism.
>  
> Clunky, but something to ensure it can make at least some progress. There is 
> another test that now fails, but just to illustrate the point:
> {code:java}
> for (int p = newParallelism; p <= maxParallelism / 2 && p <= upperBound; p++) 
> {
> if ((scaleFactor < 1 && p < currentParallelism) || (scaleFactor > 1 && p 
> > currentParallelism)) {
> if (maxParallelism % p == 0) {
> return p;
> }
> }
> } {code}
>  
> Perhaps this is by design and not a bug, but total failure to 

Re: [PR] [FLINK-33759] [flink-parquet] Add support for nested array with row/map/array type [flink]

2024-05-24 Thread via GitHub


ViktorCosenza commented on PR #24795:
URL: https://github.com/apache/flink/pull/24795#issuecomment-2130263489

   > @ViktorCosenza did you rebase?
   
   Yes, i did an interactive rebase and then force-pushed the squashed commits


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35435] Add timeout Configuration to Async Sink [flink]

2024-05-24 Thread via GitHub


dannycranmer commented on code in PR #24839:
URL: https://github.com/apache/flink/pull/24839#discussion_r1613883830


##
flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/AsyncSinkBase.java:
##
@@ -54,6 +54,8 @@ public abstract class AsyncSinkBase
 private final long maxBatchSizeInBytes;
 private final long maxTimeInBufferMS;
 private final long maxRecordSizeInBytes;
+private Long requestTimeoutMS;
+private Boolean failOnTimeout;

Review Comment:
   Why `Boolean` and not `boolean` ?



##
flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/AsyncSinkBase.java:
##
@@ -54,6 +54,8 @@ public abstract class AsyncSinkBase
 private final long maxBatchSizeInBytes;
 private final long maxTimeInBufferMS;
 private final long maxRecordSizeInBytes;
+private Long requestTimeoutMS;
+private Boolean failOnTimeout;

Review Comment:
   Also, to avoid null checks could default request timeout ms to -1 for off



##
flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/AsyncSinkBase.java:
##
@@ -54,6 +54,8 @@ public abstract class AsyncSinkBase
 private final long maxBatchSizeInBytes;
 private final long maxTimeInBufferMS;
 private final long maxRecordSizeInBytes;
+private Long requestTimeoutMS;
+private Boolean failOnTimeout;

Review Comment:
   Make `final`



##
flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/AsyncSinkWriter.java:
##
@@ -181,15 +187,88 @@ public abstract class AsyncSinkWriterDuring checkpointing, the sink needs to ensure that there are no 
outstanding in-flight
  * requests.
  *
+ * This method is deprecated in favor of {@code submitRequestEntries( 
List
+ * requestEntries, ResultHandler resultHandler)}
+ *
  * @param requestEntries a set of request entries that should be sent to 
the destination
  * @param requestToRetry the {@code accept} method should be called on 
this Consumer once the
  * processing of the {@code requestEntries} are complete. Any entries 
that encountered
  * difficulties in persisting should be re-queued through {@code 
requestToRetry} by
  * including that element in the collection of {@code RequestEntryT}s 
passed to the {@code
  * accept} method. All other elements are assumed to have been 
successfully persisted.
  */
-protected abstract void submitRequestEntries(
-List requestEntries, Consumer> 
requestToRetry);
+@Deprecated
+protected void submitRequestEntries(
+List requestEntries, Consumer> 
requestToRetry) {
+throw new UnsupportedOperationException(
+"This method is deprecated. Please override the method that 
accepts a ResultHandler.");
+}
+
+/**
+ * This method specifies how to persist buffered request entries into the 
destination. It is
+ * implemented when support for a new destination is added.
+ *
+ * The method is invoked with a set of request entries according to the 
buffering hints (and
+ * the valid limits of the destination). The logic then needs to create 
and execute the request
+ * asynchronously against the destination (ideally by batching together 
multiple request entries
+ * to increase efficiency). The logic also needs to identify individual 
request entries that
+ * were not persisted successfully and resubmit them using the {@code 
requestToRetry} callback.
+ *
+ * From a threading perspective, the mailbox thread will call this 
method and initiate the
+ * asynchronous request to persist the {@code requestEntries}. NOTE: The 
client must support

Review Comment:
   Well... You could spin up a thread pool in the sink, and not necessarily in 
the client



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-30687][table] Support pushdown for aggregate filters [flink]

2024-05-24 Thread via GitHub


jeyhunkarimov commented on PR #24660:
URL: https://github.com/apache/flink/pull/24660#issuecomment-2130159573

   Hi @JingGe , I added support for multiple aggregates with the same filter 
(to push down their filters). Could you please check the PR in your available 
time? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33759] [flink-parquet] Add support for nested array with row/map/array type [flink]

2024-05-24 Thread via GitHub


JingGe commented on PR #24795:
URL: https://github.com/apache/flink/pull/24795#issuecomment-2130068966

   @ViktorCosenza did you rebase?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33759] [flink-parquet] Add support for nested array with row/map/array type [flink]

2024-05-24 Thread via GitHub


JingGe commented on PR #24795:
URL: https://github.com/apache/flink/pull/24795#issuecomment-2130067792

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (FLINK-35285) Autoscaler key group optimization can interfere with scale-down.max-factor

2024-05-24 Thread Trystan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849352#comment-17849352
 ] 

Trystan edited comment on FLINK-35285 at 5/24/24 5:34 PM:
--

[~gyfora] is there maybe another setting that can help tune this? At least on 
1.7.0, I often find that a max scale down factor of 0.5 (which seems to be 
essentially mandatory given the current computations) leads to an overshoot - 
so then it scales back up. For example 40 -> 20 -> 40 -> 24.

I'd prefer to be conservative on the scale down and set it to 0.2 or 0.3. In 
the case of maxParallelism=120, 0.3 would work for _this_ scale down from 40 
(sort of - it results in 30), but 0.2 would not - we would effectively have a 
minParallelism of 40 and never go below it. Yet in the case of current=120, 
max=120, maxScaleDown=.2, it works just fine - it'll scale to 96. The "good" 
minScaleFactors seem highly dependent on both the maxParallelism and 
currentParallelism.

It seems that the problem lies in this loop: 
[https://github.com/apache/flink-kubernetes-operator/blob/fe3d24e4500d6fcaed55250ccc816546886fd1cf/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java#L299-L303]

If we add 
{code:java}
&& p < currentParallelism {code}
to the loop we get the expected behavior on scale down. Of course, then other 
keygroup-optimized scale ups break. Perhaps there needs to be different loops 
for scale up / scale down?

Is there something obvious that I'm missing, something I can tune better?


was (Author: trystan):
[~gyfora] is there maybe another setting that can help tune this? At least on 
1.7.0, I often find that a max scale down factor of 0.5 (which seems to be 
essentially mandatory given the current computations) leads to an overshoot - 
so then it scales back up. For example 40 -> 20 -> 40 -> 24.

I'd prefer to be conservative on the scale down and set it to 0.2 or 0.3. In 
the case of maxParallelism=120, 0.3 would work for _this_ scale down, but 0.2 
would not - we would effectively have a minParallelism of 40 and never go below 
it. Yet in the case of current=120, max=120, maxScaleDown=.2, it works just 
fine - it'll scale to 96. The "good" minScaleFactors seem highly dependent on 
both the maxParallelism and currentParallelism.

It seems that the problem lies in this loop: 
https://github.com/apache/flink-kubernetes-operator/blob/fe3d24e4500d6fcaed55250ccc816546886fd1cf/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java#L299-L303

If we add 
{code:java}
&& p < currentParallelism {code}
to the loop we get the expected behavior on scale down. Of course, then other 
keygroup-optimized scale ups break. Perhaps there needs to be different loops 
for scale up / scale down?

Is there something obvious that I'm missing, something I can tune better?

> Autoscaler key group optimization can interfere with scale-down.max-factor
> --
>
> Key: FLINK-35285
> URL: https://issues.apache.org/jira/browse/FLINK-35285
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Reporter: Trystan
>Priority: Minor
>
> When setting a less aggressive scale down limit, the key group optimization 
> can prevent a vertex from scaling down at all. It will hunt from target 
> upwards to maxParallelism/2, and will always find currentParallelism again.
>  
> A simple test trying to scale down from a parallelism of 60 with a 
> scale-down.max-factor of 0.2:
> {code:java}
> assertEquals(48, JobVertexScaler.scale(60, inputShipStrategies, 360, .8, 8, 
> 360)); {code}
>  
> It seems reasonable to make a good attempt to spread data across subtasks, 
> but not at the expense of total deadlock. The problem is that during scale 
> down it doesn't actually ensure that newParallelism will be < 
> currentParallelism. The only workaround is to set a scale down factor large 
> enough such that it finds the next lowest divisor of the maxParallelism.
>  
> Clunky, but something to ensure it can make at least some progress. There is 
> another test that now fails, but just to illustrate the point:
> {code:java}
> for (int p = newParallelism; p <= maxParallelism / 2 && p <= upperBound; p++) 
> {
> if ((scaleFactor < 1 && p < currentParallelism) || (scaleFactor > 1 && p 
> > currentParallelism)) {
> if (maxParallelism % p == 0) {
> return p;
> }
> }
> } {code}
>  
> Perhaps this is by design and not a bug, but total failure to scale down in 
> order to keep optimized key groups does not seem ideal.
>  
> Key group optimization block:
> 

[jira] [Commented] (FLINK-35285) Autoscaler key group optimization can interfere with scale-down.max-factor

2024-05-24 Thread Trystan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849352#comment-17849352
 ] 

Trystan commented on FLINK-35285:
-

[~gyfora] is there maybe another setting that can help tune this? At least on 
1.7.0, I often find that a max scale down factor of 0.5 (which seems to be 
essentially mandatory given the current computations) leads to an overshoot - 
so then it scales back up. For example 40 -> 20 -> 40 -> 24.

I'd prefer to be conservative on the scale down and set it to 0.2 or 0.3. In 
the case of maxParallelism=120, 0.3 would work for _this_ scale down, but 0.2 
would not - we would effectively have a minParallelism of 40 and never go below 
it. Yet in the case of current=120, max=120, maxScaleDown=.2, it works just 
fine - it'll scale to 96. The "good" minScaleFactors seem highly dependent on 
both the maxParallelism and currentParallelism.

It seems that the problem lies in this loop: 
https://github.com/apache/flink-kubernetes-operator/blob/fe3d24e4500d6fcaed55250ccc816546886fd1cf/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java#L299-L303

If we add 
{code:java}
&& p < currentParallelism {code}
to the loop we get the expected behavior on scale down. Of course, then other 
keygroup-optimized scale ups break. Perhaps there needs to be different loops 
for scale up / scale down?

Is there something obvious that I'm missing, something I can tune better?

> Autoscaler key group optimization can interfere with scale-down.max-factor
> --
>
> Key: FLINK-35285
> URL: https://issues.apache.org/jira/browse/FLINK-35285
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Reporter: Trystan
>Priority: Minor
>
> When setting a less aggressive scale down limit, the key group optimization 
> can prevent a vertex from scaling down at all. It will hunt from target 
> upwards to maxParallelism/2, and will always find currentParallelism again.
>  
> A simple test trying to scale down from a parallelism of 60 with a 
> scale-down.max-factor of 0.2:
> {code:java}
> assertEquals(48, JobVertexScaler.scale(60, inputShipStrategies, 360, .8, 8, 
> 360)); {code}
>  
> It seems reasonable to make a good attempt to spread data across subtasks, 
> but not at the expense of total deadlock. The problem is that during scale 
> down it doesn't actually ensure that newParallelism will be < 
> currentParallelism. The only workaround is to set a scale down factor large 
> enough such that it finds the next lowest divisor of the maxParallelism.
>  
> Clunky, but something to ensure it can make at least some progress. There is 
> another test that now fails, but just to illustrate the point:
> {code:java}
> for (int p = newParallelism; p <= maxParallelism / 2 && p <= upperBound; p++) 
> {
> if ((scaleFactor < 1 && p < currentParallelism) || (scaleFactor > 1 && p 
> > currentParallelism)) {
> if (maxParallelism % p == 0) {
> return p;
> }
> }
> } {code}
>  
> Perhaps this is by design and not a bug, but total failure to scale down in 
> order to keep optimized key groups does not seem ideal.
>  
> Key group optimization block:
> [https://github.com/apache/flink-kubernetes-operator/blob/fe3d24e4500d6fcaed55250ccc816546886fd1cf/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java#L296C1-L303C10]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-31223][sqlgateway] Introduce getFlinkConfigurationOptions to [flink]

2024-05-24 Thread via GitHub


davidradl commented on PR #24741:
URL: https://github.com/apache/flink/pull/24741#issuecomment-2130035074

   @reswqa all clean now. I did not backport the 1.19 change at that method was 
not there at 1.18. I was hoping that I could backport commit  06b3708 which 
introduces the getMap change to ReadableConfig , but this is not enough. I am 
not sure how much I will need to back port to get 118 working; 119 has a Flip 
that changes configuration substantially.
   I suggest we leave the fix as is, with the method you added, and not do a 
large amount of 1.19 -1.18 back ports. WDYT? 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35424] Elasticsearch connector 8 supports SSL context [flink-connector-elasticsearch]

2024-05-24 Thread via GitHub


liuml07 commented on code in PR #104:
URL: 
https://github.com/apache/flink-connector-elasticsearch/pull/104#discussion_r1613758123


##
flink-connector-elasticsearch8/src/test/java/org/apache/flink/connector/elasticsearch/sink/Elasticsearch8TestUtils.java:
##
@@ -0,0 +1,69 @@
+package org.apache.flink.connector.elasticsearch.sink;
+
+import org.apache.http.util.EntityUtils;
+import org.elasticsearch.client.Request;
+import org.elasticsearch.client.Response;
+import org.elasticsearch.client.RestClient;
+import org.testcontainers.utility.DockerImageName;
+
+import java.io.IOException;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+/** Utility class for Elasticsearch8 tests. */
+public class Elasticsearch8TestUtils {

Review Comment:
   Indeed the parameterized tests will reuse the code at the max level and 
still keep separate cases/parameters independent. I thought of this idea 
previously but realized the `ES_CONTAINER` field is static (to amortize cost of 
container setup) and annotated `@Container` for managed lifecycle.
   
   When parameterized, there will be multiple if-else checks depending on the 
parameter in the base class and child test classes, mainly for ES client and 
sink builder. This is not a problem per se, and just needs a bit more 
refactoring.
   
   I'll move the fields / static methods back to the base class for now, and 
take another look at parameterized tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35424] Elasticsearch connector 8 supports SSL context [flink-connector-elasticsearch]

2024-05-24 Thread via GitHub


reta commented on code in PR #104:
URL: 
https://github.com/apache/flink-connector-elasticsearch/pull/104#discussion_r1613717772


##
flink-connector-elasticsearch8/src/test/java/org/apache/flink/connector/elasticsearch/sink/Elasticsearch8TestUtils.java:
##
@@ -0,0 +1,69 @@
+package org.apache.flink.connector.elasticsearch.sink;
+
+import org.apache.http.util.EntityUtils;
+import org.elasticsearch.client.Request;
+import org.elasticsearch.client.Response;
+import org.elasticsearch.client.RestClient;
+import org.testcontainers.utility.DockerImageName;
+
+import java.io.IOException;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+/** Utility class for Elasticsearch8 tests. */
+public class Elasticsearch8TestUtils {

Review Comment:
   > I don't have strong preference on this and can move it back to the 
ElasticsearchSinkBaseITCase class if we feel it's better to keep them in the 
base class.
   
   It looks more straightforward to me to be fair (with 
ElasticsearchSinkBaseITCase), the container creation could be parameterized if 
needed, thank you



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35424][connectors/elasticsearch] Elasticsearch connector 8 supports SSL context [flink-connector-elasticsearch]

2024-05-24 Thread via GitHub


liuml07 commented on code in PR #104:
URL: 
https://github.com/apache/flink-connector-elasticsearch/pull/104#discussion_r1613711076


##
flink-connector-elasticsearch8/src/test/java/org/apache/flink/connector/elasticsearch/sink/Elasticsearch8TestUtils.java:
##
@@ -0,0 +1,69 @@
+package org.apache.flink.connector.elasticsearch.sink;
+
+import org.apache.http.util.EntityUtils;
+import org.elasticsearch.client.Request;
+import org.elasticsearch.client.Response;
+import org.elasticsearch.client.RestClient;
+import org.testcontainers.utility.DockerImageName;
+
+import java.io.IOException;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+/** Utility class for Elasticsearch8 tests. */
+public class Elasticsearch8TestUtils {

Review Comment:
   Good question, it's not required. `ElasticsearchSinkBaseITCase` creates a 
static ES container while the new test `Elasticsearch8AsyncSinkSecureITCase` 
needs to create the secure ES container. The new one does not really inherit 
this base class, but simply refer to those static fields. I think it's a bit 
clearer to extract those common fields and methods so secure vs. non-secure 
tests are completely independent.
   
   I don't have strong preference on this and can move it back to the 
`ElasticsearchSinkBaseITCase` class if we feel it's better to keep them in the 
base class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-34333) Fix FLINK-34007 LeaderElector bug in 1.18

2024-05-24 Thread Jira


[ 
https://issues.apache.org/jira/browse/FLINK-34333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849337#comment-17849337
 ] 

Pedro Mázala commented on FLINK-34333:
--

I've just done it. Thank you for porting it back

> Fix FLINK-34007 LeaderElector bug in 1.18
> -
>
> Key: FLINK-34333
> URL: https://issues.apache.org/jira/browse/FLINK-34333
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.18.1
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.18.2
>
>
> FLINK-34007 revealed a bug in the k8s client v6.6.2 which we're using since 
> Flink 1.18. This issue was fixed with FLINK-34007 for Flink 1.19 which 
> required an update of the k8s client to v6.9.0.
> This Jira issue is about finding a solution in Flink 1.18 for the very same 
> problem FLINK-34007 covered. It's a dedicated Jira issue because we want to 
> unblock the release of 1.19 by resolving FLINK-34007.
> Just to summarize why the upgrade to v6.9.0 is desired: There's a bug in 
> v6.6.2 which might prevent the leadership lost event being forwarded to the 
> client ([#5463|https://github.com/fabric8io/kubernetes-client/issues/5463]). 
> An initial proposal where the release call was handled in Flink's 
> {{KubernetesLeaderElector}} didn't work due to the leadership lost event 
> being triggered twice (see [FLINK-34007 PR 
> comment|https://github.com/apache/flink/pull/24132#discussion_r1467175902])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-35446] Fix NPE when disabling checkpoint file merging but restore from merged files [flink]

2024-05-24 Thread via GitHub


flinkbot commented on PR #24840:
URL: https://github.com/apache/flink/pull/24840#issuecomment-2129877895

   
   ## CI report:
   
   * 5a78e68711f02ccaeb069376270649229d8c53c8 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35316][tests] Run CDC E2e test with Flink 1.19 [flink-cdc]

2024-05-24 Thread via GitHub


yuxiqian closed pull request #3305: [FLINK-35316][tests] Run CDC E2e test with 
Flink 1.19
URL: https://github.com/apache/flink-cdc/pull/3305


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35316][tests] Run CDC E2e test with Flink 1.19 [flink-cdc]

2024-05-24 Thread via GitHub


yuxiqian commented on PR #3305:
URL: https://github.com/apache/flink-cdc/pull/3305#issuecomment-2129877441

   Addressed in #3348.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35424][connectors/elasticsearch] Elasticsearch connector 8 supports SSL context [flink-connector-elasticsearch]

2024-05-24 Thread via GitHub


liuml07 commented on code in PR #104:
URL: 
https://github.com/apache/flink-connector-elasticsearch/pull/104#discussion_r1613694681


##
flink-connector-elasticsearch8/src/main/java/org/apache/flink/connector/elasticsearch/sink/Elasticsearch8AsyncSinkBuilder.java:
##
@@ -100,16 +111,62 @@ public Elasticsearch8AsyncSinkBuilder 
setHeaders(Header... headers) {
 }
 
 /**
- * setCertificateFingerprint set the certificate fingerprint to be used to 
verify the HTTPS
- * connection.
+ * Allows to bypass the certificates chain validation and connect to 
insecure network endpoints
+ * (for example, servers which use self-signed certificates).
+ *
+ * @return this builder
+ */
+public Elasticsearch8AsyncSinkBuilder allowInsecure() {

Review Comment:
   Thanks, I refactored this ES 8 a bit in #100 to assist this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-35411) Optimize wait logic in draining of async state requests

2024-05-24 Thread Zakelly Lan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849336#comment-17849336
 ] 

Zakelly Lan commented on FLINK-35411:
-

[~spoon-lz] sure, will do. Thanks for your volunteering.

> Optimize wait logic in draining of async state requests
> ---
>
> Key: FLINK-35411
> URL: https://issues.apache.org/jira/browse/FLINK-35411
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / State Backends, Runtime / Task
>Reporter: Zakelly Lan
>Assignee: Yanfei Lei
>Priority: Major
>
> Currently during draining of async state requests, the task thread performs 
> {{Thread.sleep}} to avoid cpu overhead when polling mails. This can be 
> optimized by wait & notify.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35355) Async aggregating state

2024-05-24 Thread Zakelly Lan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849335#comment-17849335
 ] 

Zakelly Lan commented on FLINK-35355:
-

Merged into master via 467f94f9ecef91b671ebbdc4774f2b690f4fa713

> Async aggregating state
> ---
>
> Key: FLINK-35355
> URL: https://issues.apache.org/jira/browse/FLINK-35355
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / State Backends
>Reporter: Zakelly Lan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-35355) Async aggregating state

2024-05-24 Thread Zakelly Lan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zakelly Lan resolved FLINK-35355.
-
Fix Version/s: 2.0.0
 Assignee: Jie Pu
   Resolution: Fixed

> Async aggregating state
> ---
>
> Key: FLINK-35355
> URL: https://issues.apache.org/jira/browse/FLINK-35355
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / State Backends
>Reporter: Zakelly Lan
>Assignee: Jie Pu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-35355][State] Internal async aggregating state and corresponding state descriptor [flink]

2024-05-24 Thread via GitHub


Zakelly closed pull request #24810: [FLINK-35355][State] Internal async 
aggregating state and corresponding state descriptor
URL: https://github.com/apache/flink/pull/24810


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35446] Fix NPE when disabling checkpoint file merging but restore from merged files [flink]

2024-05-24 Thread via GitHub


Zakelly commented on PR #24840:
URL: https://github.com/apache/flink/pull/24840#issuecomment-2129858482

   @fredia @ljz2051 Would you please take a look? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35355][State] Internal async aggregating state and corresponding state descriptor [flink]

2024-05-24 Thread via GitHub


Zakelly commented on PR #24810:
URL: https://github.com/apache/flink/pull/24810#issuecomment-2129836350

   CI green, merging


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-35446) FileMergingSnapshotManagerBase throws a NullPointerException

2024-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-35446:
---
Labels: pull-request-available test-stability  (was: test-stability)

> FileMergingSnapshotManagerBase throws a NullPointerException
> 
>
> Key: FLINK-35446
> URL: https://issues.apache.org/jira/browse/FLINK-35446
> Project: Flink
>  Issue Type: Bug
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> * 1.20 Java 11 / Test (module: tests) 
> https://github.com/apache/flink/actions/runs/9217608897/job/25360103124#step:10:8641
> {{ResumeCheckpointManuallyITCase.testExternalizedIncrementalRocksDBCheckpointsWithLocalRecoveryZookeeper}}
>  throws a NullPointerException when it tries to restore state handles: 
> {code}
> Error: 02:57:52 02:57:52.551 [ERROR] Tests run: 48, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 268.6 s <<< FAILURE! -- in 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase
> Error: 02:57:52 02:57:52.551 [ERROR] 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedIncrementalRocksDBCheckpointsWithLocalRecoveryZookeeper[RestoreMode
>  = CLAIM] -- Time elapsed: 3.145 s <<< ERROR!
> May 24 02:57:52 org.apache.flink.runtime.JobException: Recovery is suppressed 
> by NoRestartBackoffTimeStrategy
> May 24 02:57:52   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:219)
> May 24 02:57:52   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailureAndReport(ExecutionFailureHandler.java:166)
> May 24 02:57:52   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:121)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.recordTaskFailure(DefaultScheduler.java:279)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:270)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.onTaskFailed(DefaultScheduler.java:263)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.onTaskExecutionStateUpdate(SchedulerBase.java:788)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:765)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:83)
> May 24 02:57:52   at 
> org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:496)
> May 24 02:57:52   at 
> jdk.internal.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
> May 24 02:57:52   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 24 02:57:52   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> May 24 02:57:52   at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRpcInvocation$1(PekkoRpcActor.java:318)
> May 24 02:57:52   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
> May 24 02:57:52   at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcInvocation(PekkoRpcActor.java:316)
> May 24 02:57:52   at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:229)
> May 24 02:57:52   at 
> org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:88)
> May 24 02:57:52   at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:174)
> May 24 02:57:52   at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33)
> May 24 02:57:52   at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29)
> May 24 02:57:52   at 
> scala.PartialFunction.applyOrElse(PartialFunction.scala:127)
> May 24 02:57:52   at 
> scala.PartialFunction.applyOrElse$(PartialFunction.scala:126)
> May 24 02:57:52   at 
> org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29)
> May 24 02:57:52   at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175)
> May 24 02:57:52   at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
> May 24 02:57:52   at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
> May 24 02:57:52   at 
> org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547)
> May 24 02:57:52   at 
> org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545)
> May 24 

[PR] [FLINK-35446] Fix NPE when disabling checkpoint file merging but restore from merged files [flink]

2024-05-24 Thread via GitHub


Zakelly opened a new pull request, #24840:
URL: https://github.com/apache/flink/pull/24840

   ## What is the purpose of the change
   
   See FLINK-35446. It seems the file merging is disabled, thus the 
`FileMergingSnapshotManagerBase#initFileSystem` is never called. And at this 
time if we restore from a checkpoint that files are already merged, the managed 
path is needed while it has not been initialized. This PR fix this.
   
   ## Brief change log
   
- Skip using managed directory to claim management of files if it is not 
initialized, which means the no management need to be claimed in this case.
   
   ## Verifying this change
   
   This change is a trivial rework without any test coverage. But some github 
actions may fail like FLINK-35446.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-35446) FileMergingSnapshotManagerBase throws a NullPointerException

2024-05-24 Thread Zakelly Lan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849323#comment-17849323
 ] 

Zakelly Lan commented on FLINK-35446:
-

[~rskraba] Thanks for letting me know! I'll fix this.

> FileMergingSnapshotManagerBase throws a NullPointerException
> 
>
> Key: FLINK-35446
> URL: https://issues.apache.org/jira/browse/FLINK-35446
> Project: Flink
>  Issue Type: Bug
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> * 1.20 Java 11 / Test (module: tests) 
> https://github.com/apache/flink/actions/runs/9217608897/job/25360103124#step:10:8641
> {{ResumeCheckpointManuallyITCase.testExternalizedIncrementalRocksDBCheckpointsWithLocalRecoveryZookeeper}}
>  throws a NullPointerException when it tries to restore state handles: 
> {code}
> Error: 02:57:52 02:57:52.551 [ERROR] Tests run: 48, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 268.6 s <<< FAILURE! -- in 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase
> Error: 02:57:52 02:57:52.551 [ERROR] 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedIncrementalRocksDBCheckpointsWithLocalRecoveryZookeeper[RestoreMode
>  = CLAIM] -- Time elapsed: 3.145 s <<< ERROR!
> May 24 02:57:52 org.apache.flink.runtime.JobException: Recovery is suppressed 
> by NoRestartBackoffTimeStrategy
> May 24 02:57:52   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:219)
> May 24 02:57:52   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailureAndReport(ExecutionFailureHandler.java:166)
> May 24 02:57:52   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:121)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.recordTaskFailure(DefaultScheduler.java:279)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:270)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.onTaskFailed(DefaultScheduler.java:263)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.onTaskExecutionStateUpdate(SchedulerBase.java:788)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:765)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:83)
> May 24 02:57:52   at 
> org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:496)
> May 24 02:57:52   at 
> jdk.internal.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
> May 24 02:57:52   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 24 02:57:52   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> May 24 02:57:52   at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRpcInvocation$1(PekkoRpcActor.java:318)
> May 24 02:57:52   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
> May 24 02:57:52   at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcInvocation(PekkoRpcActor.java:316)
> May 24 02:57:52   at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:229)
> May 24 02:57:52   at 
> org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:88)
> May 24 02:57:52   at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:174)
> May 24 02:57:52   at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33)
> May 24 02:57:52   at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29)
> May 24 02:57:52   at 
> scala.PartialFunction.applyOrElse(PartialFunction.scala:127)
> May 24 02:57:52   at 
> scala.PartialFunction.applyOrElse$(PartialFunction.scala:126)
> May 24 02:57:52   at 
> org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29)
> May 24 02:57:52   at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175)
> May 24 02:57:52   at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
> May 24 02:57:52   at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
> May 24 02:57:52   at 
> org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547)
> May 24 02:57:52   at 
> org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545)
> May 24 02:57:52 

Re: [PR] [FLINK-35448] Translate pod templates documentation into Chinese [flink-kubernetes-operator]

2024-05-24 Thread via GitHub


caicancai commented on code in PR #830:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/830#discussion_r1613512120


##
docs/content/docs/custom-resource/pod-template.md:
##
@@ -93,16 +90,18 @@ spec:
 ```
 
 {{< hint info >}}
-When using the operator with Flink native Kubernetes integration, please refer 
to [pod template field precedence](
-https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#fields-overwritten-by-flink).
+当使用具有 Flink 原生 Kubernetes 集成的 operator 时,请参考 [pod template 字段优先级](
+https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#fields-overwritten-by-flink)。
 {{< /hint >}}
 
+
 ## Array Merging Behaviour

Review Comment:
   I'm not sure if array should be translated into Chinese



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35424][connectors/elasticsearch] Elasticsearch connector 8 supports SSL context [flink-connector-elasticsearch]

2024-05-24 Thread via GitHub


reta commented on code in PR #104:
URL: 
https://github.com/apache/flink-connector-elasticsearch/pull/104#discussion_r1613490458


##
flink-connector-elasticsearch8/src/main/java/org/apache/flink/connector/elasticsearch/sink/Elasticsearch8AsyncSinkBuilder.java:
##
@@ -100,16 +111,62 @@ public Elasticsearch8AsyncSinkBuilder 
setHeaders(Header... headers) {
 }
 
 /**
- * setCertificateFingerprint set the certificate fingerprint to be used to 
verify the HTTPS
- * connection.
+ * Allows to bypass the certificates chain validation and connect to 
insecure network endpoints
+ * (for example, servers which use self-signed certificates).
+ *
+ * @return this builder
+ */
+public Elasticsearch8AsyncSinkBuilder allowInsecure() {

Review Comment:
   :+1: Looks like a straightforward backport of ES6/7 impl, thank you



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35424][connectors/elasticsearch] Elasticsearch connector 8 supports SSL context [flink-connector-elasticsearch]

2024-05-24 Thread via GitHub


reta commented on code in PR #104:
URL: 
https://github.com/apache/flink-connector-elasticsearch/pull/104#discussion_r1613488632


##
flink-connector-elasticsearch8/src/test/java/org/apache/flink/connector/elasticsearch/sink/Elasticsearch8TestUtils.java:
##
@@ -0,0 +1,69 @@
+package org.apache.flink.connector.elasticsearch.sink;
+
+import org.apache.http.util.EntityUtils;
+import org.elasticsearch.client.Request;
+import org.elasticsearch.client.Response;
+import org.elasticsearch.client.RestClient;
+import org.testcontainers.utility.DockerImageName;
+
+import java.io.IOException;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+/** Utility class for Elasticsearch8 tests. */
+public class Elasticsearch8TestUtils {

Review Comment:
   Why do we need this class vs using existing `ElasticsearchSinkBaseITCase` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-35052) Webhook validator should reject unsupported Flink versions

2024-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-35052:
---
Labels: pull-request-available  (was: )

> Webhook validator should reject unsupported Flink versions
> --
>
> Key: FLINK-35052
> URL: https://issues.apache.org/jira/browse/FLINK-35052
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Alexander Fedulov
>Assignee: Alexander Fedulov
>Priority: Major
>  Labels: pull-request-available
>
> The admission webhook currently does not verify if FlinkDeployment CR 
> utilizes Flink versions that are not supported by the Operator. This causes 
> the CR to be accepted and the failure to be postponed until the 
> reconciliation phase. We should instead fail fast and provide users direct 
> feedback.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-35052] Reject unsupported versions in the webhook validator [flink-kubernetes-operator]

2024-05-24 Thread via GitHub


afedulov opened a new pull request, #831:
URL: https://github.com/apache/flink-kubernetes-operator/pull/831

   ## What is the purpose of the change
   
   The admission webhook currently does not verify if FlinkDeployment CR 
utilizes Flink versions that are not supported by the Operator. This causes the 
CR to be accepted and the failure to be postponed until the reconciliation 
phase. We should instead fail fast and provide users direct feedback.
   
   ## Brief change log
   
   Adds a Flink version check to the validator
   
   ## Verifying this change
   Added a test case to the existing test suite
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changes to the `CustomResourceDescriptors`: 
(yes / **no**)
 - Core observer or reconciler logic that is regularly executed: (yes / 
**no**)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (**yes** / no)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [FLINK-35448] Translate pod templates documentation into Chinese [flink-kubernetes-operator]

2024-05-24 Thread via GitHub


caicancai opened a new pull request, #830:
URL: https://github.com/apache/flink-kubernetes-operator/pull/830

   
   
   ## What is the purpose of the change
   
   *(For example: This pull request adds a new feature to periodically create 
and maintain savepoints through the `FlinkDeployment` custom resource.)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *Periodic savepoint trigger is introduced to the custom resource*
 - *The operator checks on reconciliation whether the required time has 
passed*
 - *The JobManager's dispose savepoint API is used to clean up obsolete 
savepoints*
   
   ## Verifying this change
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Manually verified the change by running a 4 node cluster with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changes to the `CustomResourceDescriptors`: 
(yes / no)
 - Core observer or reconciler logic that is regularly executed: (yes / no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-35448) Translate pod templates documentation into Chinese

2024-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-35448:
---
Labels: pull-request-available  (was: )

> Translate pod templates documentation into Chinese
> --
>
> Key: FLINK-35448
> URL: https://issues.apache.org/jira/browse/FLINK-35448
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Caican Cai
>Priority: Minor
>  Labels: pull-request-available
>
> Translate pod templates documentation into Chinese
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35448) Translate pod templates documentation into Chinese

2024-05-24 Thread Caican Cai (Jira)
Caican Cai created FLINK-35448:
--

 Summary: Translate pod templates documentation into Chinese
 Key: FLINK-35448
 URL: https://issues.apache.org/jira/browse/FLINK-35448
 Project: Flink
  Issue Type: Sub-task
Reporter: Caican Cai


Translate pod templates documentation into Chinese
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-34487][ci] Adds Python Wheels nightly GHA workflow [flink]

2024-05-24 Thread via GitHub


morazow commented on PR #24426:
URL: https://github.com/apache/flink/pull/24426#issuecomment-2129494739

   Test run built wheel artifacts: 
https://github.com/morazow/flink/actions/runs/9224143298


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33759] [flink-parquet] Add support for nested array with row/map/array type [flink]

2024-05-24 Thread via GitHub


ViktorCosenza commented on PR #24795:
URL: https://github.com/apache/flink/pull/24795#issuecomment-2129462557

   @JingGe Are you able to trigger the CI manually? I think I't wasnt triggered 
after the squash because no changes were detected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34746] Switching to the Apache CDN for Dockerfile [flink-docker]

2024-05-24 Thread via GitHub


MartijnVisser commented on PR #190:
URL: https://github.com/apache/flink-docker/pull/190#issuecomment-2129446485

   @hlteoh37 Done, but I'm not sure if this actually needs to be backported to 
the other branches. @lincoln-lil do you know?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34487][ci] Adds Python Wheels nightly GHA workflow [flink]

2024-05-24 Thread via GitHub


morazow commented on PR #24426:
URL: https://github.com/apache/flink/pull/24426#issuecomment-2129446733

   Hey @XComp, @HuangXingBo ,
   
   What do you think of the latest commit changes for migration?
   
   This way both Linux & MacOS platform use similar build system, we also don't 
depend on bash script on Linux. All the wheel build requirements are defined in 
the `pyproject.toml` (python versions, architecture, etc) that are used for 
both platforms.
   
   For Linux I had to add `manylinux2014` version since `manylinux1` does not 
support python3.10+ versions.
   
   Please let me know what you think


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-35446) FileMergingSnapshotManagerBase throws a NullPointerException

2024-05-24 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849285#comment-17849285
 ] 

Ryan Skraba commented on FLINK-35446:
-

[~lijinzhong] or [~zakelly] Do you think this needs a similar fix as 
FLINK-35382 ? 

> FileMergingSnapshotManagerBase throws a NullPointerException
> 
>
> Key: FLINK-35446
> URL: https://issues.apache.org/jira/browse/FLINK-35446
> Project: Flink
>  Issue Type: Bug
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> * 1.20 Java 11 / Test (module: tests) 
> https://github.com/apache/flink/actions/runs/9217608897/job/25360103124#step:10:8641
> {{ResumeCheckpointManuallyITCase.testExternalizedIncrementalRocksDBCheckpointsWithLocalRecoveryZookeeper}}
>  throws a NullPointerException when it tries to restore state handles: 
> {code}
> Error: 02:57:52 02:57:52.551 [ERROR] Tests run: 48, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 268.6 s <<< FAILURE! -- in 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase
> Error: 02:57:52 02:57:52.551 [ERROR] 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedIncrementalRocksDBCheckpointsWithLocalRecoveryZookeeper[RestoreMode
>  = CLAIM] -- Time elapsed: 3.145 s <<< ERROR!
> May 24 02:57:52 org.apache.flink.runtime.JobException: Recovery is suppressed 
> by NoRestartBackoffTimeStrategy
> May 24 02:57:52   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:219)
> May 24 02:57:52   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailureAndReport(ExecutionFailureHandler.java:166)
> May 24 02:57:52   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:121)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.recordTaskFailure(DefaultScheduler.java:279)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:270)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.onTaskFailed(DefaultScheduler.java:263)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.onTaskExecutionStateUpdate(SchedulerBase.java:788)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:765)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:83)
> May 24 02:57:52   at 
> org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:496)
> May 24 02:57:52   at 
> jdk.internal.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
> May 24 02:57:52   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 24 02:57:52   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> May 24 02:57:52   at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRpcInvocation$1(PekkoRpcActor.java:318)
> May 24 02:57:52   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
> May 24 02:57:52   at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcInvocation(PekkoRpcActor.java:316)
> May 24 02:57:52   at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:229)
> May 24 02:57:52   at 
> org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:88)
> May 24 02:57:52   at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:174)
> May 24 02:57:52   at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33)
> May 24 02:57:52   at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29)
> May 24 02:57:52   at 
> scala.PartialFunction.applyOrElse(PartialFunction.scala:127)
> May 24 02:57:52   at 
> scala.PartialFunction.applyOrElse$(PartialFunction.scala:126)
> May 24 02:57:52   at 
> org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29)
> May 24 02:57:52   at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175)
> May 24 02:57:52   at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
> May 24 02:57:52   at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
> May 24 02:57:52   at 
> org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547)
> May 24 02:57:52   at 
> 

[jira] [Commented] (FLINK-35446) FileMergingSnapshotManagerBase throws a NullPointerException

2024-05-24 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849284#comment-17849284
 ] 

Ryan Skraba commented on FLINK-35446:
-

* 1.20 Java 11 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9217608897/job/25360103124#step:10:8641
* 1.20 Default (Java 8) / Test (module: table) 
https://github.com/apache/flink/actions/runs/9219075449/job/25363874486#step:10:11847
 {{PruneAggregateCallITCase.testNoneEmptyGroupKey}}
* 1.20 Default (Java 8) / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9219075449/job/25363874825#step:10:8005

The last one is different than the others: 
{code}
Error: 05:48:38 05:48:38.790 [ERROR] Tests run: 11, Failures: 1, Errors: 0, 
Skipped: 0, Time elapsed: 12.78 s <<< FAILURE! -- in 
org.apache.flink.test.classloading.ClassLoaderITCase
Error: 05:48:38 05:48:38.790 [ERROR] 
org.apache.flink.test.classloading.ClassLoaderITCase.testCheckpointedStreamingClassloaderJobWithCustomClassLoader
 -- Time elapsed: 2.492 s <<< FAILURE!
May 24 05:48:38 org.assertj.core.error.AssertJMultipleFailuresError: 
May 24 05:48:38 
May 24 05:48:38 Multiple Failures (1 failure)
May 24 05:48:38 -- failure 1 --
May 24 05:48:38 [Any cause is instance of class 'class 
org.apache.flink.util.SerializedThrowable' and contains message 
'org.apache.flink.test.classloading.jar.CheckpointedStreamingProgram$SuccessException']
 
May 24 05:48:38 Expecting any element of:
May 24 05:48:38   [org.apache.flink.client.program.ProgramInvocationException: 
The main method caused an error: Job execution failed.
May 24 05:48:38 at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:373)
May 24 05:48:38 at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:223)
May 24 05:48:38 at 
org.apache.flink.test.classloading.ClassLoaderITCase.lambda$testCheckpointedStreamingClassloaderJobWithCustomClassLoader$1(ClassLoaderITCase.java:260)
May 24 05:48:38 ...(54 remaining lines not displayed - this can be 
changed with Assertions.setMaxStackTraceElementsDisplayed),
May 24 05:48:38 org.apache.flink.runtime.client.JobExecutionException: Job 
execution failed.
May 24 05:48:38 at 
org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
May 24 05:48:38 at 
org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
May 24 05:48:38 at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
May 24 05:48:38 ...(45 remaining lines not displayed - this can be 
changed with Assertions.setMaxStackTraceElementsDisplayed),
May 24 05:48:38 org.apache.flink.runtime.JobException: Recovery is 
suppressed by FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=1, 
backoffTimeMS=100)
May 24 05:48:38 at 
org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:219)
May 24 05:48:38 at 
org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailureAndReport(ExecutionFailureHandler.java:166)
May 24 05:48:38 at 
org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:121)
May 24 05:48:38 ...(36 remaining lines not displayed - this can be 
changed with Assertions.setMaxStackTraceElementsDisplayed),
May 24 05:48:38 java.lang.NullPointerException
May 24 05:48:38 at 
org.apache.flink.runtime.checkpoint.filemerging.FileMergingSnapshotManagerBase.isManagedByFileMergingManager(FileMergingSnapshotManagerBase.java:733)
May 24 05:48:38 at 
org.apache.flink.runtime.checkpoint.filemerging.FileMergingSnapshotManagerBase.lambda$null$4(FileMergingSnapshotManagerBase.java:687)
May 24 05:48:38 at java.util.HashMap.computeIfAbsent(HashMap.java:1128)
May 24 05:48:38 ...(41 remaining lines not displayed - this can be 
changed with Assertions.setMaxStackTraceElementsDisplayed)]
May 24 05:48:38 to satisfy the given assertions requirements but none did:
May 24 05:48:38 
May 24 05:48:38 org.apache.flink.client.program.ProgramInvocationException: The 
main method caused an error: Job execution failed.
May 24 05:48:38 at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:373)
May 24 05:48:38 at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:223)
May 24 05:48:38 at 
org.apache.flink.test.classloading.ClassLoaderITCase.lambda$testCheckpointedStreamingClassloaderJobWithCustomClassLoader$1(ClassLoaderITCase.java:260)
May 24 05:48:38 ...(54 remaining lines not displayed - this can be 
changed with 

[jira] [Commented] (FLINK-35342) MaterializedTableStatementITCase test can check for wrong status

2024-05-24 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849283#comment-17849283
 ] 

Ryan Skraba commented on FLINK-35342:
-

* 1.20 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/9217608897/job/25360076574#step:10:12483

> MaterializedTableStatementITCase test can check for wrong status
> 
>
> Key: FLINK-35342
> URL: https://issues.apache.org/jira/browse/FLINK-35342
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Assignee: Feng Jin
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> * 1.20 AdaptiveScheduler / Test (module: table) 
> https://github.com/apache/flink/actions/runs/9056197319/job/24879135605#step:10:12490
>  
> It looks like 
> {{MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume}}
>  can be flaky, where the expected status is not yet RUNNING:
> {code}
> Error: 03:24:03 03:24:03.902 [ERROR] Tests run: 6, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 26.78 s <<< FAILURE! -- in 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase
> Error: 03:24:03 03:24:03.902 [ERROR] 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(Path,
>  RestClusterClient) -- Time elapsed: 3.850 s <<< FAILURE!
> May 13 03:24:03 org.opentest4j.AssertionFailedError: 
> May 13 03:24:03 
> May 13 03:24:03 expected: "RUNNING"
> May 13 03:24:03  but was: "CREATED"
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> May 13 03:24:03   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> May 13 03:24:03   at 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(MaterializedTableStatementITCase.java:650)
> May 13 03:24:03   at java.lang.reflect.Method.invoke(Method.java:498)
> May 13 03:24:03   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> May 13 03:24:03 
> May 13 03:24:04 03:24:04.270 [INFO] 
> May 13 03:24:04 03:24:04.270 [INFO] Results:
> May 13 03:24:04 03:24:04.270 [INFO] 
> Error: 03:24:04 03:24:04.270 [ERROR] Failures: 
> Error: 03:24:04 03:24:04.271 [ERROR]   
> MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume:650
>  
> May 13 03:24:04 expected: "RUNNING"
> May 13 03:24:04  but was: "CREATED"
> May 13 03:24:04 03:24:04.271 [INFO] 
> Error: 03:24:04 03:24:04.271 [ERROR] Tests run: 82, Failures: 1, Errors: 0, 
> Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28440) EventTimeWindowCheckpointingITCase failed with restore

2024-05-24 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849282#comment-17849282
 ] 

Ryan Skraba commented on FLINK-28440:
-

* 1.19 Hadoop 3.1.3 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9217608890/job/25360146799#step:10:8157

> EventTimeWindowCheckpointingITCase failed with restore
> --
>
> Key: FLINK-28440
> URL: https://issues.apache.org/jira/browse/FLINK-28440
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / State Backends
>Affects Versions: 1.16.0, 1.17.0, 1.18.0, 1.19.0
>Reporter: Huang Xingbo
>Assignee: Yanfei Lei
>Priority: Critical
>  Labels: auto-deprioritized-critical, pull-request-available, 
> stale-assigned, test-stability
> Fix For: 1.20.0
>
> Attachments: image-2023-02-01-00-51-54-506.png, 
> image-2023-02-01-01-10-01-521.png, image-2023-02-01-01-19-12-182.png, 
> image-2023-02-01-16-47-23-756.png, image-2023-02-01-16-57-43-889.png, 
> image-2023-02-02-10-52-56-599.png, image-2023-02-03-10-09-07-586.png, 
> image-2023-02-03-12-03-16-155.png, image-2023-02-03-12-03-56-614.png
>
>
> {code:java}
> Caused by: java.lang.Exception: Exception while creating 
> StreamOperatorStateContext.
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:256)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:722)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:698)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:665)
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935)
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:904)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_0a448493b4782967b150582570326227_(2/4) from 
> any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:353)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:165)
>   ... 11 more
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
> /tmp/junit1835099326935900400/junit1113650082510421526/52ee65b7-033f-4429-8ddd-adbe85e27ced
>  (No such file or directory)
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.advance(StateChangelogHandleStreamHandleReader.java:87)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.hasNext(StateChangelogHandleStreamHandleReader.java:69)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.readBackendHandle(ChangelogBackendRestoreOperation.java:96)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.restore(ChangelogBackendRestoreOperation.java:75)
>   at 
> org.apache.flink.state.changelog.ChangelogStateBackend.restore(ChangelogStateBackend.java:92)
>   at 
> org.apache.flink.state.changelog.AbstractChangelogStateBackend.createKeyedStateBackend(AbstractChangelogStateBackend.java:136)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:336)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   ... 13 more
> Caused by: java.io.FileNotFoundException: 
> 

Re: [PR] [FLINK-35129][cdc][postgres] Add checkpoint cycle option for commit offsets [flink-cdc]

2024-05-24 Thread via GitHub


morazow commented on code in PR #3349:
URL: https://github.com/apache/flink-cdc/pull/3349#discussion_r1613394340


##
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-postgres-cdc/src/main/java/org/apache/flink/cdc/connectors/postgres/source/reader/PostgresSourceReader.java:
##
@@ -104,6 +108,10 @@ public List snapshotState(long 
checkpointId) {
 
 @Override
 public void notifyCheckpointComplete(long checkpointId) throws Exception {
+this.checkpointCount = (this.checkpointCount + 1) % 
this.checkpointCycle;

Review Comment:
   Thanks guys for the feedback, I am going to check it  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-31223][sqlgateway] Introduce getFlinkConfigurationOptions to [flink]

2024-05-24 Thread via GitHub


davidradl commented on code in PR #24741:
URL: https://github.com/apache/flink/pull/24741#discussion_r1613388125


##
flink-table/flink-sql-client/src/test/java/org/apache/flink/table/client/SqlClientTestBase.java:
##
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.client;
+
+import org.apache.flink.core.testutils.CommonTestUtils;
+import org.apache.flink.table.client.cli.TerminalUtils;
+
+import org.jline.terminal.Size;
+import org.jline.terminal.Terminal;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.io.TempDir;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.File;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.StandardOpenOption;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import static 
org.apache.flink.configuration.ConfigConstants.ENV_FLINK_CONF_DIR;
+
+/** Base class for test {@link SqlClient}. */
+class SqlClientTestBase {
+@TempDir private Path tempFolder;
+
+protected String historyPath;
+
+protected Map originalEnv;
+
+@BeforeEach
+void before() throws IOException {
+originalEnv = System.getenv();
+
+// prepare conf dir
+File confFolder = Files.createTempDirectory(tempFolder, 
"conf").toFile();
+File confYaml = new File(confFolder, "config.yaml");

Review Comment:
   that was it - thanks for you support @reswqa 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33892][runtime] Support Job Recovery from JobMaster Failures for Batch Jobs. [flink]

2024-05-24 Thread via GitHub


zhuzhurk commented on code in PR #24771:
URL: https://github.com/apache/flink/pull/24771#discussion_r1612989352


##
flink-tests/src/test/java/org/apache/flink/test/scheduling/JMFailoverITCase.java:
##
@@ -532,9 +508,9 @@ private JobGraph 
createJobGraphWithUnsupportedBatchSnapshotOperatorCoordinator(
 return StreamingJobGraphGenerator.createJobGraph(streamGraph);
 }
 
-private static void fillKeepGoing(
-List indices, boolean going, Map 
keepGoing) {
-indices.forEach(index -> keepGoing.put(index, going));
+private static void fillBlockSubTasks(

Review Comment:
   fillBlockSubTasks -> setSubtaskBlocked



##
flink-runtime/src/test/java/org/apache/flink/runtime/scheduler/adaptivebatch/BatchJobRecoveryTest.java:
##
@@ -0,0 +1,1173 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.scheduler.adaptivebatch;
+
+import org.apache.flink.api.common.JobID;
+import org.apache.flink.api.common.eventtime.WatermarkAlignmentParams;
+import org.apache.flink.api.connector.source.Boundedness;
+import org.apache.flink.api.connector.source.mocks.MockSource;
+import org.apache.flink.api.connector.source.mocks.MockSourceSplit;
+import org.apache.flink.api.connector.source.mocks.MockSplitEnumerator;
+import org.apache.flink.configuration.BatchExecutionOptions;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.runtime.clusterframework.types.ResourceID;
+import org.apache.flink.runtime.execution.ExecutionState;
+import org.apache.flink.runtime.executiongraph.DefaultExecutionGraph;
+import org.apache.flink.runtime.executiongraph.Execution;
+import org.apache.flink.runtime.executiongraph.ExecutionAttemptID;
+import org.apache.flink.runtime.executiongraph.ExecutionGraph;
+import org.apache.flink.runtime.executiongraph.ExecutionJobVertex;
+import org.apache.flink.runtime.executiongraph.ExecutionVertex;
+import org.apache.flink.runtime.executiongraph.IntermediateResultPartition;
+import org.apache.flink.runtime.executiongraph.InternalExecutionGraphAccessor;
+import org.apache.flink.runtime.executiongraph.ResultPartitionBytes;
+import 
org.apache.flink.runtime.executiongraph.TestingComponentMainThreadExecutor;
+import 
org.apache.flink.runtime.executiongraph.failover.FixedDelayRestartBackoffTimeStrategy;
+import org.apache.flink.runtime.io.network.partition.JobMasterPartitionTracker;
+import 
org.apache.flink.runtime.io.network.partition.JobMasterPartitionTrackerImpl;
+import 
org.apache.flink.runtime.io.network.partition.PartitionNotFoundException;
+import org.apache.flink.runtime.io.network.partition.ResultPartitionID;
+import org.apache.flink.runtime.io.network.partition.ResultPartitionType;
+import org.apache.flink.runtime.jobgraph.DistributionPattern;
+import org.apache.flink.runtime.jobgraph.IntermediateResultPartitionID;
+import org.apache.flink.runtime.jobgraph.JobGraph;
+import org.apache.flink.runtime.jobgraph.JobVertex;
+import org.apache.flink.runtime.jobgraph.JobVertexID;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.jobmaster.event.ExecutionVertexFinishedEvent;
+import org.apache.flink.runtime.jobmaster.event.FileSystemJobEventStore;
+import org.apache.flink.runtime.jobmaster.event.JobEvent;
+import org.apache.flink.runtime.jobmaster.event.JobEventManager;
+import org.apache.flink.runtime.jobmaster.event.JobEventStore;
+import org.apache.flink.runtime.jobmaster.utils.TestingJobMasterGateway;
+import org.apache.flink.runtime.jobmaster.utils.TestingJobMasterGatewayBuilder;
+import org.apache.flink.runtime.operators.coordination.EventReceivingTasks;
+import 
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder;
+import 
org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator;
+import 
org.apache.flink.runtime.operators.coordination.TestingOperatorCoordinator;
+import org.apache.flink.runtime.scheduler.DefaultSchedulerBuilder;
+import org.apache.flink.runtime.scheduler.SchedulerBase;
+import org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID;
+import 

[jira] [Updated] (FLINK-35447) Flink CDC Document document file had removed but website can access

2024-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-35447:
---
Labels: pull-request-available  (was: )

> Flink CDC Document document file had removed but website can access
> ---
>
> Key: FLINK-35447
> URL: https://issues.apache.org/jira/browse/FLINK-35447
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Reporter: Zhongqiang Gong
>Priority: Minor
>  Labels: pull-request-available
>
> https://nightlies.apache.org/flink/flink-cdc-docs-master/docs/connectors/overview/
>  the link should not appeared.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-35447][doc-ci] Flink CDC Document document file had removed but website can access [flink-cdc]

2024-05-24 Thread via GitHub


GOODBOY008 commented on PR #3362:
URL: https://github.com/apache/flink-cdc/pull/3362#issuecomment-2129373300

   @leonardBang PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [FLINK-35447][doc-ci] Flink CDC Document document file had removed but website can access [flink-cdc]

2024-05-24 Thread via GitHub


GOODBOY008 opened a new pull request, #3362:
URL: https://github.com/apache/flink-cdc/pull/3362

   Solution:
   Deletes files from the destination directory that are not present in the 
source directory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35434] Support pass exception in StateExecutor to runtime [flink]

2024-05-24 Thread via GitHub


zoltar9264 commented on code in PR #24833:
URL: https://github.com/apache/flink/pull/24833#discussion_r1613094184


##
flink-state-backends/flink-statebackend-forst/src/main/java/org/apache/flink/state/forst/ForStWriteBatchOperation.java:
##
@@ -75,7 +74,10 @@ public CompletableFuture process() {
 request.completeStateFuture();
 }
 } catch (Exception e) {
-throw new CompletionException("Error while adding data 
to ForStDB", e);
+for (ForStDBPutRequest request : batchRequest) {

Review Comment:
   Thanks for reminder @masteryhx , firstly this batch operation should indeed 
fail in this case.  Currently, the  AsyncFrameworkExceptionHandler will fail 
the job directly, then the StateExecutor will be destroy. I don't see the need 
to let the state executor continue executing, so I will fail the executor and 
update the pr.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-35447) Flink CDC Document document file had removed but website can access

2024-05-24 Thread Zhongqiang Gong (Jira)
Zhongqiang Gong created FLINK-35447:
---

 Summary: Flink CDC Document document file had removed but website 
can access
 Key: FLINK-35447
 URL: https://issues.apache.org/jira/browse/FLINK-35447
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Zhongqiang Gong


https://nightlies.apache.org/flink/flink-cdc-docs-master/docs/connectors/overview/
 the link should not appeared.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35435) [FLIP-451] Introduce timeout configuration to AsyncSink

2024-05-24 Thread Ahmed Hamdy (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849272#comment-17849272
 ] 

Ahmed Hamdy commented on FLINK-35435:
-

[~danny.cranmer] Could you please take a look when you have time?

> [FLIP-451] Introduce timeout configuration to AsyncSink
> ---
>
> Key: FLINK-35435
> URL: https://issues.apache.org/jira/browse/FLINK-35435
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common
>Reporter: Ahmed Hamdy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
> Attachments: Screenshot 2024-05-24 at 11.06.30.png, Screenshot 
> 2024-05-24 at 12.06.20.png
>
>
> Implementation Ticket for:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-451%3A+Introduce+timeout+configuration+to+AsyncSink+API
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-35435] Add timeout Configuration to Async Sink [flink]

2024-05-24 Thread via GitHub


flinkbot commented on PR #24839:
URL: https://github.com/apache/flink/pull/24839#issuecomment-2129354320

   
   ## CI report:
   
   * 2b91a9c92af7d97d492a9c83d43d7c544f85d355 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-35435) [FLIP-451] Introduce timeout configuration to AsyncSink

2024-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-35435:
---
Labels: pull-request-available  (was: )

> [FLIP-451] Introduce timeout configuration to AsyncSink
> ---
>
> Key: FLINK-35435
> URL: https://issues.apache.org/jira/browse/FLINK-35435
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common
>Reporter: Ahmed Hamdy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
> Attachments: Screenshot 2024-05-24 at 11.06.30.png, Screenshot 
> 2024-05-24 at 12.06.20.png
>
>
> Implementation Ticket for:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-451%3A+Introduce+timeout+configuration+to+AsyncSink+API
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-35435] Add timeout Configuration to Async Sink [flink]

2024-05-24 Thread via GitHub


vahmed-hamdy opened a new pull request, #24839:
URL: https://github.com/apache/flink/pull/24839

   
   
   ## What is the purpose of the change
   
   Implementation of 
[FLIP-451](https://cwiki.apache.org/confluence/display/FLINK/FLIP-451%3A+Introduce+timeout+configuration+to+AsyncSink+API)
 introducing Timeout Configuration to Async Sink.
   
   
   ## Brief change log
   
   - Add `ResultHandler` class to be used by Sink implementers
   - Add `AsyncSinkWriterResultHandler` implementation that supports timeout
   - Add `requestTimeoutMs` and `failOnTimeout` configuration to 
`AsyncSinkWriterConfiguration` and to `AsyncSinkWriterConfigurationBuilder`
   - Add default values to `requestTimeoutMs` and `failOnTimeout` as suggested 
in FLIP-451
   - Add needed unit tests and refactored existing tests
   
   ## Verifying this change
   
   
   
   This change added tests and can be verified as follows:
   
   - Added unit tests
   - Performed Sanity testing and benchmarks on Kinesis Implementation as 
described in the [Ticket](https://issues.apache.org/jira/browse/FLINK-35435).
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: yes
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? yes
 - If yes, how is the feature documented? JavaDocs + [Follow Up Ticket
   ](https://issues.apache.org/jira/browse/FLINK-35445)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-35446) FileMergingSnapshotManagerBase throws a NullPointerException

2024-05-24 Thread Ryan Skraba (Jira)
Ryan Skraba created FLINK-35446:
---

 Summary: FileMergingSnapshotManagerBase throws a 
NullPointerException
 Key: FLINK-35446
 URL: https://issues.apache.org/jira/browse/FLINK-35446
 Project: Flink
  Issue Type: Bug
Reporter: Ryan Skraba


* 1.20 Java 11 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9217608897/job/25360103124#step:10:8641

{{ResumeCheckpointManuallyITCase.testExternalizedIncrementalRocksDBCheckpointsWithLocalRecoveryZookeeper}}
 throws a NullPointerException when it tries to restore state handles: 

{code}
Error: 02:57:52 02:57:52.551 [ERROR] Tests run: 48, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 268.6 s <<< FAILURE! -- in 
org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase
Error: 02:57:52 02:57:52.551 [ERROR] 
org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedIncrementalRocksDBCheckpointsWithLocalRecoveryZookeeper[RestoreMode
 = CLAIM] -- Time elapsed: 3.145 s <<< ERROR!
May 24 02:57:52 org.apache.flink.runtime.JobException: Recovery is suppressed 
by NoRestartBackoffTimeStrategy
May 24 02:57:52 at 
org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:219)
May 24 02:57:52 at 
org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailureAndReport(ExecutionFailureHandler.java:166)
May 24 02:57:52 at 
org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:121)
May 24 02:57:52 at 
org.apache.flink.runtime.scheduler.DefaultScheduler.recordTaskFailure(DefaultScheduler.java:279)
May 24 02:57:52 at 
org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:270)
May 24 02:57:52 at 
org.apache.flink.runtime.scheduler.DefaultScheduler.onTaskFailed(DefaultScheduler.java:263)
May 24 02:57:52 at 
org.apache.flink.runtime.scheduler.SchedulerBase.onTaskExecutionStateUpdate(SchedulerBase.java:788)
May 24 02:57:52 at 
org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:765)
May 24 02:57:52 at 
org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:83)
May 24 02:57:52 at 
org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:496)
May 24 02:57:52 at 
jdk.internal.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
May 24 02:57:52 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
May 24 02:57:52 at 
java.base/java.lang.reflect.Method.invoke(Method.java:566)
May 24 02:57:52 at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRpcInvocation$1(PekkoRpcActor.java:318)
May 24 02:57:52 at 
org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
May 24 02:57:52 at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcInvocation(PekkoRpcActor.java:316)
May 24 02:57:52 at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:229)
May 24 02:57:52 at 
org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:88)
May 24 02:57:52 at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:174)
May 24 02:57:52 at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33)
May 24 02:57:52 at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29)
May 24 02:57:52 at 
scala.PartialFunction.applyOrElse(PartialFunction.scala:127)
May 24 02:57:52 at 
scala.PartialFunction.applyOrElse$(PartialFunction.scala:126)
May 24 02:57:52 at 
org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29)
May 24 02:57:52 at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175)
May 24 02:57:52 at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
May 24 02:57:52 at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
May 24 02:57:52 at 
org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547)
May 24 02:57:52 at 
org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545)
May 24 02:57:52 at 
org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229)
May 24 02:57:52 at 
org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590)
May 24 02:57:52 at 
org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557)
May 24 02:57:52 at 
org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280)
May 24 02:57:52 at 

[jira] [Updated] (FLINK-35445) Update Async Sink documentation for Timeout configuration

2024-05-24 Thread Ahmed Hamdy (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hamdy updated FLINK-35445:

Parent: FLINK-35435
Issue Type: Sub-task  (was: Improvement)

> Update Async Sink documentation for Timeout configuration 
> --
>
> Key: FLINK-35445
> URL: https://issues.apache.org/jira/browse/FLINK-35445
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Common, Documentation
>Reporter: Ahmed Hamdy
>Priority: Major
> Fix For: 1.20.0
>
>
> Update Documentation for AsyncSink Changes introduced by 
> [FLIP-451|https://cwiki.apache.org/confluence/display/FLINK/FLIP-451%3A+Introduce+timeout+configuration+to+AsyncSink+API]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35445) Update Async Sink documentation for Timeout configuration

2024-05-24 Thread Ahmed Hamdy (Jira)
Ahmed Hamdy created FLINK-35445:
---

 Summary: Update Async Sink documentation for Timeout configuration 
 Key: FLINK-35445
 URL: https://issues.apache.org/jira/browse/FLINK-35445
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Common, Documentation
Reporter: Ahmed Hamdy
 Fix For: 1.20.0


Update Documentation for AsyncSink Changes introduced by 
[FLIP-451|https://cwiki.apache.org/confluence/display/FLINK/FLIP-451%3A+Introduce+timeout+configuration+to+AsyncSink+API]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35435) [FLIP-451] Introduce timeout configuration to AsyncSink

2024-05-24 Thread Ahmed Hamdy (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849270#comment-17849270
 ] 

Ahmed Hamdy commented on FLINK-35435:
-

h1. Non-Functional Backward compatibility 
To assure that we haven't imposed any regressions to existing implementers we 
tested {{ KinesisStreamsSink }} with default request timeout vs no timeout on 2 
levels

h2. Sanity testing

We have run the [example 
job|https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-aws-kinesis-streams/src/test/java/org/apache/flink/connector/kinesis/sink/examples/SinkIntoKinesis.java]
 with checkpoint interval of 10 seconds, we set the request timeout for 3 
minutes and verified no requests were retried due to timeout during a period of 
30 minutes of job execution.

h2. Performance Benchmark

I have benchmarked the kinesis sink with the default timeout (10 minutes) with 
batch size = 20, and default values of inflight requests.

The result show no difference (except for a small network blip)

h3. Sink With Timeout
 !Screenshot 2024-05-24 at 11.06.30.png! 

h3. Sink With No Timeout
 !Screenshot 2024-05-24 at 12.06.20.png! 

> [FLIP-451] Introduce timeout configuration to AsyncSink
> ---
>
> Key: FLINK-35435
> URL: https://issues.apache.org/jira/browse/FLINK-35435
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common
>Reporter: Ahmed Hamdy
>Priority: Major
> Fix For: 1.20.0
>
> Attachments: Screenshot 2024-05-24 at 11.06.30.png, Screenshot 
> 2024-05-24 at 12.06.20.png
>
>
> Implementation Ticket for:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-451%3A+Introduce+timeout+configuration+to+AsyncSink+API
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-35415][base] Fix compatibility with Flink 1.19 [flink-cdc]

2024-05-24 Thread via GitHub


yuxiqian commented on PR #3348:
URL: https://github.com/apache/flink-cdc/pull/3348#issuecomment-2129300597

   Pushed another commit to resolve CI issue, could @leonardBang please 
re-trigger the CI? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33892][runtime] Support Job Recovery from JobMaster Failures for Batch Jobs. [flink]

2024-05-24 Thread via GitHub


zhuzhurk commented on code in PR #24771:
URL: https://github.com/apache/flink/pull/24771#discussion_r1612975806


##
flink-runtime/src/test/java/org/apache/flink/runtime/scheduler/adaptivebatch/BatchJobRecoveryTest.java:
##
@@ -0,0 +1,1217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.scheduler.adaptivebatch;
+
+import org.apache.flink.api.common.JobID;
+import org.apache.flink.api.common.eventtime.WatermarkAlignmentParams;
+import org.apache.flink.api.connector.source.Boundedness;
+import org.apache.flink.api.connector.source.mocks.MockSource;
+import org.apache.flink.api.connector.source.mocks.MockSourceSplit;
+import org.apache.flink.api.connector.source.mocks.MockSplitEnumerator;
+import org.apache.flink.configuration.BatchExecutionOptions;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.runtime.clusterframework.types.ResourceID;
+import org.apache.flink.runtime.execution.ExecutionState;
+import org.apache.flink.runtime.executiongraph.DefaultExecutionGraph;
+import org.apache.flink.runtime.executiongraph.Execution;
+import org.apache.flink.runtime.executiongraph.ExecutionAttemptID;
+import org.apache.flink.runtime.executiongraph.ExecutionGraph;
+import org.apache.flink.runtime.executiongraph.ExecutionJobVertex;
+import org.apache.flink.runtime.executiongraph.ExecutionVertex;
+import org.apache.flink.runtime.executiongraph.IntermediateResultPartition;
+import org.apache.flink.runtime.executiongraph.InternalExecutionGraphAccessor;
+import org.apache.flink.runtime.executiongraph.ResultPartitionBytes;
+import 
org.apache.flink.runtime.executiongraph.TestingComponentMainThreadExecutor;
+import 
org.apache.flink.runtime.executiongraph.failover.FixedDelayRestartBackoffTimeStrategy;
+import org.apache.flink.runtime.io.network.partition.JobMasterPartitionTracker;
+import 
org.apache.flink.runtime.io.network.partition.JobMasterPartitionTrackerImpl;
+import 
org.apache.flink.runtime.io.network.partition.PartitionNotFoundException;
+import org.apache.flink.runtime.io.network.partition.ResultPartitionID;
+import org.apache.flink.runtime.io.network.partition.ResultPartitionType;
+import org.apache.flink.runtime.jobgraph.DistributionPattern;
+import org.apache.flink.runtime.jobgraph.IntermediateResultPartitionID;
+import org.apache.flink.runtime.jobgraph.JobGraph;
+import org.apache.flink.runtime.jobgraph.JobVertex;
+import org.apache.flink.runtime.jobgraph.JobVertexID;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.jobmaster.event.ExecutionVertexFinishedEvent;
+import org.apache.flink.runtime.jobmaster.event.FileSystemJobEventStore;
+import org.apache.flink.runtime.jobmaster.event.JobEvent;
+import org.apache.flink.runtime.jobmaster.event.JobEventManager;
+import org.apache.flink.runtime.jobmaster.event.JobEventStore;
+import org.apache.flink.runtime.jobmaster.utils.TestingJobMasterGateway;
+import org.apache.flink.runtime.jobmaster.utils.TestingJobMasterGatewayBuilder;
+import org.apache.flink.runtime.operators.coordination.EventReceivingTasks;
+import 
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder;
+import 
org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator;
+import 
org.apache.flink.runtime.operators.coordination.TestingOperatorCoordinator;
+import org.apache.flink.runtime.scheduler.DefaultSchedulerBuilder;
+import org.apache.flink.runtime.scheduler.SchedulerBase;
+import org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID;
+import org.apache.flink.runtime.scheduler.strategy.SchedulingResultPartition;
+import org.apache.flink.runtime.scheduler.strategy.SchedulingTopology;
+import org.apache.flink.runtime.shuffle.DefaultShuffleMetrics;
+import org.apache.flink.runtime.shuffle.JobShuffleContextImpl;
+import org.apache.flink.runtime.shuffle.NettyShuffleDescriptor;
+import org.apache.flink.runtime.shuffle.NettyShuffleMaster;
+import org.apache.flink.runtime.shuffle.PartitionWithMetrics;
+import org.apache.flink.runtime.shuffle.ShuffleDescriptor;
+import org.apache.flink.runtime.shuffle.ShuffleMaster;
+import 

[jira] [Updated] (FLINK-35435) [FLIP-451] Introduce timeout configuration to AsyncSink

2024-05-24 Thread Ahmed Hamdy (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hamdy updated FLINK-35435:

Attachment: Screenshot 2024-05-24 at 11.06.30.png
Screenshot 2024-05-24 at 12.06.20.png

> [FLIP-451] Introduce timeout configuration to AsyncSink
> ---
>
> Key: FLINK-35435
> URL: https://issues.apache.org/jira/browse/FLINK-35435
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common
>Reporter: Ahmed Hamdy
>Priority: Major
> Fix For: 1.20.0
>
> Attachments: Screenshot 2024-05-24 at 11.06.30.png, Screenshot 
> 2024-05-24 at 12.06.20.png
>
>
> Implementation Ticket for:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-451%3A+Introduce+timeout+configuration+to+AsyncSink+API
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33211][table] support flink table lineage [flink]

2024-05-24 Thread via GitHub


davidradl commented on code in PR #24618:
URL: https://github.com/apache/flink/pull/24618#discussion_r1613290123


##
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/lineage/LineageGraph.java:
##
@@ -20,13 +20,12 @@
 package org.apache.flink.streaming.api.lineage;
 
 import org.apache.flink.annotation.PublicEvolving;
-import org.apache.flink.streaming.api.graph.StreamGraph;
 
 import java.util.List;
 
 /**
- * Job lineage is built according to {@link StreamGraph}, users can get 
sources, sinks and
- * relationships from lineage and manage the relationship between jobs and 
tables.
+ * Job lineage graph that users can get sources, sinks and relationships from 
lineage and manage the

Review Comment:
   > Thanks David for your comments. Yes, the documentation will be added after 
adding the job lineage listener which is more user facing. It is planned in 
this jira https://issues.apache.org/jira/browse/FLINK-33212. This PR only 
consider source/sink level lineage. Column level lineage is not included for 
this work, so internal transformations not need lineage info for now. Would you 
please elaborate more about "I assume a sink could be a source - so could be in 
both current lists"?
   
   Hi Peter, usually we think of lineage assets as the nodes in the lineage 
(e.g. open lineage). So the asset could be a Kafka topic and that topic would 
be being used as a source for some flows and a sink for other flows. I was 
wondering how this fits with  lineage at the table level, where there could be 
a table defined as a sink and a table defined as a source on the same Kafka 
topic. I guess when exporting / exposing to open lineage there could be many 
Flink tables referring to the same topic that would end up as one open lineage 
node. The natural way for Flink to store the lineage is at the table level - 
rather than at the asset level. So thinking about it, I think this is fine. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35406] Use inner serializer when casting RAW type to BINARY or… [flink]

2024-05-24 Thread via GitHub


twalthr commented on code in PR #24818:
URL: https://github.com/apache/flink/pull/24818#discussion_r1613283206


##
flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/functions/CastFunctionMiscITCase.java:
##
@@ -384,4 +410,15 @@ public static class LocalDateTimeToRaw extends 
ScalarFunction {
 return LocalDateTime.parse(str);
 }
 }
+
+public static byte[] serializeLocalDateTime(LocalDateTime localDateTime) {

Review Comment:
   use the existing utility 
`org.apache.flink.util.InstantiationUtil#serializeToByteArray`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (FLINK-34746) Switching to the Apache CDN for Dockerfile

2024-05-24 Thread Martijn Visser (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser closed FLINK-34746.
--
Resolution: Fixed

Fixed in apache/flink-docker@master 883600747505c128d97e9d25c9326f0c6f1d31e4

> Switching to the Apache CDN for Dockerfile
> --
>
> Key: FLINK-34746
> URL: https://issues.apache.org/jira/browse/FLINK-34746
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-docker
>Reporter: lincoln lee
>Assignee: Martijn Visser
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
>
> During publishing the official image, we received some comments
> for Switching to the Apache CDN
>  
> See
> https://github.com/docker-library/official-images/pull/16114
> https://github.com/docker-library/official-images/pull/16430
>  
> Reason for switching: [https://apache.org/history/mirror-history.html] (also 
> [https://www.apache.org/dyn/closer.cgi] and [https://www.apache.org/mirrors])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-34746] Switching to the Apache CDN for Dockerfile [flink-docker]

2024-05-24 Thread via GitHub


MartijnVisser merged PR #190:
URL: https://github.com/apache/flink-docker/pull/190


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35326][hive] add lineage integration for Hive connector [flink]

2024-05-24 Thread via GitHub


davidradl commented on PR #24835:
URL: https://github.com/apache/flink/pull/24835#issuecomment-2129222164

   Shouldn't this pr go in after 
[https://github.com/apache/flink/pull/24618](https://github.com/apache/flink/pull/24618)
 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34487][ci] Adds Python Wheels nightly GHA workflow [flink]

2024-05-24 Thread via GitHub


morazow commented on code in PR #24426:
URL: https://github.com/apache/flink/pull/24426#discussion_r1613261308


##
.github/workflows/nightly.yml:
##
@@ -94,3 +94,51 @@ jobs:
   s3_bucket: ${{ secrets.IT_CASE_S3_BUCKET }}
   s3_access_key: ${{ secrets.IT_CASE_S3_ACCESS_KEY }}
   s3_secret_key: ${{ secrets.IT_CASE_S3_SECRET_KEY }}
+
+  build_python_wheels:
+name: "Build Python Wheels on ${{ matrix.os }}"
+runs-on: ${{ matrix.os }}
+strategy:
+  fail-fast: false
+  matrix:
+include:
+  - os: ubuntu-latest
+os_name: linux
+python-version: 3.9
+  - os: macos-latest
+os_name: macos
+python-version: 3.9

Review Comment:
   Yes on macos the only pyproject.toml is used by cibuildwheel. Tools seems to 
be well documented, maybe we could also use it for linux builds.
   
   It uses the `cp38-*`, etc for Python versions, 
https://cibuildwheel.pypa.io/en/stable/options/#build-skip. For Linux we are 
running bash scripts for building wheel, but we could let the cibuildwheel 
action do for both platforms.
   
   We can ask @HuangXingBo for final review



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-35408) Add 30 min tolerance value when validating the time-zone setting

2024-05-24 Thread Xiao Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Huang updated FLINK-35408:
---
Affects Version/s: cdc-3.1.0

> Add 30 min tolerance value when validating the time-zone setting
> 
>
> Key: FLINK-35408
> URL: https://issues.apache.org/jira/browse/FLINK-35408
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Affects Versions: cdc-3.1.0
>Reporter: Xiao Huang
>Priority: Minor
>  Labels: pull-request-available
>
> Now, MySQL CDC connector will retrieve the offset seconds between the 
> configured timezone and UTC by executing the SQL statement below, and then 
> compare it with the configured timezone.
> {code:java}
> SELECT TIME_TO_SEC(TIMEDIFF(NOW(), UTC_TIMESTAMP())) {code}
> For some MySQL instances, the validating for time-zone is too strict. We can 
> add 30min tolerance value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35409) Request more splits if all splits are filtered from addSplits method

2024-05-24 Thread Xiao Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Huang updated FLINK-35409:
---
Fix Version/s: cdc-3.2.0

> Request more splits if all splits are filtered from addSplits method
> 
>
> Key: FLINK-35409
> URL: https://issues.apache.org/jira/browse/FLINK-35409
> Project: Flink
>  Issue Type: Bug
>  Components: Flink CDC
>Affects Versions: cdc-3.1.0
>Reporter: Xiao Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: cdc-3.2.0
>
>
> Suppose this scenario: A job is still in the snapshot phase, and the 
> remaining uncompleted snapshot splits all belong to a few tables that have 
> been deleted by the user.
> In such case, when restarting from a savepoint, these uncompleted snapshot 
> splits will not trigger a call to the addSplits method. Moreover, since the 
> BinlogSplit has not been sent yet, the job will not start the SplitReader to 
> read data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35409) Request more splits if all splits are filtered from addSplits method

2024-05-24 Thread Xiao Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Huang updated FLINK-35409:
---
Affects Version/s: cdc-3.1.0

> Request more splits if all splits are filtered from addSplits method
> 
>
> Key: FLINK-35409
> URL: https://issues.apache.org/jira/browse/FLINK-35409
> Project: Flink
>  Issue Type: Bug
>  Components: Flink CDC
>Affects Versions: cdc-3.1.0
>Reporter: Xiao Huang
>Priority: Minor
>  Labels: pull-request-available
>
> Suppose this scenario: A job is still in the snapshot phase, and the 
> remaining uncompleted snapshot splits all belong to a few tables that have 
> been deleted by the user.
> In such case, when restarting from a savepoint, these uncompleted snapshot 
> splits will not trigger a call to the addSplits method. Moreover, since the 
> BinlogSplit has not been sent yet, the job will not start the SplitReader to 
> read data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35408) Add 30 min tolerance value when validating the time-zone setting

2024-05-24 Thread Xiao Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Huang updated FLINK-35408:
---
Fix Version/s: cdc-3.2.0

> Add 30 min tolerance value when validating the time-zone setting
> 
>
> Key: FLINK-35408
> URL: https://issues.apache.org/jira/browse/FLINK-35408
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Affects Versions: cdc-3.1.0
>Reporter: Xiao Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: cdc-3.2.0
>
>
> Now, MySQL CDC connector will retrieve the offset seconds between the 
> configured timezone and UTC by executing the SQL statement below, and then 
> compare it with the configured timezone.
> {code:java}
> SELECT TIME_TO_SEC(TIMEDIFF(NOW(), UTC_TIMESTAMP())) {code}
> For some MySQL instances, the validating for time-zone is too strict. We can 
> add 30min tolerance value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35409) Request more splits if all splits are filtered from addSplits method

2024-05-24 Thread Xiao Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Huang updated FLINK-35409:
---
Issue Type: Bug  (was: Improvement)

> Request more splits if all splits are filtered from addSplits method
> 
>
> Key: FLINK-35409
> URL: https://issues.apache.org/jira/browse/FLINK-35409
> Project: Flink
>  Issue Type: Bug
>  Components: Flink CDC
>Reporter: Xiao Huang
>Priority: Minor
>  Labels: pull-request-available
>
> Suppose this scenario: A job is still in the snapshot phase, and the 
> remaining uncompleted snapshot splits all belong to a few tables that have 
> been deleted by the user.
> In such case, when restarting from a savepoint, these uncompleted snapshot 
> splits will not trigger a call to the addSplits method. Moreover, since the 
> BinlogSplit has not been sent yet, the job will not start the SplitReader to 
> read data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35409) Request more splits if all splits are filtered from addSplits method

2024-05-24 Thread Xiao Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Huang updated FLINK-35409:
---
Description: 
Suppose this scenario: A job is still in the snapshot phase, and the remaining 
uncompleted snapshot splits all belong to a few tables that have been deleted 
by the user.

In such case, when restarting from a savepoint, these uncompleted snapshot 
splits will not trigger a call to the addSplits method. Moreover, since the 
BinlogSplit has not been sent yet, the job will not start the SplitReader to 
read data.

> Request more splits if all splits are filtered from addSplits method
> 
>
> Key: FLINK-35409
> URL: https://issues.apache.org/jira/browse/FLINK-35409
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Reporter: Xiao Huang
>Priority: Minor
>  Labels: pull-request-available
>
> Suppose this scenario: A job is still in the snapshot phase, and the 
> remaining uncompleted snapshot splits all belong to a few tables that have 
> been deleted by the user.
> In such case, when restarting from a savepoint, these uncompleted snapshot 
> splits will not trigger a call to the addSplits method. Moreover, since the 
> BinlogSplit has not been sent yet, the job will not start the SplitReader to 
> read data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-34487][ci] Adds Python Wheels nightly GHA workflow [flink]

2024-05-24 Thread via GitHub


morazow commented on code in PR #24426:
URL: https://github.com/apache/flink/pull/24426#discussion_r1613227700


##
.github/workflows/nightly.yml:
##
@@ -94,3 +94,51 @@ jobs:
   s3_bucket: ${{ secrets.IT_CASE_S3_BUCKET }}
   s3_access_key: ${{ secrets.IT_CASE_S3_ACCESS_KEY }}
   s3_secret_key: ${{ secrets.IT_CASE_S3_SECRET_KEY }}
+
+  build_python_wheels:
+name: "Build Python Wheels on ${{ matrix.os }}"

Review Comment:
   Additionally, I checked the wheel artifacts from above latest Azure 
pipelines run, they are more or less same size as GitHub actions run



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-35408) Add 30 min tolerance value when validating the time-zone setting

2024-05-24 Thread Xiao Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Huang updated FLINK-35408:
---
Description: 
Now, MySQL CDC connector will retrieve the offset seconds between the 
configured timezone and UTC by executing the SQL statement below, and then 
compare it with the configured timezone.
{code:java}
SELECT TIME_TO_SEC(TIMEDIFF(NOW(), UTC_TIMESTAMP())) {code}
For some MySQL instances, the validating for time-zone is too strict. We can 
add 30min tolerance value.

  was:
Now, MySQL CDC connector will retrieve the number of seconds the 
database-configured timezone is offset from UTC by executing SQL statement 
below, and then compare it with the configured timezone.
{code:java}
SELECT TIME_TO_SEC(TIMEDIFF(NOW(), UTC_TIMESTAMP())) {code}
For some MySQL instances, the validating for time-zone is too strict. We can 
add 30min tolerance value.


> Add 30 min tolerance value when validating the time-zone setting
> 
>
> Key: FLINK-35408
> URL: https://issues.apache.org/jira/browse/FLINK-35408
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Reporter: Xiao Huang
>Priority: Minor
>  Labels: pull-request-available
>
> Now, MySQL CDC connector will retrieve the offset seconds between the 
> configured timezone and UTC by executing the SQL statement below, and then 
> compare it with the configured timezone.
> {code:java}
> SELECT TIME_TO_SEC(TIMEDIFF(NOW(), UTC_TIMESTAMP())) {code}
> For some MySQL instances, the validating for time-zone is too strict. We can 
> add 30min tolerance value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-34487][ci] Adds Python Wheels nightly GHA workflow [flink]

2024-05-24 Thread via GitHub


morazow commented on code in PR #24426:
URL: https://github.com/apache/flink/pull/24426#discussion_r1613226372


##
.github/workflows/nightly.yml:
##
@@ -94,3 +94,51 @@ jobs:
   s3_bucket: ${{ secrets.IT_CASE_S3_BUCKET }}
   s3_access_key: ${{ secrets.IT_CASE_S3_ACCESS_KEY }}
   s3_secret_key: ${{ secrets.IT_CASE_S3_SECRET_KEY }}
+
+  build_python_wheels:
+name: "Build Python Wheels on ${{ matrix.os }}"

Review Comment:
   Hey @XComp,
   
   I think this is fine. The Azure Pipelines 
(https://github.com/apache/flink/blob/master/tools/azure-pipelines/build-python-wheels.yml)
 also run like that, builds each wheel in a separate job. 
   
   From [GitHub migration 
doc](https://docs.github.com/en/actions/migrating-to-github-actions/manually-migrating-to-github-actions/migrating-from-azure-pipelines-to-github-actions#migrating-jobs-and-steps):
   
   ```
   - Jobs contain a series of steps that run sequentially.
   - Jobs run on separate virtual machines or in separate containers.
   - Jobs run in parallel by default, but can be configured to run sequentially.
   ```
   
   So should be fine to build each wheel separately without the Flink build 
similar to the Azure Pipelines.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-35408) Add 30 min tolerance value when validating the time-zone setting

2024-05-24 Thread Xiao Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Huang updated FLINK-35408:
---
Description: 
Now, MySQL CDC connector will retrieve the number of seconds the 
database-configured timezone is offset from UTC by executing SQL statement 
below, and then compare it with the configured timezone.
{code:java}
SELECT TIME_TO_SEC(TIMEDIFF(NOW(), UTC_TIMESTAMP())) {code}
For some MySQL instances, the validating for time-zone is too strict. We can 
add 30min tolerance value.

> Add 30 min tolerance value when validating the time-zone setting
> 
>
> Key: FLINK-35408
> URL: https://issues.apache.org/jira/browse/FLINK-35408
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Reporter: Xiao Huang
>Priority: Minor
>  Labels: pull-request-available
>
> Now, MySQL CDC connector will retrieve the number of seconds the 
> database-configured timezone is offset from UTC by executing SQL statement 
> below, and then compare it with the configured timezone.
> {code:java}
> SELECT TIME_TO_SEC(TIMEDIFF(NOW(), UTC_TIMESTAMP())) {code}
> For some MySQL instances, the validating for time-zone is too strict. We can 
> add 30min tolerance value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-34746] Switching to the Apache CDN for Dockerfile [flink-docker]

2024-05-24 Thread via GitHub


hlteoh37 commented on PR #190:
URL: https://github.com/apache/flink-docker/pull/190#issuecomment-2129138391

   @MartijnVisser can we merge this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (FLINK-35298) Improve metric reporter logic

2024-05-24 Thread Leonard Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Xu resolved FLINK-35298.

Resolution: Implemented

via master: 8e8fd304afdd9668247a8869698e0949806cad7b

> Improve metric reporter logic
> -
>
> Key: FLINK-35298
> URL: https://issues.apache.org/jira/browse/FLINK-35298
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: cdc-3.1.0
>Reporter: Xiao Huang
>Assignee: Xiao Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: cdc-3.2.0
>
>
> * In snapshot phase, set -1 as the value of currentFetchEventTimeLag metric.
>  * Support currentEmitEventTimeLag metric.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35298) Improve metric reporter logic

2024-05-24 Thread Leonard Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Xu reassigned FLINK-35298:
--

Assignee: Xiao Huang

> Improve metric reporter logic
> -
>
> Key: FLINK-35298
> URL: https://issues.apache.org/jira/browse/FLINK-35298
> Project: Flink
>  Issue Type: Improvement
>Reporter: Xiao Huang
>Assignee: Xiao Huang
>Priority: Minor
>  Labels: pull-request-available
>
> * In snapshot phase, set -1 as the value of currentFetchEventTimeLag metric.
>  * Support currentEmitEventTimeLag metric.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35298) Improve metric reporter logic

2024-05-24 Thread Leonard Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Xu updated FLINK-35298:
---
Fix Version/s: cdc-3.2.0

> Improve metric reporter logic
> -
>
> Key: FLINK-35298
> URL: https://issues.apache.org/jira/browse/FLINK-35298
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: cdc-3.1.0
>Reporter: Xiao Huang
>Assignee: Xiao Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: cdc-3.2.0
>
>
> * In snapshot phase, set -1 as the value of currentFetchEventTimeLag metric.
>  * Support currentEmitEventTimeLag metric.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35298) Improve metric reporter logic

2024-05-24 Thread Leonard Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Xu updated FLINK-35298:
---
Affects Version/s: cdc-3.1.0

> Improve metric reporter logic
> -
>
> Key: FLINK-35298
> URL: https://issues.apache.org/jira/browse/FLINK-35298
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: cdc-3.1.0
>Reporter: Xiao Huang
>Assignee: Xiao Huang
>Priority: Minor
>  Labels: pull-request-available
>
> * In snapshot phase, set -1 as the value of currentFetchEventTimeLag metric.
>  * Support currentEmitEventTimeLag metric.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-35298][cdc] improve fetch delay metric reporter logic [flink-cdc]

2024-05-24 Thread via GitHub


leonardBang merged PR #3298:
URL: https://github.com/apache/flink-cdc/pull/3298


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-35300) Improve MySqlStreamingChangeEventSource to skip null events in event deserializer

2024-05-24 Thread Xiao Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Huang updated FLINK-35300:
---
Description: As described in title.

> Improve MySqlStreamingChangeEventSource to skip null events in event 
> deserializer
> -
>
> Key: FLINK-35300
> URL: https://issues.apache.org/jira/browse/FLINK-35300
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Reporter: Xiao Huang
>Priority: Minor
>  Labels: pull-request-available
>
> As described in title.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-35129][cdc][postgres] Add checkpoint cycle option for commit offsets [flink-cdc]

2024-05-24 Thread via GitHub


leonardBang commented on code in PR #3349:
URL: https://github.com/apache/flink-cdc/pull/3349#discussion_r1613201109


##
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-postgres-cdc/src/main/java/org/apache/flink/cdc/connectors/postgres/source/reader/PostgresSourceReader.java:
##
@@ -104,6 +108,10 @@ public List snapshotState(long 
checkpointId) {
 
 @Override
 public void notifyCheckpointComplete(long checkpointId) throws Exception {
+this.checkpointCount = (this.checkpointCount + 1) % 
this.checkpointCycle;

Review Comment:
   +1 for @loserwang1024 ‘s comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-35297) Add validation for option connect.timeout

2024-05-24 Thread Xiao Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Huang updated FLINK-35297:
---
Description: the value of option `connector.timeout` needs to be checked at 
compile time.

> Add validation for option connect.timeout
> -
>
> Key: FLINK-35297
> URL: https://issues.apache.org/jira/browse/FLINK-35297
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Reporter: Xiao Huang
>Priority: Minor
>  Labels: pull-request-available
>
> the value of option `connector.timeout` needs to be checked at compile time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-35129][cdc][postgres] Add checkpoint cycle option for commit offsets [flink-cdc]

2024-05-24 Thread via GitHub


loserwang1024 commented on code in PR #3349:
URL: https://github.com/apache/flink-cdc/pull/3349#discussion_r1613185257


##
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-postgres-cdc/src/main/java/org/apache/flink/cdc/connectors/postgres/source/reader/PostgresSourceReader.java:
##
@@ -104,6 +108,10 @@ public List snapshotState(long 
checkpointId) {
 
 @Override
 public void notifyCheckpointComplete(long checkpointId) throws Exception {
+this.checkpointCount = (this.checkpointCount + 1) % 
this.checkpointCycle;

Review Comment:
   @morazow, great job. I generally agree with your approach. However, I 
currently have a different perspective. Instead of committing at every third 
checkpoint cycle (rolling window), I prefer to commit the offsets three 
checkpoints in advance of current checkpoint (sliding window).
   
   For a detailed design, we can store successful checkpoint IDs in a min heap, 
whose size is three (as decided by the configuration). When a checkpoint is 
successfully performed, we can push its ID into the heap and take the minimum 
checkpoint ID value, then commit it. By doing this, we always have three 
checkpoints whose offsets have not been recycled.
   
   (P.S.: Let's log the heap at each checkpoint, so users can know from which 
checkpoint IDs they can restore.)
   
   @leonardBang , @ruanhang1993 , CC, WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35129][cdc][postgres] Add checkpoint cycle option for commit offsets [flink-cdc]

2024-05-24 Thread via GitHub


loserwang1024 commented on code in PR #3349:
URL: https://github.com/apache/flink-cdc/pull/3349#discussion_r1613185257


##
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-postgres-cdc/src/main/java/org/apache/flink/cdc/connectors/postgres/source/reader/PostgresSourceReader.java:
##
@@ -104,6 +108,10 @@ public List snapshotState(long 
checkpointId) {
 
 @Override
 public void notifyCheckpointComplete(long checkpointId) throws Exception {
+this.checkpointCount = (this.checkpointCount + 1) % 
this.checkpointCycle;

Review Comment:
   @morazow, great job. I generally agree with your approach. However, I 
currently have a different perspective. Instead of committing at every third 
checkpoint cycle (rolling window), I prefer to commit the offsets three 
checkpoints in advance (sliding window).
   
   For a detailed design, we can store successful checkpoint IDs in a min heap, 
whose size is three (as decided by the configuration). When a checkpoint is 
successfully performed, we can push its ID into the heap and take the minimum 
checkpoint ID value, then commit it. By doing this, we always have three 
checkpoints whose offsets have not been recycled.
   
   (P.S.: Let's log the heap at each checkpoint, so users can know from which 
checkpoint IDs they can restore.)
   
   @leonardBang , @ruanhang1993 , CC, WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-35298) Improve metric reporter logic

2024-05-24 Thread Xiao Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Huang updated FLINK-35298:
---
Description: 
* In snapshot phase, set -1 as the value of currentFetchEventTimeLag metric.
 * Support currentEmitEventTimeLag metric.

> Improve metric reporter logic
> -
>
> Key: FLINK-35298
> URL: https://issues.apache.org/jira/browse/FLINK-35298
> Project: Flink
>  Issue Type: Improvement
>Reporter: Xiao Huang
>Priority: Minor
>  Labels: pull-request-available
>
> * In snapshot phase, set -1 as the value of currentFetchEventTimeLag metric.
>  * Support currentEmitEventTimeLag metric.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35295) Improve jdbc connection pool initialization failure message

2024-05-24 Thread Xiao Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Huang updated FLINK-35295:
---
Description: As described in ticket title.

> Improve jdbc connection pool initialization failure message
> ---
>
> Key: FLINK-35295
> URL: https://issues.apache.org/jira/browse/FLINK-35295
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Reporter: Xiao Huang
>Priority: Major
>  Labels: pull-request-available
>
> As described in ticket title.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35294) Use source config to check if the filter should be applied in timestamp starting mode

2024-05-24 Thread Xiao Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Huang updated FLINK-35294:
---
Fix Version/s: cdc-3.2.0

> Use source config to check if the filter should be applied in timestamp 
> starting mode
> -
>
> Key: FLINK-35294
> URL: https://issues.apache.org/jira/browse/FLINK-35294
> Project: Flink
>  Issue Type: Bug
>  Components: Flink CDC
>Affects Versions: cdc-3.1.0
>Reporter: Xiao Huang
>Priority: Major
>  Labels: pull-request-available
> Fix For: cdc-3.2.0
>
>
> Since MySQL does not support the ability to quickly locate an binlog offset 
> through a timestamp, the current logic for starting from a timestamp is to 
> begin from the earliest binlog offset and then filter out the data before the 
> user-specified position.
> If the user restarts the job during the filtering process, this filter will 
> become ineffective.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35294) Use source config to check if the filter should be applied in timestamp starting mode

2024-05-24 Thread Xiao Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Huang updated FLINK-35294:
---
Affects Version/s: cdc-3.1.0

> Use source config to check if the filter should be applied in timestamp 
> starting mode
> -
>
> Key: FLINK-35294
> URL: https://issues.apache.org/jira/browse/FLINK-35294
> Project: Flink
>  Issue Type: Bug
>  Components: Flink CDC
>Affects Versions: cdc-3.1.0
>Reporter: Xiao Huang
>Priority: Major
>  Labels: pull-request-available
>
> Since MySQL does not support the ability to quickly locate an binlog offset 
> through a timestamp, the current logic for starting from a timestamp is to 
> begin from the earliest binlog offset and then filter out the data before the 
> user-specified position.
> If the user restarts the job during the filtering process, this filter will 
> become ineffective.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34582) release build tools lost the newly added py3.11 packages for mac

2024-05-24 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849229#comment-17849229
 ] 

Muhammet Orazov commented on FLINK-34582:
-

Ohh no. Yes indeed you are right [~mapohl] , thanks for the update (y)

> release build tools lost the newly added py3.11 packages for mac
> 
>
> Key: FLINK-34582
> URL: https://issues.apache.org/jira/browse/FLINK-34582
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.19.0, 1.20.0
>Reporter: lincoln lee
>Assignee: Xingbo Huang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.20.0
>
> Attachments: image-2024-03-07-10-39-49-341.png
>
>
> during 1.19.0-rc1 building binaries via 
> tools/releasing/create_binary_release.sh
> lost the newly added py3.11  2 packages for mac



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   >