[jira] [Commented] (FLINK-35697) Release Testing: Verify FLIP-451 Introduce timeout configuration to AsyncSink

2024-07-08 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863777#comment-17863777
 ] 

Muhammet Orazov commented on FLINK-35697:
-

Hey all,

I have finished the regression testing, I didn't see any regression issues on 
using 1.20-SNAPSHOT sinks, mainly tested using AWS Kinesis.

 

This ticket is done from my side, we should be good to go with next steps.

cc: [~chalixar] , [~Weijie Guo] 

> Release Testing: Verify FLIP-451 Introduce timeout configuration to AsyncSink
> -
>
> Key: FLINK-35697
> URL: https://issues.apache.org/jira/browse/FLINK-35697
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Common
>Affects Versions: 1.20.0
>Reporter: Ahmed Hamdy
>Assignee: Muhammet Orazov
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.20.0
>
>
> h2. Description
> In FLIP-451 we added Timeout configuration to {{AsyncSinkWriter}}, with 
> default value of 10 minutes and default failOnTimeout to false. 
> We need to test the new feature on different levels
> - Functional Testing
> - Performance Testing
> - Regression Testing
> h2. Common Utils
> The feature introduced affects an abstract {{AsyncSinkWriter}} class. we need 
> to use an implementation sink for our tests, Any implementation where we can 
> track delivery of elements is accepted in our tests, an example is:
> {code}
> class DiscardingElementWriter extends AsyncSinkWriter {
> SeparateThreadExecutor executor =
> new SeparateThreadExecutor(r -> new Thread(r, 
> "DiscardingElementWriter"));
> public DiscardingElementWriter(
> Sink.InitContext context,
> AsyncSinkWriterConfiguration configuration,
> Collection> 
> bufferedRequestStates) {
> super(
> (element, context1) -> element.toString(),
> context,
> configuration,
> bufferedRequestStates);
> }
> @Override
> protected long getSizeInBytes(String requestEntry) {
> return requestEntry.length();
> }
> @Override
> protected void submitRequestEntries(
> List requestEntries, ResultHandler 
> resultHandler) {
> executor.execute(
> () -> {
> long delayMillis = new Random().nextInt(5000);
> try {
> Thread.sleep(delayMillis);
> } catch (InterruptedException ignored) {
> }
> for (String entry : requestEntries) {
> LOG.info("Discarding {} after {} ms", entry, 
> delayMillis);
> }
> resultHandler.complete();
> });
> }
> }
> {code}
> We will also need a simple Flink Job that writes data using the sink
> {code}
> final StreamExecutionEnvironment env = StreamExecutionEnvironment
> .getExecutionEnvironment();
> env.setParallelism(1);
> env.fromSequence(0, 100)
> .map(Object::toString)
> .sinkTo(new DiscardingTestAsyncSink<>());
> {code}
> We can use least values for batch size and inflight requests to increase 
> number of requests that are subject to timeout
> {code}
> public class DiscardingTestAsyncSink extends AsyncSinkBase {
> private static final Logger LOG = 
> LoggerFactory.getLogger(DiscardingTestAsyncSink.class);
> public DiscardingTestAsyncSink(long requestTimeoutMS, boolean 
> failOnTimeout) {
> super(
> (element, context) -> element.toString(),
> 1, // maxBatchSize
> 1, // maxInflightRequests
> 10, // maxBufferedRequests
> 1000L, // maxBatchsize
> 100, // MaxTimeInBuffer
> 500L, // maxRecordSize
> requestTimeoutMS,
> failOnTimeout);
> }
> @Override
> public SinkWriter createWriter(WriterInitContext context) throws 
> IOException {
> return new DiscardingElementWriter(
> new InitContextWrapper(context),
> AsyncSinkWriterConfiguration.builder()
> .setMaxBatchSize(this.getMaxBatchSize())
> .setMaxBatchSizeInBytes(this.getMaxBatchSizeInBytes())
> .setMaxInFlightRequests(this.getMaxInFlightRequests())
> .setMaxBufferedRequests(this.getMaxBufferedRequests())
> .setMaxTimeInBufferMS(this.getMaxTimeInBufferMS())
> 
> .setM

[jira] [Commented] (FLINK-35697) Release Testing: Verify FLIP-451 Introduce timeout configuration to AsyncSink

2024-07-06 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863521#comment-17863521
 ] 

Muhammet Orazov commented on FLINK-35697:
-

Thanks all (y)

I have already done the functional tests and performance testing.

I'll continue with regression testing checks

> Release Testing: Verify FLIP-451 Introduce timeout configuration to AsyncSink
> -
>
> Key: FLINK-35697
> URL: https://issues.apache.org/jira/browse/FLINK-35697
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Common
>Affects Versions: 1.20.0
>Reporter: Ahmed Hamdy
>Assignee: Muhammet Orazov
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.20.0
>
>
> h2. Description
> In FLIP-451 we added Timeout configuration to {{AsyncSinkWriter}}, with 
> default value of 10 minutes and default failOnTimeout to false. 
> We need to test the new feature on different levels
> - Functional Testing
> - Performance Testing
> - Regression Testing
> h2. Common Utils
> The feature introduced affects an abstract {{AsyncSinkWriter}} class. we need 
> to use an implementation sink for our tests, Any implementation where we can 
> track delivery of elements is accepted in our tests, an example is:
> {code}
> class DiscardingElementWriter extends AsyncSinkWriter {
> SeparateThreadExecutor executor =
> new SeparateThreadExecutor(r -> new Thread(r, 
> "DiscardingElementWriter"));
> public DiscardingElementWriter(
> Sink.InitContext context,
> AsyncSinkWriterConfiguration configuration,
> Collection> 
> bufferedRequestStates) {
> super(
> (element, context1) -> element.toString(),
> context,
> configuration,
> bufferedRequestStates);
> }
> @Override
> protected long getSizeInBytes(String requestEntry) {
> return requestEntry.length();
> }
> @Override
> protected void submitRequestEntries(
> List requestEntries, ResultHandler 
> resultHandler) {
> executor.execute(
> () -> {
> long delayMillis = new Random().nextInt(5000);
> try {
> Thread.sleep(delayMillis);
> } catch (InterruptedException ignored) {
> }
> for (String entry : requestEntries) {
> LOG.info("Discarding {} after {} ms", entry, 
> delayMillis);
> }
> resultHandler.complete();
> });
> }
> }
> {code}
> We will also need a simple Flink Job that writes data using the sink
> {code}
> final StreamExecutionEnvironment env = StreamExecutionEnvironment
> .getExecutionEnvironment();
> env.setParallelism(1);
> env.fromSequence(0, 100)
> .map(Object::toString)
> .sinkTo(new DiscardingTestAsyncSink<>());
> {code}
> We can use least values for batch size and inflight requests to increase 
> number of requests that are subject to timeout
> {code}
> public class DiscardingTestAsyncSink extends AsyncSinkBase {
> private static final Logger LOG = 
> LoggerFactory.getLogger(DiscardingTestAsyncSink.class);
> public DiscardingTestAsyncSink(long requestTimeoutMS, boolean 
> failOnTimeout) {
> super(
> (element, context) -> element.toString(),
> 1, // maxBatchSize
> 1, // maxInflightRequests
> 10, // maxBufferedRequests
> 1000L, // maxBatchsize
> 100, // MaxTimeInBuffer
> 500L, // maxRecordSize
> requestTimeoutMS,
> failOnTimeout);
> }
> @Override
> public SinkWriter createWriter(WriterInitContext context) throws 
> IOException {
> return new DiscardingElementWriter(
> new InitContextWrapper(context),
> AsyncSinkWriterConfiguration.builder()
> .setMaxBatchSize(this.getMaxBatchSize())
> .setMaxBatchSizeInBytes(this.getMaxBatchSizeInBytes())
> .setMaxInFlightRequests(this.getMaxInFlightRequests())
> .setMaxBufferedRequests(this.getMaxBufferedRequests())
> .setMaxTimeInBufferMS(this.getMaxTimeInBufferMS())
> 
> .setMaxRecordSizeInBytes(this.getMaxRecordSizeInBytes())
> .setFailOnTimeout(this.getFailOnTimeout())
>

[jira] [Commented] (FLINK-34487) Integrate tools/azure-pipelines/build-python-wheels.yml into GHA nightly workflow

2024-07-03 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862775#comment-17862775
 ] 

Muhammet Orazov commented on FLINK-34487:
-

Hey [~mapohl] ,

I have added backport PRs for release branches.

Could you please have a look?

PRs:
 * [https://github.com/apache/flink/pull/25016]
 * [https://github.com/apache/flink/pull/25017]
 * [https://github.com/apache/flink/pull/25018]

 

> Integrate tools/azure-pipelines/build-python-wheels.yml into GHA nightly 
> workflow
> -
>
> Key: FLINK-34487
> URL: https://issues.apache.org/jira/browse/FLINK-34487
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Muhammet Orazov
>Priority: Major
>  Labels: github-actions, pull-request-available
> Fix For: 2.0.0
>
>
> Analogously to the [Azure Pipelines nightly 
> config|https://github.com/apache/flink/blob/e923d4060b6dabe650a8950774d176d3e92437c2/tools/azure-pipelines/build-apache-repo.yml#L183]
>  we want to generate the wheels artifacts in the GHA nightly workflow as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35697) Release Testing: Verify FLIP-451 Introduce timeout configuration to AsyncSink

2024-07-02 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862672#comment-17862672
 ] 

Muhammet Orazov commented on FLINK-35697:
-

Hey [~chalixar],

Thanks for the update!

I'll look into it, and try to finalise next days

> Release Testing: Verify FLIP-451 Introduce timeout configuration to AsyncSink
> -
>
> Key: FLINK-35697
> URL: https://issues.apache.org/jira/browse/FLINK-35697
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Common
>Reporter: Ahmed Hamdy
>Priority: Blocker
> Fix For: 1.20.0
>
>
> h2. Description
> In FLIP-451 we added Timeout configuration to {{AsyncSinkWriter}}, with 
> default value of 10 minutes and default failOnTimeout to false. 
> We need to test the new feature on different levels
> - Functional Testing
> - Performance Testing
> - Regression Testing
> h2. Common Utils
> The feature introduced affects an abstract {{AsyncSinkWriter}} class. we need 
> to use an implementation sink for our tests, Any implementation where we can 
> track delivery of elements is accepted in our tests, an example is:
> {code}
> class DiscardingElementWriter extends AsyncSinkWriter {
> SeparateThreadExecutor executor =
> new SeparateThreadExecutor(r -> new Thread(r, 
> "DiscardingElementWriter"));
> public DiscardingElementWriter(
> Sink.InitContext context,
> AsyncSinkWriterConfiguration configuration,
> Collection> 
> bufferedRequestStates) {
> super(
> (element, context1) -> element.toString(),
> context,
> configuration,
> bufferedRequestStates);
> }
> @Override
> protected long getSizeInBytes(String requestEntry) {
> return requestEntry.length();
> }
> @Override
> protected void submitRequestEntries(
> List requestEntries, ResultHandler 
> resultHandler) {
> executor.execute(
> () -> {
> long delayMillis = new Random().nextInt(5000);
> try {
> Thread.sleep(delayMillis);
> } catch (InterruptedException ignored) {
> }
> for (String entry : requestEntries) {
> LOG.info("Discarding {} after {} ms", entry, 
> delayMillis);
> }
> resultHandler.complete();
> });
> }
> }
> {code}
> We will also need a simple Flink Job that writes data using the sink
> {code}
> final StreamExecutionEnvironment env = StreamExecutionEnvironment
> .getExecutionEnvironment();
> env.setParallelism(1);
> env.fromSequence(0, 100)
> .map(Object::toString)
> .sinkTo(new DiscardingTestAsyncSink<>());
> {code}
> We can use least values for batch size and inflight requests to increase 
> number of requests that are subject to timeout
> {code}
> public class DiscardingTestAsyncSink extends AsyncSinkBase {
> private static final Logger LOG = 
> LoggerFactory.getLogger(DiscardingTestAsyncSink.class);
> public DiscardingTestAsyncSink(long requestTimeoutMS, boolean 
> failOnTimeout) {
> super(
> (element, context) -> element.toString(),
> 1, // maxBatchSize
> 1, // maxInflightRequests
> 10, // maxBufferedRequests
> 1000L, // maxBatchsize
> 100, // MaxTimeInBuffer
> 500L, // maxRecordSize
> requestTimeoutMS,
> failOnTimeout);
> }
> @Override
> public SinkWriter createWriter(WriterInitContext context) throws 
> IOException {
> return new DiscardingElementWriter(
> new InitContextWrapper(context),
> AsyncSinkWriterConfiguration.builder()
> .setMaxBatchSize(this.getMaxBatchSize())
> .setMaxBatchSizeInBytes(this.getMaxBatchSizeInBytes())
> .setMaxInFlightRequests(this.getMaxInFlightRequests())
> .setMaxBufferedRequests(this.getMaxBufferedRequests())
> .setMaxTimeInBufferMS(this.getMaxTimeInBufferMS())
> 
> .setMaxRecordSizeInBytes(this.getMaxRecordSizeInBytes())
> .setFailOnTimeout(this.getFailOnTimeout())
> .setRequestTimeoutMS(this.getRequestTimeoutMS())
> .build(),
> Collections.emptyList());
>  

[jira] [Commented] (FLINK-34487) Integrate tools/azure-pipelines/build-python-wheels.yml into GHA nightly workflow

2024-07-02 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862670#comment-17862670
 ] 

Muhammet Orazov commented on FLINK-34487:
-

Hey [~mapohl] ,

 

Sure I will create similar PR for other branches (fingers crossed)

> Integrate tools/azure-pipelines/build-python-wheels.yml into GHA nightly 
> workflow
> -
>
> Key: FLINK-34487
> URL: https://issues.apache.org/jira/browse/FLINK-34487
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Muhammet Orazov
>Priority: Major
>  Labels: github-actions, pull-request-available
> Fix For: 2.0.0
>
>
> Analogously to the [Azure Pipelines nightly 
> config|https://github.com/apache/flink/blob/e923d4060b6dabe650a8950774d176d3e92437c2/tools/azure-pipelines/build-apache-repo.yml#L183]
>  we want to generate the wheels artifacts in the GHA nightly workflow as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35654) Add Flink CDC verification guide in docs

2024-06-27 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860391#comment-17860391
 ] 

Muhammet Orazov commented on FLINK-35654:
-

Hey [~xiqian_yu],
Great, thanks for the update! Going to check it soon (y)

> Add Flink CDC verification guide in docs
> 
>
> Key: FLINK-35654
> URL: https://issues.apache.org/jira/browse/FLINK-35654
> Project: Flink
>  Issue Type: Bug
>  Components: Flink CDC
>Reporter: yux
>Priority: Major
>
> Currently, ASF voting process requires vast quality verification before 
> releasing any new versions, including:
>  * Tarball checksum verification
>  * Compile from source code
>  * Run pipeline E2e tests
>  * Run migration tests
>  * Check if jar was packaged with correct JDK version
>  * ...
> Adding verification guide in Flink CDC docs should help developers verify 
> future releases more easily.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35565) Flink KafkaSource Batch Job Gets Into Infinite Loop after Resetting Offset

2024-06-26 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860130#comment-17860130
 ] 

Muhammet Orazov commented on FLINK-35565:
-

The solution of https://issues.apache.org/jira/browse/FLINK-34470 could work 
for this issue also.

> Flink KafkaSource Batch Job Gets Into Infinite Loop after Resetting Offset
> --
>
> Key: FLINK-35565
> URL: https://issues.apache.org/jira/browse/FLINK-35565
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: kafka-3.1.0
> Environment: This is reproduced on a *Flink 1.18.1* with the latest 
> Kafka connector 3.1.0-1.18 on a session cluster.
>Reporter: Naci Simsek
>Priority: Major
> Attachments: image-2024-06-11-11-19-09-889.png, 
> taskmanager_localhost_54489-ac092a_log.txt
>
>
> h2. Summary
> Flink batch job gets into an infinite fetch loop and could not gracefully 
> finish if the connected Kafka topic is empty and starting offset value in 
> Flink job is lower than the current start/end offset of the related topic. 
> See below for details:
> h2. How to reproduce
> Flink +*batch*+ job which works as a {*}KafkaSource{*}, will consume events 
> from Kafka topic.
> Related Kafka topic is empty, there are no events, and the offset value is as 
> below: *15*
> !image-2024-06-11-11-19-09-889.png|width=895,height=256!
>  
> Flink job uses a *specific starting offset* value, which is +*less*+ than the 
> current offset of the topic/partition.
> See below, it set as “4”
>  
> {code:java}
> package naci.grpId;
> import org.apache.flink.api.common.RuntimeExecutionMode;
> import org.apache.flink.api.common.serialization.SimpleStringSchema;
> import org.apache.flink.connector.kafka.source.KafkaSource;
> import 
> org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer;
> import org.apache.flink.streaming.api.datastream.DataStream;
> import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
> import org.apache.flink.api.common.eventtime.WatermarkStrategy;
> import org.apache.kafka.common.TopicPartition;
> import java.util.HashMap;
> import java.util.Map;
> public class KafkaSource_Print {
> public static void main(String[] args) throws Exception {
> final StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.setRuntimeMode(RuntimeExecutionMode.BATCH);
> // Define the specific offsets for the partitions
> Map specificOffsets = new HashMap<>();
> specificOffsets.put(new TopicPartition("topic_test", 0), 4L); // 
> Start from offset 4 for partition 0
> KafkaSource kafkaSource = KafkaSource
> .builder()
> .setBootstrapServers("localhost:9093")  // Make sure the port 
> is correct
> .setTopics("topic_test")
> .setValueOnlyDeserializer(new SimpleStringSchema())
> 
> .setStartingOffsets(OffsetsInitializer.offsets(specificOffsets))
> .setBounded(OffsetsInitializer.latest())
> .build();
> DataStream stream = env.fromSource(
> kafkaSource,
> WatermarkStrategy.noWatermarks(),
> "Kafka Source"
> );
> stream.print();
> env.execute("Flink KafkaSource test job");
> }
> }{code}
>  
>  
> Here are the initial logs printed related to the offset, as soon as the job 
> gets submitted:
>  
> {code:java}
> 2024-05-30 12:15:50,010 INFO  
> org.apache.flink.connector.base.source.reader.SourceReaderBase [] - Adding 
> split(s) to reader: [[Partition: topic_test-0, StartingOffset: 4, 
> StoppingOffset: 15]]
> 2024-05-30 12:15:50,069 DEBUG 
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher [] - 
> Prepare to run AddSplitsTask: [[[Partition: topic_test-0, StartingOffset: 4, 
> StoppingOffset: 15]]]
> 2024-05-30 12:15:50,074 TRACE 
> org.apache.flink.connector.kafka.source.reader.KafkaPartitionSplitReader [] - 
> Seeking starting offsets to specified offsets: {topic_test-0=4}
> 2024-05-30 12:15:50,074 INFO  org.apache.kafka.clients.consumer.KafkaConsumer 
>  [] - [Consumer clientId=KafkaSource--2381765882724812354-0, 
> groupId=null] Seeking to offset 4 for partition topic_test-0
> 2024-05-30 12:15:50,075 DEBUG 
> org.apache.flink.connector.kafka.source.reader.KafkaPartitionSplitReader [] - 
> SplitsChange handling result: [topic_test-0, start:4, stop: 15]
> 2024-05-30 12:15:50,075 DEBUG 
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher [] - 
> Finished running task AddSplitsTask: [[[Partition: topic_test-0, 
> StartingOffset: 4, StoppingOffset: 15]]]
> 2024-05-30 12:15:50,075 DEBUG 
> org.apache.fli

[jira] [Commented] (FLINK-35670) Support Postgres CDC Pipeline Source

2024-06-26 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860127#comment-17860127
 ] 

Muhammet Orazov commented on FLINK-35670:
-

Hello [~zhengjun.kissy] ,

I didn't fully start on it yet. But the scope of this ticket would be Postgres 
CDC Pipeline {*}Source{*}.

I think you can continue the work on the sink part (y)

> Support Postgres CDC Pipeline Source
> 
>
> Key: FLINK-35670
> URL: https://issues.apache.org/jira/browse/FLINK-35670
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Reporter: Muhammet Orazov
>Priority: Major
>
> Similar to other [CDC pipeline 
> sources|https://nightlies.apache.org/flink/flink-cdc-docs-master/docs/connectors/pipeline-connectors/overview/]
>  (only MySQL at the moment), we should support Postgres as a pipeline source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35670) Support Postgres CDC Pipeline Source

2024-06-23 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859498#comment-17859498
 ] 

Muhammet Orazov commented on FLINK-35670:
-

Hey all, please feel free to assign this ticket to me

> Support Postgres CDC Pipeline Source
> 
>
> Key: FLINK-35670
> URL: https://issues.apache.org/jira/browse/FLINK-35670
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Reporter: Muhammet Orazov
>Priority: Major
>
> Similar to other [CDC pipeline 
> sources|https://nightlies.apache.org/flink/flink-cdc-docs-master/docs/connectors/pipeline-connectors/overview/]
>  (only MySQL at the moment), we should support Postgres as a pipeline source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35671) Support Iceberg CDC Pipeline Sink

2024-06-23 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859499#comment-17859499
 ] 

Muhammet Orazov commented on FLINK-35671:
-

Hey all, please feel free to assign this ticket to me

> Support Iceberg CDC Pipeline Sink
> -
>
> Key: FLINK-35671
> URL: https://issues.apache.org/jira/browse/FLINK-35671
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Reporter: Muhammet Orazov
>Priority: Major
>
> Similar to other [CDC pipeline 
> sinks|https://nightlies.apache.org/flink/flink-cdc-docs-master/docs/connectors/pipeline-connectors/overview/],
>  we should support Iceberg as a pipeline sink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35671) Support Iceberg CDC Pipeline Sink

2024-06-23 Thread Muhammet Orazov (Jira)
Muhammet Orazov created FLINK-35671:
---

 Summary: Support Iceberg CDC Pipeline Sink
 Key: FLINK-35671
 URL: https://issues.apache.org/jira/browse/FLINK-35671
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Muhammet Orazov


Similar to other [CDC pipeline 
sinks|https://nightlies.apache.org/flink/flink-cdc-docs-master/docs/connectors/pipeline-connectors/overview/],
 we should support Iceberg as a pipeline sink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35670) Support Postgres CDC Pipeline Source

2024-06-23 Thread Muhammet Orazov (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Muhammet Orazov updated FLINK-35670:

Description: Similar to other [CDC pipeline 
sources|https://nightlies.apache.org/flink/flink-cdc-docs-master/docs/connectors/pipeline-connectors/overview/]
 (only MySQL at the moment), we should support Postgres as a pipeline source.  
(was: Similar to other CDC pipeline sources (only MySQL at the moment), we 
should support Postgres as a pipeline source.)

> Support Postgres CDC Pipeline Source
> 
>
> Key: FLINK-35670
> URL: https://issues.apache.org/jira/browse/FLINK-35670
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Reporter: Muhammet Orazov
>Priority: Major
>
> Similar to other [CDC pipeline 
> sources|https://nightlies.apache.org/flink/flink-cdc-docs-master/docs/connectors/pipeline-connectors/overview/]
>  (only MySQL at the moment), we should support Postgres as a pipeline source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35670) Support Postgres CDC Pipeline Source

2024-06-23 Thread Muhammet Orazov (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Muhammet Orazov updated FLINK-35670:

Summary: Support Postgres CDC Pipeline Source  (was: Support Postgres 
Pipeline Source)

> Support Postgres CDC Pipeline Source
> 
>
> Key: FLINK-35670
> URL: https://issues.apache.org/jira/browse/FLINK-35670
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Reporter: Muhammet Orazov
>Priority: Major
>
> Similar to other CDC pipeline sources (only MySQL at the moment), we should 
> support Postgres as a pipeline source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35670) Support Postgres Pipeline Source

2024-06-23 Thread Muhammet Orazov (Jira)
Muhammet Orazov created FLINK-35670:
---

 Summary: Support Postgres Pipeline Source
 Key: FLINK-35670
 URL: https://issues.apache.org/jira/browse/FLINK-35670
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Muhammet Orazov


Similar to other CDC pipeline sources (only MySQL at the moment), we should 
support Postgres as a pipeline source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35654) Add Flink CDC verification guide in docs

2024-06-20 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856450#comment-17856450
 ] 

Muhammet Orazov commented on FLINK-35654:
-

Thanks [~xiqian_yu] ! This is indeed much required guide (y)

> Add Flink CDC verification guide in docs
> 
>
> Key: FLINK-35654
> URL: https://issues.apache.org/jira/browse/FLINK-35654
> Project: Flink
>  Issue Type: Bug
>  Components: Flink CDC
>Reporter: yux
>Priority: Major
>
> Currently, ASF voting process requires vast quality verification before 
> releasing any new versions, including:
>  * Tarball checksum verification
>  * Compile from source code
>  * Run pipeline E2e tests
>  * Run migration tests
>  * Check if jar was packaged with correct JDK version
>  * ...
> Adding verification guide in Flink CDC docs should help developers verify 
> future releases more easily.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-25538) [JUnit5 Migration] Module: flink-connector-kafka

2024-06-19 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856294#comment-17856294
 ] 

Muhammet Orazov commented on FLINK-25538:
-

Hey all,

I have create new PR [https://github.com/apache/flink-connector-kafka/pull/106] 
(continuing from PR [https://github.com/apache/flink-connector-kafka/pull/66).]

Please feel free to close the PR #66.

I'll continue applying changes on #106.

 

 

> [JUnit5 Migration] Module: flink-connector-kafka
> 
>
> Key: FLINK-25538
> URL: https://issues.apache.org/jira/browse/FLINK-25538
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Kafka
>Reporter: Qingsheng Ren
>Assignee: Muhammet Orazov
>Priority: Minor
>  Labels: pull-request-available, stale-assigned, starter
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35067) Support metadata 'op_type' virtual column for Postgres CDC Connector.

2024-06-18 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856058#comment-17856058
 ] 

Muhammet Orazov commented on FLINK-35067:
-

Hey all,

[~loserwang1024] feel free to assign this to me if you are busy :) 

>  Support metadata 'op_type' virtual column for Postgres CDC Connector. 
> ---
>
> Key: FLINK-35067
> URL: https://issues.apache.org/jira/browse/FLINK-35067
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Affects Versions: cdc-3.1.0
>Reporter: Hongshun Wang
>Assignee: Hongshun Wang
>Priority: Major
> Fix For: cdc-3.2.0
>
>
> Like [https://github.com/apache/flink-cdc/pull/2913,] Support metadata 
> 'op_type' virtual column for Postgres CDC Connector. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-25538) [JUnit5 Migration] Module: flink-connector-kafka

2024-06-03 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851573#comment-17851573
 ] 

Muhammet Orazov commented on FLINK-25538:
-

Thanks [~martijnvisser], could you please also assign this to me?


I am going to send another pull-request from original PR branch. Fingers 
crossed we can finalize it this time (y)

> [JUnit5 Migration] Module: flink-connector-kafka
> 
>
> Key: FLINK-25538
> URL: https://issues.apache.org/jira/browse/FLINK-25538
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Kafka
>Reporter: Qingsheng Ren
>Assignee: xiang1 yu
>Priority: Minor
>  Labels: pull-request-available, stale-assigned, starter
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35491) [JUnit5 Migration] Module: Flink CDC modules

2024-05-30 Thread Muhammet Orazov (Jira)
Muhammet Orazov created FLINK-35491:
---

 Summary: [JUnit5 Migration] Module: Flink CDC modules
 Key: FLINK-35491
 URL: https://issues.apache.org/jira/browse/FLINK-35491
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Muhammet Orazov


Migrate Junit4 tests to Junit5 for the following modules:
 * flink-cdc-common
 * flink-cdc-composer
 * flink-cdc-runtime
 * flink-cdc-connect/flink-cdc-pipeline-connectors
 * flink-cdc-e2e-tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35490) [JUnit5 Migration] Module: Flink CDC flink-cdc-connect/flink-cdc-source-connectors

2024-05-30 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850663#comment-17850663
 ] 

Muhammet Orazov commented on FLINK-35490:
-

I would be happy to work on this, please assign it to me, thanks (y)

> [JUnit5 Migration] Module: Flink CDC 
> flink-cdc-connect/flink-cdc-source-connectors
> --
>
> Key: FLINK-35490
> URL: https://issues.apache.org/jira/browse/FLINK-35490
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Reporter: Muhammet Orazov
>Priority: Major
>
> Migrate Junit4 tests to Junit5 in the Flink CDC following modules:
>  
> - flink-cdc-connect/flink-cdc-source-connectors



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35490) [JUnit5 Migration] Module: Flink CDC flink-cdc-connect/flink-cdc-source-connectors

2024-05-30 Thread Muhammet Orazov (Jira)
Muhammet Orazov created FLINK-35490:
---

 Summary: [JUnit5 Migration] Module: Flink CDC 
flink-cdc-connect/flink-cdc-source-connectors
 Key: FLINK-35490
 URL: https://issues.apache.org/jira/browse/FLINK-35490
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Muhammet Orazov


Migrate Junit4 tests to Junit5 in the Flink CDC following modules:

 

- flink-cdc-connect/flink-cdc-source-connectors



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34585) [JUnit5 Migration] Module: Flink CDC

2024-05-30 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850662#comment-17850662
 ] 

Muhammet Orazov commented on FLINK-34585:
-

Hey all,

Since this is lots of effort, I suggest we divide this ticket into two parts. 
With the following modules for each ticket:


First ticket to include the following module:
 * flink-cdc-connect/flink-cdc-source-connectors

And the second ticket to include the following modules:
 * flink-cdc-common
 * flink-cdc-composer
 * flink-cdc-runtime
 * flink-cdc-connect/flink-cdc-pipeline-connectors
 * flink-cdc-e2e-tests


Because the `flink-cdc-source-connectors` is big part that contains the Junit4 
tests.

> [JUnit5 Migration] Module: Flink CDC
> 
>
> Key: FLINK-34585
> URL: https://issues.apache.org/jira/browse/FLINK-34585
> Project: Flink
>  Issue Type: Sub-task
>  Components: Flink CDC
>Reporter: Hang Ruan
>Assignee: LvYanquan
>Priority: Major
>
> Most tests in Flink CDC are still using Junit 4. We need to use Junit 5 
> instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34582) release build tools lost the newly added py3.11 packages for mac

2024-05-24 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849229#comment-17849229
 ] 

Muhammet Orazov commented on FLINK-34582:
-

Ohh no. Yes indeed you are right [~mapohl] , thanks for the update (y)

> release build tools lost the newly added py3.11 packages for mac
> 
>
> Key: FLINK-34582
> URL: https://issues.apache.org/jira/browse/FLINK-34582
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.19.0, 1.20.0
>Reporter: lincoln lee
>Assignee: Xingbo Huang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.20.0
>
> Attachments: image-2024-03-07-10-39-49-341.png
>
>
> during 1.19.0-rc1 building binaries via 
> tools/releasing/create_binary_release.sh
> lost the newly added py3.11  2 packages for mac



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34582) release build tools lost the newly added py3.11 packages for mac

2024-05-23 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848998#comment-17848998
 ] 

Muhammet Orazov commented on FLINK-34582:
-

At the moment the latest one for Linux is 3.10, 
[https://github.com/HuangXingBo/flink/blob/master/flink-python/dev/build-wheels.sh#L19-L26]

 

Should we also update the Python version on linux builds? Could you please 
advise [~hxb]?

> release build tools lost the newly added py3.11 packages for mac
> 
>
> Key: FLINK-34582
> URL: https://issues.apache.org/jira/browse/FLINK-34582
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.19.0, 1.20.0
>Reporter: lincoln lee
>Assignee: Xingbo Huang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.20.0
>
> Attachments: image-2024-03-07-10-39-49-341.png
>
>
> during 1.19.0-rc1 building binaries via 
> tools/releasing/create_binary_release.sh
> lost the newly added py3.11  2 packages for mac



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34582) release build tools lost the newly added py3.11 packages for mac

2024-05-23 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848958#comment-17848958
 ] 

Muhammet Orazov commented on FLINK-34582:
-

Hello [~hxb] ,

 

Any idea why we don't also update and use same Python version in Linux build?

 

Thanks and best,

Muhammet

> release build tools lost the newly added py3.11 packages for mac
> 
>
> Key: FLINK-34582
> URL: https://issues.apache.org/jira/browse/FLINK-34582
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.19.0, 1.20.0
>Reporter: lincoln lee
>Assignee: Xingbo Huang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.19.0, 1.20.0
>
> Attachments: image-2024-03-07-10-39-49-341.png
>
>
> during 1.19.0-rc1 building binaries via 
> tools/releasing/create_binary_release.sh
> lost the newly added py3.11  2 packages for mac



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34470) Transactional message + Table api kafka source with 'latest-offset' scan bound mode causes indefinitely hanging

2024-05-22 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848510#comment-17848510
 ] 

Muhammet Orazov commented on FLINK-34470:
-

Hey [~dongwoo.kim] , sorry for late reply. Somehow missed the message.

I have added a minor comment, but overall I'd add integration test for this 
case if possible and let committer to check the PR also 

> Transactional message + Table api kafka source with 'latest-offset' scan 
> bound mode causes indefinitely hanging
> ---
>
> Key: FLINK-34470
> URL: https://issues.apache.org/jira/browse/FLINK-34470
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: kafka-3.1.0
>Reporter: dongwoo.kim
>Priority: Major
>  Labels: pull-request-available
>
> h2. Summary  
> Hi we have faced issue with transactional message and table api kafka source. 
> If we configure *'scan.bounded.mode'* to *'latest-offset'* flink sql's 
> request timeouts after hanging. We can always reproduce this unexpected 
> behavior by following below steps.
> This is related to this 
> [issue|https://issues.apache.org/jira/browse/FLINK-33484] too.
> h2. How to reproduce
> 1. Deploy transactional producer and stop after producing certain amount of 
> messages
> 2. Configure *'scan.bounded.mode'* to *'latest-offset'* and submit simple 
> query such as getting count of the produced messages
> 3. Flink sql job gets stucked and timeouts.
> h2. Cause
> Transaction producer always produces [control 
> records|https://kafka.apache.org/documentation/#controlbatch] at the end of 
> the transaction. And these control messages are not polled by 
> {*}consumer.poll(){*}. (It is filtered internally). In 
> *KafkaPartitionSplitReader* code, split is finished only when 
> *lastRecord.offset() >= stoppingOffset - 1* condition is met. This might work 
> well with non transactional messages or streaming environment but in some 
> batch use cases it causes unexpected hanging.
> h2. Proposed solution
> {code:java}
> if (consumer.position(tp) >= stoppingOffset) {
> recordsBySplits.setPartitionStoppingOffset(tp, 
> stoppingOffset);
> finishSplitAtRecord(
> tp,
> stoppingOffset,
> lastRecord.offset(),
> finishedPartitions,
> recordsBySplits);
> }
> {code}
> Replacing if condition to *consumer.position(tp) >= stoppingOffset* in 
> [here|https://github.com/apache/flink-connector-kafka/blob/15f2662eccf461d9d539ed87a78c9851cd17fa43/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaPartitionSplitReader.java#L137]
>  can solve the problem. 
> *consumer.position(tp)* gets next record's offset if it exist and the last 
> record's offset if the next record doesn't exist. 
> By this KafkaPartitionSplitReader is available to finish the split even when 
> the stopping offset is configured to control record's offset. 
> I would be happy to implement about this fix if we can reach on agreement. 
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-20402) Migrate test_tpch.sh

2024-05-21 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848150#comment-17848150
 ] 

Muhammet Orazov commented on FLINK-20402:
-

Hey all,

I would like to work on this, could you please assign it to me?

> Migrate test_tpch.sh
> 
>
> Key: FLINK-20402
> URL: https://issues.apache.org/jira/browse/FLINK-20402
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Ecosystem, Tests
>Reporter: Jark Wu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-20392) Migrating bash e2e tests to Java/Docker

2024-05-16 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846857#comment-17846857
 ] 

Muhammet Orazov commented on FLINK-20392:
-

[~mapohl] yes, I agree I think it is good balance, let's keep both options. We 
initially was opting to keep them consistent but we can come back to it again 
if that is issue again.

 

Migrating them into dockerized tests will add some overhead since each suite 
will create (2-5) containers, wait until they are ready and terminate at the 
end. At some point later we can invest some time to group tests that can reuse 
the Flink cluster, testcontainers have the `reuse` option.

 

 

> Migrating bash e2e tests to Java/Docker
> ---
>
> Key: FLINK-20392
> URL: https://issues.apache.org/jira/browse/FLINK-20392
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Test Infrastructure, Tests
>Reporter: Matthias Pohl
>Priority: Minor
>  Labels: auto-deprioritized-major, auto-deprioritized-minor, 
> starter
>
> This Jira issue serves as an umbrella ticket for single e2e test migration 
> tasks. This should enable us to migrate all bash-based e2e tests step-by-step.
> The goal is to utilize the e2e test framework (see 
> [flink-end-to-end-tests-common|https://github.com/apache/flink/tree/master/flink-end-to-end-tests/flink-end-to-end-tests-common]).
>  Ideally, the test should use Docker containers as much as possible 
> disconnect the execution from the environment. A good source to achieve that 
> is [testcontainers.org|https://www.testcontainers.org/].
> The related ML discussion is [Stop adding new bash-based e2e tests to 
> Flink|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Stop-adding-new-bash-based-e2e-tests-to-Flink-td46607.html].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-20392) Migrating bash e2e tests to Java/Docker

2024-05-15 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846609#comment-17846609
 ] 

Muhammet Orazov edited comment on FLINK-20392 at 5/15/24 12:25 PM:
---

Thanks [~lorenzo.affetti] ,

 
{quote}MiniClusterExtension:
 - PRO: the code is very clean: no need to deploy separate JAR for the Flink 
app, as that is coded in the test and the Flink cluster is registered in the 
current test environment
{quote}
I would see this also extra work as CON. Most of the e2e are already have so 
called TestProgram, with MiniCluster approach these functionality will have to 
be extracted. I think this is extra effort, instead we could submit those test 
programs to FlinkContainers and validate the expected outcome(s).

 


was (Author: JIRAUSER303922):
Thanks [~lorenzo.affetti] ,

 
{quote}MiniClusterExtension:
 - PRO: the code is very clean: no need to deploy separate JAR for the Flink 
app, as that is coded in the test and the Flink cluster is registered in the 
current test environment
{quote}
I would see this also extra work as CON. Most of the e2e are already have so 
called TestProgram, with MiniCluster approach these functionality will have to 
extracted. I think this is extra effort, instead we could submit those test 
programs to FlinkContainers and validate the expected outcome(s).

 

> Migrating bash e2e tests to Java/Docker
> ---
>
> Key: FLINK-20392
> URL: https://issues.apache.org/jira/browse/FLINK-20392
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Test Infrastructure, Tests
>Reporter: Matthias Pohl
>Priority: Minor
>  Labels: auto-deprioritized-major, auto-deprioritized-minor, 
> starter
>
> This Jira issue serves as an umbrella ticket for single e2e test migration 
> tasks. This should enable us to migrate all bash-based e2e tests step-by-step.
> The goal is to utilize the e2e test framework (see 
> [flink-end-to-end-tests-common|https://github.com/apache/flink/tree/master/flink-end-to-end-tests/flink-end-to-end-tests-common]).
>  Ideally, the test should use Docker containers as much as possible 
> disconnect the execution from the environment. A good source to achieve that 
> is [testcontainers.org|https://www.testcontainers.org/].
> The related ML discussion is [Stop adding new bash-based e2e tests to 
> Flink|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Stop-adding-new-bash-based-e2e-tests-to-Flink-td46607.html].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-20392) Migrating bash e2e tests to Java/Docker

2024-05-15 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846609#comment-17846609
 ] 

Muhammet Orazov commented on FLINK-20392:
-

Thanks [~lorenzo.affetti] ,

 
{quote}MiniClusterExtension:
 - PRO: the code is very clean: no need to deploy separate JAR for the Flink 
app, as that is coded in the test and the Flink cluster is registered in the 
current test environment
{quote}
I would see this also extra work as CON. Most of the e2e are already have so 
called TestProgram, with MiniCluster approach these functionality will have to 
extracted. I think this is extra effort, instead we could submit those test 
programs to FlinkContainers and validate the expected outcome(s).

 

> Migrating bash e2e tests to Java/Docker
> ---
>
> Key: FLINK-20392
> URL: https://issues.apache.org/jira/browse/FLINK-20392
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Test Infrastructure, Tests
>Reporter: Matthias Pohl
>Priority: Minor
>  Labels: auto-deprioritized-major, auto-deprioritized-minor, 
> starter
>
> This Jira issue serves as an umbrella ticket for single e2e test migration 
> tasks. This should enable us to migrate all bash-based e2e tests step-by-step.
> The goal is to utilize the e2e test framework (see 
> [flink-end-to-end-tests-common|https://github.com/apache/flink/tree/master/flink-end-to-end-tests/flink-end-to-end-tests-common]).
>  Ideally, the test should use Docker containers as much as possible 
> disconnect the execution from the environment. A good source to achieve that 
> is [testcontainers.org|https://www.testcontainers.org/].
> The related ML discussion is [Stop adding new bash-based e2e tests to 
> Flink|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Stop-adding-new-bash-based-e2e-tests-to-Flink-td46607.html].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-20400) Migrate test_streaming_sql.sh

2024-04-27 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841555#comment-17841555
 ] 

Muhammet Orazov commented on FLINK-20400:
-

Hello [~jark], [~mapohl],

Could you please assign this ticket to me? I would like to work on it

> Migrate test_streaming_sql.sh
> -
>
> Key: FLINK-20400
> URL: https://issues.apache.org/jira/browse/FLINK-20400
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API, Tests
>Reporter: Jark Wu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34508) Migrate S3-related ITCases and e2e tests to Minio

2024-04-17 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17838471#comment-17838471
 ] 

Muhammet Orazov commented on FLINK-34508:
-

The 
[YarnFileStageTestS3ITCase|https://github.com/apache/flink/blob/master/flink-yarn/src/test/java/org/apache/flink/yarn/YarnFileStageTestS3ITCase.java#L131]
 test is skipped because of the assumption UUID.randomUUID() path does not 
exist.

> Migrate S3-related ITCases and e2e tests to Minio 
> --
>
> Key: FLINK-34508
> URL: https://issues.apache.org/jira/browse/FLINK-34508
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Muhammet Orazov
>Priority: Major
>  Labels: github-actions, pull-request-available
>
> Anything that uses {{org.apache.flink.testutils.s3.S3TestCredentials}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35129) Postgres source commits the offset after every multiple checkpoint cycles.

2024-04-17 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17838427#comment-17838427
 ] 

Muhammet Orazov commented on FLINK-35129:
-

Hey [~loserwang1024] , I would be happy to work on it! Yes, could you please 
assign it to me?

> Postgres source commits the offset after every multiple checkpoint cycles.
> --
>
> Key: FLINK-35129
> URL: https://issues.apache.org/jira/browse/FLINK-35129
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Reporter: Hongshun Wang
>Priority: Major
>
> After entering the Stream phase, the offset consumed by the global slot is 
> committed upon the completion of each checkpoint, preventing log files from 
> being unable to be recycled continuously, which could lead to insufficient 
> disk space.
> However, the job can only restart from the latest checkpoint or savepoint. if 
> restored from an earlier state, WAL may already have been recycled.
>  
> The way to solve it is to commit the offset after every multiple checkpoint 
> cycles. The number of checkpoint cycles is determine by connector option, and 
> the default value is 3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34508) Migrate S3-related ITCases and e2e tests to Minio

2024-04-15 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837531#comment-17837531
 ] 

Muhammet Orazov commented on FLINK-34508:
-

Hey [~mapohl],

On the GitHub CI, the S3 `IT_CASE_S3_*` parameters are not set? They seem to be 
empty, and tests depending on it are skipped.


For secrets they should have been displayed with `*`:
{code:java}
Run PROFILE="-Dinclude_hadoop_aws" PROFILE="$PROFILE -Pgithub-actions" 
./tools/azure-pipelines/uploading_watchdog.sh \
2  PROFILE="-Dinclude_hadoop_aws" PROFILE="$PROFILE -Pgithub-actions" 
./tools/azure-pipelines/uploading_watchdog.sh \
3  ./tools/ci/test_controller.sh connect
4  shell: sh -e {0}
5  env:
6MOUNTED_WORKING_DIR: /__w/flink/flink
7CONTAINER_LOCAL_WORKING_DIR: /root/flink
8FLINK_ARTIFACT_DIR: /root/artifact-directory
9FLINK_ARTIFACT_FILENAME: flink_artifacts.tar.gz
10MAVEN_REPO_FOLDER: /root/.m2/repository
11MAVEN_ARGS: -Dmaven.repo.local=/root/.m2/repository
12DOCKER_IMAGES_CACHE_FOLDER: /root/.docker-cache
13GHA_JOB_TIMEOUT: 240
14GHA_PIPELINE_START_TIME: 2024-04-16 01:46:07+00:00
15JAVA_HOME: /usr/lib/jvm/java-8-openjdk-amd64
16PATH: 
/usr/lib/jvm/java-8-openjdk-amd64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
17IT_CASE_S3_BUCKET: 
18IT_CASE_S3_ACCESS_KEY: 
19IT_CASE_S3_SECRET_KEY: 
20[INFO] GitHub Actions environment detected: 
./tools/azure-pipelines/uploading_watchdog.sh will rely on GHA-specific 
environment variables.
21[INFO] GitHub Actions environment detected: 
./tools/azure-pipelines/uploading_watchdog.sh will export the variables in the 
GHA-specific way. {code}


Please let me know if we need to create a new issue to set them and enable 
tests. This issue is about removing them :) Maybe we can discuss how we want to 
proceed.

I would vote to remove the `s3n` from the Yarn integration test and use Minio 
for all of them

> Migrate S3-related ITCases and e2e tests to Minio 
> --
>
> Key: FLINK-34508
> URL: https://issues.apache.org/jira/browse/FLINK-34508
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Muhammet Orazov
>Priority: Major
>  Labels: github-actions, pull-request-available
>
> Anything that uses {{org.apache.flink.testutils.s3.S3TestCredentials}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34508) Migrate S3-related ITCases and e2e tests to Minio

2024-04-15 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837530#comment-17837530
 ] 

Muhammet Orazov commented on FLINK-34508:
-

In the Azure Pipeline the YarnFileStageTestS3ITCase is skipped:
{code:java}
Apr 16 03:01:16 03:01:16.491 [INFO] Running 
org.apache.flink.yarn.YarnFileStageTestS3ITCase
Apr 16 03:01:24 03:01:24.104 [WARNING] Tests run: 8, Failures: 0, Errors: 0, 
Skipped: 8, Time elapsed: 7.611 s -- in 
org.apache.flink.yarn.YarnFileStageTestS3ITCase {code}
 

> Migrate S3-related ITCases and e2e tests to Minio 
> --
>
> Key: FLINK-34508
> URL: https://issues.apache.org/jira/browse/FLINK-34508
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Muhammet Orazov
>Priority: Major
>  Labels: github-actions, pull-request-available
>
> Anything that uses {{org.apache.flink.testutils.s3.S3TestCredentials}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34508) Migrate S3-related ITCases and e2e tests to Minio

2024-04-15 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837486#comment-17837486
 ] 

Muhammet Orazov edited comment on FLINK-34508 at 4/16/24 6:19 AM:
--

Hey all,

I have add changes to remove the `IT_CASE_S3_*` variables. There is one issue I 
want to discuss.

There is an integration test case in `flink-yarn` module, it is 
`[YarnFileStateTestS3ITCase.java|https://github.com/apache/flink/blob/master/flink-yarn/src/test/java/org/apache/flink/yarn/YarnFileStageTestS3ITCase.java]`.
 This file also depends on the S3 variables. It also uses the S3 Native (s3n) 
in one of the tests, when run this test fails.

 

-However, it is not enabled in the CI. The `flink-yarn` module is not used in 
the tests (or I could not find). It is not listed on the 
[stage.sh|https://github.com/apache/flink/blob/master/tools/ci/stage.sh] file.-

It is enabled. But still none of two tests is run, from one of recent logs:
{code:java}
 Apr 16 02:17:12 02:17:12.837 [INFO] Running 
org.apache.flink.yarn.YarnFileStageTestS3ITCase
27911Apr 16 02:17:12 02:17:12.936 [INFO] Tests run: 0, Failures: 0, Errors: 0, 
Skipped: 0, Time elapsed: 0.089 s -- in 
org.apache.flink.yarn.YarnFileStageTestS3ITCase{code}
 

The NativeS3Filesystem does not work with Minio, since it directly goes to the 
default `.amazonaws.` endpoint. The S3 native library is deprecated in the 
3.3.4+ version of `hadoop-aws` library, but not used for yarn module for now.

 

I would suggest to remove the `s3n` test from the YarnFileStageTestS3ITCase 
test. What do you think?


was (Author: JIRAUSER303922):
Hey all,

I have add changes to remove the `IT_CASE_S3_*` variables. There is one issue I 
want to discuss.

There is an integration test case in `flink-yarn` module, it is 
`[YarnFileStateTestS3ITCase.java|https://github.com/apache/flink/blob/master/flink-yarn/src/test/java/org/apache/flink/yarn/YarnFileStageTestS3ITCase.java]`.
 This file also depends on the S3 variables. It also uses the S3 Native (s3n) 
in one of the tests, when run this test fails.

 

However, it is not enabled in the CI. The `flink-yarn` module is not used in 
the tests (or I could not find). It is not listed on the 
[stage.sh|https://github.com/apache/flink/blob/master/tools/ci/stage.sh] file.

 

The NativeS3Filesystem does not work with Minio, since it directly goes to the 
default `.amazonaws.` endpoint. The S3 native library is deprecated in the 
3.3.4+ version of `hadoop-aws` library, but not used for yarn module for now.

 

I would suggest to remove the `s3n` test from the YarnFileStageTestS3ITCase 
test. What do you think?

> Migrate S3-related ITCases and e2e tests to Minio 
> --
>
> Key: FLINK-34508
> URL: https://issues.apache.org/jira/browse/FLINK-34508
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Muhammet Orazov
>Priority: Major
>  Labels: github-actions, pull-request-available
>
> Anything that uses {{org.apache.flink.testutils.s3.S3TestCredentials}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34508) Migrate S3-related ITCases and e2e tests to Minio

2024-04-15 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837486#comment-17837486
 ] 

Muhammet Orazov edited comment on FLINK-34508 at 4/16/24 2:51 AM:
--

Hey all,

I have add changes to remove the `IT_CASE_S3_*` variables. There is one issue I 
want to discuss.

There is an integration test case in `flink-yarn` module, it is 
`[YarnFileStateTestS3ITCase.java|https://github.com/apache/flink/blob/master/flink-yarn/src/test/java/org/apache/flink/yarn/YarnFileStageTestS3ITCase.java]`.
 This file also depends on the S3 variables. It also uses the S3 Native (s3n) 
in one of the tests, when run this test fails.

 

However, it is not enabled in the CI. The `flink-yarn` module is not used in 
the tests (or I could not find). It is not listed on the 
[stage.sh|https://github.com/apache/flink/blob/master/tools/ci/stage.sh] file.

 

The NativeS3Filesystem does not work with Minio, since it directly goes to the 
default `.amazonaws.` endpoint. The S3 native library is deprecated in the 
3.3.4+ version of `hadoop-aws` library, but not used for yarn module for now.

 

I would suggest to remove the `s3n` test from the YarnFileStageTestS3ITCase 
test. What do you think?


was (Author: JIRAUSER303922):
Hey all,

I have add changes to remove the `IT_CASE_S3_*` variables. There is one issue I 
want to discuss.

There is an integration test case in `flink-yarn` module, it is 
`[YarnFileStateTestS3ITCase.java|https://github.com/apache/flink/blob/master/flink-yarn/src/test/java/org/apache/flink/yarn/YarnFileStageTestS3ITCase.java]`.
 This file also depends on the S3 variables. It also uses the S3 Native (s3n) 
in one of the tests, when run this test fails.

 

However, it is not enabled in the CI. The `flink-yarn` module is not used in 
the tests (or I could not find).

 

The NativeS3Filesystem does not work with Minio, since it directly goes to the 
default `.amazonaws.` endpoint. The S3 native library is deprecated in the 
3.3.4+ version of `hadoop-aws` library, but not used for yarn module for now.

 

I would suggest to remove the `s3n` test from the YarnFileStageTestS3ITCase 
test. What do you think?

> Migrate S3-related ITCases and e2e tests to Minio 
> --
>
> Key: FLINK-34508
> URL: https://issues.apache.org/jira/browse/FLINK-34508
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Muhammet Orazov
>Priority: Major
>  Labels: github-actions, pull-request-available
>
> Anything that uses {{org.apache.flink.testutils.s3.S3TestCredentials}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34508) Migrate S3-related ITCases and e2e tests to Minio

2024-04-15 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837486#comment-17837486
 ] 

Muhammet Orazov edited comment on FLINK-34508 at 4/16/24 2:49 AM:
--

Hey all,

I have add changes to remove the `IT_CASE_S3_*` variables. There is one issue I 
want to discuss.

There is an integration test case in `flink-yarn` module, it is 
`[YarnFileStateTestS3ITCase.java|https://github.com/apache/flink/blob/master/flink-yarn/src/test/java/org/apache/flink/yarn/YarnFileStageTestS3ITCase.java]`.
 This file also depends on the S3 variables. It also uses the S3 Native (s3n) 
in one of the tests, when run this test fails.

 

However, it is not enabled in the CI. The `flink-yarn` module is not used in 
the tests (or I could not find).

 

The NativeS3Filesystem does not work with Minio, since it directly goes to the 
default `.amazonaws.` endpoint. The S3 native library is deprecated in the 
3.3.4+ version of `hadoop-aws` library, but not used for yarn module for now.

 

I would suggest to remove the `s3n` test from the YarnFileStageTestS3ITCase 
test. What do you think?


was (Author: JIRAUSER303922):
Hey all,

I have add changes to remove the `IT_CASE_S3_*` variables. There is one issue I 
want to discuss.

There is an integration test case in `flink-yarn` module, it is 
`YarnFileStateTestS3ITCase.java`. This file also depends on the S3 variables. 
It also uses the S3 Native (s3n) in one of the tests, when run this test fails.

 

However, it is not enabled in the CI. The `flink-yarn` module is not used in 
the tests (or I could not find).

 

The NativeS3Filesystem does not work with Minio, since it directly goes to the 
default `.amazonaws.` endpoint. The S3 native library is deprecated in the 
3.3.4+ version of `hadoop-aws` library, but not used for yarn module for now.

 

I would suggest to remove the `s3n` test from the YarnFileStageTestS3ITCase 
test. What do you think?

> Migrate S3-related ITCases and e2e tests to Minio 
> --
>
> Key: FLINK-34508
> URL: https://issues.apache.org/jira/browse/FLINK-34508
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Muhammet Orazov
>Priority: Major
>  Labels: github-actions, pull-request-available
>
> Anything that uses {{org.apache.flink.testutils.s3.S3TestCredentials}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34508) Migrate S3-related ITCases and e2e tests to Minio

2024-04-15 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837486#comment-17837486
 ] 

Muhammet Orazov commented on FLINK-34508:
-

Hey all,

I have add changes to remove the `IT_CASE_S3_*` variables. There is one issue I 
want to discuss.

There is an integration test case in `flink-yarn` module, it is 
`YarnFileStateTestS3ITCase.java`. This file also depends on the S3 variables. 
It also uses the S3 Native (s3n) in one of the tests, when run this test fails.

 

However, it is not enabled in the CI. The `flink-yarn` module is not used in 
the tests (or I could not find).

 

The NativeS3Filesystem does not work with Minio, since it directly goes to the 
default `.amazonaws.` endpoint. The S3 native library is deprecated in the 
3.3.4+ version of `hadoop-aws` library, but not used for yarn module for now.

 

I would suggest to remove the `s3n` test from the YarnFileStageTestS3ITCase 
test. What do you think?

> Migrate S3-related ITCases and e2e tests to Minio 
> --
>
> Key: FLINK-34508
> URL: https://issues.apache.org/jira/browse/FLINK-34508
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Muhammet Orazov
>Priority: Major
>  Labels: github-actions, pull-request-available
>
> Anything that uses {{org.apache.flink.testutils.s3.S3TestCredentials}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35078) Include Flink CDC pipeline connector sub-modules in the CI runs

2024-04-10 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17836021#comment-17836021
 ] 

Muhammet Orazov commented on FLINK-35078:
-

I will be happy to address this.

> Include Flink CDC pipeline connector sub-modules in the CI runs
> ---
>
> Key: FLINK-35078
> URL: https://issues.apache.org/jira/browse/FLINK-35078
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Reporter: Muhammet Orazov
>Priority: Major
>
> At the moment, on the the Flink CDC pipeline connector parent is build using 
> the GHA workflow. This misses the maven plugin checks on the sub-modules.
> Similar to the other modules we should list the pipeline connector modules 
> separately in the CI built.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35078) Include Flink CDC pipeline connector sub-modules in the CI runs

2024-04-10 Thread Muhammet Orazov (Jira)
Muhammet Orazov created FLINK-35078:
---

 Summary: Include Flink CDC pipeline connector sub-modules in the 
CI runs
 Key: FLINK-35078
 URL: https://issues.apache.org/jira/browse/FLINK-35078
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Muhammet Orazov


At the moment, on the the Flink CDC pipeline connector parent is build using 
the GHA workflow. This misses the maven plugin checks on the sub-modules.

Similar to the other modules we should list the pipeline connector modules 
separately in the CI built.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34470) Transactional message + Table api kafka source with 'latest-offset' scan bound mode causes indefinitely hanging

2024-04-09 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835321#comment-17835321
 ] 

Muhammet Orazov commented on FLINK-34470:
-

Hey [~dongwoo.kim] ,

 

Would you like to send a PR for it? It would be good to fix this for all 
transactional issues (including FLINK-33484), a good solution that works both 
for transactional, non-transactional and bounded mode cases.

 

I would be happy to collaborate, please let me know.

 

Best,

Muhammet

> Transactional message + Table api kafka source with 'latest-offset' scan 
> bound mode causes indefinitely hanging
> ---
>
> Key: FLINK-34470
> URL: https://issues.apache.org/jira/browse/FLINK-34470
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.17.1
>Reporter: dongwoo.kim
>Priority: Major
>
> h2. Summary  
> Hi we have faced issue with transactional message and table api kafka source. 
> If we configure *'scan.bounded.mode'* to *'latest-offset'* flink sql's 
> request timeouts after hanging. We can always reproduce this unexpected 
> behavior by following below steps.
> This is related to this 
> [issue|https://issues.apache.org/jira/browse/FLINK-33484] too.
> h2. How to reproduce
> 1. Deploy transactional producer and stop after producing certain amount of 
> messages
> 2. Configure *'scan.bounded.mode'* to *'latest-offset'* and submit simple 
> query such as getting count of the produced messages
> 3. Flink sql job gets stucked and timeouts.
> h2. Cause
> Transaction producer always produces [control 
> records|https://kafka.apache.org/documentation/#controlbatch] at the end of 
> the transaction. And these control messages are not polled by 
> {*}consumer.poll(){*}. (It is filtered internally). In 
> *KafkaPartitionSplitReader* code, split is finished only when 
> *lastRecord.offset() >= stoppingOffset - 1* condition is met. This might work 
> well with non transactional messages or streaming environment but in some 
> batch use cases it causes unexpected hanging.
> h2. Proposed solution
> {code:java}
> if (consumer.position(tp) >= stoppingOffset) {
> recordsBySplits.setPartitionStoppingOffset(tp, 
> stoppingOffset);
> finishSplitAtRecord(
> tp,
> stoppingOffset,
> lastRecord.offset(),
> finishedPartitions,
> recordsBySplits);
> }
> {code}
> Replacing if condition to *consumer.position(tp) >= stoppingOffset* in 
> [here|https://github.com/apache/flink-connector-kafka/blob/15f2662eccf461d9d539ed87a78c9851cd17fa43/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaPartitionSplitReader.java#L137]
>  can solve the problem. 
> *consumer.position(tp)* gets next record's offset if it exist and the last 
> record's offset if the next record doesn't exist. 
> By this KafkaPartitionSplitReader is available to finish the split even when 
> the stopping offset is configured to control record's offset. 
> I would be happy to implement about this fix if we can reach on agreement. 
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34937) Apache Infra GHA policy update

2024-03-26 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830874#comment-17830874
 ] 

Muhammet Orazov commented on FLINK-34937:
-

Hey [~mapohl], the first link is unreachable or is it only visible to infra 
subscribers?

 

Would it be possible to create some actionable points from the GHA policy? They 
are a little vague, not sure GHA offers matching settings for each point. For 
example, using something like `concurrency.cancel-in-progress` etc.

> Apache Infra GHA policy update
> --
>
> Key: FLINK-34937
> URL: https://issues.apache.org/jira/browse/FLINK-34937
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Major
>
> There is a policy update [announced in the infra 
> ML|https://lists.apache.org/thread/6qw21x44q88rc3mhkn42jgjjw94rsvb1] which 
> asked Apache projects to limit the number of runners per job. Additionally, 
> the [GHA policy|https://infra.apache.org/github-actions-policy.html] is 
> referenced which I wasn't aware of when working on the action workflow.
> This issue is about applying the policy to the Flink GHA workflows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34487) Integrate tools/azure-pipelines/build-python-wheels.yml into GHA nightly workflow

2024-03-24 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830257#comment-17830257
 ] 

Muhammet Orazov commented on FLINK-34487:
-

Hello [~mapohl], could you please have a look to the attached PR? Thanks a lot 
(y) 

> Integrate tools/azure-pipelines/build-python-wheels.yml into GHA nightly 
> workflow
> -
>
> Key: FLINK-34487
> URL: https://issues.apache.org/jira/browse/FLINK-34487
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Muhammet Orazov
>Priority: Major
>  Labels: github-actions, pull-request-available
>
> Analogously to the [Azure Pipelines nightly 
> config|https://github.com/apache/flink/blob/e923d4060b6dabe650a8950774d176d3e92437c2/tools/azure-pipelines/build-apache-repo.yml#L183]
>  we want to generate the wheels artifacts in the GHA nightly workflow as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33707) Verify the snapshot migration on Java17

2024-03-24 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830254#comment-17830254
 ] 

Muhammet Orazov commented on FLINK-33707:
-

Hello,

Would this affect the Flink 1.18 version?

> Verify the snapshot migration on Java17
> ---
>
> Key: FLINK-33707
> URL: https://issues.apache.org/jira/browse/FLINK-33707
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Reporter: Yun Tang
>Priority: Major
>
> This task is like FLINK-33699, I think we could introduce a 
> StatefulJobSnapshotMigrationITCase-like test to restore snapshots containing 
> scala code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34508) Migrate S3-related ITCases and e2e tests to Minio

2024-03-01 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822527#comment-17822527
 ] 

Muhammet Orazov commented on FLINK-34508:
-

Hey [~mapohl], let me take this task also. Could you please assign it to me?

> Migrate S3-related ITCases and e2e tests to Minio 
> --
>
> Key: FLINK-34508
> URL: https://issues.apache.org/jira/browse/FLINK-34508
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions
>
> Anything that uses {{org.apache.flink.testutils.s3.S3TestCredentials}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34487) Integrate tools/azure-pipelines/build-python-wheels.yml into GHA nightly workflow

2024-03-01 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822511#comment-17822511
 ] 

Muhammet Orazov commented on FLINK-34487:
-

Yes definitely! It is good initiative to move to GitHub actions as open-source 
project (y)

> Integrate tools/azure-pipelines/build-python-wheels.yml into GHA nightly 
> workflow
> -
>
> Key: FLINK-34487
> URL: https://issues.apache.org/jira/browse/FLINK-34487
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Muhammet Orazov
>Priority: Major
>  Labels: github-actions, pull-request-available
>
> Analogously to the [Azure Pipelines nightly 
> config|https://github.com/apache/flink/blob/e923d4060b6dabe650a8950774d176d3e92437c2/tools/azure-pipelines/build-apache-repo.yml#L183]
>  we want to generate the wheels artifacts in the GHA nightly workflow as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34487) Integrate tools/azure-pipelines/build-python-wheels.yml into GHA nightly workflow

2024-02-29 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822420#comment-17822420
 ] 

Muhammet Orazov commented on FLINK-34487:
-

Hey [~mapohl], I'd like to work on this. Could you please assign this to me? 
Thanks!

> Integrate tools/azure-pipelines/build-python-wheels.yml into GHA nightly 
> workflow
> -
>
> Key: FLINK-34487
> URL: https://issues.apache.org/jira/browse/FLINK-34487
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions
>
> Analogously to the [Azure Pipelines nightly 
> config|https://github.com/apache/flink/blob/e923d4060b6dabe650a8950774d176d3e92437c2/tools/azure-pipelines/build-apache-repo.yml#L183]
>  we want to generate the wheels artifacts in the GHA nightly workflow as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34419) flink-docker's .github/workflows/snapshot.yml doesn't support JDK 17 and 21

2024-02-28 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821627#comment-17821627
 ] 

Muhammet Orazov commented on FLINK-34419:
-

Hello [~mapohl], 

Thanks! Could you please have a look? I have send a PR for master branch.

I would like also to update dev-x branches if this looks good. The {{dev-x}} 
should be updated at least to test with new JDKs. 

> flink-docker's .github/workflows/snapshot.yml doesn't support JDK 17 and 21
> ---
>
> Key: FLINK-34419
> URL: https://issues.apache.org/jira/browse/FLINK-34419
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / CI
>Reporter: Matthias Pohl
>Assignee: Muhammet Orazov
>Priority: Major
>  Labels: pull-request-available, starter
>
> [.github/workflows/snapshot.yml|https://github.com/apache/flink-docker/blob/master/.github/workflows/snapshot.yml#L40]
>  needs to be updated: JDK 17 support was added in 1.18 (FLINK-15736). JDK 21 
> support was added in 1.19 (FLINK-33163)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34419) flink-docker's .github/workflows/snapshot.yml doesn't support JDK 17 and 21

2024-02-27 Thread Muhammet Orazov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821445#comment-17821445
 ] 

Muhammet Orazov commented on FLINK-34419:
-

Hello [~mapohl] ,

I would like to work on this issue. Could you please assign it to me?

> flink-docker's .github/workflows/snapshot.yml doesn't support JDK 17 and 21
> ---
>
> Key: FLINK-34419
> URL: https://issues.apache.org/jira/browse/FLINK-34419
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / CI
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: starter
>
> [.github/workflows/snapshot.yml|https://github.com/apache/flink-docker/blob/master/.github/workflows/snapshot.yml#L40]
>  needs to be updated: JDK 17 support was added in 1.18 (FLINK-15736). JDK 21 
> support was added in 1.19 (FLINK-33163)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)