EMERSON WANG created FLINK-35521:
------------------------------------

             Summary: Flink FileSystem SQL Connector Generating SUCESS File 
Multiple Times
                 Key: FLINK-35521
                 URL: https://issues.apache.org/jira/browse/FLINK-35521
             Project: Flink
          Issue Type: Improvement
          Components: Connectors / FileSystem
    Affects Versions: 1.18.1
         Environment: Our PyFlink SQL jobs are running in AWS EKS environment.
            Reporter: EMERSON WANG


Our Flink table SQL job received data from the Kafka streams and then sinked 
all partitioned data into the associated parquet files under the same S3 folder
through the filesystem SQL connector.

For the S3 filesystem SQL connector, sink.partition-commit.policy.kind was set 
to 'success-file' and sink.partition-commit.trigger was set
to 'partition-time'. We found that _SUCCESS file in the S3 folder was generated 
multiple times after multiple partitions are committed.

Because all partitioned parquet files and _SUCCESS file are in the same S3 
folder and _SUCCESS file is used to trigger the downstream application, we 
really like the _SUCCESS file to be generated only once instead of multiple 
times after all partitions are committed and all parquet files are ready to be 
processed.
Thus, one _SUCCESS file can be used to trigger the downstream application only 
once instead of multiple times.

We knew we could set sink.partition-commit.trigger to 'process-time' to 
generate _SUCCESS file only once in the S3 folder; however, 'process-time' 
would not meet our business requirements.

We'd request the FileSystem SQL connector should support to the following new 
user case:
Even if sink.partition-commit.trigger is set to 'partition-time', _SUCCESS file 
will be generated only once after all partitions are committed and all output 
files are ready to be processed, and will be used to trigger the downstream 
application only once instead of multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to