[jira] [Updated] (FLINK-11116) Overwrite outdated in-progress files in StreamingFileSink.

2021-04-29 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-6:
---
Labels: auto-deprioritized-major pull-request-available  (was: 
pull-request-available stale-major)

> Overwrite outdated in-progress files in StreamingFileSink.
> --
>
> Key: FLINK-6
> URL: https://issues.apache.org/jira/browse/FLINK-6
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / FileSystem
>Affects Versions: 1.7.0
>Reporter: Kostas Kloudas
>Priority: Major
>  Labels: auto-deprioritized-major, pull-request-available
> Fix For: 1.7.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In order to guarantee exactly-once semantics, the streaming file sink is 
> implementing a two-phase commit protocol when writing files to the filesystem.
> Initially data is written to in-progress files. These files are then put into 
> "pending" state when they are completed (based on the rolling policy), and 
> they are finally committed when the checkpoint that put them in the "pending" 
> state is acknowledged as complete.
> The above shows that in the case that we have:
> 1) checkpoints A, B, C coming 
> 2) checkpoint A being acknowledged and 
> 3) failure
> Then we may have files that do not belong to any checkpoint (because B and C 
> were not considered successful). These files are currently not cleaned up.
> In order to reduce the amount of such files created, we removed the random 
> suffix from in-progress temporary files, so that the next in-progress file 
> that is opened for this part, overwrites them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-11116) Overwrite outdated in-progress files in StreamingFileSink.

2021-04-29 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-6:
---
Priority: Minor  (was: Major)

> Overwrite outdated in-progress files in StreamingFileSink.
> --
>
> Key: FLINK-6
> URL: https://issues.apache.org/jira/browse/FLINK-6
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / FileSystem
>Affects Versions: 1.7.0
>Reporter: Kostas Kloudas
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
> Fix For: 1.7.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In order to guarantee exactly-once semantics, the streaming file sink is 
> implementing a two-phase commit protocol when writing files to the filesystem.
> Initially data is written to in-progress files. These files are then put into 
> "pending" state when they are completed (based on the rolling policy), and 
> they are finally committed when the checkpoint that put them in the "pending" 
> state is acknowledged as complete.
> The above shows that in the case that we have:
> 1) checkpoints A, B, C coming 
> 2) checkpoint A being acknowledged and 
> 3) failure
> Then we may have files that do not belong to any checkpoint (because B and C 
> were not considered successful). These files are currently not cleaned up.
> In order to reduce the amount of such files created, we removed the random 
> suffix from in-progress temporary files, so that the next in-progress file 
> that is opened for this part, overwrites them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-11116) Overwrite outdated in-progress files in StreamingFileSink.

2021-04-22 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-6:
---
Labels: pull-request-available stale-major  (was: pull-request-available)

> Overwrite outdated in-progress files in StreamingFileSink.
> --
>
> Key: FLINK-6
> URL: https://issues.apache.org/jira/browse/FLINK-6
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / FileSystem
>Affects Versions: 1.7.0
>Reporter: Kostas Kloudas
>Priority: Major
>  Labels: pull-request-available, stale-major
> Fix For: 1.7.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In order to guarantee exactly-once semantics, the streaming file sink is 
> implementing a two-phase commit protocol when writing files to the filesystem.
> Initially data is written to in-progress files. These files are then put into 
> "pending" state when they are completed (based on the rolling policy), and 
> they are finally committed when the checkpoint that put them in the "pending" 
> state is acknowledged as complete.
> The above shows that in the case that we have:
> 1) checkpoints A, B, C coming 
> 2) checkpoint A being acknowledged and 
> 3) failure
> Then we may have files that do not belong to any checkpoint (because B and C 
> were not considered successful). These files are currently not cleaned up.
> In order to reduce the amount of such files created, we removed the random 
> suffix from in-progress temporary files, so that the next in-progress file 
> that is opened for this part, overwrites them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-11116) Overwrite outdated in-progress files in StreamingFileSink.

2019-02-11 Thread Tzu-Li (Gordon) Tai (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-6:

Fix Version/s: (was: 1.7.2)
   1.7.3

> Overwrite outdated in-progress files in StreamingFileSink.
> --
>
> Key: FLINK-6
> URL: https://issues.apache.org/jira/browse/FLINK-6
> Project: Flink
>  Issue Type: Improvement
>  Components: filesystem-connector
>Affects Versions: 1.7.0
>Reporter: Kostas Kloudas
>Assignee: Kostas Kloudas
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.3
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In order to guarantee exactly-once semantics, the streaming file sink is 
> implementing a two-phase commit protocol when writing files to the filesystem.
> Initially data is written to in-progress files. These files are then put into 
> "pending" state when they are completed (based on the rolling policy), and 
> they are finally committed when the checkpoint that put them in the "pending" 
> state is acknowledged as complete.
> The above shows that in the case that we have:
> 1) checkpoints A, B, C coming 
> 2) checkpoint A being acknowledged and 
> 3) failure
> Then we may have files that do not belong to any checkpoint (because B and C 
> were not considered successful). These files are currently not cleaned up.
> In order to reduce the amount of such files created, we removed the random 
> suffix from in-progress temporary files, so that the next in-progress file 
> that is opened for this part, overwrites them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11116) Overwrite outdated in-progress files in StreamingFileSink.

2018-12-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-6:
---
Labels: pull-request-available  (was: )

> Overwrite outdated in-progress files in StreamingFileSink.
> --
>
> Key: FLINK-6
> URL: https://issues.apache.org/jira/browse/FLINK-6
> Project: Flink
>  Issue Type: Improvement
>  Components: filesystem-connector
>Affects Versions: 1.7.0
>Reporter: Kostas Kloudas
>Assignee: Kostas Kloudas
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.2
>
>
> In order to guarantee exactly-once semantics, the streaming file sink is 
> implementing a two-phase commit protocol when writing files to the filesystem.
> Initially data is written to in-progress files. These files are then put into 
> "pending" state when they are completed (based on the rolling policy), and 
> they are finally committed when the checkpoint that put them in the "pending" 
> state is acknowledged as complete.
> The above shows that in the case that we have:
> 1) checkpoints A, B, C coming 
> 2) checkpoint A being acknowledged and 
> 3) failure
> Then we may have files that do not belong to any checkpoint (because B and C 
> were not considered successful). These files are currently not cleaned up.
> In order to reduce the amount of such files created, we removed the random 
> suffix from in-progress temporary files, so that the next in-progress file 
> that is opened for this part, overwrites them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11116) Overwrite outdated in-progress files in StreamingFileSink.

2018-12-14 Thread Kostas Kloudas (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kostas Kloudas updated FLINK-6:
---
Description: 
In order to guarantee exactly-once semantics, the streaming file sink is 
implementing a two-phase commit protocol when writing files to the filesystem.

Initially data is written to in-progress files. These files are then put into 
"pending" state when they are completed (based on the rolling policy), and they 
are finally committed when the checkpoint that put them in the "pending" state 
is acknowledged as complete.

The above shows that in the case that we have:
1) checkpoints A, B, C coming 
2) checkpoint A being acknowledged and 
3) failure

Then we may have files that do not belong to any checkpoint (because B and C 
were not considered successful). These files are currently not cleaned up.

In order to reduce the amount of such files created, we removed the random 
suffix from in-progress temporary files, so that the next in-progress file that 
is opened for this part, overwrites them.

  was:
In order to guarantee exactly-once semantics, the streaming file sink is 
implementing a two-phase commit protocol when writing files to the filesystem.

Initially data is written to in-progress files. These files are then put into 
"pending" state when they are completed (based on the rolling policy), and they 
are finally committed when the checkpoint that put them in the "pending" state 
is acknowledged as complete.

The above shows that in the case that we have:
1) checkpoints A, B, C coming 
2) checkpoint A being acknowledged and 
3) failure

Then we may have files that do not belong to any checkpoint (because B and C 
were not considered successful). These files are currently not cleaned up.

This issue aims at cleaning up these files.


> Overwrite outdated in-progress files in StreamingFileSink.
> --
>
> Key: FLINK-6
> URL: https://issues.apache.org/jira/browse/FLINK-6
> Project: Flink
>  Issue Type: Improvement
>  Components: filesystem-connector
>Affects Versions: 1.7.0
>Reporter: Kostas Kloudas
>Assignee: Kostas Kloudas
>Priority: Major
> Fix For: 1.7.2
>
>
> In order to guarantee exactly-once semantics, the streaming file sink is 
> implementing a two-phase commit protocol when writing files to the filesystem.
> Initially data is written to in-progress files. These files are then put into 
> "pending" state when they are completed (based on the rolling policy), and 
> they are finally committed when the checkpoint that put them in the "pending" 
> state is acknowledged as complete.
> The above shows that in the case that we have:
> 1) checkpoints A, B, C coming 
> 2) checkpoint A being acknowledged and 
> 3) failure
> Then we may have files that do not belong to any checkpoint (because B and C 
> were not considered successful). These files are currently not cleaned up.
> In order to reduce the amount of such files created, we removed the random 
> suffix from in-progress temporary files, so that the next in-progress file 
> that is opened for this part, overwrites them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11116) Overwrite outdated in-progress files in StreamingFileSink.

2018-12-14 Thread Kostas Kloudas (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kostas Kloudas updated FLINK-6:
---
Summary: Overwrite outdated in-progress files in StreamingFileSink.  (was: 
Clean-up temporary files that upon recovery, they belong to no checkpoint.)

> Overwrite outdated in-progress files in StreamingFileSink.
> --
>
> Key: FLINK-6
> URL: https://issues.apache.org/jira/browse/FLINK-6
> Project: Flink
>  Issue Type: Improvement
>  Components: filesystem-connector
>Affects Versions: 1.7.0
>Reporter: Kostas Kloudas
>Assignee: Kostas Kloudas
>Priority: Major
> Fix For: 1.7.2
>
>
> In order to guarantee exactly-once semantics, the streaming file sink is 
> implementing a two-phase commit protocol when writing files to the filesystem.
> Initially data is written to in-progress files. These files are then put into 
> "pending" state when they are completed (based on the rolling policy), and 
> they are finally committed when the checkpoint that put them in the "pending" 
> state is acknowledged as complete.
> The above shows that in the case that we have:
> 1) checkpoints A, B, C coming 
> 2) checkpoint A being acknowledged and 
> 3) failure
> Then we may have files that do not belong to any checkpoint (because B and C 
> were not considered successful). These files are currently not cleaned up.
> This issue aims at cleaning up these files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)