[jira] [Commented] (FLINK-19589) Expose S3 options for tagging and object lifecycle policy for FileSystem

2021-10-25 Thread Padarn Wilson (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434053#comment-17434053
 ] 

Padarn Wilson commented on FLINK-19589:
---

Sorry I can't make time to do this myself as I stopped working with Flink and 
it became a very low priority for our work.

> Expose S3 options for tagging and object lifecycle policy for FileSystem
> 
>
> Key: FLINK-19589
> URL: https://issues.apache.org/jira/browse/FLINK-19589
> Project: Flink
>  Issue Type: Improvement
>  Components: FileSystems
>Affects Versions: 1.12.0
>Reporter: Padarn Wilson
>Priority: Minor
>  Labels: auto-unassigned, stale-minor
>
> This ticket proposes to expose the management of two properties related S3 
> Object management:
>  - [Lifecycle configuration 
> |https://docs.aws.amazon.com/AmazonS3/latest/dev/intro-lifecycle-rules.html]
>  - [Object 
> tagging|https://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.htm]
> Being able to control these is useful for people who want to manage jobs 
> using S3 for checkpointing or job output, but need to control per job level 
> configuration of the tagging/lifecycle for the purposes of auditing or cost 
> control (for example deleting old state from S3)
> Ideally, it would be possible to control this on each object being written by 
> Flink, or at least at a job level.
> _Note_*:* Some related existing properties can be set using the hadoop module 
> using system properties: see for example 
> {code:java}
> fs.s3a.acl.default{code}
> which sets the default ACL on written objects.
> *Solutions*:
> 1) Modify hadoop module:
> The above-linked module could be updated in order to have a new property (and 
> similar for lifecycle)
>  fs.s3a.tags.default
>  which could be a comma separated list of tags to set. For example
> {code:java}
> fs.s3a.acl.default = "jobname:JOBNAME,owner:OWNER"{code}
> This seems like a natural place to put this logic (and is outside of Flink if 
> we decide to go this way. However it does not allow for a sink and checkpoint 
> to have different values for these.
> 2) Expose withTagging from module
> The hadoop module used by Flink's existing filesystem has already exposed put 
> request level tagging (see 
> [this|https://github.com/aws/aws-sdk-java/blob/c06822732612d7208927d2a678073098522085c3/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/model/PutObjectRequest.java#L292]).
>  This could be used in the Flink filesystem plugin to expose these options. A 
> possible approach could be to somehow incorporate it into the file path, e.g.,
> {code:java}
> path = "TAGS:s3://bucket/path"{code}
>  Or possible as an option that can be applied to the checkpoint and sink 
> configurations, e.g.,
> {code:java}
> env.getCheckpointingConfig().setS3Tags(TAGS) {code}
> and similar for a file sink.
> _Note_: The lifecycle can also be managed using the module: see 
> [here|https://docs.aws.amazon.com/AmazonS3/latest/dev/manage-lifecycle-using-java.html].
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19589) Expose S3 options for tagging and object lifecycle policy for FileSystem

2021-01-09 Thread Padarn Wilson (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262081#comment-17262081
 ] 

Padarn Wilson commented on FLINK-19589:
---

Noted. Then let me put together a MR and we can discuss finer details there.

> Expose S3 options for tagging and object lifecycle policy for FileSystem
> 
>
> Key: FLINK-19589
> URL: https://issues.apache.org/jira/browse/FLINK-19589
> Project: Flink
>  Issue Type: Improvement
>  Components: FileSystems
>Affects Versions: 1.12.0
>Reporter: Padarn Wilson
>Assignee: Padarn Wilson
>Priority: Minor
>
> This ticket proposes to expose the management of two properties related S3 
> Object management:
>  - [Lifecycle configuration 
> |https://docs.aws.amazon.com/AmazonS3/latest/dev/intro-lifecycle-rules.html]
>  - [Object 
> tagging|https://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.htm]
> Being able to control these is useful for people who want to manage jobs 
> using S3 for checkpointing or job output, but need to control per job level 
> configuration of the tagging/lifecycle for the purposes of auditing or cost 
> control (for example deleting old state from S3)
> Ideally, it would be possible to control this on each object being written by 
> Flink, or at least at a job level.
> _Note_*:* Some related existing properties can be set using the hadoop module 
> using system properties: see for example 
> {code:java}
> fs.s3a.acl.default{code}
> which sets the default ACL on written objects.
> *Solutions*:
> 1) Modify hadoop module:
> The above-linked module could be updated in order to have a new property (and 
> similar for lifecycle)
>  fs.s3a.tags.default
>  which could be a comma separated list of tags to set. For example
> {code:java}
> fs.s3a.acl.default = "jobname:JOBNAME,owner:OWNER"{code}
> This seems like a natural place to put this logic (and is outside of Flink if 
> we decide to go this way. However it does not allow for a sink and checkpoint 
> to have different values for these.
> 2) Expose withTagging from module
> The hadoop module used by Flink's existing filesystem has already exposed put 
> request level tagging (see 
> [this|https://github.com/aws/aws-sdk-java/blob/c06822732612d7208927d2a678073098522085c3/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/model/PutObjectRequest.java#L292]).
>  This could be used in the Flink filesystem plugin to expose these options. A 
> possible approach could be to somehow incorporate it into the file path, e.g.,
> {code:java}
> path = "TAGS:s3://bucket/path"{code}
>  Or possible as an option that can be applied to the checkpoint and sink 
> configurations, e.g.,
> {code:java}
> env.getCheckpointingConfig().setS3Tags(TAGS) {code}
> and similar for a file sink.
> _Note_: The lifecycle can also be managed using the module: see 
> [here|https://docs.aws.amazon.com/AmazonS3/latest/dev/manage-lifecycle-using-java.html].
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19589) Expose S3 options for tagging and object lifecycle policy for FileSystem

2020-10-20 Thread Padarn Wilson (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218027#comment-17218027
 ] 

Padarn Wilson commented on FLINK-19589:
---

[~AHeise] agreed - when I first asked the question I hadn't looked into it in 
enough depth to see option 1), but it seems like the natural one. If we go this 
way I assume I should open the issue on their community JIRA. 

[~rmetzger] interesting idea. We could perhaps add a new abstraction something 
like "WriteRequest", which in the case of S3 would be the equivalent of the 
"PutObjectRequest". I'm not sure if this would have many uses outside of the  
current case.

> Expose S3 options for tagging and object lifecycle policy for FileSystem
> 
>
> Key: FLINK-19589
> URL: https://issues.apache.org/jira/browse/FLINK-19589
> Project: Flink
>  Issue Type: Improvement
>  Components: FileSystems
>Affects Versions: 1.12.0
>Reporter: Padarn Wilson
>Assignee: Padarn Wilson
>Priority: Minor
>
> This ticket proposes to expose the management of two properties related S3 
> Object management:
>  - [Lifecycle configuration 
> |https://docs.aws.amazon.com/AmazonS3/latest/dev/intro-lifecycle-rules.html]
>  - [Object 
> tagging|https://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.htm]
> Being able to control these is useful for people who want to manage jobs 
> using S3 for checkpointing or job output, but need to control per job level 
> configuration of the tagging/lifecycle for the purposes of auditing or cost 
> control (for example deleting old state from S3)
> Ideally, it would be possible to control this on each object being written by 
> Flink, or at least at a job level.
> _Note_*:* Some related existing properties can be set using the hadoop module 
> using system properties: see for example 
> {code:java}
> fs.s3a.acl.default{code}
> which sets the default ACL on written objects.
> *Solutions*:
> 1) Modify hadoop module:
> The above-linked module could be updated in order to have a new property (and 
> similar for lifecycle)
>  fs.s3a.tags.default
>  which could be a comma separated list of tags to set. For example
> {code:java}
> fs.s3a.acl.default = "jobname:JOBNAME,owner:OWNER"{code}
> This seems like a natural place to put this logic (and is outside of Flink if 
> we decide to go this way. However it does not allow for a sink and checkpoint 
> to have different values for these.
> 2) Expose withTagging from module
> The hadoop module used by Flink's existing filesystem has already exposed put 
> request level tagging (see 
> [this|https://github.com/aws/aws-sdk-java/blob/c06822732612d7208927d2a678073098522085c3/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/model/PutObjectRequest.java#L292]).
>  This could be used in the Flink filesystem plugin to expose these options. A 
> possible approach could be to somehow incorporate it into the file path, e.g.,
> {code:java}
> path = "TAGS:s3://bucket/path"{code}
>  Or possible as an option that can be applied to the checkpoint and sink 
> configurations, e.g.,
> {code:java}
> env.getCheckpointingConfig().setS3Tags(TAGS) {code}
> and similar for a file sink.
> _Note_: The lifecycle can also be managed using the module: see 
> [here|https://docs.aws.amazon.com/AmazonS3/latest/dev/manage-lifecycle-using-java.html].
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19589) Expose S3 options for tagging and object lifecycle policy for FileSystem

2020-10-18 Thread Padarn Wilson (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216110#comment-17216110
 ] 

Padarn Wilson commented on FLINK-19589:
---

[~AHeise] I've updated the ticket with more details and some discussion about 
the solution. Can you let me know if more detail is needed to understand the 
proposed changes?

> Expose S3 options for tagging and object lifecycle policy for FileSystem
> 
>
> Key: FLINK-19589
> URL: https://issues.apache.org/jira/browse/FLINK-19589
> Project: Flink
>  Issue Type: Improvement
>  Components: FileSystems
>Affects Versions: 1.12.0
>Reporter: Padarn Wilson
>Assignee: Padarn Wilson
>Priority: Minor
>
> This ticket proposes to expose the management of two properties related S3 
> Object management:
>  - [Lifecycle configuration 
> |https://docs.aws.amazon.com/AmazonS3/latest/dev/intro-lifecycle-rules.html]
>  - [Object 
> tagging|https://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.htm]
> Being able to control these is useful for people who want to manage jobs 
> using S3 for checkpointing or job output, but need to control per job level 
> configuration of the tagging/lifecycle for the purposes of auditing or cost 
> control (for example deleting old state from S3)
> Ideally, it would be possible to control this on each object being written by 
> Flink, or at least at a job level.
> _Note_*:* Some related existing properties can be set using the hadoop module 
> using system properties: see for example 
> {code:java}
> fs.s3a.acl.default{code}
> which sets the default ACL on written objects.
> *Solutions*:
> 1) Modify hadoop module:
> The above-linked module could be updated in order to have a new property (and 
> similar for lifecycle)
>  fs.s3a.tags.default
>  which could be a comma separated list of tags to set. For example
> {code:java}
> fs.s3a.acl.default = "jobname:JOBNAME,owner:OWNER"{code}
> This seems like a natural place to put this logic (and is outside of Flink if 
> we decide to go this way. However it does not allow for a sink and checkpoint 
> to have different values for these.
> 2) Expose withTagging from module
> The hadoop module used by Flink's existing filesystem has already exposed put 
> request level tagging (see 
> [this|https://github.com/aws/aws-sdk-java/blob/c06822732612d7208927d2a678073098522085c3/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/model/PutObjectRequest.java#L292]).
>  This could be used in the Flink filesystem plugin to expose these options. A 
> possible approach could be to somehow incorporate it into the file path, e.g.,
> {code:java}
> path = "TAGS:s3://bucket/path"{code}
>  Or possible as an option that can be applied to the checkpoint and sink 
> configurations, e.g.,
> {code:java}
> env.getCheckpointingConfig().setS3Tags(TAGS) {code}
> and similar for a file sink.
> _Note_: The lifecycle can also be managed using the module: see 
> [here|https://docs.aws.amazon.com/AmazonS3/latest/dev/manage-lifecycle-using-java.html].
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19589) Expose S3 options for tagging and object lifecycle policy for FileSystem

2020-10-18 Thread Padarn Wilson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Padarn Wilson updated FLINK-19589:
--
Description: 
This ticket proposes to expose the management of two properties related S3 
Object management:
 - [Lifecycle configuration 
|https://docs.aws.amazon.com/AmazonS3/latest/dev/intro-lifecycle-rules.html]
 - [Object 
tagging|https://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.htm]

Being able to control these is useful for people who want to manage jobs using 
S3 for checkpointing or job output, but need to control per job level 
configuration of the tagging/lifecycle for the purposes of auditing or cost 
control (for example deleting old state from S3)

Ideally, it would be possible to control this on each object being written by 
Flink, or at least at a job level.

_Note_*:* Some related existing properties can be set using the hadoop module 
using system properties: see for example 
{code:java}
fs.s3a.acl.default{code}
which sets the default ACL on written objects.

*Solutions*:

1) Modify hadoop module:

The above-linked module could be updated in order to have a new property (and 
similar for lifecycle)
 fs.s3a.tags.default
 which could be a comma separated list of tags to set. For example
{code:java}
fs.s3a.acl.default = "jobname:JOBNAME,owner:OWNER"{code}
This seems like a natural place to put this logic (and is outside of Flink if 
we decide to go this way. However it does not allow for a sink and checkpoint 
to have different values for these.

2) Expose withTagging from module

The hadoop module used by Flink's existing filesystem has already exposed put 
request level tagging (see 
[this|https://github.com/aws/aws-sdk-java/blob/c06822732612d7208927d2a678073098522085c3/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/model/PutObjectRequest.java#L292]).
 This could be used in the Flink filesystem plugin to expose these options. A 
possible approach could be to somehow incorporate it into the file path, e.g.,
{code:java}
path = "TAGS:s3://bucket/path"{code}
 Or possible as an option that can be applied to the checkpoint and sink 
configurations, e.g.,
{code:java}
env.getCheckpointingConfig().setS3Tags(TAGS) {code}
and similar for a file sink.

_Note_: The lifecycle can also be managed using the module: see 
[here|https://docs.aws.amazon.com/AmazonS3/latest/dev/manage-lifecycle-using-java.html].

 

 

 

  was:
This ticket proposes to expose the management of two properties related S3 
Object management:
 - [Lifecycle configuration 
|https://docs.aws.amazon.com/AmazonS3/latest/dev/intro-lifecycle-rules.html]
 - [Object 
tagging|https://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.htm]

Being able to control these is useful for people who want to manage jobs using 
S3 for checkpointing or job output, but need to control per job level 
configuration of the tagging/lifecycle for the purposes of auditing or cost 
control (for example deleting old state from S3)

Ideally, it would be possible to control this on each object being written by 
Flink, or at least at a job level.

_Note_*:* Some related existing properties can be set using the hadoop module 
using system properties: see for example 
 fs.s3a.acl.default
 which sets the default ACL on written objects.

*Solutions*:

1) Modify hadoop module:

The above-linked module could be updated in order to have a new property (and 
similar for lifecycle)
 fs.s3a.tags.default
 which could be a comma separated list of tags to set. For example
 fs.s3a.acl.default = "jobname:JOBNAME,owner:OWNER"
 This seems like a natural place to put this logic (and is outside of Flink if 
we decide to go this way. However it does not allow for a sink and checkpoint 
to have different values for these.

2) Expose withTagging from module

The hadoop module used by Flink's existing filesystem has already exposed put 
request level tagging (see 
[this|https://github.com/aws/aws-sdk-java/blob/c06822732612d7208927d2a678073098522085c3/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/model/PutObjectRequest.java#L292]).
 This could be used in the Flink filesystem plugin to expose these options. A 
possible approach could be to somehow incorporate it into the file path, e.g.,
 path = "TAGS:s3://bucket/path"
  Or possible as an option that can be applied to the checkpoint and sink 
configurations, e.g.,
 env.getCheckpointingConfig().setS3Tags(TAGS) 
 and similar for a file sink.

_Note_: The lifecycle can also be managed using the module: see 
[here|https://docs.aws.amazon.com/AmazonS3/latest/dev/manage-lifecycle-using-java.html].

 

 

 


> Expose S3 options for tagging and object lifecycle policy for FileSystem
> 
>
> Key: FLINK-19589
> URL: https://issues.apache.org/jira/browse/FLINK-19589
> Project: Flink
>  

[jira] [Updated] (FLINK-19589) Expose S3 options for tagging and object lifecycle policy for FileSystem

2020-10-18 Thread Padarn Wilson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Padarn Wilson updated FLINK-19589:
--
Description: 
This ticket proposes to expose the management of two properties related S3 
Object management:
 - [Lifecycle configuration 
|https://docs.aws.amazon.com/AmazonS3/latest/dev/intro-lifecycle-rules.html]
 - [Object 
tagging|https://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.htm]

Being able to control these is useful for people who want to manage jobs using 
S3 for checkpointing or job output, but need to control per job level 
configuration of the tagging/lifecycle for the purposes of auditing or cost 
control (for example deleting old state from S3)

Ideally, it would be possible to control this on each object being written by 
Flink, or at least at a job level.

_Note_*:* Some related existing properties can be set using the hadoop module 
using system properties: see for example 
 fs.s3a.acl.default
 which sets the default ACL on written objects.

*Solutions*:

1) Modify hadoop module:

The above-linked module could be updated in order to have a new property (and 
similar for lifecycle)
 fs.s3a.tags.default
 which could be a comma separated list of tags to set. For example
 fs.s3a.acl.default = "jobname:JOBNAME,owner:OWNER"
 This seems like a natural place to put this logic (and is outside of Flink if 
we decide to go this way. However it does not allow for a sink and checkpoint 
to have different values for these.

2) Expose withTagging from module

The hadoop module used by Flink's existing filesystem has already exposed put 
request level tagging (see 
[this|https://github.com/aws/aws-sdk-java/blob/c06822732612d7208927d2a678073098522085c3/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/model/PutObjectRequest.java#L292]).
 This could be used in the Flink filesystem plugin to expose these options. A 
possible approach could be to somehow incorporate it into the file path, e.g.,
 path = "TAGS:s3://bucket/path"
  Or possible as an option that can be applied to the checkpoint and sink 
configurations, e.g.,
 env.getCheckpointingConfig().setS3Tags(TAGS) 
 and similar for a file sink.

_Note_: The lifecycle can also be managed using the module: see 
[here|https://docs.aws.amazon.com/AmazonS3/latest/dev/manage-lifecycle-using-java.html].

 

 

 

  was:
This ticket proposes to expose the management of two properties related S3 
Object management:
- [Lifecycle configuration 
|https://docs.aws.amazon.com/AmazonS3/latest/dev/intro-lifecycle-rules.html]
- [Object tagging
|https://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.html]

Being able to control these is useful for people who want to manage jobs using 
S3 for checkpointing or job output, but need to control per job level 
configuration of the tagging/lifecycle for the purposes of auditing or cost 
control (for example deleting old state from S3)

Ideally, it would be possible to control this on each object being written by 
Flink, or at least at a job level.

_Note_*:* Some related existing properties can be set using the hadoop module 
using system properties: see for example 
fs.s3a.acl.default
which sets the default ACL on written objects.

*Solutions*:

1) Modify hadoop module:

The above-linked module could be updated in order to have a new property (and 
similar for lifecycle)
fs.s3a.tags.default
which could be a comma separated list of tags to set. For example
fs.s3a.acl.default = "jobname:JOBNAME,owner:OWNER"
This seems like a natural place to put this logic (and is outside of Flink if 
we decide to go this way. However it does not allow for a sink and checkpoint 
to have different values for these.

2) Expose withTagging from module

The hadoop module used by Flink's existing filesystem has already exposed put 
request level tagging (see 
[this|https://github.com/aws/aws-sdk-java/blob/c06822732612d7208927d2a678073098522085c3/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/model/PutObjectRequest.java#L292]).
 This could be used in the Flink filesystem plugin to expose these options. A 
possible approach could be to somehow incorporate it into the file path, e.g.,
path = "TAGS:s3://bucket/path"
 Or possible as an option that can be applied to the checkpoint and sink 
configurations, e.g.,
env.getCheckpointingConfig().setS3Tags(TAGS) 
and similar for a file sink.

_Note_: The lifecycle can also be managed using the module: see 
[here|https://docs.aws.amazon.com/AmazonS3/latest/dev/manage-lifecycle-using-java.html].

 

 

 


> Expose S3 options for tagging and object lifecycle policy for FileSystem
> 
>
> Key: FLINK-19589
> URL: https://issues.apache.org/jira/browse/FLINK-19589
> Project: Flink
>  Issue Type: Improvement
>  Components: FileSystems
>Affects 

[jira] [Updated] (FLINK-19589) Expose S3 options for tagging and object lifecycle policy for FileSystem

2020-10-18 Thread Padarn Wilson (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Padarn Wilson updated FLINK-19589:
--
Description: 
This ticket proposes to expose the management of two properties related S3 
Object management:
- [Lifecycle configuration 
|https://docs.aws.amazon.com/AmazonS3/latest/dev/intro-lifecycle-rules.html]
- [Object tagging
|https://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.html]

Being able to control these is useful for people who want to manage jobs using 
S3 for checkpointing or job output, but need to control per job level 
configuration of the tagging/lifecycle for the purposes of auditing or cost 
control (for example deleting old state from S3)

Ideally, it would be possible to control this on each object being written by 
Flink, or at least at a job level.

_Note_*:* Some related existing properties can be set using the hadoop module 
using system properties: see for example 
fs.s3a.acl.default
which sets the default ACL on written objects.

*Solutions*:

1) Modify hadoop module:

The above-linked module could be updated in order to have a new property (and 
similar for lifecycle)
fs.s3a.tags.default
which could be a comma separated list of tags to set. For example
fs.s3a.acl.default = "jobname:JOBNAME,owner:OWNER"
This seems like a natural place to put this logic (and is outside of Flink if 
we decide to go this way. However it does not allow for a sink and checkpoint 
to have different values for these.

2) Expose withTagging from module

The hadoop module used by Flink's existing filesystem has already exposed put 
request level tagging (see 
[this|https://github.com/aws/aws-sdk-java/blob/c06822732612d7208927d2a678073098522085c3/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/model/PutObjectRequest.java#L292]).
 This could be used in the Flink filesystem plugin to expose these options. A 
possible approach could be to somehow incorporate it into the file path, e.g.,
path = "TAGS:s3://bucket/path"
 Or possible as an option that can be applied to the checkpoint and sink 
configurations, e.g.,
env.getCheckpointingConfig().setS3Tags(TAGS) 
and similar for a file sink.

_Note_: The lifecycle can also be managed using the module: see 
[here|https://docs.aws.amazon.com/AmazonS3/latest/dev/manage-lifecycle-using-java.html].

 

 

 

  was:
Purpose: To expose object tagging and lifecycle options for objects created by 
s3 FileSystems.

The current s3 hadoop (and presto) FileSystems wrap around existing 
implementations (e.g for the flink-s3-fs-hadoop file system uses 
org.apache.hadoop's implementation). These implementation expose functions such 
as `setObjectTagging` when uploading objects, but these cannot be used by Flink 
users currently.

I propose we extend the configuration of these filesystems to expose options to 
see tags and lifecycle for s3 objects created by a Flink job. 

Summary: Expose S3 options for tagging and object lifecycle policy for 
FileSystem  (was: Expose extra options in the S3 FileSystem)

> Expose S3 options for tagging and object lifecycle policy for FileSystem
> 
>
> Key: FLINK-19589
> URL: https://issues.apache.org/jira/browse/FLINK-19589
> Project: Flink
>  Issue Type: Improvement
>  Components: FileSystems
>Affects Versions: 1.12.0
>Reporter: Padarn Wilson
>Assignee: Padarn Wilson
>Priority: Minor
>
> This ticket proposes to expose the management of two properties related S3 
> Object management:
> - [Lifecycle configuration 
> |https://docs.aws.amazon.com/AmazonS3/latest/dev/intro-lifecycle-rules.html]
> - [Object tagging
> |https://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.html]
> Being able to control these is useful for people who want to manage jobs 
> using S3 for checkpointing or job output, but need to control per job level 
> configuration of the tagging/lifecycle for the purposes of auditing or cost 
> control (for example deleting old state from S3)
> Ideally, it would be possible to control this on each object being written by 
> Flink, or at least at a job level.
> _Note_*:* Some related existing properties can be set using the hadoop module 
> using system properties: see for example 
> fs.s3a.acl.default
> which sets the default ACL on written objects.
> *Solutions*:
> 1) Modify hadoop module:
> The above-linked module could be updated in order to have a new property (and 
> similar for lifecycle)
> fs.s3a.tags.default
> which could be a comma separated list of tags to set. For example
> fs.s3a.acl.default = "jobname:JOBNAME,owner:OWNER"
> This seems like a natural place to put this logic (and is outside of Flink if 
> we decide to go this way. However it does not allow for a sink and checkpoint 
> to have different values for these.

[jira] [Created] (FLINK-19589) Expose extra options in the S3 FileSystem

2020-10-12 Thread Padarn Wilson (Jira)
Padarn Wilson created FLINK-19589:
-

 Summary: Expose extra options in the S3 FileSystem
 Key: FLINK-19589
 URL: https://issues.apache.org/jira/browse/FLINK-19589
 Project: Flink
  Issue Type: Improvement
  Components: FileSystems
Reporter: Padarn Wilson


Purpose: To expose object tagging and lifecycle options for objects created by 
s3 FileSystems.

The current s3 hadoop (and presto) FileSystems wrap around existing 
implementations (e.g for the flink-s3-fs-hadoop file system uses 
org.apache.hadoop's implementation). These implementation expose functions such 
as `setObjectTagging` when uploading objects, but these cannot be used by Flink 
users currently.

I propose we extend the configuration of these filesystems to expose options to 
see tags and lifecycle for s3 objects created by a Flink job. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-5479) Per-partition watermarks in FlinkKafkaConsumer should consider idle partitions

2019-03-08 Thread Padarn Wilson (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787991#comment-16787991
 ] 

Padarn Wilson commented on FLINK-5479:
--

Hi all - Is this still an open issue that anyone is working on? Was thinking of 
taking a crack at it, but thought I'd check here first.

> Per-partition watermarks in FlinkKafkaConsumer should consider idle partitions
> --
>
> Key: FLINK-5479
> URL: https://issues.apache.org/jira/browse/FLINK-5479
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kafka
>Reporter: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: 1.8.0
>
>
> Reported in ML: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html
> Similar to what's happening to idle sources blocking watermark progression in 
> downstream operators (see FLINK-5017), the per-partition watermark mechanism 
> in {{FlinkKafkaConsumer}} is also being blocked of progressing watermarks 
> when a partition is idle. The watermark of idle partitions is always 
> {{Long.MIN_VALUE}}, therefore the overall min watermark across all partitions 
> of a consumer subtask will never proceed.
> It's normally not a common case to have Kafka partitions not producing any 
> data, but it'll probably be good to handle this as well. I think we should 
> have a localized solution similar to FLINK-5017 for the per-partition 
> watermarks in {{AbstractFetcher}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)