[jira] [Commented] (ARROW-16746) [C++][Python] S3 tag support on write

2022-12-06 Thread Apache Arrow JIRA Bot (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643983#comment-17643983
 ] 

Apache Arrow JIRA Bot commented on ARROW-16746:
---

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [C++][Python] S3 tag support on write
> -
>
> Key: ARROW-16746
> URL: https://issues.apache.org/jira/browse/ARROW-16746
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: André Kelpe
>Assignee: Quang Hoang
>Priority: Major
>  Labels: good-second-issue
>
> S3 allows tagging data to better organize ones data 
> ([https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html)] 
> We use this for efficient downstream processes/inventory management.
> Currently arrow/pyarrow does not allow tags to be added on write. This is 
> causing us to scan the bucket and re-apply the tags after a pyrrow based 
> process has run.
> I looked through the code and think that it could potentially be done via the 
> metadata mechanism.
> The tags need to be added to the CreateMultipartUploadRequest here: 
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/s3fs.cc#L1156
> See also
> http://sdk.amazonaws.com/cpp/api/LATEST/class_aws_1_1_s3_1_1_model_1_1_create_multipart_upload_request.html#af791f34a65dc69bd681d6995313be2da



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16746) [C++][Python] S3 tag support on write

2022-06-10 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552922#comment-17552922
 ] 

Steve Loughran commented on ARROW-16746:


yes. we use them a bit in the s3a committers, to annotate a zero byte marker 
file with the length they will finally ;get when manifest at their destination. 
in HADOOP-17833 that's beiing exposed in the createFile(path) buiilder api, 
where apps can set headers at create time. presumably gcs and azure could be 
wired up differently. they both have the advantage that you can edit file 
attributes after creation.

> [C++][Python] S3 tag support on write
> -
>
> Key: ARROW-16746
> URL: https://issues.apache.org/jira/browse/ARROW-16746
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: André Kelpe
>Priority: Major
>
> S3 allows tagging data to better organize ones data 
> ([https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html)] 
> We use this for efficient downstream processes/inventory management.
> Currently arrow/pyarrow does not allow tags to be added on write. This is 
> causing us to scan the bucket and re-apply the tags after a pyrrow based 
> process has run.
> I looked through the code and think that it could potentially be done via the 
> metadata mechanism.
> The tags need to be added to the CreateMultipartUploadRequest here: 
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/s3fs.cc#L1156
> See also
> http://sdk.amazonaws.com/cpp/api/LATEST/class_aws_1_1_s3_1_1_model_1_1_create_multipart_upload_request.html#af791f34a65dc69bd681d6995313be2da



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARROW-16746) [C++][Python] S3 tag support on write

2022-06-10 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552857#comment-17552857
 ] 

Antoine Pitrou commented on ARROW-16746:


[~ste...@apache.org] Thanks for the information. What are "user attributes" in 
this context? Are you talking about "User-defined object metadata" as defined 
in [https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html] ?

> [C++][Python] S3 tag support on write
> -
>
> Key: ARROW-16746
> URL: https://issues.apache.org/jira/browse/ARROW-16746
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: André Kelpe
>Priority: Major
>
> S3 allows tagging data to better organize ones data 
> ([https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html)] 
> We use this for efficient downstream processes/inventory management.
> Currently arrow/pyarrow does not allow tags to be added on write. This is 
> causing us to scan the bucket and re-apply the tags after a pyrrow based 
> process has run.
> I looked through the code and think that it could potentially be done via the 
> metadata mechanism.
> The tags need to be added to the CreateMultipartUploadRequest here: 
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/s3fs.cc#L1156
> See also
> http://sdk.amazonaws.com/cpp/api/LATEST/class_aws_1_1_s3_1_1_model_1_1_create_multipart_upload_request.html#af791f34a65dc69bd681d6995313be2da



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARROW-16746) [C++][Python] S3 tag support on write

2022-06-10 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552769#comment-17552769
 ] 

Steve Loughran commented on ARROW-16746:


hadoop s3a maps user attributes to the filesystem XAttr APIs, very soon to let 
you also set them when you create a file.

> [C++][Python] S3 tag support on write
> -
>
> Key: ARROW-16746
> URL: https://issues.apache.org/jira/browse/ARROW-16746
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: André Kelpe
>Priority: Major
>
> S3 allows tagging data to better organize ones data 
> ([https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html)] 
> We use this for efficient downstream processes/inventory management.
> Currently arrow/pyarrow does not allow tags to be added on write. This is 
> causing us to scan the bucket and re-apply the tags after a pyrrow based 
> process has run.
> I looked through the code and think that it could potentially be done via the 
> metadata mechanism.
> The tags need to be added to the CreateMultipartUploadRequest here: 
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/s3fs.cc#L1156
> See also
> http://sdk.amazonaws.com/cpp/api/LATEST/class_aws_1_1_s3_1_1_model_1_1_create_multipart_upload_request.html#af791f34a65dc69bd681d6995313be2da



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARROW-16746) [C++][Python] S3 tag support on write

2022-06-03 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17545816#comment-17545816
 ] 

Antoine Pitrou commented on ARROW-16746:


We should try to do this in a way that's generic enough and can be implemented 
in other filesystem types.

For example I see that GCS supports custom metadata:
https://cloud.google.com/storage/docs/metadata#custom-metadata

Some local filesystems support extended attributes:
https://en.wikipedia.org/wiki/Extended_file_attributes

cc [~coryan]

> [C++][Python] S3 tag support on write
> -
>
> Key: ARROW-16746
> URL: https://issues.apache.org/jira/browse/ARROW-16746
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: André Kelpe
>Priority: Major
>
> S3 allows tagging data to better organize ones data 
> ([https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html)] 
> We use this for efficient downstream processes/inventory management.
> Currently arrow/pyarrow does not allow tags to be added on write. This is 
> causing us to scan the bucket and re-apply the tags after a pyrrow based 
> process has run.
> I looked through the code and think that it could potentially be done via the 
> metadata mechanism.
> The tags need to be added to the CreateMultipartUploadRequest here: 
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/s3fs.cc#L1156
> See also
> http://sdk.amazonaws.com/cpp/api/LATEST/class_aws_1_1_s3_1_1_model_1_1_create_multipart_upload_request.html#af791f34a65dc69bd681d6995313be2da



--
This message was sent by Atlassian Jira
(v8.20.7#820007)