[ 
https://issues.apache.org/jira/browse/HADOOP-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496047#comment-17496047
 ] 

Ahmar Suhail commented on HADOOP-14837:
---------------------------------------

[~ste...@apache.org] I've been looking at this and had a few questions: 
 * For reporting better, do we want to add in a new statistic, something like 
`objects_in_glacier` which will have the count of objects currently in glacier?
 * In listings, we can add in a new option to filter out glacier files by doing 
something like `!summary.getStorageClass().equals("GLACIER")` in the acceptor 
[here|https://github.com/apache/hadoop/blob/365375412fe5eea82549630ee8c5598502b95caf/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Listing.java#L770]?
 After we do this and call `getContentSummary()` it won't return glacier files 
in the fileCount. 
 * To return StorageType.Archive for a file, I was looking at 
getBlockLocations, it'll currently return something like `BlockLocation( \{ 
"localhost:9866" }, \{ "localhost" }, 0, file.getLen())` , so not sure how we 
want it to behave when implemented in S3AFS? Will it be something like 
`BlockLocation( \{ filepath }, \{ StorageType.Archive.toString() }, 0, 
file.getLen())` ?
 * Do we want implement retrieval in open()? If yes, will the behaviour be:
 ** If fs.s3a.open.glacier.retrieve is enabled, check if file is in glacier, if 
yes, initiate restore
 ** If restore has not complete and .read() is called, throw "cannot read yet 
-retrieval requested"
 ** If restore has not been initiated (can happen when 
fs.s3a.open.glacier.retrieve is false) and .read() is called throw "cannot read 
data in glacier" 

> Handle S3A "glacier" data
> -------------------------
>
>                 Key: HADOOP-14837
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14837
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-beta1
>            Reporter: Steve Loughran
>            Priority: Minor
>
> SPARK-21797 covers how if you have AWS S3 set to copy some files to glacier, 
> they appear in the listing but GETs fail, and so does everything else
> We should think about how best to handle this.
> # report better
> # if listings can identify files which are glaciated then maybe we could have 
> an option to filter them out
> # test & see what happens



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to