[ https://issues.apache.org/jira/browse/HADOOP-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496047#comment-17496047 ]
Ahmar Suhail commented on HADOOP-14837: --------------------------------------- [~ste...@apache.org] I've been looking at this and had a few questions: * For reporting better, do we want to add in a new statistic, something like `objects_in_glacier` which will have the count of objects currently in glacier? * In listings, we can add in a new option to filter out glacier files by doing something like `!summary.getStorageClass().equals("GLACIER")` in the acceptor [here|https://github.com/apache/hadoop/blob/365375412fe5eea82549630ee8c5598502b95caf/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Listing.java#L770]? After we do this and call `getContentSummary()` it won't return glacier files in the fileCount. * To return StorageType.Archive for a file, I was looking at getBlockLocations, it'll currently return something like `BlockLocation( \{ "localhost:9866" }, \{ "localhost" }, 0, file.getLen())` , so not sure how we want it to behave when implemented in S3AFS? Will it be something like `BlockLocation( \{ filepath }, \{ StorageType.Archive.toString() }, 0, file.getLen())` ? * Do we want implement retrieval in open()? If yes, will the behaviour be: ** If fs.s3a.open.glacier.retrieve is enabled, check if file is in glacier, if yes, initiate restore ** If restore has not complete and .read() is called, throw "cannot read yet -retrieval requested" ** If restore has not been initiated (can happen when fs.s3a.open.glacier.retrieve is false) and .read() is called throw "cannot read data in glacier" > Handle S3A "glacier" data > ------------------------- > > Key: HADOOP-14837 > URL: https://issues.apache.org/jira/browse/HADOOP-14837 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.0.0-beta1 > Reporter: Steve Loughran > Priority: Minor > > SPARK-21797 covers how if you have AWS S3 set to copy some files to glacier, > they appear in the listing but GETs fail, and so does everything else > We should think about how best to handle this. > # report better > # if listings can identify files which are glaciated then maybe we could have > an option to filter them out > # test & see what happens -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org