[ https://issues.apache.org/jira/browse/HADOOP-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17807799#comment-17807799 ]
ASF GitHub Bot commented on HADOOP-14837: ----------------------------------------- ahmarsuhail commented on code in PR #6407: URL: https://github.com/apache/hadoop/pull/6407#discussion_r1455953276 ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ObjectStorageClassFilter.java: ########## @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.s3a; + +import org.apache.hadoop.thirdparty.com.google.common.collect.Sets; +import java.util.Set; +import java.util.function.Function; +import software.amazon.awssdk.services.s3.model.ObjectStorageClass; +import software.amazon.awssdk.services.s3.model.S3Object; + + +/** + * S3ObjectStorageClassFilter will filter the S3 files based on the fs.s3a.glacier.read.restored.objects configuration set in S3AFileSystem + * The config can have 3 values: + * READ_ALL: This would conform to the current default behavior of not taking into account the storage classes retrieved from S3. This will be done to keep the current behavior for the customers and not changing the experience for them. Review Comment: Instead of saying current behaviour, can you specify what that current behaviour is. It errors right? ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java: ########## @@ -1486,6 +1486,17 @@ private Constants() { */ public static final int DEFAULT_PREFETCH_MAX_BLOCKS_COUNT = 4; + /** + * Read Restored Glacier objects config. + * Value = {@value} + */ + public static final String READ_RESTORED_GLACIER_OBJECTS = "fs.s3a.glacier.read.restored.objects"; + + /** + * Default value of Read Restored Glacier objects config. + */ + public static final S3ObjectStorageClassFilter DEFAULT_READ_RESTORED_GLACIER_OBJECTS = S3ObjectStorageClassFilter.READ_ALL; Review Comment: I'd update this to just READ_ALL, and then when you do conf.get(), pass in the default there ... conf.get(READ_RESTORED_GLACIER_OBJECTS, DEFAULT_READ_RESTORED_GLACIER_OBJECTS) ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java: ########## @@ -2466,8 +2474,8 @@ public RemoteIterator<S3ALocatedFileStatus> listFilesAndDirectoryMarkers( path, true, includeSelf - ? Listing.ACCEPT_ALL_BUT_S3N - : new Listing.AcceptAllButSelfAndS3nDirs(path), + ? Lists.newArrayList(new GlacierStatusAcceptor(s3ObjectStorageClassFilter), Listing.ACCEPT_ALL_BUT_S3N) Review Comment: don't think you need an acceptor for this, so just update if clause in the Listing ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ObjectStorageClassFilter.java: ########## @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.s3a; + +import org.apache.hadoop.thirdparty.com.google.common.collect.Sets; +import java.util.Set; +import java.util.function.Function; +import software.amazon.awssdk.services.s3.model.ObjectStorageClass; +import software.amazon.awssdk.services.s3.model.S3Object; + + +/** + * S3ObjectStorageClassFilter will filter the S3 files based on the fs.s3a.glacier.read.restored.objects configuration set in S3AFileSystem + * The config can have 3 values: + * READ_ALL: This would conform to the current default behavior of not taking into account the storage classes retrieved from S3. This will be done to keep the current behavior for the customers and not changing the experience for them. + * SKIP_ALL_GLACIER: If this value is set then this will ignore any S3 Objects which are tagged with Glacier storage classes and retrieve the others. + * READ_RESTORED_GLACIER_OBJECTS: If this value is set then restored status of the Glacier object will be checked, if restored the objects would be read like normal S3 objects else they will be ignored as the objects would not have been retrieved from the S3 Glacier. + */ +public enum S3ObjectStorageClassFilter { + READ_ALL(o -> true), + SKIP_ALL_GLACIER(S3ObjectStorageClassFilter::isNotGlacierObject), + READ_RESTORED_GLACIER_OBJECTS(S3ObjectStorageClassFilter::isCompletedRestoredObject); + + private static final Set<ObjectStorageClass> GLACIER_STORAGE_CLASSES = Sets.newHashSet(ObjectStorageClass.GLACIER, ObjectStorageClass.DEEP_ARCHIVE); Review Comment: what about GLACIER_IR? ########## hadoop-common-project/hadoop-common/src/main/resources/core-default.xml: ########## @@ -2191,6 +2191,18 @@ </description> </property> +<property> Review Comment: these don't need to go here, you can remove. This will need documentation though, so add this in index.md > Handle S3A "glacier" data > ------------------------- > > Key: HADOOP-14837 > URL: https://issues.apache.org/jira/browse/HADOOP-14837 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.0.0-beta1 > Reporter: Steve Loughran > Priority: Minor > Labels: pull-request-available > > SPARK-21797 covers how if you have AWS S3 set to copy some files to glacier, > they appear in the listing but GETs fail, and so does everything else > We should think about how best to handle this. > # report better > # if listings can identify files which are glaciated then maybe we could have > an option to filter them out > # test & see what happens -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org