[ https://issues.apache.org/jira/browse/HADOOP-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808027#comment-17808027 ]
ASF GitHub Bot commented on HADOOP-14837: ----------------------------------------- bpahuja commented on code in PR #6407: URL: https://github.com/apache/hadoop/pull/6407#discussion_r1456958842 ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ObjectStorageClassFilter.java: ########## @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.s3a; + +import org.apache.hadoop.thirdparty.com.google.common.collect.Sets; +import java.util.Set; +import java.util.function.Function; +import software.amazon.awssdk.services.s3.model.ObjectStorageClass; +import software.amazon.awssdk.services.s3.model.S3Object; + + +/** + * S3ObjectStorageClassFilter will filter the S3 files based on the fs.s3a.glacier.read.restored.objects configuration set in S3AFileSystem + * The config can have 3 values: + * READ_ALL: This would conform to the current default behavior of not taking into account the storage classes retrieved from S3. This will be done to keep the current behavior for the customers and not changing the experience for them. + * SKIP_ALL_GLACIER: If this value is set then this will ignore any S3 Objects which are tagged with Glacier storage classes and retrieve the others. + * READ_RESTORED_GLACIER_OBJECTS: If this value is set then restored status of the Glacier object will be checked, if restored the objects would be read like normal S3 objects else they will be ignored as the objects would not have been retrieved from the S3 Glacier. + */ +public enum S3ObjectStorageClassFilter { + READ_ALL(o -> true), + SKIP_ALL_GLACIER(S3ObjectStorageClassFilter::isNotGlacierObject), + READ_RESTORED_GLACIER_OBJECTS(S3ObjectStorageClassFilter::isCompletedRestoredObject); + + private static final Set<ObjectStorageClass> GLACIER_STORAGE_CLASSES = Sets.newHashSet(ObjectStorageClass.GLACIER, ObjectStorageClass.DEEP_ARCHIVE); Review Comment: GLACIER_IR files, are instantly available so no failure is observed. S3A will be able to access the same. ########## hadoop-common-project/hadoop-common/src/main/resources/core-default.xml: ########## @@ -2191,6 +2191,18 @@ </description> </property> +<property> Review Comment: Sure will do > Handle S3A "glacier" data > ------------------------- > > Key: HADOOP-14837 > URL: https://issues.apache.org/jira/browse/HADOOP-14837 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.0.0-beta1 > Reporter: Steve Loughran > Priority: Minor > Labels: pull-request-available > > SPARK-21797 covers how if you have AWS S3 set to copy some files to glacier, > they appear in the listing but GETs fail, and so does everything else > We should think about how best to handle this. > # report better > # if listings can identify files which are glaciated then maybe we could have > an option to filter them out > # test & see what happens -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org