[ https://issues.apache.org/jira/browse/HADOOP-18908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772268#comment-17772268 ]
ASF GitHub Bot commented on HADOOP-18908: ----------------------------------------- steveloughran commented on code in PR #6106: URL: https://github.com/apache/hadoop/pull/6106#discussion_r1347608674 ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java: ########## @@ -229,4 +254,49 @@ private static URI getS3Endpoint(String endpoint, final Configuration conf) { throw new IllegalArgumentException(e); } } + + /** + * Parses the endpoint to get the region. + * If endpoint is the central one, use US_EAST_1. + * + * @param endpoint the configure endpoint. + * @return the S3 region. + */ + private static Region getS3RegionFromEndpoint(String endpoint) { + + if(!endpoint.endsWith(CENTRAL_ENDPOINT)) { + LOG.debug("Endpoint {} is not the default; parsing", endpoint); + return AwsHostNameUtils.parseSigningRegion(endpoint, S3_SERVICE_NAME).orElse(null); Review Comment: weve had so much trouble in the past here with vpce, govcloud, cn that we need a set of unit tests to verify this code does what we actually want -and therea are no regressions ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java: ########## @@ -138,14 +157,21 @@ private <BuilderT extends S3BaseClientBuilder<BuilderT, ClientT>, ClientT> Build BuilderT builder, S3ClientCreationParameters parameters, Configuration conf, String bucket) throws IOException { - Region region = parameters.getRegion(); - LOG.debug("Using region {}", region); - URI endpoint = getS3Endpoint(parameters.getEndpoint(), conf); + Region region = null; + if (endpoint != null) { builder.endpointOverride(endpoint); - LOG.debug("Using endpoint {}", endpoint); + region = getS3RegionFromEndpoint(parameters.getEndpoint()); + LOG.debug("Using endpoint {} and region {}", endpoint, region); + } + + if (region != null) { + builder.region(region); + } else { + builder.crossRegionAccessEnabled(true); + builder.region(getS3Region(conf)); Review Comment: also, what about: env vars and sysprops? good or bad? ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java: ########## @@ -229,4 +254,49 @@ private static URI getS3Endpoint(String endpoint, final Configuration conf) { throw new IllegalArgumentException(e); } } + + /** + * Parses the endpoint to get the region. + * If endpoint is the central one, use US_EAST_1. + * + * @param endpoint the configure endpoint. + * @return the S3 region. Review Comment: javadoc to say returns null if not resolvved ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java: ########## @@ -229,4 +254,49 @@ private static URI getS3Endpoint(String endpoint, final Configuration conf) { throw new IllegalArgumentException(e); } } + + /** + * Parses the endpoint to get the region. + * If endpoint is the central one, use US_EAST_1. + * + * @param endpoint the configure endpoint. + * @return the S3 region. + */ + private static Region getS3RegionFromEndpoint(String endpoint) { + + if(!endpoint.endsWith(CENTRAL_ENDPOINT)) { + LOG.debug("Endpoint {} is not the default; parsing", endpoint); + return AwsHostNameUtils.parseSigningRegion(endpoint, S3_SERVICE_NAME).orElse(null); + } + + // endpoint is for US_EAST_1; + return Region.US_EAST_1; + } + + /** + * Gets the region from configuration and returns. + * If configured region is an empty string, use + * the default SDK resolution chain. + * + * @param conf Configuration. + * @return the S3 region + */ + private static Region getS3Region(Configuration conf) { + String region = conf.getTrimmed(AWS_REGION, AWS_S3_CENTRAL_REGION); + LOG.debug("fs.s3a.endpoint.region=\"{}\"", region); + if (!region.isEmpty()) { + // there's either an explicit region or we have fallen back + // to the central one. + LOG.debug("Setting region to {}", region); + return Region.of(region); + } else { + // no region. + // allow this if people really want it; it is OK to rely on this + // when deployed in EC2. Review Comment: if it is ok in ec2, then this will be very noisy in ec2 deployments? ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java: ########## @@ -138,14 +157,21 @@ private <BuilderT extends S3BaseClientBuilder<BuilderT, ClientT>, ClientT> Build BuilderT builder, S3ClientCreationParameters parameters, Configuration conf, String bucket) throws IOException { - Region region = parameters.getRegion(); - LOG.debug("Using region {}", region); - URI endpoint = getS3Endpoint(parameters.getEndpoint(), conf); + Region region = null; + if (endpoint != null) { builder.endpointOverride(endpoint); - LOG.debug("Using endpoint {}", endpoint); + region = getS3RegionFromEndpoint(parameters.getEndpoint()); + LOG.debug("Using endpoint {} and region {}", endpoint, region); + } + + if (region != null) { + builder.region(region); + } else { + builder.crossRegionAccessEnabled(true); + builder.region(getS3Region(conf)); Review Comment: if a region is set in the config, it must take priority over anything the AWS SDK determines from the endpoint. without that we can never correct for sdk issues ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java: ########## @@ -138,14 +157,21 @@ private <BuilderT extends S3BaseClientBuilder<BuilderT, ClientT>, ClientT> Build BuilderT builder, S3ClientCreationParameters parameters, Configuration conf, String bucket) throws IOException { - Region region = parameters.getRegion(); Review Comment: restore the ability to pass in a region > Improve s3a region handling, including determining from endpoint > ---------------------------------------------------------------- > > Key: HADOOP-18908 > URL: https://issues.apache.org/jira/browse/HADOOP-18908 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.4.0 > Reporter: Steve Loughran > Assignee: Ahmar Suhail > Priority: Major > Labels: pull-request-available > > s3a now requires the fs.s3a.endpoint.region to be set; and while it can > determine it from a network call, this takes time and doesn't work for third > party stores. > proposed > * reinstate parsing of the fs.3a.endpoint url to automatically determine > region from well known endoints (and vplink ones) > * don't try to talk to AWS S3 if endpoint isn't an aws one: for that caller > must declare (HADOOP-18673) > * document this in v2 migration, including stack traces of falures -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org