[jira] [Commented] (FLINK-8439) Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop

2018-07-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559452#comment-16559452
 ] 

ASF GitHub Bot commented on FLINK-8439:
---

aljoscha commented on issue #6405: [FLINK-8439] Add Flink shading to AWS 
credential provider s3 hadoop c…
URL: https://github.com/apache/flink/pull/6405#issuecomment-408360780
 
 
   Thanks! This looks good now. I'll merge once Travis is green.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop
> 
>
> Key: FLINK-8439
> URL: https://issues.apache.org/jira/browse/FLINK-8439
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Dyana Rose
>Assignee: Andrey Zagrebin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.4.3, 1.5.3
>
>
> This came up when using the s3 for the file system backend and running under 
> ECS.
> With no credentials in the container, hadoop-aws will default to EC2 instance 
> level credentials when accessing S3. However when running under ECS, you will 
> generally want to default to the task definition's IAM role.
> In this case you need to set the hadoop property
> {code:java}
> fs.s3a.aws.credentials.provider{code}
> to a fully qualified class name(s). see [hadoop-aws 
> docs|https://github.com/apache/hadoop/blob/1ba491ff907fc5d2618add980734a3534e2be098/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md]
> This works as expected when you add this setting to flink-conf.yaml but there 
> is a further 'gotcha.'  Because the AWS sdk is shaded, the actual full class 
> name for, in this case, the ContainerCredentialsProvider is
> {code:java}
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
>  
> meaning the full setting is:
> {code:java}
> fs.s3a.aws.credentials.provider: 
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
> If you instead set it to the unshaded class name you will see a very 
> confusing error stating that the ContainerCredentialsProvider doesn't 
> implement AWSCredentialsProvider (which it most certainly does.)
> Adding this information (how to specify alternate Credential Providers, and 
> the name space gotcha) to the [AWS deployment 
> docs|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/deployment/aws.html]
>  would be useful to anyone else using S3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8439) Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop

2018-07-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559449#comment-16559449
 ] 

ASF GitHub Bot commented on FLINK-8439:
---

azagrebin commented on a change in pull request #6405: [FLINK-8439] Add Flink 
shading to AWS credential provider s3 hadoop c…
URL: https://github.com/apache/flink/pull/6405#discussion_r205712754
 
 

 ##
 File path: 
flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/AbstractS3FileSystemFactory.java
 ##
 @@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.fs.hdfs;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.FileSystem;
+import org.apache.flink.core.fs.FileSystemFactory;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.net.URI;
+
+/** Base class for S3 file system factories. */
+public abstract class AbstractS3FileSystemFactory implements FileSystemFactory 
{
 
 Review comment:
   I also thought about it, just the s3 classes still have something in common 
and no other common package, but probably too little in common


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop
> 
>
> Key: FLINK-8439
> URL: https://issues.apache.org/jira/browse/FLINK-8439
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Dyana Rose
>Assignee: Andrey Zagrebin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.4.3, 1.5.3
>
>
> This came up when using the s3 for the file system backend and running under 
> ECS.
> With no credentials in the container, hadoop-aws will default to EC2 instance 
> level credentials when accessing S3. However when running under ECS, you will 
> generally want to default to the task definition's IAM role.
> In this case you need to set the hadoop property
> {code:java}
> fs.s3a.aws.credentials.provider{code}
> to a fully qualified class name(s). see [hadoop-aws 
> docs|https://github.com/apache/hadoop/blob/1ba491ff907fc5d2618add980734a3534e2be098/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md]
> This works as expected when you add this setting to flink-conf.yaml but there 
> is a further 'gotcha.'  Because the AWS sdk is shaded, the actual full class 
> name for, in this case, the ContainerCredentialsProvider is
> {code:java}
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
>  
> meaning the full setting is:
> {code:java}
> fs.s3a.aws.credentials.provider: 
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
> If you instead set it to the unshaded class name you will see a very 
> confusing error stating that the ContainerCredentialsProvider doesn't 
> implement AWSCredentialsProvider (which it most certainly does.)
> Adding this information (how to specify alternate Credential Providers, and 
> the name space gotcha) to the [AWS deployment 
> docs|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/deployment/aws.html]
>  would be useful to anyone else using S3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8439) Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop

2018-07-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559446#comment-16559446
 ] 

ASF GitHub Bot commented on FLINK-8439:
---

azagrebin commented on issue #6405: [FLINK-8439] Add Flink shading to AWS 
credential provider s3 hadoop c…
URL: https://github.com/apache/flink/pull/6405#issuecomment-408360092
 
 
   Thanks for the review @aljoscha, I removed s3 specifics from the base class


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop
> 
>
> Key: FLINK-8439
> URL: https://issues.apache.org/jira/browse/FLINK-8439
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Dyana Rose
>Assignee: Andrey Zagrebin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.4.3, 1.5.3
>
>
> This came up when using the s3 for the file system backend and running under 
> ECS.
> With no credentials in the container, hadoop-aws will default to EC2 instance 
> level credentials when accessing S3. However when running under ECS, you will 
> generally want to default to the task definition's IAM role.
> In this case you need to set the hadoop property
> {code:java}
> fs.s3a.aws.credentials.provider{code}
> to a fully qualified class name(s). see [hadoop-aws 
> docs|https://github.com/apache/hadoop/blob/1ba491ff907fc5d2618add980734a3534e2be098/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md]
> This works as expected when you add this setting to flink-conf.yaml but there 
> is a further 'gotcha.'  Because the AWS sdk is shaded, the actual full class 
> name for, in this case, the ContainerCredentialsProvider is
> {code:java}
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
>  
> meaning the full setting is:
> {code:java}
> fs.s3a.aws.credentials.provider: 
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
> If you instead set it to the unshaded class name you will see a very 
> confusing error stating that the ContainerCredentialsProvider doesn't 
> implement AWSCredentialsProvider (which it most certainly does.)
> Adding this information (how to specify alternate Credential Providers, and 
> the name space gotcha) to the [AWS deployment 
> docs|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/deployment/aws.html]
>  would be useful to anyone else using S3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8439) Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop

2018-07-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559442#comment-16559442
 ] 

ASF GitHub Bot commented on FLINK-8439:
---

azagrebin commented on a change in pull request #6405: [FLINK-8439] Add Flink 
shading to AWS credential provider s3 hadoop c…
URL: https://github.com/apache/flink/pull/6405#discussion_r205711773
 
 

 ##
 File path: 
flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopConfigLoader.java
 ##
 @@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.fs.hdfs;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.runtime.util.HadoopUtils;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nonnull;
+
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Set;
+
+/** This class lazily loads hadoop configuration from resettable Flink's 
configuration. */
+public class HadoopConfigLoader {
+   private static final Logger LOG = 
LoggerFactory.getLogger(HadoopConfigLoader.class);
+
+   private static final Set PACKAGE_PREFIXES_TO_SHADE =
+   new HashSet<>(Collections.singletonList("com.amazonaws."));
+
+   /** The prefixes that Flink adds to the Hadoop fs config. */
+   private final String[] flinkConfigPrefixes;
+
+   /** Keys that are replaced (after prefix replacement, to give a more 
uniform experience
+* across different file system implementations. */
+   private final String[][] mirroredConfigKeys;
+
+   /** Hadoop config prefix to replace Flink prefix. */
+   private final String hadoopConfigPrefix;
+
+   private final Set configKeysToShade;
+   private final String flinkShadingPrefix;
+
+   /** Flink's configuration object. */
+   private Configuration flinkConfig;
+
+   /** Hadoop's configuration for the file systems, lazily initialized. */
+   private org.apache.hadoop.conf.Configuration hadoopConfig;
+
+   public HadoopConfigLoader(
+   @Nonnull String[] flinkConfigPrefixes,
+   @Nonnull String[][] mirroredConfigKeys,
+   @Nonnull String hadoopConfigPrefix,
+   @Nonnull Set configKeysToShade,
+   @Nonnull String flinkShadingPrefix) {
+   this.flinkConfigPrefixes = flinkConfigPrefixes;
+   this.mirroredConfigKeys = mirroredConfigKeys;
+   this.hadoopConfigPrefix = hadoopConfigPrefix;
+   this.configKeysToShade = configKeysToShade;
+   this.flinkShadingPrefix = flinkShadingPrefix;
+   }
+
+   public void setFlinkConfig(Configuration config) {
+   flinkConfig = config;
+   hadoopConfig = null;
+   }
+
+   /** get the loaded Hadoop config (or fall back to one loaded from the 
classpath). */
+   public org.apache.hadoop.conf.Configuration getOrLoadHadoopConfig() {
+   org.apache.hadoop.conf.Configuration hadoopConfig = 
this.hadoopConfig;
+   if (hadoopConfig == null) {
+   if (flinkConfig != null) {
+   hadoopConfig = 
mirrorCertianHadoopConfig(loadHadoopConfigFromFlink());
+   }
+   else {
+   LOG.warn("The factory has not been configured 
prior to loading the S3 file system."
+   + " Using Hadoop configuration from the 
classpath.");
+   hadoopConfig = new 
org.apache.hadoop.conf.Configuration();
+   }
+   }
+   this.hadoopConfig = hadoopConfig;
+   return hadoopConfig;
+   }
+
+   // add additional config entries from the Flink config to the Hadoop 
config
+   private org.apache.hadoop.conf.Configuration 
loadHadoopConfigFromFlink() {
+   org.apache.hadoop.conf.Configuration hadoopConfig = 
HadoopUtils.getHadoopConfiguration(flinkConfig);
+   for (String key : flinkConfig.keySet()) {
+  

[jira] [Commented] (FLINK-8439) Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop

2018-07-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558271#comment-16558271
 ] 

ASF GitHub Bot commented on FLINK-8439:
---

aljoscha commented on a change in pull request #6405: [FLINK-8439] Add Flink 
shading to AWS credential provider s3 hadoop c…
URL: https://github.com/apache/flink/pull/6405#discussion_r205442551
 
 

 ##
 File path: 
flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopConfigLoader.java
 ##
 @@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.fs.hdfs;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.runtime.util.HadoopUtils;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nonnull;
+
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Set;
+
+/** This class lazily loads hadoop configuration from resettable Flink's 
configuration. */
+public class HadoopConfigLoader {
+   private static final Logger LOG = 
LoggerFactory.getLogger(HadoopConfigLoader.class);
+
+   private static final Set PACKAGE_PREFIXES_TO_SHADE =
+   new HashSet<>(Collections.singletonList("com.amazonaws."));
+
+   /** The prefixes that Flink adds to the Hadoop fs config. */
+   private final String[] flinkConfigPrefixes;
+
+   /** Keys that are replaced (after prefix replacement, to give a more 
uniform experience
+* across different file system implementations. */
+   private final String[][] mirroredConfigKeys;
+
+   /** Hadoop config prefix to replace Flink prefix. */
+   private final String hadoopConfigPrefix;
+
+   private final Set configKeysToShade;
+   private final String flinkShadingPrefix;
+
+   /** Flink's configuration object. */
+   private Configuration flinkConfig;
+
+   /** Hadoop's configuration for the file systems, lazily initialized. */
+   private org.apache.hadoop.conf.Configuration hadoopConfig;
+
+   public HadoopConfigLoader(
+   @Nonnull String[] flinkConfigPrefixes,
+   @Nonnull String[][] mirroredConfigKeys,
+   @Nonnull String hadoopConfigPrefix,
+   @Nonnull Set configKeysToShade,
+   @Nonnull String flinkShadingPrefix) {
+   this.flinkConfigPrefixes = flinkConfigPrefixes;
+   this.mirroredConfigKeys = mirroredConfigKeys;
+   this.hadoopConfigPrefix = hadoopConfigPrefix;
+   this.configKeysToShade = configKeysToShade;
+   this.flinkShadingPrefix = flinkShadingPrefix;
+   }
+
+   public void setFlinkConfig(Configuration config) {
+   flinkConfig = config;
+   hadoopConfig = null;
+   }
+
+   /** get the loaded Hadoop config (or fall back to one loaded from the 
classpath). */
+   public org.apache.hadoop.conf.Configuration getOrLoadHadoopConfig() {
+   org.apache.hadoop.conf.Configuration hadoopConfig = 
this.hadoopConfig;
+   if (hadoopConfig == null) {
+   if (flinkConfig != null) {
+   hadoopConfig = 
mirrorCertianHadoopConfig(loadHadoopConfigFromFlink());
+   }
+   else {
+   LOG.warn("The factory has not been configured 
prior to loading the S3 file system."
+   + " Using Hadoop configuration from the 
classpath.");
+   hadoopConfig = new 
org.apache.hadoop.conf.Configuration();
+   }
+   }
+   this.hadoopConfig = hadoopConfig;
+   return hadoopConfig;
+   }
+
+   // add additional config entries from the Flink config to the Hadoop 
config
+   private org.apache.hadoop.conf.Configuration 
loadHadoopConfigFromFlink() {
+   org.apache.hadoop.conf.Configuration hadoopConfig = 
HadoopUtils.getHadoopConfiguration(flinkConfig);
+   for (String key : flinkConfig.keySet()) {
+   

[jira] [Commented] (FLINK-8439) Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop

2018-07-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558267#comment-16558267
 ] 

ASF GitHub Bot commented on FLINK-8439:
---

aljoscha commented on a change in pull request #6405: [FLINK-8439] Add Flink 
shading to AWS credential provider s3 hadoop c…
URL: https://github.com/apache/flink/pull/6405#discussion_r205441433
 
 

 ##
 File path: 
flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopConfigLoader.java
 ##
 @@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.fs.hdfs;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.runtime.util.HadoopUtils;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nonnull;
+
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Set;
+
+/** This class lazily loads hadoop configuration from resettable Flink's 
configuration. */
+public class HadoopConfigLoader {
+   private static final Logger LOG = 
LoggerFactory.getLogger(HadoopConfigLoader.class);
+
+   private static final Set PACKAGE_PREFIXES_TO_SHADE =
 
 Review comment:
   Same as my other comment, I would like this to not be S3 specific. By making 
this another constructor parameter we could do that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop
> 
>
> Key: FLINK-8439
> URL: https://issues.apache.org/jira/browse/FLINK-8439
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Dyana Rose
>Assignee: Andrey Zagrebin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.4.3, 1.5.3
>
>
> This came up when using the s3 for the file system backend and running under 
> ECS.
> With no credentials in the container, hadoop-aws will default to EC2 instance 
> level credentials when accessing S3. However when running under ECS, you will 
> generally want to default to the task definition's IAM role.
> In this case you need to set the hadoop property
> {code:java}
> fs.s3a.aws.credentials.provider{code}
> to a fully qualified class name(s). see [hadoop-aws 
> docs|https://github.com/apache/hadoop/blob/1ba491ff907fc5d2618add980734a3534e2be098/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md]
> This works as expected when you add this setting to flink-conf.yaml but there 
> is a further 'gotcha.'  Because the AWS sdk is shaded, the actual full class 
> name for, in this case, the ContainerCredentialsProvider is
> {code:java}
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
>  
> meaning the full setting is:
> {code:java}
> fs.s3a.aws.credentials.provider: 
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
> If you instead set it to the unshaded class name you will see a very 
> confusing error stating that the ContainerCredentialsProvider doesn't 
> implement AWSCredentialsProvider (which it most certainly does.)
> Adding this information (how to specify alternate Credential Providers, and 
> the name space gotcha) to the [AWS deployment 
> docs|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/deployment/aws.html]
>  would be useful to anyone else using S3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8439) Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop

2018-07-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558268#comment-16558268
 ] 

ASF GitHub Bot commented on FLINK-8439:
---

aljoscha commented on a change in pull request #6405: [FLINK-8439] Add Flink 
shading to AWS credential provider s3 hadoop c…
URL: https://github.com/apache/flink/pull/6405#discussion_r205441586
 
 

 ##
 File path: 
flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopConfigLoader.java
 ##
 @@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.fs.hdfs;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.runtime.util.HadoopUtils;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nonnull;
+
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Set;
+
+/** This class lazily loads hadoop configuration from resettable Flink's 
configuration. */
+public class HadoopConfigLoader {
+   private static final Logger LOG = 
LoggerFactory.getLogger(HadoopConfigLoader.class);
+
+   private static final Set PACKAGE_PREFIXES_TO_SHADE =
+   new HashSet<>(Collections.singletonList("com.amazonaws."));
+
+   /** The prefixes that Flink adds to the Hadoop fs config. */
+   private final String[] flinkConfigPrefixes;
+
+   /** Keys that are replaced (after prefix replacement, to give a more 
uniform experience
+* across different file system implementations. */
+   private final String[][] mirroredConfigKeys;
+
+   /** Hadoop config prefix to replace Flink prefix. */
+   private final String hadoopConfigPrefix;
+
+   private final Set configKeysToShade;
+   private final String flinkShadingPrefix;
+
+   /** Flink's configuration object. */
+   private Configuration flinkConfig;
+
+   /** Hadoop's configuration for the file systems, lazily initialized. */
+   private org.apache.hadoop.conf.Configuration hadoopConfig;
+
+   public HadoopConfigLoader(
+   @Nonnull String[] flinkConfigPrefixes,
+   @Nonnull String[][] mirroredConfigKeys,
+   @Nonnull String hadoopConfigPrefix,
+   @Nonnull Set configKeysToShade,
+   @Nonnull String flinkShadingPrefix) {
+   this.flinkConfigPrefixes = flinkConfigPrefixes;
+   this.mirroredConfigKeys = mirroredConfigKeys;
+   this.hadoopConfigPrefix = hadoopConfigPrefix;
+   this.configKeysToShade = configKeysToShade;
+   this.flinkShadingPrefix = flinkShadingPrefix;
+   }
+
+   public void setFlinkConfig(Configuration config) {
+   flinkConfig = config;
+   hadoopConfig = null;
+   }
+
+   /** get the loaded Hadoop config (or fall back to one loaded from the 
classpath). */
+   public org.apache.hadoop.conf.Configuration getOrLoadHadoopConfig() {
+   org.apache.hadoop.conf.Configuration hadoopConfig = 
this.hadoopConfig;
+   if (hadoopConfig == null) {
+   if (flinkConfig != null) {
+   hadoopConfig = 
mirrorCertianHadoopConfig(loadHadoopConfigFromFlink());
+   }
+   else {
+   LOG.warn("The factory has not been configured 
prior to loading the S3 file system."
+   + " Using Hadoop configuration from the 
classpath.");
+   hadoopConfig = new 
org.apache.hadoop.conf.Configuration();
+   }
+   }
+   this.hadoopConfig = hadoopConfig;
+   return hadoopConfig;
+   }
+
+   // add additional config entries from the Flink config to the Hadoop 
config
+   private org.apache.hadoop.conf.Configuration 
loadHadoopConfigFromFlink() {
+   org.apache.hadoop.conf.Configuration hadoopConfig = 
HadoopUtils.getHadoopConfiguration(flinkConfig);
+   for (String key : flinkConfig.keySet()) {
+   

[jira] [Commented] (FLINK-8439) Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop

2018-07-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558266#comment-16558266
 ] 

ASF GitHub Bot commented on FLINK-8439:
---

aljoscha commented on a change in pull request #6405: [FLINK-8439] Add Flink 
shading to AWS credential provider s3 hadoop c…
URL: https://github.com/apache/flink/pull/6405#discussion_r205441268
 
 

 ##
 File path: 
flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/AbstractS3FileSystemFactory.java
 ##
 @@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.fs.hdfs;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.FileSystem;
+import org.apache.flink.core.fs.FileSystemFactory;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.net.URI;
+
+/** Base class for S3 file system factories. */
+public abstract class AbstractS3FileSystemFactory implements FileSystemFactory 
{
 
 Review comment:
   I don't like putting S3 specifics into the generic Hadoop FS package. We 
could call this one `AbstractHadoopFileSystemFactory`, leave out the 
`getScheme()` implementation, and drop mentions of S3  to make it properly 
independent of S3.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop
> 
>
> Key: FLINK-8439
> URL: https://issues.apache.org/jira/browse/FLINK-8439
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Dyana Rose
>Assignee: Andrey Zagrebin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.4.3, 1.5.3
>
>
> This came up when using the s3 for the file system backend and running under 
> ECS.
> With no credentials in the container, hadoop-aws will default to EC2 instance 
> level credentials when accessing S3. However when running under ECS, you will 
> generally want to default to the task definition's IAM role.
> In this case you need to set the hadoop property
> {code:java}
> fs.s3a.aws.credentials.provider{code}
> to a fully qualified class name(s). see [hadoop-aws 
> docs|https://github.com/apache/hadoop/blob/1ba491ff907fc5d2618add980734a3534e2be098/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md]
> This works as expected when you add this setting to flink-conf.yaml but there 
> is a further 'gotcha.'  Because the AWS sdk is shaded, the actual full class 
> name for, in this case, the ContainerCredentialsProvider is
> {code:java}
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
>  
> meaning the full setting is:
> {code:java}
> fs.s3a.aws.credentials.provider: 
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
> If you instead set it to the unshaded class name you will see a very 
> confusing error stating that the ContainerCredentialsProvider doesn't 
> implement AWSCredentialsProvider (which it most certainly does.)
> Adding this information (how to specify alternate Credential Providers, and 
> the name space gotcha) to the [AWS deployment 
> docs|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/deployment/aws.html]
>  would be useful to anyone else using S3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8439) Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop

2018-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554260#comment-16554260
 ] 

ASF GitHub Bot commented on FLINK-8439:
---

GitHub user azagrebin opened a pull request:

https://github.com/apache/flink/pull/6405

[FLINK-8439] Add Flink shading to AWS credential provider s3 hadoop c…

## What is the purpose of the change

This PR refactors S3 Hadoop and Presto file systems and adds Flink shading 
to AWS credential provider config.

## Brief change log

  - extract `AbstractS3FileSystemFactory` base class from `s3hadoop.` and 
`s3presto.S3FileSystemFactory`s
  - extract hadoop configuration logic into `HadoopConfigLoader` with Flink 
shading of certain Hadoop configs
  -  add Flink shading to AWS credential provider config of 
`S3FileSystemFactory`s

## Verifying this change

run unit tests

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (no)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
  - The serializers: (no)
  - The runtime per-record code paths (performance sensitive): (no)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  - The S3 file system connector: (yes)

## Documentation

  - Does this pull request introduce a new feature? (no)
  - If yes, how is the feature documented? (not applicable)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/azagrebin/flink FLINK-8439

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/6405.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6405


commit 44c6eafb6b0757deb89f4e4a7e9bb237f7336428
Author: Andrey Zagrebin 
Date:   2018-07-23T16:10:55Z

[FLINK-8439] Add Flink shading to AWS credential provider s3 hadoop config




> Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop
> 
>
> Key: FLINK-8439
> URL: https://issues.apache.org/jira/browse/FLINK-8439
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Dyana Rose
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.4.3, 1.5.3
>
>
> This came up when using the s3 for the file system backend and running under 
> ECS.
> With no credentials in the container, hadoop-aws will default to EC2 instance 
> level credentials when accessing S3. However when running under ECS, you will 
> generally want to default to the task definition's IAM role.
> In this case you need to set the hadoop property
> {code:java}
> fs.s3a.aws.credentials.provider{code}
> to a fully qualified class name(s). see [hadoop-aws 
> docs|https://github.com/apache/hadoop/blob/1ba491ff907fc5d2618add980734a3534e2be098/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md]
> This works as expected when you add this setting to flink-conf.yaml but there 
> is a further 'gotcha.'  Because the AWS sdk is shaded, the actual full class 
> name for, in this case, the ContainerCredentialsProvider is
> {code:java}
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
>  
> meaning the full setting is:
> {code:java}
> fs.s3a.aws.credentials.provider: 
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
> If you instead set it to the unshaded class name you will see a very 
> confusing error stating that the ContainerCredentialsProvider doesn't 
> implement AWSCredentialsProvider (which it most certainly does.)
> Adding this information (how to specify alternate Credential Providers, and 
> the name space gotcha) to the [AWS deployment 
> docs|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/deployment/aws.html]
>  would be useful to anyone else using S3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8439) Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop

2018-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554261#comment-16554261
 ] 

ASF GitHub Bot commented on FLINK-8439:
---

Github user azagrebin commented on the issue:

https://github.com/apache/flink/pull/6405
  
cc @GJL @StephanEwen


> Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop
> 
>
> Key: FLINK-8439
> URL: https://issues.apache.org/jira/browse/FLINK-8439
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Dyana Rose
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.4.3, 1.5.3
>
>
> This came up when using the s3 for the file system backend and running under 
> ECS.
> With no credentials in the container, hadoop-aws will default to EC2 instance 
> level credentials when accessing S3. However when running under ECS, you will 
> generally want to default to the task definition's IAM role.
> In this case you need to set the hadoop property
> {code:java}
> fs.s3a.aws.credentials.provider{code}
> to a fully qualified class name(s). see [hadoop-aws 
> docs|https://github.com/apache/hadoop/blob/1ba491ff907fc5d2618add980734a3534e2be098/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md]
> This works as expected when you add this setting to flink-conf.yaml but there 
> is a further 'gotcha.'  Because the AWS sdk is shaded, the actual full class 
> name for, in this case, the ContainerCredentialsProvider is
> {code:java}
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
>  
> meaning the full setting is:
> {code:java}
> fs.s3a.aws.credentials.provider: 
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
> If you instead set it to the unshaded class name you will see a very 
> confusing error stating that the ContainerCredentialsProvider doesn't 
> implement AWSCredentialsProvider (which it most certainly does.)
> Adding this information (how to specify alternate Credential Providers, and 
> the name space gotcha) to the [AWS deployment 
> docs|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/deployment/aws.html]
>  would be useful to anyone else using S3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8439) Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop

2018-03-29 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419050#comment-16419050
 ] 

Till Rohrmann commented on FLINK-8439:
--

Unblocking 1.5.0 from this issue since it is a documentation task. However, we 
should update the documentation for 1.5.0 accordingly.

> Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop
> 
>
> Key: FLINK-8439
> URL: https://issues.apache.org/jira/browse/FLINK-8439
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Dyana Rose
>Priority: Blocker
> Fix For: 1.5.0, 1.4.3
>
>
> This came up when using the s3 for the file system backend and running under 
> ECS.
> With no credentials in the container, hadoop-aws will default to EC2 instance 
> level credentials when accessing S3. However when running under ECS, you will 
> generally want to default to the task definition's IAM role.
> In this case you need to set the hadoop property
> {code:java}
> fs.s3a.aws.credentials.provider{code}
> to a fully qualified class name(s). see [hadoop-aws 
> docs|https://github.com/apache/hadoop/blob/1ba491ff907fc5d2618add980734a3534e2be098/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md]
> This works as expected when you add this setting to flink-conf.yaml but there 
> is a further 'gotcha.'  Because the AWS sdk is shaded, the actual full class 
> name for, in this case, the ContainerCredentialsProvider is
> {code:java}
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
>  
> meaning the full setting is:
> {code:java}
> fs.s3a.aws.credentials.provider: 
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
> If you instead set it to the unshaded class name you will see a very 
> confusing error stating that the ContainerCredentialsProvider doesn't 
> implement AWSCredentialsProvider (which it most certainly does.)
> Adding this information (how to specify alternate Credential Providers, and 
> the name space gotcha) to the [AWS deployment 
> docs|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/deployment/aws.html]
>  would be useful to anyone else using S3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8439) Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop

2018-01-25 Thread Dyana Rose (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16339188#comment-16339188
 ] 

Dyana Rose commented on FLINK-8439:
---

I should add that I believe the issue with permissions also affects the presto 
fs connector


The setting for presto looks like it is:
{code:java}
presto.s3.credentials-provider{code}
 
[https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/s3/S3ConfigurationUpdater.java#L22]
 

> Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop
> 
>
> Key: FLINK-8439
> URL: https://issues.apache.org/jira/browse/FLINK-8439
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Dyana Rose
>Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> This came up when using the s3 for the file system backend and running under 
> ECS.
> With no credentials in the container, hadoop-aws will default to EC2 instance 
> level credentials when accessing S3. However when running under ECS, you will 
> generally want to default to the task definition's IAM role.
> In this case you need to set the hadoop property
> {code:java}
> fs.s3a.aws.credentials.provider{code}
> to a fully qualified class name(s). see [hadoop-aws 
> docs|https://github.com/apache/hadoop/blob/1ba491ff907fc5d2618add980734a3534e2be098/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md]
> This works as expected when you add this setting to flink-conf.yaml but there 
> is a further 'gotcha.'  Because the AWS sdk is shaded, the actual full class 
> name for, in this case, the ContainerCredentialsProvider is
> {code:java}
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
>  
> meaning the full setting is:
> {code:java}
> fs.s3a.aws.credentials.provider: 
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
> If you instead set it to the unshaded class name you will see a very 
> confusing error stating that the ContainerCredentialsProvider doesn't 
> implement AWSCredentialsProvider (which it most certainly does.)
> Adding this information (how to specify alternate Credential Providers, and 
> the name space gotcha) to the [AWS deployment 
> docs|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/deployment/aws.html]
>  would be useful to anyone else using S3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8439) Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop

2018-01-24 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337897#comment-16337897
 ] 

Stephan Ewen commented on FLINK-8439:
-

Big +1!

> Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop
> 
>
> Key: FLINK-8439
> URL: https://issues.apache.org/jira/browse/FLINK-8439
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Dyana Rose
>Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> This came up when using the s3 for the file system backend and running under 
> ECS.
> With no credentials in the container, hadoop-aws will default to EC2 instance 
> level credentials when accessing S3. However when running under ECS, you will 
> generally want to default to the task definition's IAM role.
> In this case you need to set the hadoop property
> {code:java}
> fs.s3a.aws.credentials.provider{code}
> to a fully qualified class name(s). see [hadoop-aws 
> docs|https://github.com/apache/hadoop/blob/1ba491ff907fc5d2618add980734a3534e2be098/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md]
> This works as expected when you add this setting to flink-conf.yaml but there 
> is a further 'gotcha.'  Because the AWS sdk is shaded, the actual full class 
> name for, in this case, the ContainerCredentialsProvider is
> {code:java}
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
>  
> meaning the full setting is:
> {code:java}
> fs.s3a.aws.credentials.provider: 
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
> If you instead set it to the unshaded class name you will see a very 
> confusing error stating that the ContainerCredentialsProvider doesn't 
> implement AWSCredentialsProvider (which it most certainly does.)
> Adding this information (how to specify alternate Credential Providers, and 
> the name space gotcha) to the [AWS deployment 
> docs|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/deployment/aws.html]
>  would be useful to anyone else using S3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8439) Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop

2018-01-24 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337845#comment-16337845
 ] 

Aljoscha Krettek commented on FLINK-8439:
-

We could also add code that automatically remaps config values to the shaded 
package if it detects the s3 path in there.

> Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop
> 
>
> Key: FLINK-8439
> URL: https://issues.apache.org/jira/browse/FLINK-8439
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Dyana Rose
>Priority: Blocker
> Fix For: 1.5.0
>
>
> This came up when using the s3 for the file system backend and running under 
> ECS.
> With no credentials in the container, hadoop-aws will default to EC2 instance 
> level credentials when accessing S3. However when running under ECS, you will 
> generally want to default to the task definition's IAM role.
> In this case you need to set the hadoop property
> {code:java}
> fs.s3a.aws.credentials.provider{code}
> to a fully qualified class name(s). see [hadoop-aws 
> docs|https://github.com/apache/hadoop/blob/1ba491ff907fc5d2618add980734a3534e2be098/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md]
> This works as expected when you add this setting to flink-conf.yaml but there 
> is a further 'gotcha.'  Because the AWS sdk is shaded, the actual full class 
> name for, in this case, the ContainerCredentialsProvider is
> {code:java}
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
>  
> meaning the full setting is:
> {code:java}
> fs.s3a.aws.credentials.provider: 
> org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code}
> If you instead set it to the unshaded class name you will see a very 
> confusing error stating that the ContainerCredentialsProvider doesn't 
> implement AWSCredentialsProvider (which it most certainly does.)
> Adding this information (how to specify alternate Credential Providers, and 
> the name space gotcha) to the [AWS deployment 
> docs|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/deployment/aws.html]
>  would be useful to anyone else using S3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)