[ https://issues.apache.org/jira/browse/SPARK-17869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robin B closed SPARK-17869. --------------------------- Resolution: Won't Fix You are right [~srowen] > Connect to Amazon S3 using signature version 4 (only choice in Frankfurt) > ------------------------------------------------------------------------- > > Key: SPARK-17869 > URL: https://issues.apache.org/jira/browse/SPARK-17869 > Project: Spark > Issue Type: Improvement > Affects Versions: 2.0.0, 2.0.1 > Environment: Mac OS X / Ubuntu > pyspark > hadoop-aws:2.7.3 > aws-java-sdk:1.11.41 > Reporter: Robin B > > Connection fails with **400 Bad request** for S3 in Frankfurt region where > version 4 authentication is needed to connect. > This issue is somewhat related HADOOP-13325, but the solution (to include the > endpoint explicitly) does nothing to ameliorate the problem. > > sc._jsc.hadoopConfiguration().set('fs.s3a.impl','org.apache.hadoop.fs.s3native.NativeS3FileSystem') > > sc._jsc.hadoopConfiguration().set('com.amazonaws.services.s3.enableV4','true') > > sc.setSystemProperty('SDKGlobalConfiguration.ENABLE_S3_SIGV4_SYSTEM_PROPERTY','true') > > sc._jsc.hadoopConfiguration().set('fs.s3a.endpoint','s3.eu-central-1.amazonaws.com') > sc._jsc.hadoopConfiguration().set('fs.s3a.awsAccessKeyId','ACCESS_KEY') > > sc._jsc.hadoopConfiguration().set('fs.s3a.awsSecretAccessKey','SECRET_KEY') > df = spark.read.csv("s3a://BUCKET-NAME/filename.csv") > yields: > 16/10/10 18:39:28 WARN DataSource: Error while looking for metadata > directory. > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File > "/usr/local/Cellar/apache-spark/2.0.0/libexec/python/pyspark/sql/readwriter.py", > line 363, in csv > return > self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path))) > File > "/usr/local/Cellar/apache-spark/2.0.0/libexec/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", > line 933, in __call__ > File > "/usr/local/Cellar/apache-spark/2.0.0/libexec/python/pyspark/sql/utils.py", > line 63, in deco > return f(*a, **kw) > File > "/usr/local/Cellar/apache-spark/2.0.0/libexec/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", > line 312, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o35.csv. > : java.io.IOException: s3n://BUCKET-NAME : 400 : Bad Request > at > org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:453) > at > org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:427) > at > org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411) > at > org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:181) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at > org.apache.hadoop.fs.s3native.$Proxy7.retrieveMetadata(Unknown Source) > at > org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:476) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424) > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:360) > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:350) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at scala.collection.immutable.List.foreach(List.scala:381) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) > at scala.collection.immutable.List.flatMap(List.scala:344) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350) > at > org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149) > at > org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:401) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) > at > py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:280) > at > py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:211) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org