[jira] [Closed] (SPARK-17869) Connect to Amazon S3 using signature version 4 (only choice in Frankfurt)

2016-10-11 Thread Robin B (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robin B closed SPARK-17869.
---
Resolution: Won't Fix

You are right [~srowen]

> Connect to Amazon S3 using signature version 4 (only choice in Frankfurt)
> -
>
> Key: SPARK-17869
> URL: https://issues.apache.org/jira/browse/SPARK-17869
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0, 2.0.1
> Environment: Mac OS X / Ubuntu
> pyspark
> hadoop-aws:2.7.3
> aws-java-sdk:1.11.41
>Reporter: Robin B
>
> Connection fails with **400 Bad request** for S3 in Frankfurt region where 
> version 4 authentication is needed to connect. 
> This issue is somewhat related HADOOP-13325, but the solution (to include the 
> endpoint explicitly) does nothing to ameliorate the problem.
> 
> sc._jsc.hadoopConfiguration().set('fs.s3a.impl','org.apache.hadoop.fs.s3native.NativeS3FileSystem')
> 
> sc._jsc.hadoopConfiguration().set('com.amazonaws.services.s3.enableV4','true')
> 
> sc.setSystemProperty('SDKGlobalConfiguration.ENABLE_S3_SIGV4_SYSTEM_PROPERTY','true')
> 
> sc._jsc.hadoopConfiguration().set('fs.s3a.endpoint','s3.eu-central-1.amazonaws.com')
> sc._jsc.hadoopConfiguration().set('fs.s3a.awsAccessKeyId','ACCESS_KEY')
> 
> sc._jsc.hadoopConfiguration().set('fs.s3a.awsSecretAccessKey','SECRET_KEY')
> df = spark.read.csv("s3a://BUCKET-NAME/filename.csv")
> yields:
>   16/10/10 18:39:28 WARN DataSource: Error while looking for metadata 
> directory.
>   Traceback (most recent call last):
> File "", line 1, in 
> File 
> "/usr/local/Cellar/apache-spark/2.0.0/libexec/python/pyspark/sql/readwriter.py",
>  line 363, in csv
>   return 
> self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path)))
> File 
> "/usr/local/Cellar/apache-spark/2.0.0/libexec/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py",
>  line 933, in __call__
> File 
> "/usr/local/Cellar/apache-spark/2.0.0/libexec/python/pyspark/sql/utils.py", 
> line 63, in deco
>   return f(*a, **kw)
> File 
> "/usr/local/Cellar/apache-spark/2.0.0/libexec/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py",
>  line 312, in get_return_value
>   py4j.protocol.Py4JJavaError: An error occurred while calling o35.csv.
>   : java.io.IOException: s3n://BUCKET-NAME : 400 : Bad Request
>   at 
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:453)
>   at 
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:427)
>   at 
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
>   at 
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:181)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at 
> org.apache.hadoop.fs.s3native.$Proxy7.retrieveMetadata(Unknown Source)
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:476)
>   at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:360)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:350)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>   at scala.collection.immutable.List.flatMap(List.scala:344)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
>   at 
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
>   at 
> org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:401)
>   at 

[jira] [Updated] (SPARK-17869) Connect to Amazon S3 using signature version 4 (only choice in Frankfurt)

2016-10-11 Thread Robin B (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robin B updated SPARK-17869:

Description: 
Connection fails with **400 Bad request** for S3 in Frankfurt region where 
version 4 authentication is needed to connect. 

This issue is somewhat related HADOOP-13325, but the solution (to include the 
endpoint explicitly) does nothing to ameliorate the problem.


sc._jsc.hadoopConfiguration().set('fs.s3a.impl','org.apache.hadoop.fs.s3native.NativeS3FileSystem')

sc._jsc.hadoopConfiguration().set('com.amazonaws.services.s3.enableV4','true')

sc.setSystemProperty('SDKGlobalConfiguration.ENABLE_S3_SIGV4_SYSTEM_PROPERTY','true')

sc._jsc.hadoopConfiguration().set('fs.s3a.endpoint','s3.eu-central-1.amazonaws.com')
sc._jsc.hadoopConfiguration().set('fs.s3a.awsAccessKeyId','ACCESS_KEY')
sc._jsc.hadoopConfiguration().set('fs.s3a.awsSecretAccessKey','SECRET_KEY')

df = spark.read.csv("s3a://BUCKET-NAME/filename.csv")

yields:

16/10/10 18:39:28 WARN DataSource: Error while looking for metadata 
directory.
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/usr/local/Cellar/apache-spark/2.0.0/libexec/python/pyspark/sql/readwriter.py",
 line 363, in csv
return 
self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path)))
  File 
"/usr/local/Cellar/apache-spark/2.0.0/libexec/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py",
 line 933, in __call__
  File 
"/usr/local/Cellar/apache-spark/2.0.0/libexec/python/pyspark/sql/utils.py", 
line 63, in deco
return f(*a, **kw)
  File 
"/usr/local/Cellar/apache-spark/2.0.0/libexec/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py",
 line 312, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o35.csv.
: java.io.IOException: s3n://BUCKET-NAME : 400 : Bad Request
at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:453)
at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:427)
at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:181)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at 
org.apache.hadoop.fs.s3native.$Proxy7.retrieveMetadata(Unknown Source)
at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:476)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:360)
at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:350)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at 
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
at 
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at 
org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:401)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at 
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at 

[jira] [Created] (SPARK-17869) Connect to Amazon S3 using signature version 4 (only choice in Frankfurt)

2016-10-11 Thread Robin B (JIRA)
Robin B created SPARK-17869:
---

 Summary: Connect to Amazon S3 using signature version 4 (only 
choice in Frankfurt)
 Key: SPARK-17869
 URL: https://issues.apache.org/jira/browse/SPARK-17869
 Project: Spark
  Issue Type: Improvement
Affects Versions: 2.0.1, 2.0.0
 Environment: Mac OS X / Ubuntu
pyspark
hadoop-aws:2.7.3
aws-java-sdk:1.11.41
Reporter: Robin B


Connection fails with **400 Bad request** for S3 in Frankfurt region where 
version 4 authentication is needed to connect. 

This issue is somewhat related 
, but the 
solution (to include the endpoint explicitly) does nothing to ameliorate the 
problem.


sc._jsc.hadoopConfiguration().set('fs.s3a.impl','org.apache.hadoop.fs.s3native.NativeS3FileSystem')

sc._jsc.hadoopConfiguration().set('com.amazonaws.services.s3.enableV4','true')

sc.setSystemProperty('SDKGlobalConfiguration.ENABLE_S3_SIGV4_SYSTEM_PROPERTY','true')

sc._jsc.hadoopConfiguration().set('fs.s3a.endpoint','s3.eu-central-1.amazonaws.com')
sc._jsc.hadoopConfiguration().set('fs.s3a.awsAccessKeyId','ACCESS_KEY')
sc._jsc.hadoopConfiguration().set('fs.s3a.awsSecretAccessKey','SECRET_KEY')

df = spark.read.csv("s3a://BUCKET-NAME/filename.csv")

yields:

16/10/10 18:39:28 WARN DataSource: Error while looking for metadata 
directory.
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/usr/local/Cellar/apache-spark/2.0.0/libexec/python/pyspark/sql/readwriter.py",
 line 363, in csv
return 
self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path)))
  File 
"/usr/local/Cellar/apache-spark/2.0.0/libexec/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py",
 line 933, in __call__
  File 
"/usr/local/Cellar/apache-spark/2.0.0/libexec/python/pyspark/sql/utils.py", 
line 63, in deco
return f(*a, **kw)
  File 
"/usr/local/Cellar/apache-spark/2.0.0/libexec/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py",
 line 312, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o35.csv.
: java.io.IOException: s3n://BUCKET-NAME : 400 : Bad Request
at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:453)
at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:427)
at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:181)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at 
org.apache.hadoop.fs.s3native.$Proxy7.retrieveMetadata(Unknown Source)
at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:476)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:360)
at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:350)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at 
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
at 
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at 
org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:401)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)