Maxim Martynov created HADOOP-18839:
---------------------------------------
Summary: SSLException while accessing S3 bucket is reported only
after 15 minutes of waiting
Key: HADOOP-18839
URL: https://issues.apache.org/jira/browse/HADOOP-18839
Project: Hadoop Common
Issue Type: Bug
Components: fs/s3
Affects Versions: 3.3.4
Reporter: Maxim Martynov
Attachments: host.log, ssl.log
I've tried to connect from PySpark to Minio running in docker.
Installing PySpark and starting Minio:
{code:bash}
pip install pyspark==3.4.1
docker run --rm -d --hostname minio --name minio -p 9000:9000 -p 9001:9001 -e
MINIO_ACCESS_KEY=access -e MINIO_SECRET_KEY=Eevoh2wo0ui6ech0wu8oy
3feiR3eicha -e MINIO_ROOT_USER=admin -e
MINIO_ROOT_PASSWORD=iepaegaigi3ofa9TaephieSo1iecaesh bitnami/minio:latest
docker exec minio mc mb test-bucket
{code}
Then create Spark session:
{code:python}
from pyspark.sql import SparkSession
spark = SparkSession.builder\
.config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.3.4")\
.config("spark.hadoop.fs.s3a.endpoint", "localhost:9000")\
.config("spark.hadoop.fs.s3a.access.key", "access")\
.config("spark.hadoop.fs.s3a.secret.key",
"Eevoh2wo0ui6ech0wu8oy3feiR3eicha")\
.config("spark.hadoop.fs.s3a.aws.credentials.provider",
"org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider")\
.getOrCreate()
spark.sparkContext.setLogLevel("debug")
{code}
And try to access some object in a bucket:
{code:python}
import time
begin = time.perf_counter()
spark.read.format("csv").load("s3a://test-bucket/fake")
end = time.perf_counter()
py4j.protocol.Py4JJavaError: An error occurred while calling o40.load.
: org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on
s3a://test-bucket/fake: com.amazonaws.SdkClientException: Unable to execute
HTTP request: Unsupported or unrecognized SSL message: Unable to execute HTTP
request: Unsupported or unrecognized SSL message
...
{code}
[^ssl.log]
{code:python}
>>> print((end-begin)/60)
14.72387898775002
{code}
I was waiting almost *15 minutes* to get the exception from Spark. The reason
was I tried to connect to S3 instance with
{{{}fs.s3a.connection.ssl.enabled=true{}}}, but Minio is configured to listen
for HTTP protocol only.
Is there any way to immediately raise exception if SSL connection cannot be
established?
If I try to pass wrong endpoint, like {{{}localhos:9000{}}}, I'll get exception
like this in just 5 seconds:
{code:java}
: org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on
s3a://test-bucket/fake: com.amazonaws.SdkClientException: Unable to execute
HTTP request: test-bucket.localhos: Unable to execute HTTP request:
test-bucket.localhos
...
{code}
[^host.log]
{code:python}
>>> print((end-begin)/60)
0.09500707178334172
>>> end-begin
5.700424307000503
{code}
I know about options like {{fs.s3a.attempts.maximum}} and
{{{}fs.s3a.retry.limit{}}}, setting them to 1 will cause raising exception just
immediately. But this does not look right.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]