Prabhu Joseph created HADOOP-16347:
--------------------------------------

             Summary: MapReduce job tasks fails on S3
                 Key: HADOOP-16347
                 URL: https://issues.apache.org/jira/browse/HADOOP-16347
             Project: Hadoop Common
          Issue Type: Sub-task
          Components: fs/s3
    Affects Versions: 3.3.0
            Reporter: Prabhu Joseph


MapReduce Job tasks fails. There are few tasks which fails with below exception 
and few hangs and then times out. List Files on S3 works fine from hadoop 
client.
 
*Exception from failed task:*
 
{code}
2019-05-30 20:23:05,424 ERROR [IPC Server handler 19 on 35791] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
attempt_1559246386193_0001_m_000000_0 - exited : 
org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist on 
qe-cloudstorage-bucket: com.amazonaws.SdkClientException: Unable to execute 
HTTP request: error:14090086:SSL 
routines:ssl3_get_server_certificate:certificate verify failed: Unable to 
execute HTTP request: error:14090086:SSL 
routines:ssl3_get_server_certificate:certificate verify failed
        at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:204)
        at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:111)
        at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:314)
        at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:406)
        at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:310)
        at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:285)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:444)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:350)
        at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3315)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:136)
        at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3364)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3332)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:491)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
        at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.<init>(FileOutputCommitter.java:161)
        at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.<init>(FileOutputCommitter.java:117)
        at 
org.apache.hadoop.examples.terasort.TeraOutputFormat.getOutputCommitter(TeraOutputFormat.java:152)
        at org.apache.hadoop.mapred.Task.initialize(Task.java:606)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: 
error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify 
failed
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1116)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1066)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
        at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
        at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4368)
        at 
com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:5129)
        at 
com.amazonaws.services.s3.AmazonS3Client.fetchRegionFromCache(AmazonS3Client.java:5103)
        at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4352)
        at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4315)
        at 
com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1344)
        at 
com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1284)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$1(S3AFileSystem.java:445)
        at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
        ... 22 more
Caused by: javax.net.ssl.SSLException: error:14090086:SSL 
routines:ssl3_get_server_certificate:certificate verify failed
        at org.wildfly.openssl.OpenSSLEngine.unwrap(OpenSSLEngine.java:543)
        at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
        at 
org.wildfly.openssl.OpenSSLSocket.runHandshake(OpenSSLSocket.java:319)
        at 
org.wildfly.openssl.OpenSSLSocket.startHandshake(OpenSSLSocket.java:210)
        at 
com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:396)
        at 
com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:355)
        at 
com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
        at 
com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
        at com.amazonaws.http.conn.$Proxy17.connect(Unknown Source)
        at 
com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381)
        at 
com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
        at 
com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
        at 
com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
        at 
com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
        at 
com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
        at 
com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1238)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1058)
        ... 37 more

{code}
 
 
*ThreadDump of* *Hanging* *MapTask:*
 

{code}
"main" #1 prio=5 os_prio=0 tid=0x00007ff424064800 nid=0x15109 waiting on 
condition [0x00007ff42c3fc000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doPauseBeforeRetry(AmazonHttpClient.java:1679)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.pauseBeforeRetry(AmazonHttpClient.java:1653)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1191)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1058)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4368)
at 
com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:5129)
at 
com.amazonaws.services.s3.AmazonS3Client.fetchRegionFromCache(AmazonS3Client.java:5103)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4352)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4315)
at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1344)
at 
com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1284)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$1(S3AFileSystem.java:445)
at org.apache.hadoop.fs.s3a.S3AFileSystem$$Lambda$19/350413251.execute(Unknown 
Source)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:314)
at org.apache.hadoop.fs.s3a.Invoker$$Lambda$20/253767021.execute(Unknown Source)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:406)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:310)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:285)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:444)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:350)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3315)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:136)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3364)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3332)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:491)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.<init>(FileOutputCommitter.java:161)
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.<init>(FileOutputCommitter.java:117)
at 
org.apache.hadoop.examples.terasort.TeraOutputFormat.getOutputCommitter(TeraOutputFormat.java:152)
at org.apache.hadoop.mapred.Task.initialize(Task.java:606)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to