[ 
https://issues.apache.org/jira/browse/HADOOP-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171659#comment-17171659
 ] 

Steve Loughran commented on HADOOP-17180:
-----------------------------------------

DDB's error reporting is pretty painful. I don't see any support for it in 
S3aUtils.isThrottleException in trunk, so upgrading isn't going to make your 
life better

.InMemoryFileIndex is something I've heard bad reviews for in terms of S3 load 
(https://stackoverflow.com/questions/60590925/spark-making-expensive-s3-api-calls
 ), probably the trigger of this.

Anyway, if treating 500 as a throttle event from DDB is needed, patch against 
trunk welcome, we can go back to 3.3.x as well. We've moved a long way from the 
3.1 line though.

Looking at the changelog, you need to be on 3.2.x to get the current DDB 
throttle logic in HADOOP-15426

Can you try that and see if it helps?


> S3Guard: Include 500 DynamoDB system errors in exponential backoff retries
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-17180
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17180
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 3.1.3
>            Reporter: David Kats
>            Priority: Major
>         Attachments: image-2020-08-03-09-58-54-102.png
>
>
> We get fatal failures from S3guard (that in turn fail our spark jobs) because 
> of the inernal DynamoDB system errors.
> {color:#000000}com.amazonaws.services.dynamodbv2.model.InternalServerErrorException:
>  Internal server error (Service: AmazonDynamoDBv2; Status Code: 500; Error 
> Code: InternalServerError; Request ID: 
> 00EBRE6J6V8UGD7040C9DUP2MNVV4KQNSO5AEMVJF66Q9ASUAAJG): Internal server error 
> (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: 
> InternalServerError; Request ID: 
> 00EBRE6J6V8UGD7040C9DUP2MNVV4KQNSO5AEMVJF66Q9ASUAAJG){color}
> {color:#000000}The DynamoDB has separate statistic for system errors:{color}
> {color:#000000}!image-2020-08-03-09-58-54-102.png!{color}
> {color:#000000}I contacted the AWS Support and got an explanation that those 
> 500 errors are returned to the client once DynamoDB gets overwhelmed with 
> client requests.{color}
> {color:#000000}So essentially the traffic should had been throttled but it 
> didn't and got 500 system errors.{color}
> {color:#000000}My point is that the client should handle those errors just 
> like throttling exceptions - {color}
> {color:#000000}with exponential backoff retries.{color}
>  
> {color:#000000}Here is more complete exception stack trace:{color}
>  
> *{color:#000000}org.apache.hadoop.fs.s3a.AWSServiceIOException: get on 
> s3a://rem-spark/persisted_step_data/15/0afb1ccb73854f1fa55517a77ec7cc5e__b67e2221-f0e3-4c89-90ab-f49618ea4557__SDTopology/parquet.all_ranges/topo_id=321:
>  com.amazonaws.services.dynamodbv2.model.InternalServerErrorException: 
> Internal server error (Service: AmazonDynamoDBv2; Status Code: 500; Error 
> Code: InternalServerError; Request ID: 
> 00EBRE6J6V8UGD7040C9DUP2MNVV4KQNSO5AEMVJF66Q9ASUAAJG): Internal server error 
> (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: 
> InternalServerError; Request ID: 
> 00EBRE6J6V8UGD7040C9DUP2MNVV4KQNSO5AEMVJF66Q9ASUAAJG) 
> at{color}*{color:#000000} 
> org.apache.hadoop.fs.s3a.S3AUtils.translateDynamoDBException(S3AUtils.java:389)
>  at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:181) 
> at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:111) at 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.get(DynamoDBMetadataStore.java:438)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2110)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088) 
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:1889)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listStatus$9(S3AFileSystem.java:1868)
>  at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109) at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:1868) at 
> org.apache.spark.sql.execution.datasources.InMemoryFileIndex$.org$apache$spark$sql$execution$datasources$InMemoryFileIndex$$listLeafFiles(InMemoryFileIndex.scala:277)
>  at 
> org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$3$$anonfun$apply$2.apply(InMemoryFileIndex.scala:207)
>  at 
> org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$3$$anonfun$apply$2.apply(InMemoryFileIndex.scala:206)
>  at scala.collection.immutable.Stream.map(Stream.scala:418) at 
> org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$3.apply(InMemoryFileIndex.scala:206)
>  at 
> org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$3.apply(InMemoryFileIndex.scala:204)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at 
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:748) Caused by: 
> com.amazonaws.services.dynamodbv2.model.InternalServerErrorException: 
> Internal server error (Service: AmazonDynamoDBv2; Status Code: 500; Error 
> Code: InternalServerError; Request ID: 
> 00EBRE6J6V8UGD7040C9DUP2MNVV4KQNSO5AEMVJF66Q9ASUAAJG) at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1639)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
>  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:2925)
>  at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:2901)
>  at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeGetItem(AmazonDynamoDBClient.java:1640)
>  at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.getItem(AmazonDynamoDBClient.java:1616)
>  at 
> com.amazonaws.services.dynamodbv2.document.internal.GetItemImpl.doLoadItem(GetItemImpl.java:77)
>  at 
> com.amazonaws.services.dynamodbv2.document.internal.GetItemImpl.getItem(GetItemImpl.java:66)
>  at com.amazonaws.services.dynamodbv2.document.Table.getItem(Table.java:608) 
> at 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.getConsistentItem(DynamoDBMetadataStore.java:423)
>  at 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.innerGet(DynamoDBMetadataStore.java:459)
>  at 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.lambda$get$2(DynamoDBMetadataStore.java:439)
>  at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109) ... 29 more{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to