Thomas Demoor created HADOOP-12878: -------------------------------------- Summary: Impersonate hosts in s3a for better data locality handling Key: HADOOP-12878 URL: https://issues.apache.org/jira/browse/HADOOP-12878 Project: Hadoop Common Issue Type: Sub-task Reporter: Thomas Demoor Assignee: Thomas Demoor
Currently, {{localhost}} is passed as locality for each block, causing all blocks involved in job to initially target the same node (RM), before being moved by the scheduler (to a rack-local node). This reduces parallelism for jobs (with short-lived mappers). We should mimic Azures implementation: a config setting {{fs.s3a.block.location.impersonatedhost}} where the user can enter the list of hostnames in the cluster to return to {{getFileBlockLocations}}. Possible optimization: for larger systems, it might be better to return N (5?) random hostnames to prevent passing a huge array (the downstream code assumes size = O(3)). -- This message was sent by Atlassian JIRA (v6.3.4#6332)