[ https://issues.apache.org/jira/browse/HADOOP-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365275#comment-15365275 ]
Ryan Blue edited comment on HADOOP-12878 at 7/6/16 10:55 PM: ------------------------------------------------------------- FileInputFormat works slightly differently. First, the [split size|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L445] is calculated from the file's reported block size and the current min and max split sizes. Then, [the file is broken into N splits|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L410-416] that size, where {{N = Math.ceil(fileLength / splitSize)}}. The block locations are then used to determine [where each split is located|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L448], based on the split's starting offset. The result is that {{getFileBlockLocations}} can return a single location for the entire file and you'll still end up with N roughly block-sized splits. This is what enables you to get more parallelism by setting smaller split sizes, even if the resulting splits don't correspond to different blocks. In our environment, we use a 64MB S3 block size and don't see a bottleneck from one input split per file. was (Author: rdblue): FileInputFormat works slightly differently. First, the [split size|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L445] is calculated from the file's reported block size and the current min and max split sizes. Then, the file is broken into N splits that size, where {{N = Math.ceil(fileLength / splitSize)}}. The block locations are then used to determine where each split is located, based on the split's starting offset. The result is that {{getFileBlockLocations}} can return a single location for the entire file and you'll still end up with N roughly block-sized splits. This is what enables you to get more parallelism by setting smaller split sizes, even if the resulting splits don't correspond to different blocks. In our environment, we use a 64MB S3 block size and don't see a bottleneck from one input split per file. > Impersonate hosts in s3a for better data locality handling > ---------------------------------------------------------- > > Key: HADOOP-12878 > URL: https://issues.apache.org/jira/browse/HADOOP-12878 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 2.8.0 > Reporter: Thomas Demoor > Assignee: Thomas Demoor > > Currently, {{localhost}} is passed as locality for each block, causing all > blocks involved in job to initially target the same node (RM), before being > moved by the scheduler (to a rack-local node). This reduces parallelism for > jobs (with short-lived mappers). > We should mimic Azures implementation: a config setting > {{fs.s3a.block.location.impersonatedhost}} where the user can enter the list > of hostnames in the cluster to return to {{getFileBlockLocations}}. > Possible optimization: for larger systems, it might be better to return N > (5?) random hostnames to prevent passing a huge array (the downstream code > assumes size = O(3)). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org