[jira] [Created] (HADOOP-14943) S3A to implement getFileBlockLocations() for mapred partitioning

Steve Loughran (JIRA) Wed, 11 Oct 2017 07:28:30 -0700

Steve Loughran created HADOOP-14943:
---------------------------------------


             Summary: S3A to implement getFileBlockLocations() for mapred 
partitioning
                 Key: HADOOP-14943
                 URL: https://issues.apache.org/jira/browse/HADOOP-14943
             Project: Hadoop Common
          Issue Type: Sub-task
          Components: fs/s3
    Affects Versions: 2.8.1
            Reporter: Steve Loughran
            Priority: Critical


It looks suspiciously like S3A isn't providing the partitioning data needed in 
{{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a file 
by the blocksize. This will stop tools using the MRv1 APIS doing the 
partitioning properly if the input format isn't doing it own split logic.

FileInputFormat in MRv2 is a bit more configurable about input split 
calculation & will split up large files. but otherwise, the partitioning is 
being done more by the default values of the executing engine, rather than any 
config data from the filesystem about what its "block size" is,

NativeAzureFS does a better job; maybe that could be factored out to 
hadoop-common and reused?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (HADOOP-14943) S3A to implement getFileBlockLocations() for mapred partitioning

Reply via email to