Steve Loughran created HADOOP-14943:
---------------------------------------
Summary: S3A to implement getFileBlockLocations() for mapred
partitioning
Key: HADOOP-14943
URL: https://issues.apache.org/jira/browse/HADOOP-14943
Project: Hadoop Common
Issue Type: Sub-task
Components: fs/s3
Affects Versions: 2.8.1
Reporter: Steve Loughran
Priority: Critical
It looks suspiciously like S3A isn't providing the partitioning data needed in
{{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a file
by the blocksize. This will stop tools using the MRv1 APIS doing the
partitioning properly if the input format isn't doing it own split logic.
FileInputFormat in MRv2 is a bit more configurable about input split
calculation & will split up large files. but otherwise, the partitioning is
being done more by the default values of the executing engine, rather than any
config data from the filesystem about what its "block size" is,
NativeAzureFS does a better job; maybe that could be factored out to
hadoop-common and reused?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]