[ https://issues.apache.org/jira/browse/HADOOP-11584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329721#comment-14329721 ]
Lei (Eddy) Xu commented on HADOOP-11584: ---------------------------------------- Tests passed on AWS US standard region. Non-binding +1. Just one small question, why do {{TestS3ABlocksize}} and {{TestS3AFileSystemContract}} require different {{s3a fs name}} ? {{TestS3ABlocksize}} asks for {{fs.contract.test.fs.s3a}} while {{TestS3AFileSystemContract}} asks for {{test.fs.s3a.name}}? > s3a file block size set to 0 in getFileStatus > --------------------------------------------- > > Key: HADOOP-11584 > URL: https://issues.apache.org/jira/browse/HADOOP-11584 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 2.6.0 > Reporter: Dan Hecht > Assignee: Brahma Reddy Battula > Priority: Blocker > Attachments: HADOOP-10584-003.patch, HADOOP-111584.patch, > HADOOP-11584-002.patch > > > The consequence is that mapreduce probably is not splitting s3a files in the > expected way. This is similar to HADOOP-5861 (which was for s3n, though s3n > was passing 5G rather than 0 for block size). > FileInputFormat.getSplits() relies on the FileStatus block size being set: > {code} > if (isSplitable(job, path)) { > long blockSize = file.getBlockSize(); > long splitSize = computeSplitSize(blockSize, minSize, maxSize); > {code} > However, S3AFileSystem does not set the FileStatus block size field. From > S3AFileStatus.java: > {code} > // Files > public S3AFileStatus(long length, long modification_time, Path path) { > super(length, false, 1, 0, modification_time, path); > isEmptyDirectory = false; > } > {code} > I think it should use S3AFileSystem.getDefaultBlockSize() for each file's > block size (where it's currently passing 0). -- This message was sent by Atlassian JIRA (v6.3.4#6332)