FYI, HADOOP-16458 : LocatedFileStatusFetcher.getFileStatuses failing intermittently with s3
This is inevitably something up with S3A, but I'm going to be making changes to the LocatedFileStatusFetcher code as well as o.a.h.fs.Globber to help diagnose this, so it's stepping into MAPREDUCE land. Two questions. -there are no explicit unit tests of LocatedFileStatusFetcher doing scans of object stores or filesystems. Is there anything I've not seen? - the FileSystem globber has code which, if it does a listStatus(path) gets a single entry, calls getFileStatus to get some more information, which the docs say "needed to handle symlinks" I don't know where we are with symlinks right now, because they aren't in any object store, and disabled for HDFS. What do people think if I actually removed that secondary check? I may play with some subclassing games and just remove it for S3A, so it's lower risk, while improving perf slightly. ABFS could copy. Any thoughts?