steveloughran commented on issue #1442: HADOOP-16570. S3A committers encounter scale issues URL: https://github.com/apache/hadoop/pull/1442#issuecomment-535658748 latest test run -s3 ireland. There's a new unit test which with the current values takes 1 min; plan to cut the numbers back, just leaving as is to be confident that there's no scale problems with these values. I think I'll declare many more blocks per file. The slow parts of the test are actually * the non serialized creation of all the pendingset files. that can be massively speeded up * the actual listing of files to commit. That's a sequential operation at the start of the commit; I will look at it a bit to see if there are some easy opportunities for speedups, as that would mattter in production, maybe moving off fancy java 8 stuff to simple loops will help there. As that list process is the one for the staging committers, it is only listing the consistent cluster FS (i.e HDFS) so s3 perf won't matter. In real jobs the time to POST commits will dominate -and with that patch every pendingset file is loaded and processed in parallel
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org