steveloughran commented on issue #1442: HADOOP-16570. S3A committers encounter 
scale issues
URL: https://github.com/apache/hadoop/pull/1442#issuecomment-535658748
 
 
   latest test run -s3 ireland. There's a new unit test which with the current 
values takes 1 min; plan to cut the numbers back, just leaving as is to be 
confident that there's no scale problems with these values. I think I'll 
declare many more blocks per file. 
   
   The slow parts of the test are actually 
   * the non serialized creation of all the pendingset files. that can be 
massively speeded up
   * the actual listing of files to commit. That's a sequential operation at 
the start of the commit; I will look at it a bit to see if there are some easy 
opportunities for speedups, as that would mattter in production, maybe moving 
off fancy java 8 stuff to simple loops will help there.
   
   As that list process is the one for the staging committers, it is only 
listing the consistent cluster FS (i.e HDFS) so s3 perf won't matter. In real 
jobs the time to POST commits will dominate -and with that patch every 
pendingset file is loaded and processed in parallel

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to