Bumping this thread up, because I'm also curious if anybody has any thoughts on Adam's questions.
On Mon, Aug 15, 2016 at 1:49 PM Adam Fuchs <afu...@apache.org> wrote: > I've been looking through the bulk load code lately related to some > performance issues a customer of ours is experiencing, and I'm perplexed by > a couple of things. Between o.a.a.master.tableOps.LoadFiles and > o.a.a.server.client.BulkImporter we have 4 thread pools that are used in > bulk load. It seems like only the master thread pool gets any parallelism > because we always send one file at a time to the tservers (LoadFiles:154). > Are the three thread pools in the tserver vestigial? Did we used to send > bigger batches to the tservers and find that one at a time was more > optimal? > > Seems like we could greatly simplify the tserver portion of the bulk load. > Can anybody think of why that might not be a good idea? > > Also, has anybody optimized the pool sizes for multiple concurrent large > bulk loads, and do you have suggestions on what settings to use (i.e. > master.fate.threadpool.size and master.bulk.threadpool.size)? > > Thanks, > Adam >