Hi, We have a hive connector library we're having some issues with. Given N input splits we have N mappers spawned, but the number of write requests these mappers end up making end up exceeding the peak allowed write throughput. Is it possible to dynamically merge some input splits in some mappers to throttle throughput to stay under some peak?
I could use CombineFileInputFormat to control the size of splits perhaps, but I don't know what the size of the data set that needs to be written is in advance. And ideally we'd be able to delegate the throttling logic to the library- it sees what throughput is achieved with a given number of mappers, and maybe kill some mappers and move their input splits over to another OR maybe queue some inputs to run later. Is this possible? Thanks, Anmol