We encountered a bottleneck during the shuffle phase. However, there is not much data to be shuffled across the network at all - total less than 10MBytes (the combiner aggregated most of the data).
Are there any parameters or anything we can tune to improve the shuffle performance? Thanks, -Songting