Sounds like a YARN configuration problem
Parallelize is good :), not all Map / reduces are executed at same times
Check some configurations like:
-
yarn.nodemanager.resource.memory-mb per node
-
yarn.nodemanager.resource.cpu-vcores per node
This can help you to start:
https://www.cl
My model have 7 tables,a cube have 15 dimensions, in the “Convert Cuboid
Data to HFile” step to start too many maps and reduces(maps 500+,reduces 1.
4k+),This step expend all resources of the small cluster.
I set these parameters in the cluster:
dfs.block.size=256M
hive.exec.reducers.bytes.per.r