Hello!

Test result:
When I load csv data into carbondata table 3 times,the executors distributed 
unevenly。My  purpose is one node one task,but the result is some node has 2 
task and some node has no task。
See the load data 1.png,data 2.png,data 3.png。
The carbondata data.PNG is the data structure in hadoop.


I load 4 0000 0000 records into carbondata table takes 2629s seconds,its too 
long。


Question:
How can i make the executors distributed evenly ?


The environment:

spark2.1+carbondata1.1,there are 7 datanodes.


./bin/spark-shell   \
--master yarn \
--deploy-mode client  \
--num-executors n \ (the first time is 7(result in load data 1.png),the second 
time is 6(result in load data 2.png),the three time is 8(result in load 
data3.png))
--executor-cores 10 \
--executor-memory 40G \
--driver-memory 8G \


carbon.properties
######## DataLoading Configuration ########
carbon.sort.file.buffer.size=20
carbon.graph.rowset.size=10000
carbon.number.of.cores.while.loading=10
carbon.sort.size=50000
carbon.number.of.cores.while.compacting=10
carbon.number.of.cores=10


Best regards!





Reply via email to