Hello!

0、The failure
When i insert into carbon table,i encounter failure。The failure is  as follow:
Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most 
recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): 
ExecutorLostFailure (executor 1 exited caused by one of the running tasks) 
Reason: Slave lost+details
Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most 
recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): 
ExecutorLostFailure (executor 1 exited caused by one of the running tasks) 
Reason: Slave lost
Driver stacktrace:
the stage:

Step:
1、start spark-shell
./bin/spark-shell \ 
--master yarn-client \ 
--num-executors 5 \  (I tried to set this parameter range from 10 to 20,but the 
second job has only 5 tasks)
--executor-cores 5 \ 
--executor-memory 20G \ 
--driver-memory 8G \ 
--queue root.default \ 
--jars /xxx.jar

//spark-default.conf spark.default.parallelism=320

import org.apache.spark.sql.CarbonContext 
val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore") 

2、create table
cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt String,lst 
String,plat String,sty String,is_pay String,is_vip String,is_mpack String,scene 
String,status String,nw String,isc String,area String,spttag String,province 
String,isp String,city String,tv String,hwm String,pip String,fo String,sh 
String,mid String,user_id String,play_pv Int,spt_cnt Int,prg_spt_cnt Int) row 
format delimited fields terminated by '|' STORED BY 'carbondata' TBLPROPERTIES 
('DICTIONARY_EXCLUDE'='pip,sh,mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty,is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm,pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')")
 

//notes,set "fo" column BUCKETCOLUMNS is to join another table
//the column distinct values are as follows:


3、insert into table(xxxx_table_tmp  is a hive extenal orc table,has 20 0000 
0000 records)
cc.sql("insert into xxxx_table select 
dt,pt,lst,plat,sty,is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,city,tv,hwm,pip,fo,sh,mid,user_id
 ,play_pv,spt_cnt,prg_spt_cnt from xxxx_table_tmp where dt='2017-01-01'")

4、spark split sql into two jobs,the first finished succeeded, but the second 
failed:

 
5、The second job stage:



Question:
1、Why the second job has only five jobs,but the first job has 994 jobs ?( 
note:My hadoop cluster has 5 datanode)
      I guess it caused the failure 
2、In the sources,i find DataLoadPartitionCoalescer.class,is it means that "one 
datanode has only one partition ,and then the task is only one on the datanode"?
3、In the ExampleUtils class,"carbon.table.split.partition.enable" is set as 
follow,but i can not find "carbon.table.split.partition.enable" in other parts 
of the project。
     I set "carbon.table.split.partition.enable" to true, but the second job 
has only five jobs.How to use this property?
     ExampleUtils :
    // whether use table split partition 
    // true -> use table split partition, support multiple partition loading 
    // false -> use node split partition, support data load by host partition 
    
CarbonProperties.getInstance().addProperty("carbon.table.split.partition.enable",
 "false") 
4、Insert into carbon table takes 3 hours ,but eventually failed 。How can i 
speed it.
5、in the spark-shell  ,I tried to set this parameter range from 10 to 20,but 
the second job has only 5 tasks
     the other parameter executor-memory = 20G is enough?

I need your help!Thank you very much!

ww...@163.com



ww...@163.com

Reply via email to