TEST SQL :
高基数随机查询
select * From carbon_table where dt='2017-01-01' and user_id='XXXX' limit 100;


高基数随机查询like
select * From carbon_table where dt='2017-01-01' and fo like '%YYYY%' limit 100;


低基数随机查询
select * From carbon_table where dt='2017-01-01' and plat='android' and 
tv='8400' limit 100


1维度查询
select province,sum(play_pv) play_pv ,sum(spt_cnt) spt_cnt
from carbon_table where dt='2017-01-01' and sty='AAAA'
group by province


2维度查询
select province,city,sum(play_pv) play_pv ,sum(spt_cnt) spt_cnt
from carbon_table where dt='2017-01-01' and sty='AAAA'
group by province,city


3维度查询
select province,city,isp,sum(play_pv) play_pv ,sum(spt_cnt) spt_cnt
from carbon_table where dt='2017-01-01' and sty='AAAA'
group by province,city,isp


多维度查询
select sty,isc,status,nw,tv,area,province,city,isp,sum(play_pv) play_pv_sum 
,sum(spt_cnt) spt_cnt_sum
from carbon_table where dt='2017-01-01' and sty='AAAA'
group by sty,isc,status,nw,tv,area,province,city,isp


distinct 单列
select tv, count(distinct user_id) 
from carbon_table where dt='2017-01-01' and sty='AAAA' and fo like '%YYYY%' 
group by tv


distinct 多列
select count(distinct user_id) ,count(distinct mid),count(distinct case when 
sty='AAAA' then mid end)
from carbon_table where dt='2017-01-01' and sty='AAAA'


排序查询
select user_id,sum(play_pv) play_pv_sum
from carbon_table
group by user_id
order by play_pv_sum desc limit 100


简单join查询
select b.fo_level1,b.fo_level2,sum(a.play_pv) play_pv_sum From carbon_table a
left join dim_carbon_table b
on a.fo=b.fo and a.dt = b.dt where a.dt = '2017-01-01' group by 
b.fo_level1,b.fo_level2

At 2017-03-27 04:10:04, "a" <ww...@163.com> wrote:
>I download  the newest sourcecode (master) and compile,generate the jar 
>carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
>Then i use spark2.1 test again.The error logs are as follow:
>
>
> Container log :
>17/03/27 02:27:21 ERROR newflow.DataLoadExecutor: Executor task launch 
>worker-9 Data Loading failed for table carbon_table
>java.lang.NullPointerException
>        at 
> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>        at 
> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>        at 
> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>        at 
> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>        at 
> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>        at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>        at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:745)
>17/03/27 02:27:21 INFO rdd.NewDataFrameLoaderRDD: DataLoad failure
>org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: 
>Data Loading failed for table carbon_table
>        at 
> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>        at 
> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>        at 
> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>        at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>        at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:745)
>Caused by: java.lang.NullPointerException
>        at 
> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>        at 
> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>        at 
> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>        ... 10 more
>17/03/27 02:27:21 ERROR rdd.NewDataFrameLoaderRDD: Executor task launch 
>worker-9 
>org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: 
>Data Loading failed for table carbon_table
>        at 
> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>        at 
> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>        at 
> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>        at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>        at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:745)
>Caused by: java.lang.NullPointerException
>        at 
> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>        at 
> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>        at 
> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>        ... 10 more
>17/03/27 02:27:21 ERROR executor.Executor: Exception in task 0.3 in stage 2.0 
>(TID 538)
>org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: 
>Data Loading failed for table carbon_table
>        at 
> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>        at 
> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>        at 
> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>        at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>        at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:745)
>Caused by: java.lang.NullPointerException
>        at 
> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>        at 
> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>        at 
> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>        ... 10 more
>
>
>
>Spark log:
>
>ERROR 27-03 02:27:21,407 - Task 0 in stage 2.0 failed 4 times; aborting job
>ERROR 27-03 02:27:21,419 - main load data frame failed
>org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
>stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 
>538, hd25): 
>org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: 
>Data Loading failed for table carbon_table
>        at 
> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>        at 
> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>        at 
> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>        at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>        at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:745)
>Caused by: java.lang.NullPointerException
>        at 
> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>        at 
> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>        at 
> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>        ... 10 more
>
>
>Driver stacktrace:
>        at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>        at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
>        at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
>        at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>        at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
>        at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>        at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>        at scala.Option.foreach(Option.scala:236)
>        at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
>        at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
>        at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
>        at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
>        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>        at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
>        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
>        at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>        at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>        at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
>        at 
> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadDataFrame$1(CarbonDataRDDFactory.scala:665)
>        at 
> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:794)
>        at 
> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579)
>        at 
> org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297)
>        at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>        at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>        at 
> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>        at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>        at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>        at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>        at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>        at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>        at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
>        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
>        at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
>        at 
> $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
>        at 
> $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
>        at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44)
>        at $line23.$read$$iwC$$iwC.<init>(<console>:46)
>        at $line23.$read$$iwC.<init>(<console>:48)
>        at $line23.$read.<init>(<console>:50)
>        at $line23.$read$.<init>(<console>:54)
>        at $line23.$read$.<clinit>(<console>)
>        at $line23.$eval$.<init>(<console>:7)
>        at $line23.$eval$.<clinit>(<console>)
>        at $line23.$eval.$print(<console>)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:606)
>        at 
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>        at 
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
>        at 
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>        at 
> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>        at 
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>        at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>        at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>        at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>        at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>        at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>        at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>        at org.apache.spark.repl.Main$.main(Main.scala:31)
>        at org.apache.spark.repl.Main.main(Main.scala)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:606)
>        at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>        at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>Caused by: 
>org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: 
>Data Loading failed for table carbon_table
>        at 
> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>        at 
> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>        at 
> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>        at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>        at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:745)
>Caused by: java.lang.NullPointerException
>        at 
> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>        at 
> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>        at 
> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>        ... 10 more
>ERROR 27-03 02:27:21,422 - main 
>org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
>stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 
>538, hd25): 
>org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: 
>Data Loading failed for table carbon_table
>        at 
> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>        at 
> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>        at 
> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>        at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>        at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:745)
>Caused by: java.lang.NullPointerException
>        at 
> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>        at 
> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>        at 
> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>        ... 10 more
>
>
>Driver stacktrace:
>        at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>        at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
>        at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
>        at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>        at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
>        at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>        at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>        at scala.Option.foreach(Option.scala:236)
>        at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
>        at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
>        at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
>        at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
>        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>        at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
>        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
>        at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>        at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>        at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
>        at 
> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadDataFrame$1(CarbonDataRDDFactory.scala:665)
>        at 
> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:794)
>        at 
> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579)
>        at 
> org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297)
>        at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>        at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>        at 
> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>        at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>        at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>        at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>        at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>        at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>        at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
>        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
>        at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
>        at 
> $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
>        at 
> $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
>        at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44)
>        at $line23.$read$$iwC$$iwC.<init>(<console>:46)
>        at $line23.$read$$iwC.<init>(<console>:48)
>        at $line23.$read.<init>(<console>:50)
>        at $line23.$read$.<init>(<console>:54)
>        at $line23.$read$.<clinit>(<console>)
>        at $line23.$eval$.<init>(<console>:7)
>        at $line23.$eval$.<clinit>(<console>)
>        at $line23.$eval.$print(<console>)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:606)
>        at 
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>        at 
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
>        at 
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>        at 
> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>        at 
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>        at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>        at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>        at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>        at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>        at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>        at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>        at org.apache.spark.repl.Main$.main(Main.scala:31)
>        at org.apache.spark.repl.Main.main(Main.scala)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:606)
>        at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>        at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>Caused by: 
>org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: 
>Data Loading failed for table carbon_table
>        at 
> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>        at 
> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>        at 
> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>        at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>        at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:745)
>Caused by: java.lang.NullPointerException
>        at 
> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>        at 
> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>        at 
> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>        ... 10 more
>AUDIT 27-03 02:27:21,453 - [hd21][storm][Thread-1]Data load is failed for 
>default.carbon_table
>ERROR 27-03 02:27:21,453 - main 
>java.lang.Exception: DataLoad failure: Data Loading failed for table 
>carbon_table
>        at 
> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:937)
>        at 
> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579)
>        at 
> org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297)
>        at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>        at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>        at 
> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>        at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>        at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>        at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>        at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>        at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>        at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
>        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
>        at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
>        at 
> $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
>        at 
> $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
>        at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44)
>        at $line23.$read$$iwC$$iwC.<init>(<console>:46)
>        at $line23.$read$$iwC.<init>(<console>:48)
>        at $line23.$read.<init>(<console>:50)
>        at $line23.$read$.<init>(<console>:54)
>        at $line23.$read$.<clinit>(<console>)
>        at $line23.$eval$.<init>(<console>:7)
>        at $line23.$eval$.<clinit>(<console>)
>        at $line23.$eval.$print(<console>)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:606)
>        at 
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>        at 
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
>        at 
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>        at 
> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>        at 
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>        at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>        at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>        at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>        at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>        at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>        at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>        at org.apache.spark.repl.Main$.main(Main.scala:31)
>        at org.apache.spark.repl.Main.main(Main.scala)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:606)
>        at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>        at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>AUDIT 27-03 02:27:21,454 - [hd21][storm][Thread-1]Dataload failure for 
>default.carbon_table. Please check the logs
>java.lang.Exception: DataLoad failure: Data Loading failed for table 
>carbon_table
>        at 
> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:937)
>        at 
> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579)
>        at 
> org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297)
>        at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>        at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>        at 
> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>        at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>        at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>        at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>        at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>        at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>        at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
>        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
>        at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
>        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
>        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
>        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
>        at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
>        at $iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
>        at $iwC$$iwC$$iwC.<init>(<console>:44)
>        at $iwC$$iwC.<init>(<console>:46)
>        at $iwC.<init>(<console>:48)
>        at <init>(<console>:50)
>        at .<init>(<console>:54)
>        at .<clinit>(<console>)
>        at .<init>(<console>:7)
>        at .<clinit>(<console>)
>        at $print(<console>)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:606)
>        at 
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>        at 
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
>        at 
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>        at 
> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>        at 
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>        at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>        at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>        at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>        at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>        at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>        at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>        at org.apache.spark.repl.Main$.main(Main.scala:31)
>        at org.apache.spark.repl.Main.main(Main.scala)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:606)
>        at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>        at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>At 2017-03-27 00:42:28, "a" <ww...@163.com> wrote:
>
> 
>
> Container log : error executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 
> 15: SIGTERM。
> spark log: 17/03/26 23:40:30 ERROR YarnScheduler: Lost executor 2 on hd25: 
> Container killed by YARN for exceeding memory limits. 49.0 GB of 49 GB 
> physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
>The test sql
>
>
>
>
>
>
>
>At 2017-03-26 23:34:36, "a" <ww...@163.com> wrote:
>>
>>
>>I have set the parameters as follow:
>>1、fs.hdfs.impl.disable.cache=true
>>2、dfs.socket.timeout=1800000  (Exception:aused by: java.io.IOException: 
>>Filesystem closed)
>>3、dfs.datanode.socket.write.timeout=3600000
>>4、set carbondata property enable.unsafe.sort=true
>>5、remove BUCKETCOLUMNS property from the create table sql
>>6、set spark job parameter executor-memory=48G (from 20G to 48G)
>>
>>
>>But it  still failed, the error is "executor.CoarseGrainedExecutorBackend: 
>>RECEIVED SIGNAL 15: SIGTERM。"
>>
>>
>>Then i try to insert 40000 0000 records into carbondata table ,it works 
>>success.
>>
>>
>>How can i insert 20 0000 0000 records into carbondata?
>>Should me set  executor-memory big enough? Or Should me generate the csv file 
>>from the hive table first ,then load the csv file into carbon table?
>>Any body give me same help?
>>
>>
>>Regards
>>fish
>>
>>
>>
>>
>>
>>
>>
>>At 2017-03-26 00:34:18, "a" <ww...@163.com> wrote:
>>>Thank you  Ravindra!
>>>Version:
>>>My carbondata version is 1.0,spark version is 1.6.3,hadoop version is 
>>>2.7.1,hive version is 1.1.0
>>>one of the containers log:
>>>17/03/25 22:07:09 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED 
>>>SIGNAL 15: SIGTERM
>>>17/03/25 22:07:09 INFO storage.DiskBlockManager: Shutdown hook called
>>>17/03/25 22:07:09 INFO util.ShutdownHookManager: Shutdown hook called
>>>17/03/25 22:07:09 INFO util.ShutdownHookManager: Deleting directory 
>>>/data1/hadoop/hd_space/tmp/nm-local-dir/usercache/storm/appcache/application_1490340325187_0042/spark-84b305f9-af7b-4f58-a809-700345a84109
>>>17/03/25 22:07:10 ERROR impl.ParallelReadMergeSorterImpl: pool-23-thread-2 
>>>java.io.IOException: Error reading file: 
>>>hdfs://xxxx_table_tmp/dt=2017-01-01/pt=ios/000006_0
>>>        at 
>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1046)
>>>        at 
>>> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.next(OrcRawRecordMerger.java:263)
>>>        at 
>>> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.next(OrcRawRecordMerger.java:547)
>>>        at 
>>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1234)
>>>        at 
>>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1218)
>>>        at 
>>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1150)
>>>        at 
>>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1136)
>>>        at 
>>> org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:249)
>>>        at 
>>> org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:211)
>>>        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>>>        at 
>>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>>>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>>        at 
>>> org.apache.carbondata.spark.rdd.NewRddIterator.hasNext(NewCarbonDataLoadRDD.scala:412)
>>>        at 
>>> org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.internalHasNext(InputProcessorStepImpl.java:163)
>>>        at 
>>> org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.getBatch(InputProcessorStepImpl.java:221)
>>>        at 
>>> org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:183)
>>>        at 
>>> org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:117)
>>>        at 
>>> org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:80)
>>>        at 
>>> org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:73)
>>>        at 
>>> org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:196)
>>>        at 
>>> org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:177)
>>>        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>        at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>        at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>        at java.lang.Thread.run(Thread.java:745)
>>>Caused by: java.io.IOException: Filesystem closed
>>>        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
>>>        at 
>>> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:868)
>>>        at 
>>> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
>>>        at java.io.DataInputStream.readFully(DataInputStream.java:195)
>>>        at 
>>> org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:112)
>>>        at 
>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:228)
>>>        at 
>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:805)
>>>        at 
>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:776)
>>>        at 
>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986)
>>>        at 
>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019)
>>>        at 
>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1042)
>>>        ... 26 more
>>>I will try to set enable.unsafe.sort=true and remove BUCKETCOLUMNS property 
>>>,and try again.
>>>
>>>
>>>At 2017-03-25 20:55:03, "Ravindra Pesala" <ravi.pes...@gmail.com> wrote:
>>>>Hi,
>>>>
>>>>Carbodata launches one job per each node to sort the data at node level and
>>>>avoid shuffling. Internally it uses threads to use parallel load. Please
>>>>use carbon.number.of.cores.while.loading property in carbon.properties file
>>>>and set the number of cores it should use per machine while loading.
>>>>Carbondata sorts the data  at each node level to maintain the Btree for
>>>>each node per segment. It improves the query performance by filtering
>>>>faster if we have Btree at node level instead of each block level.
>>>>
>>>>1.Which version of Carbondata are you using?
>>>>2.There are memory issues in Carbondata-1.0 version and are fixed current
>>>>master.
>>>>3.And you can improve the performance by enabling enable.unsafe.sort=true in
>>>>carbon.properties file. But it is not supported if bucketing of columns are
>>>>enabled. We are planning to support unsafe sort load for bucketing also in
>>>>next version.
>>>>
>>>>Please send the executor log to know about the error you are facing.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>Regards,
>>>>Ravindra
>>>>
>>>>On 25 March 2017 at 16:18, ww...@163.com <ww...@163.com> wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> *0、The failure*
>>>>> When i insert into carbon table,i encounter failure。The failure is  as
>>>>> follow:
>>>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most
>>>>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
>>>>> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
>>>>> Reason: Slave lost+details
>>>>>
>>>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, 
>>>>> most recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): 
>>>>> ExecutorLostFailure (executor 1 exited caused by one of the running 
>>>>> tasks) Reason: Slave lost
>>>>> Driver stacktrace:
>>>>>
>>>>> the stage:
>>>>>
>>>>> *Step:*
>>>>> *1、start spark-shell*
>>>>> ./bin/spark-shell \
>>>>> --master yarn-client \
>>>>> --num-executors 5 \  (I tried to set this parameter range from 10 to
>>>>> 20,but the second job has only 5 tasks)
>>>>> --executor-cores 5 \
>>>>> --executor-memory 20G \
>>>>> --driver-memory 8G \
>>>>> --queue root.default \
>>>>> --jars /xxx.jar
>>>>>
>>>>> //spark-default.conf spark.default.parallelism=320
>>>>>
>>>>> import org.apache.spark.sql.CarbonContext
>>>>> val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore")
>>>>>
>>>>> *2、create table*
>>>>> cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt String,lst
>>>>> String,plat String,sty String,is_pay String,is_vip String,is_mpack
>>>>> String,scene String,status String,nw String,isc String,area String,spttag
>>>>> String,province String,isp String,city String,tv String,hwm String,pip
>>>>> String,fo String,sh String,mid String,user_id String,play_pv Int,spt_cnt
>>>>> Int,prg_spt_cnt Int) row format delimited fields terminated by '|' STORED
>>>>> BY 'carbondata' TBLPROPERTIES ('DICTIONARY_EXCLUDE'='pip,sh,
>>>>> mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty,
>>>>> is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,
>>>>> province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm,
>>>>> pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')")
>>>>>
>>>>> //notes,set "fo" column BUCKETCOLUMNS is to join another table
>>>>> //the column distinct values are as follows:
>>>>>
>>>>>
>>>>> *3、insert into table*(xxxx_table_tmp  is a hive extenal orc table,has 20
>>>>> 0000 0000 records)
>>>>> cc.sql("insert into xxxx_table select dt,pt,lst,plat,sty,is_pay,is_
>>>>> vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,
>>>>> city,tv,hwm,pip,fo,sh,mid,user_id ,play_pv,spt_cnt,prg_spt_cnt from
>>>>> xxxx_table_tmp where dt='2017-01-01'")
>>>>>
>>>>> *4、spark split sql into two jobs,the first finished succeeded, but the
>>>>> second failed:*
>>>>>
>>>>>
>>>>> *5、The second job stage:*
>>>>>
>>>>>
>>>>>
>>>>> *Question:*
>>>>> 1、Why the second job has only five jobs,but the first job has 994 jobs ?(
>>>>> note:My hadoop cluster has 5 datanode)
>>>>>       I guess it caused the failure
>>>>> 2、In the sources,i find DataLoadPartitionCoalescer.class,is it means that
>>>>> "one datanode has only one partition ,and then the task is only one on the
>>>>> datanode"?
>>>>> 3、In the ExampleUtils class,"carbon.table.split.partition.enable" is set
>>>>> as follow,but i can not find "carbon.table.split.partition.enable" in
>>>>> other parts of the project。
>>>>>      I set "carbon.table.split.partition.enable" to true, but the second
>>>>> job has only five jobs.How to use this property?
>>>>>      ExampleUtils :
>>>>>     // whether use table split partition
>>>>>     // true -> use table split partition, support multiple partition
>>>>> loading
>>>>>     // false -> use node split partition, support data load by host
>>>>> partition
>>>>>     
>>>>> CarbonProperties.getInstance().addProperty("carbon.table.split.partition.enable",
>>>>> "false")
>>>>> 4、Insert into carbon table takes 3 hours ,but eventually failed 。How can
>>>>> i speed it.
>>>>> 5、in the spark-shell  ,I tried to set this parameter range from 10 to
>>>>> 20,but the second job has only 5 tasks
>>>>>      the other parameter executor-memory = 20G is enough?
>>>>>
>>>>> I need your help!Thank you very much!
>>>>>
>>>>> ww...@163.com
>>>>>
>>>>> ------------------------------
>>>>> ww...@163.com
>>>>>
>>>>
>>>>
>>>>
>>>>-- 
>>>>Thanks & Regards,
>>>>Ravi

Reply via email to