Hi Please enable vector , it might help limit query.
import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.core.constants.CarbonCommonConstants CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_VECTOR_READER, "true") Regards Liang a wrote > TEST SQL : > 高基数随机查询 > select * From carbon_table where dt='2017-01-01' and user_id='XXXX' limit > 100; > > > 高基数随机查询like > select * From carbon_table where dt='2017-01-01' and fo like '%YYYY%' > limit 100; > > > 低基数随机查询 > select * From carbon_table where dt='2017-01-01' and plat='android' and > tv='8400' limit 100 > > > 1维度查询 > select province,sum(play_pv) play_pv ,sum(spt_cnt) spt_cnt > from carbon_table where dt='2017-01-01' and sty='AAAA' > group by province > > > 2维度查询 > select province,city,sum(play_pv) play_pv ,sum(spt_cnt) spt_cnt > from carbon_table where dt='2017-01-01' and sty='AAAA' > group by province,city > > > 3维度查询 > select province,city,isp,sum(play_pv) play_pv ,sum(spt_cnt) spt_cnt > from carbon_table where dt='2017-01-01' and sty='AAAA' > group by province,city,isp > > > 多维度查询 > select sty,isc,status,nw,tv,area,province,city,isp,sum(play_pv) > play_pv_sum ,sum(spt_cnt) spt_cnt_sum > from carbon_table where dt='2017-01-01' and sty='AAAA' > group by sty,isc,status,nw,tv,area,province,city,isp > > > distinct 单列 > select tv, count(distinct user_id) > from carbon_table where dt='2017-01-01' and sty='AAAA' and fo like > '%YYYY%' group by tv > > > distinct 多列 > select count(distinct user_id) ,count(distinct mid),count(distinct case > when sty='AAAA' then mid end) > from carbon_table where dt='2017-01-01' and sty='AAAA' > > > 排序查询 > select user_id,sum(play_pv) play_pv_sum > from carbon_table > group by user_id > order by play_pv_sum desc limit 100 > > > 简单join查询 > select b.fo_level1,b.fo_level2,sum(a.play_pv) play_pv_sum From > carbon_table a > left join dim_carbon_table b > on a.fo=b.fo and a.dt = b.dt where a.dt = '2017-01-01' group by > b.fo_level1,b.fo_level2 > > At 2017-03-27 04:10:04, "a" < > wwyxg@ > > wrote: >>I download the newest sourcecode (master) and compile,generate the jar carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar >>Then i use spark2.1 test again.The error logs are as follow: >> >> >> Container log : >>17/03/27 02:27:21 ERROR newflow.DataLoadExecutor: Executor task launch worker-9 Data Loading failed for table carbon_table >>java.lang.NullPointerException >> at >> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158) >> at >> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) >> at >> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43) >> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2. > <init> > (NewCarbonDataLoadRDD.scala:365) >> at >> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >>17/03/27 02:27:21 INFO rdd.NewDataFrameLoaderRDD: DataLoad failure >>org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table >> at >> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54) >> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2. > <init> > (NewCarbonDataLoadRDD.scala:365) >> at >> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >>Caused by: java.lang.NullPointerException >> at >> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158) >> at >> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) >> at >> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43) >> ... 10 more >>17/03/27 02:27:21 ERROR rdd.NewDataFrameLoaderRDD: Executor task launch worker-9 >>org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table >> at >> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54) >> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2. > <init> > (NewCarbonDataLoadRDD.scala:365) >> at >> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >>Caused by: java.lang.NullPointerException >> at >> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158) >> at >> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) >> at >> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43) >> ... 10 more >>17/03/27 02:27:21 ERROR executor.Executor: Exception in task 0.3 in stage 2.0 (TID 538) >>org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table >> at >> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54) >> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2. > <init> > (NewCarbonDataLoadRDD.scala:365) >> at >> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >>Caused by: java.lang.NullPointerException >> at >> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158) >> at >> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) >> at >> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43) >> ... 10 more >> >> >> >>Spark log: >> >>ERROR 27-03 02:27:21,407 - Task 0 in stage 2.0 failed 4 times; aborting job >>ERROR 27-03 02:27:21,419 - main load data frame failed >>org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 538, hd25): org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table >> at >> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54) >> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2. > <init> > (NewCarbonDataLoadRDD.scala:365) >> at >> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >>Caused by: java.lang.NullPointerException >> at >> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158) >> at >> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) >> at >> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43) >> ... 10 more >> >> >>Driver stacktrace: >> at >> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) >> at >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >> at >> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >> at >> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) >> at scala.Option.foreach(Option.scala:236) >> at >> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) >> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) >> at >> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) >> at >> org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) >> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) >> at org.apache.spark.rdd.RDD.collect(RDD.scala:926) >> at >> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadDataFrame$1(CarbonDataRDDFactory.scala:665) >> at >> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:794) >> at >> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579) >> at >> org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297) >> at >> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) >> at >> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) >> at >> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) >> at >> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) >> at >> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) >> at >> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) >> at >> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) >> at >> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) >> at org.apache.spark.sql.DataFrame. > <init> > (DataFrame.scala:145) >> at org.apache.spark.sql.DataFrame. > <init> > (DataFrame.scala:130) >> at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139) >> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC. > <init> > ( > <console> > :31) >> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC. > <init> > ( > <console> > :36) >> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC. > <init> > ( > <console> > :38) >> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC. > <init> > ( > <console> > :40) >> at $line23.$read$$iwC$$iwC$$iwC$$iwC. > <init> > ( > <console> > :42) >> at $line23.$read$$iwC$$iwC$$iwC. > <init> > ( > <console> > :44) >> at $line23.$read$$iwC$$iwC. > <init> > ( > <console> > :46) >> at $line23.$read$$iwC. > <init> > ( > <console> > :48) >> at $line23.$read. > <init> > ( > <console> > :50) >> at $line23.$read$. > <init> > ( > <console> > :54) >> at $line23.$read$. > <clinit> > ( > <console> > ) >> at $line23.$eval$. > <init> > ( > <console> > :7) >> at $line23.$eval$. > <clinit> > ( > <console> > ) >> at $line23.$eval.$print( > <console> > ) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) >> at >> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) >> at >> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) >> at >> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) >> at >> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) >> at >> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) >> at >> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) >> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) >> at >> org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) >> at >> org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) >> at >> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) >> at >> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) >> at >> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >> at >> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >> at >> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) >> at >> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) >> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) >> at org.apache.spark.repl.Main$.main(Main.scala:31) >> at org.apache.spark.repl.Main.main(Main.scala) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) >> at >> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) >> at >> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) >> at >> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>Caused by: org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table >> at >> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54) >> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2. > <init> > (NewCarbonDataLoadRDD.scala:365) >> at >> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >>Caused by: java.lang.NullPointerException >> at >> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158) >> at >> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) >> at >> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43) >> ... 10 more >>ERROR 27-03 02:27:21,422 - main >>org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 538, hd25): org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table >> at >> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54) >> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2. > <init> > (NewCarbonDataLoadRDD.scala:365) >> at >> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >>Caused by: java.lang.NullPointerException >> at >> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158) >> at >> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) >> at >> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43) >> ... 10 more >> >> >>Driver stacktrace: >> at >> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) >> at >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >> at >> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >> at >> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) >> at scala.Option.foreach(Option.scala:236) >> at >> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) >> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) >> at >> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) >> at >> org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) >> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) >> at org.apache.spark.rdd.RDD.collect(RDD.scala:926) >> at >> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadDataFrame$1(CarbonDataRDDFactory.scala:665) >> at >> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:794) >> at >> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579) >> at >> org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297) >> at >> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) >> at >> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) >> at >> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) >> at >> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) >> at >> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) >> at >> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) >> at >> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) >> at >> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) >> at org.apache.spark.sql.DataFrame. > <init> > (DataFrame.scala:145) >> at org.apache.spark.sql.DataFrame. > <init> > (DataFrame.scala:130) >> at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139) >> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC. > <init> > ( > <console> > :31) >> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC. > <init> > ( > <console> > :36) >> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC. > <init> > ( > <console> > :38) >> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC. > <init> > ( > <console> > :40) >> at $line23.$read$$iwC$$iwC$$iwC$$iwC. > <init> > ( > <console> > :42) >> at $line23.$read$$iwC$$iwC$$iwC. > <init> > ( > <console> > :44) >> at $line23.$read$$iwC$$iwC. > <init> > ( > <console> > :46) >> at $line23.$read$$iwC. > <init> > ( > <console> > :48) >> at $line23.$read. > <init> > ( > <console> > :50) >> at $line23.$read$. > <init> > ( > <console> > :54) >> at $line23.$read$. > <clinit> > ( > <console> > ) >> at $line23.$eval$. > <init> > ( > <console> > :7) >> at $line23.$eval$. > <clinit> > ( > <console> > ) >> at $line23.$eval.$print( > <console> > ) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) >> at >> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) >> at >> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) >> at >> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) >> at >> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) >> at >> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) >> at >> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) >> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) >> at >> org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) >> at >> org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) >> at >> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) >> at >> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) >> at >> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >> at >> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >> at >> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) >> at >> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) >> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) >> at org.apache.spark.repl.Main$.main(Main.scala:31) >> at org.apache.spark.repl.Main.main(Main.scala) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) >> at >> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) >> at >> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) >> at >> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>Caused by: org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table >> at >> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54) >> at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2. > <init> > (NewCarbonDataLoadRDD.scala:365) >> at >> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >>Caused by: java.lang.NullPointerException >> at >> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158) >> at >> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) >> at >> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43) >> ... 10 more >>AUDIT 27-03 02:27:21,453 - [hd21][storm][Thread-1]Data load is failed for default.carbon_table >>ERROR 27-03 02:27:21,453 - main >>java.lang.Exception: DataLoad failure: Data Loading failed for table carbon_table >> at >> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:937) >> at >> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579) >> at >> org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297) >> at >> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) >> at >> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) >> at >> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) >> at >> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) >> at >> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) >> at >> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) >> at >> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) >> at >> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) >> at org.apache.spark.sql.DataFrame. > <init> > (DataFrame.scala:145) >> at org.apache.spark.sql.DataFrame. > <init> > (DataFrame.scala:130) >> at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139) >> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC. > <init> > ( > <console> > :31) >> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC. > <init> > ( > <console> > :36) >> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC. > <init> > ( > <console> > :38) >> at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC. > <init> > ( > <console> > :40) >> at $line23.$read$$iwC$$iwC$$iwC$$iwC. > <init> > ( > <console> > :42) >> at $line23.$read$$iwC$$iwC$$iwC. > <init> > ( > <console> > :44) >> at $line23.$read$$iwC$$iwC. > <init> > ( > <console> > :46) >> at $line23.$read$$iwC. > <init> > ( > <console> > :48) >> at $line23.$read. > <init> > ( > <console> > :50) >> at $line23.$read$. > <init> > ( > <console> > :54) >> at $line23.$read$. > <clinit> > ( > <console> > ) >> at $line23.$eval$. > <init> > ( > <console> > :7) >> at $line23.$eval$. > <clinit> > ( > <console> > ) >> at $line23.$eval.$print( > <console> > ) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) >> at >> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) >> at >> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) >> at >> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) >> at >> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) >> at >> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) >> at >> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) >> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) >> at >> org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) >> at >> org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) >> at >> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) >> at >> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) >> at >> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >> at >> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >> at >> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) >> at >> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) >> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) >> at org.apache.spark.repl.Main$.main(Main.scala:31) >> at org.apache.spark.repl.Main.main(Main.scala) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) >> at >> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) >> at >> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) >> at >> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>AUDIT 27-03 02:27:21,454 - [hd21][storm][Thread-1]Dataload failure for default.carbon_table. Please check the logs >>java.lang.Exception: DataLoad failure: Data Loading failed for table carbon_table >> at >> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:937) >> at >> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579) >> at >> org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297) >> at >> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) >> at >> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) >> at >> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) >> at >> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) >> at >> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) >> at >> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) >> at >> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) >> at >> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) >> at org.apache.spark.sql.DataFrame. > <init> > (DataFrame.scala:145) >> at org.apache.spark.sql.DataFrame. > <init> > (DataFrame.scala:130) >> at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139) >> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC. > <init> > ( > <console> > :31) >> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC. > <init> > ( > <console> > :36) >> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC. > <init> > ( > <console> > :38) >> at $iwC$$iwC$$iwC$$iwC$$iwC. > <init> > ( > <console> > :40) >> at $iwC$$iwC$$iwC$$iwC. > <init> > ( > <console> > :42) >> at $iwC$$iwC$$iwC. > <init> > ( > <console> > :44) >> at $iwC$$iwC. > <init> > ( > <console> > :46) >> at $iwC. > <init> > ( > <console> > :48) >> at > <init> > ( > <console> > :50) >> at . > <init> > ( > <console> > :54) >> at . > <clinit> > ( > <console> > ) >> at . > <init> > ( > <console> > :7) >> at . > <clinit> > ( > <console> > ) >> at $print( > <console> > ) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) >> at >> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) >> at >> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) >> at >> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) >> at >> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) >> at >> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) >> at >> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) >> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) >> at >> org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) >> at >> org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) >> at >> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) >> at >> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) >> at >> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >> at >> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >> at >> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) >> at >> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) >> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) >> at org.apache.spark.repl.Main$.main(Main.scala:31) >> at org.apache.spark.repl.Main.main(Main.scala) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) >> at >> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) >> at >> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) >> at >> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> >>At 2017-03-27 00:42:28, "a" < > wwyxg@ > > wrote: >> >> >> >> Container log : error executor.CoarseGrainedExecutorBackend: RECEIVED >> SIGNAL 15: SIGTERM。 >> spark log: 17/03/26 23:40:30 ERROR YarnScheduler: Lost executor 2 on >> hd25: Container killed by YARN for exceeding memory limits. 49.0 GB of 49 >> GB physical memory used. Consider boosting >> spark.yarn.executor.memoryOverhead. >>The test sql >> >> >> >> >> >> >> >>At 2017-03-26 23:34:36, "a" < > wwyxg@ > > wrote: >>> >>> >>>I have set the parameters as follow: >>>1、fs.hdfs.impl.disable.cache=true >>>2、dfs.socket.timeout=1800000 (Exception:aused by: java.io.IOException: Filesystem closed) >>>3、dfs.datanode.socket.write.timeout=3600000 >>>4、set carbondata property enable.unsafe.sort=true >>>5、remove BUCKETCOLUMNS property from the create table sql >>>6、set spark job parameter executor-memory=48G (from 20G to 48G) >>> >>> >>>But it still failed, the error is "executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM。" >>> >>> >>>Then i try to insert 40000 0000 records into carbondata table ,it works success. >>> >>> >>>How can i insert 20 0000 0000 records into carbondata? >>>Should me set executor-memory big enough? Or Should me generate the csv file from the hive table first ,then load the csv file into carbon table? >>>Any body give me same help? >>> >>> >>>Regards >>>fish >>> >>> >>> >>> >>> >>> >>> >>>At 2017-03-26 00:34:18, "a" < > wwyxg@ > > wrote: >>>>Thank you Ravindra! >>>>Version: >>>>My carbondata version is 1.0,spark version is 1.6.3,hadoop version is 2.7.1,hive version is 1.1.0 >>>>one of the containers log: >>>>17/03/25 22:07:09 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM >>>>17/03/25 22:07:09 INFO storage.DiskBlockManager: Shutdown hook called >>>>17/03/25 22:07:09 INFO util.ShutdownHookManager: Shutdown hook called >>>>17/03/25 22:07:09 INFO util.ShutdownHookManager: Deleting directory /data1/hadoop/hd_space/tmp/nm-local-dir/usercache/storm/appcache/application_1490340325187_0042/spark-84b305f9-af7b-4f58-a809-700345a84109 >>>>17/03/25 22:07:10 ERROR impl.ParallelReadMergeSorterImpl: pool-23-thread-2 >>>>java.io.IOException: Error reading file: hdfs://xxxx_table_tmp/dt=2017-01-01/pt=ios/000006_0 >>>> at >>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1046) >>>> at >>>> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.next(OrcRawRecordMerger.java:263) >>>> at >>>> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.next(OrcRawRecordMerger.java:547) >>>> at >>>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1234) >>>> at >>>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1218) >>>> at >>>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1150) >>>> at >>>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1136) >>>> at >>>> org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:249) >>>> at >>>> org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:211) >>>> at >>>> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) >>>> at >>>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) >>>> at >>>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >>>> at >>>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >>>> at >>>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >>>> at >>>> org.apache.carbondata.spark.rdd.NewRddIterator.hasNext(NewCarbonDataLoadRDD.scala:412) >>>> at >>>> org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.internalHasNext(InputProcessorStepImpl.java:163) >>>> at >>>> org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.getBatch(InputProcessorStepImpl.java:221) >>>> at >>>> org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:183) >>>> at >>>> org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:117) >>>> at >>>> org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:80) >>>> at >>>> org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:73) >>>> at >>>> org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:196) >>>> at >>>> org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:177) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>> at java.lang.Thread.run(Thread.java:745) >>>>Caused by: java.io.IOException: Filesystem closed >>>> at >>>> org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808) >>>> at >>>> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:868) >>>> at >>>> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934) >>>> at java.io.DataInputStream.readFully(DataInputStream.java:195) >>>> at >>>> org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:112) >>>> at >>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:228) >>>> at >>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:805) >>>> at >>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:776) >>>> at >>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986) >>>> at >>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019) >>>> at >>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1042) >>>> ... 26 more >>>>I will try to set enable.unsafe.sort=true and remove BUCKETCOLUMNS property ,and try again. >>>> >>>> >>>>At 2017-03-25 20:55:03, "Ravindra Pesala" < > ravi.pesala@ > > wrote: >>>>>Hi, >>>>> >>>>>Carbodata launches one job per each node to sort the data at node level and >>>>>avoid shuffling. Internally it uses threads to use parallel load. Please >>>>>use carbon.number.of.cores.while.loading property in carbon.properties file >>>>>and set the number of cores it should use per machine while loading. >>>>>Carbondata sorts the data at each node level to maintain the Btree for >>>>>each node per segment. It improves the query performance by filtering >>>>>faster if we have Btree at node level instead of each block level. >>>>> >>>>>1.Which version of Carbondata are you using? >>>>>2.There are memory issues in Carbondata-1.0 version and are fixed current >>>>>master. >>>>>3.And you can improve the performance by enabling enable.unsafe.sort=true in >>>>>carbon.properties file. But it is not supported if bucketing of columns are >>>>>enabled. We are planning to support unsafe sort load for bucketing also in >>>>>next version. >>>>> >>>>>Please send the executor log to know about the error you are facing. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>Regards, >>>>>Ravindra >>>>> >>>>>On 25 March 2017 at 16:18, > wwyxg@ > < > wwyxg@ > > wrote: >>>>> >>>>>> Hello! >>>>>> >>>>>> *0、The failure* >>>>>> When i insert into carbon table,i encounter failure。The failure is >>>>>> as >>>>>> follow: >>>>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, >>>>>> most >>>>>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): >>>>>> ExecutorLostFailure (executor 1 exited caused by one of the running >>>>>> tasks) >>>>>> Reason: Slave lost+details >>>>>> >>>>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, >>>>>> most recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): >>>>>> ExecutorLostFailure (executor 1 exited caused by one of the running >>>>>> tasks) Reason: Slave lost >>>>>> Driver stacktrace: >>>>>> >>>>>> the stage: >>>>>> >>>>>> *Step:* >>>>>> *1、start spark-shell* >>>>>> ./bin/spark-shell \ >>>>>> --master yarn-client \ >>>>>> --num-executors 5 \ (I tried to set this parameter range from 10 to >>>>>> 20,but the second job has only 5 tasks) >>>>>> --executor-cores 5 \ >>>>>> --executor-memory 20G \ >>>>>> --driver-memory 8G \ >>>>>> --queue root.default \ >>>>>> --jars /xxx.jar >>>>>> >>>>>> //spark-default.conf spark.default.parallelism=320 >>>>>> >>>>>> import org.apache.spark.sql.CarbonContext >>>>>> val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore") >>>>>> >>>>>> *2、create table* >>>>>> cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt >>>>>> String,lst >>>>>> String,plat String,sty String,is_pay String,is_vip String,is_mpack >>>>>> String,scene String,status String,nw String,isc String,area >>>>>> String,spttag >>>>>> String,province String,isp String,city String,tv String,hwm >>>>>> String,pip >>>>>> String,fo String,sh String,mid String,user_id String,play_pv >>>>>> Int,spt_cnt >>>>>> Int,prg_spt_cnt Int) row format delimited fields terminated by '|' >>>>>> STORED >>>>>> BY 'carbondata' TBLPROPERTIES ('DICTIONARY_EXCLUDE'='pip,sh, >>>>>> mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty, >>>>>> is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag, >>>>>> province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm, >>>>>> pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')") >>>>>> >>>>>> //notes,set "fo" column BUCKETCOLUMNS is to join another table >>>>>> //the column distinct values are as follows: >>>>>> >>>>>> >>>>>> *3、insert into table*(xxxx_table_tmp is a hive extenal orc table,has >>>>>> 20 >>>>>> 0000 0000 records) >>>>>> cc.sql("insert into xxxx_table select dt,pt,lst,plat,sty,is_pay,is_ >>>>>> vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp, >>>>>> city,tv,hwm,pip,fo,sh,mid,user_id ,play_pv,spt_cnt,prg_spt_cnt from >>>>>> xxxx_table_tmp where dt='2017-01-01'") >>>>>> >>>>>> *4、spark split sql into two jobs,the first finished succeeded, but >>>>>> the >>>>>> second failed:* >>>>>> >>>>>> >>>>>> *5、The second job stage:* >>>>>> >>>>>> >>>>>> >>>>>> *Question:* >>>>>> 1、Why the second job has only five jobs,but the first job has 994 >>>>>> jobs ?( >>>>>> note:My hadoop cluster has 5 datanode) >>>>>> I guess it caused the failure >>>>>> 2、In the sources,i find DataLoadPartitionCoalescer.class,is it means >>>>>> that >>>>>> "one datanode has only one partition ,and then the task is only one >>>>>> on the >>>>>> datanode"? >>>>>> 3、In the ExampleUtils class,"carbon.table.split.partition.enable" is >>>>>> set >>>>>> as follow,but i can not find "carbon.table.split.partition.enable" in >>>>>> other parts of the project。 >>>>>> I set "carbon.table.split.partition.enable" to true, but the >>>>>> second >>>>>> job has only five jobs.How to use this property? >>>>>> ExampleUtils : >>>>>> // whether use table split partition >>>>>> // true -> use table split partition, support multiple partition >>>>>> loading >>>>>> // false -> use node split partition, support data load by host >>>>>> partition >>>>>> >>>>>> CarbonProperties.getInstance().addProperty("carbon.table.split.partition.enable", >>>>>> "false") >>>>>> 4、Insert into carbon table takes 3 hours ,but eventually failed 。How >>>>>> can >>>>>> i speed it. >>>>>> 5、in the spark-shell ,I tried to set this parameter range from 10 to >>>>>> 20,but the second job has only 5 tasks >>>>>> the other parameter executor-memory = 20G is enough? >>>>>> >>>>>> I need your help!Thank you very much! >>>>>> >>>>>> > wwyxg@ >>>>>> >>>>>> ------------------------------ >>>>>> > wwyxg@ >>>>>> >>>>> >>>>> >>>>> >>>>>-- >>>>>Thanks & Regards, >>>>>Ravi -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/insert-into-carbon-table-failed-tp9609p9707.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.