TEST SQL : 高基数随机查询 select * From carbon_table where dt='2017-01-01' and user_id='XXXX' limit 100;
高基数随机查询like select * From carbon_table where dt='2017-01-01' and fo like '%YYYY%' limit 100; 低基数随机查询 select * From carbon_table where dt='2017-01-01' and plat='android' and tv='8400' limit 100 1维度查询 select province,sum(play_pv) play_pv ,sum(spt_cnt) spt_cnt from carbon_table where dt='2017-01-01' and sty='AAAA' group by province 2维度查询 select province,city,sum(play_pv) play_pv ,sum(spt_cnt) spt_cnt from carbon_table where dt='2017-01-01' and sty='AAAA' group by province,city 3维度查询 select province,city,isp,sum(play_pv) play_pv ,sum(spt_cnt) spt_cnt from carbon_table where dt='2017-01-01' and sty='AAAA' group by province,city,isp 多维度查询 select sty,isc,status,nw,tv,area,province,city,isp,sum(play_pv) play_pv_sum ,sum(spt_cnt) spt_cnt_sum from carbon_table where dt='2017-01-01' and sty='AAAA' group by sty,isc,status,nw,tv,area,province,city,isp distinct 单列 select tv, count(distinct user_id) from carbon_table where dt='2017-01-01' and sty='AAAA' and fo like '%YYYY%' group by tv distinct 多列 select count(distinct user_id) ,count(distinct mid),count(distinct case when sty='AAAA' then mid end) from carbon_table where dt='2017-01-01' and sty='AAAA' 排序查询 select user_id,sum(play_pv) play_pv_sum from carbon_table group by user_id order by play_pv_sum desc limit 100 简单join查询 select b.fo_level1,b.fo_level2,sum(a.play_pv) play_pv_sum From carbon_table a left join dim_carbon_table b on a.fo=b.fo and a.dt = b.dt where a.dt = '2017-01-01' group by b.fo_level1,b.fo_level2 At 2017-03-27 04:10:04, "a" <ww...@163.com> wrote: >I download the newest sourcecode (master) and compile,generate the jar >carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar >Then i use spark2.1 test again.The error logs are as follow: > > > Container log : >17/03/27 02:27:21 ERROR newflow.DataLoadExecutor: Executor task launch >worker-9 Data Loading failed for table carbon_table >java.lang.NullPointerException > at > org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158) > at > org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) > at > org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43) > at > org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365) > at > org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) >17/03/27 02:27:21 INFO rdd.NewDataFrameLoaderRDD: DataLoad failure >org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: >Data Loading failed for table carbon_table > at > org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54) > at > org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365) > at > org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) >Caused by: java.lang.NullPointerException > at > org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158) > at > org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) > at > org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43) > ... 10 more >17/03/27 02:27:21 ERROR rdd.NewDataFrameLoaderRDD: Executor task launch >worker-9 >org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: >Data Loading failed for table carbon_table > at > org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54) > at > org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365) > at > org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) >Caused by: java.lang.NullPointerException > at > org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158) > at > org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) > at > org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43) > ... 10 more >17/03/27 02:27:21 ERROR executor.Executor: Exception in task 0.3 in stage 2.0 >(TID 538) >org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: >Data Loading failed for table carbon_table > at > org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54) > at > org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365) > at > org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) >Caused by: java.lang.NullPointerException > at > org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158) > at > org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) > at > org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43) > ... 10 more > > > >Spark log: > >ERROR 27-03 02:27:21,407 - Task 0 in stage 2.0 failed 4 times; aborting job >ERROR 27-03 02:27:21,419 - main load data frame failed >org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in >stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID >538, hd25): >org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: >Data Loading failed for table carbon_table > at > org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54) > at > org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365) > at > org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) >Caused by: java.lang.NullPointerException > at > org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158) > at > org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) > at > org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43) > ... 10 more > > >Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) > at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) > at org.apache.spark.rdd.RDD.collect(RDD.scala:926) > at > org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadDataFrame$1(CarbonDataRDDFactory.scala:665) > at > org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:794) > at > org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579) > at > org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130) > at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139) > at > $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31) > at > $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36) > at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38) > at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40) > at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42) > at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44) > at $line23.$read$$iwC$$iwC.<init>(<console>:46) > at $line23.$read$$iwC.<init>(<console>:48) > at $line23.$read.<init>(<console>:50) > at $line23.$read$.<init>(<console>:54) > at $line23.$read$.<clinit>(<console>) > at $line23.$eval$.<init>(<console>:7) > at $line23.$eval$.<clinit>(<console>) > at $line23.$eval.$print(<console>) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >Caused by: >org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: >Data Loading failed for table carbon_table > at > org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54) > at > org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365) > at > org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) >Caused by: java.lang.NullPointerException > at > org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158) > at > org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) > at > org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43) > ... 10 more >ERROR 27-03 02:27:21,422 - main >org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in >stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID >538, hd25): >org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: >Data Loading failed for table carbon_table > at > org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54) > at > org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365) > at > org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) >Caused by: java.lang.NullPointerException > at > org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158) > at > org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) > at > org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43) > ... 10 more > > >Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) > at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) > at org.apache.spark.rdd.RDD.collect(RDD.scala:926) > at > org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadDataFrame$1(CarbonDataRDDFactory.scala:665) > at > org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:794) > at > org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579) > at > org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130) > at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139) > at > $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31) > at > $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36) > at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38) > at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40) > at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42) > at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44) > at $line23.$read$$iwC$$iwC.<init>(<console>:46) > at $line23.$read$$iwC.<init>(<console>:48) > at $line23.$read.<init>(<console>:50) > at $line23.$read$.<init>(<console>:54) > at $line23.$read$.<clinit>(<console>) > at $line23.$eval$.<init>(<console>:7) > at $line23.$eval$.<clinit>(<console>) > at $line23.$eval.$print(<console>) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >Caused by: >org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: >Data Loading failed for table carbon_table > at > org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54) > at > org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365) > at > org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) >Caused by: java.lang.NullPointerException > at > org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158) > at > org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) > at > org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43) > ... 10 more >AUDIT 27-03 02:27:21,453 - [hd21][storm][Thread-1]Data load is failed for >default.carbon_table >ERROR 27-03 02:27:21,453 - main >java.lang.Exception: DataLoad failure: Data Loading failed for table >carbon_table > at > org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:937) > at > org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579) > at > org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130) > at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139) > at > $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31) > at > $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36) > at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38) > at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40) > at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42) > at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44) > at $line23.$read$$iwC$$iwC.<init>(<console>:46) > at $line23.$read$$iwC.<init>(<console>:48) > at $line23.$read.<init>(<console>:50) > at $line23.$read$.<init>(<console>:54) > at $line23.$read$.<clinit>(<console>) > at $line23.$eval$.<init>(<console>:7) > at $line23.$eval$.<clinit>(<console>) > at $line23.$eval.$print(<console>) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >AUDIT 27-03 02:27:21,454 - [hd21][storm][Thread-1]Dataload failure for >default.carbon_table. Please check the logs >java.lang.Exception: DataLoad failure: Data Loading failed for table >carbon_table > at > org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:937) > at > org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579) > at > org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130) > at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38) > at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40) > at $iwC$$iwC$$iwC$$iwC.<init>(<console>:42) > at $iwC$$iwC$$iwC.<init>(<console>:44) > at $iwC$$iwC.<init>(<console>:46) > at $iwC.<init>(<console>:48) > at <init>(<console>:50) > at .<init>(<console>:54) > at .<clinit>(<console>) > at .<init>(<console>:7) > at .<clinit>(<console>) > at $print(<console>) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > >At 2017-03-27 00:42:28, "a" <ww...@163.com> wrote: > > > > Container log : error executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL > 15: SIGTERM。 > spark log: 17/03/26 23:40:30 ERROR YarnScheduler: Lost executor 2 on hd25: > Container killed by YARN for exceeding memory limits. 49.0 GB of 49 GB > physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. >The test sql > > > > > > > >At 2017-03-26 23:34:36, "a" <ww...@163.com> wrote: >> >> >>I have set the parameters as follow: >>1、fs.hdfs.impl.disable.cache=true >>2、dfs.socket.timeout=1800000 (Exception:aused by: java.io.IOException: >>Filesystem closed) >>3、dfs.datanode.socket.write.timeout=3600000 >>4、set carbondata property enable.unsafe.sort=true >>5、remove BUCKETCOLUMNS property from the create table sql >>6、set spark job parameter executor-memory=48G (from 20G to 48G) >> >> >>But it still failed, the error is "executor.CoarseGrainedExecutorBackend: >>RECEIVED SIGNAL 15: SIGTERM。" >> >> >>Then i try to insert 40000 0000 records into carbondata table ,it works >>success. >> >> >>How can i insert 20 0000 0000 records into carbondata? >>Should me set executor-memory big enough? Or Should me generate the csv file >>from the hive table first ,then load the csv file into carbon table? >>Any body give me same help? >> >> >>Regards >>fish >> >> >> >> >> >> >> >>At 2017-03-26 00:34:18, "a" <ww...@163.com> wrote: >>>Thank you Ravindra! >>>Version: >>>My carbondata version is 1.0,spark version is 1.6.3,hadoop version is >>>2.7.1,hive version is 1.1.0 >>>one of the containers log: >>>17/03/25 22:07:09 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED >>>SIGNAL 15: SIGTERM >>>17/03/25 22:07:09 INFO storage.DiskBlockManager: Shutdown hook called >>>17/03/25 22:07:09 INFO util.ShutdownHookManager: Shutdown hook called >>>17/03/25 22:07:09 INFO util.ShutdownHookManager: Deleting directory >>>/data1/hadoop/hd_space/tmp/nm-local-dir/usercache/storm/appcache/application_1490340325187_0042/spark-84b305f9-af7b-4f58-a809-700345a84109 >>>17/03/25 22:07:10 ERROR impl.ParallelReadMergeSorterImpl: pool-23-thread-2 >>>java.io.IOException: Error reading file: >>>hdfs://xxxx_table_tmp/dt=2017-01-01/pt=ios/000006_0 >>> at >>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1046) >>> at >>> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.next(OrcRawRecordMerger.java:263) >>> at >>> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.next(OrcRawRecordMerger.java:547) >>> at >>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1234) >>> at >>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1218) >>> at >>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1150) >>> at >>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1136) >>> at >>> org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:249) >>> at >>> org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:211) >>> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) >>> at >>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) >>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >>> at >>> org.apache.carbondata.spark.rdd.NewRddIterator.hasNext(NewCarbonDataLoadRDD.scala:412) >>> at >>> org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.internalHasNext(InputProcessorStepImpl.java:163) >>> at >>> org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.getBatch(InputProcessorStepImpl.java:221) >>> at >>> org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:183) >>> at >>> org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:117) >>> at >>> org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:80) >>> at >>> org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:73) >>> at >>> org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:196) >>> at >>> org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:177) >>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>>Caused by: java.io.IOException: Filesystem closed >>> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808) >>> at >>> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:868) >>> at >>> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934) >>> at java.io.DataInputStream.readFully(DataInputStream.java:195) >>> at >>> org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:112) >>> at >>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:228) >>> at >>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:805) >>> at >>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:776) >>> at >>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986) >>> at >>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019) >>> at >>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1042) >>> ... 26 more >>>I will try to set enable.unsafe.sort=true and remove BUCKETCOLUMNS property >>>,and try again. >>> >>> >>>At 2017-03-25 20:55:03, "Ravindra Pesala" <ravi.pes...@gmail.com> wrote: >>>>Hi, >>>> >>>>Carbodata launches one job per each node to sort the data at node level and >>>>avoid shuffling. Internally it uses threads to use parallel load. Please >>>>use carbon.number.of.cores.while.loading property in carbon.properties file >>>>and set the number of cores it should use per machine while loading. >>>>Carbondata sorts the data at each node level to maintain the Btree for >>>>each node per segment. It improves the query performance by filtering >>>>faster if we have Btree at node level instead of each block level. >>>> >>>>1.Which version of Carbondata are you using? >>>>2.There are memory issues in Carbondata-1.0 version and are fixed current >>>>master. >>>>3.And you can improve the performance by enabling enable.unsafe.sort=true in >>>>carbon.properties file. But it is not supported if bucketing of columns are >>>>enabled. We are planning to support unsafe sort load for bucketing also in >>>>next version. >>>> >>>>Please send the executor log to know about the error you are facing. >>>> >>>> >>>> >>>> >>>> >>>> >>>>Regards, >>>>Ravindra >>>> >>>>On 25 March 2017 at 16:18, ww...@163.com <ww...@163.com> wrote: >>>> >>>>> Hello! >>>>> >>>>> *0、The failure* >>>>> When i insert into carbon table,i encounter failure。The failure is as >>>>> follow: >>>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most >>>>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): >>>>> ExecutorLostFailure (executor 1 exited caused by one of the running tasks) >>>>> Reason: Slave lost+details >>>>> >>>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, >>>>> most recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): >>>>> ExecutorLostFailure (executor 1 exited caused by one of the running >>>>> tasks) Reason: Slave lost >>>>> Driver stacktrace: >>>>> >>>>> the stage: >>>>> >>>>> *Step:* >>>>> *1、start spark-shell* >>>>> ./bin/spark-shell \ >>>>> --master yarn-client \ >>>>> --num-executors 5 \ (I tried to set this parameter range from 10 to >>>>> 20,but the second job has only 5 tasks) >>>>> --executor-cores 5 \ >>>>> --executor-memory 20G \ >>>>> --driver-memory 8G \ >>>>> --queue root.default \ >>>>> --jars /xxx.jar >>>>> >>>>> //spark-default.conf spark.default.parallelism=320 >>>>> >>>>> import org.apache.spark.sql.CarbonContext >>>>> val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore") >>>>> >>>>> *2、create table* >>>>> cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt String,lst >>>>> String,plat String,sty String,is_pay String,is_vip String,is_mpack >>>>> String,scene String,status String,nw String,isc String,area String,spttag >>>>> String,province String,isp String,city String,tv String,hwm String,pip >>>>> String,fo String,sh String,mid String,user_id String,play_pv Int,spt_cnt >>>>> Int,prg_spt_cnt Int) row format delimited fields terminated by '|' STORED >>>>> BY 'carbondata' TBLPROPERTIES ('DICTIONARY_EXCLUDE'='pip,sh, >>>>> mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty, >>>>> is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag, >>>>> province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm, >>>>> pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')") >>>>> >>>>> //notes,set "fo" column BUCKETCOLUMNS is to join another table >>>>> //the column distinct values are as follows: >>>>> >>>>> >>>>> *3、insert into table*(xxxx_table_tmp is a hive extenal orc table,has 20 >>>>> 0000 0000 records) >>>>> cc.sql("insert into xxxx_table select dt,pt,lst,plat,sty,is_pay,is_ >>>>> vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp, >>>>> city,tv,hwm,pip,fo,sh,mid,user_id ,play_pv,spt_cnt,prg_spt_cnt from >>>>> xxxx_table_tmp where dt='2017-01-01'") >>>>> >>>>> *4、spark split sql into two jobs,the first finished succeeded, but the >>>>> second failed:* >>>>> >>>>> >>>>> *5、The second job stage:* >>>>> >>>>> >>>>> >>>>> *Question:* >>>>> 1、Why the second job has only five jobs,but the first job has 994 jobs ?( >>>>> note:My hadoop cluster has 5 datanode) >>>>> I guess it caused the failure >>>>> 2、In the sources,i find DataLoadPartitionCoalescer.class,is it means that >>>>> "one datanode has only one partition ,and then the task is only one on the >>>>> datanode"? >>>>> 3、In the ExampleUtils class,"carbon.table.split.partition.enable" is set >>>>> as follow,but i can not find "carbon.table.split.partition.enable" in >>>>> other parts of the project。 >>>>> I set "carbon.table.split.partition.enable" to true, but the second >>>>> job has only five jobs.How to use this property? >>>>> ExampleUtils : >>>>> // whether use table split partition >>>>> // true -> use table split partition, support multiple partition >>>>> loading >>>>> // false -> use node split partition, support data load by host >>>>> partition >>>>> >>>>> CarbonProperties.getInstance().addProperty("carbon.table.split.partition.enable", >>>>> "false") >>>>> 4、Insert into carbon table takes 3 hours ,but eventually failed 。How can >>>>> i speed it. >>>>> 5、in the spark-shell ,I tried to set this parameter range from 10 to >>>>> 20,but the second job has only 5 tasks >>>>> the other parameter executor-memory = 20G is enough? >>>>> >>>>> I need your help!Thank you very much! >>>>> >>>>> ww...@163.com >>>>> >>>>> ------------------------------ >>>>> ww...@163.com >>>>> >>>> >>>> >>>> >>>>-- >>>>Thanks & Regards, >>>>Ravi