Hi, It is little weird, I tried to reproduce this issue but I am not successful. Can you make sure that latest jar is updated in all the datanodes and driver. There may be possibility that old jar is still referring in either driver or in datanode.
Regards, Ravindra On 27 March 2017 at 01:40, a <ww...@163.com> wrote: > I download the newest sourcecode (master) and compile,generate the jar > carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar > Then i use spark2.1 test again.The error logs are as follow: > > > Container log : > 17/03/27 02:27:21 ERROR newflow.DataLoadExecutor: Executor task launch > worker-9 Data Loading failed for table carbon_table > java.lang.NullPointerException > at org.apache.carbondata.processing.newflow. > DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java: > 158) > at org.apache.carbondata.processing.newflow. > DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) > at org.apache.carbondata.processing.newflow. > DataLoadExecutor.execute(DataLoadExecutor.java:43) > at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$ > anon$2.<init>(NewCarbonDataLoadRDD.scala:365) > at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD. > compute(NewCarbonDataLoadRDD.scala:322) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask. > scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run( > Executor.scala:227) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 17/03/27 02:27:21 INFO rdd.NewDataFrameLoaderRDD: DataLoad failure > org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: > Data Loading failed for table carbon_table > at org.apache.carbondata.processing.newflow. > DataLoadExecutor.execute(DataLoadExecutor.java:54) > at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$ > anon$2.<init>(NewCarbonDataLoadRDD.scala:365) > at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD. > compute(NewCarbonDataLoadRDD.scala:322) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask. > scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run( > Executor.scala:227) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at org.apache.carbondata.processing.newflow. > DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java: > 158) > at org.apache.carbondata.processing.newflow. > DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) > at org.apache.carbondata.processing.newflow. > DataLoadExecutor.execute(DataLoadExecutor.java:43) > ... 10 more > 17/03/27 02:27:21 ERROR rdd.NewDataFrameLoaderRDD: Executor task launch > worker-9 > org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: > Data Loading failed for table carbon_table > at org.apache.carbondata.processing.newflow. > DataLoadExecutor.execute(DataLoadExecutor.java:54) > at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$ > anon$2.<init>(NewCarbonDataLoadRDD.scala:365) > at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD. > compute(NewCarbonDataLoadRDD.scala:322) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask. > scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run( > Executor.scala:227) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at org.apache.carbondata.processing.newflow. > DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java: > 158) > at org.apache.carbondata.processing.newflow. > DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) > at org.apache.carbondata.processing.newflow. > DataLoadExecutor.execute(DataLoadExecutor.java:43) > ... 10 more > 17/03/27 02:27:21 ERROR executor.Executor: Exception in task 0.3 in stage > 2.0 (TID 538) > org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: > Data Loading failed for table carbon_table > at org.apache.carbondata.processing.newflow. > DataLoadExecutor.execute(DataLoadExecutor.java:54) > at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$ > anon$2.<init>(NewCarbonDataLoadRDD.scala:365) > at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD. > compute(NewCarbonDataLoadRDD.scala:322) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask. > scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run( > Executor.scala:227) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at org.apache.carbondata.processing.newflow. > DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java: > 158) > at org.apache.carbondata.processing.newflow. > DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) > at org.apache.carbondata.processing.newflow. > DataLoadExecutor.execute(DataLoadExecutor.java:43) > ... 10 more > > > > Spark log: > > ERROR 27-03 02:27:21,407 - Task 0 in stage 2.0 failed 4 times; aborting job > ERROR 27-03 02:27:21,419 - main load data frame failed > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage > 2.0 (TID 538, hd25): > org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: > Data Loading failed for table carbon_table > at org.apache.carbondata.processing.newflow. > DataLoadExecutor.execute(DataLoadExecutor.java:54) > at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$ > anon$2.<init>(NewCarbonDataLoadRDD.scala:365) > at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD. > compute(NewCarbonDataLoadRDD.scala:322) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask. > scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run( > Executor.scala:227) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at org.apache.carbondata.processing.newflow. > DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java: > 158) > at org.apache.carbondata.processing.newflow. > DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) > at org.apache.carbondata.processing.newflow. > DataLoadExecutor.execute(DataLoadExecutor.java:43) > ... 10 more > > > Driver stacktrace: > at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$ > scheduler$DAGScheduler$$failJobAndIndependentStages( > DAGScheduler.scala:1431) > at org.apache.spark.scheduler.DAGScheduler$$anonfun$ > abortStage$1.apply(DAGScheduler.scala:1419) > at org.apache.spark.scheduler.DAGScheduler$$anonfun$ > abortStage$1.apply(DAGScheduler.scala:1418) > at scala.collection.mutable.ResizableArray$class.foreach( > ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach( > ArrayBuffer.scala:47) > at org.apache.spark.scheduler.DAGScheduler.abortStage( > DAGScheduler.scala:1418) > at org.apache.spark.scheduler.DAGScheduler$$anonfun$ > handleTaskSetFailed$1.apply(DAGScheduler.scala:799) > at org.apache.spark.scheduler.DAGScheduler$$anonfun$ > handleTaskSetFailed$1.apply(DAGScheduler.scala:799) > at scala.Option.foreach(Option.scala:236) > at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed( > DAGScheduler.scala:799) > at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop. > doOnReceive(DAGScheduler.scala:1640) > at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop. > onReceive(DAGScheduler.scala:1599) > at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop. > onReceive(DAGScheduler.scala:1588) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at org.apache.spark.scheduler.DAGScheduler.runJob( > DAGScheduler.scala:620) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) > at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD. > scala:927) > at org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:150) > at org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:111) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) > at org.apache.spark.rdd.RDD.collect(RDD.scala:926) > at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$. > loadDataFrame$1(CarbonDataRDDFactory.scala:665) > at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$. > loadCarbonData(CarbonDataRDDFactory.scala:794) > at org.apache.spark.sql.execution.command.LoadTable. > run(carbonTableSchema.scala:579) > at org.apache.spark.sql.execution.command.LoadTableByInsert.run( > carbonTableSchema.scala:297) > at org.apache.spark.sql.execution.ExecutedCommand. > sideEffectResult$lzycompute(commands.scala:58) > at org.apache.spark.sql.execution.ExecutedCommand. > sideEffectResult(commands.scala:56) > at org.apache.spark.sql.execution.ExecutedCommand. > doExecute(commands.scala:70) > at org.apache.spark.sql.execution.SparkPlan$$anonfun$ > execute$5.apply(SparkPlan.scala:132) > at org.apache.spark.sql.execution.SparkPlan$$anonfun$ > execute$5.apply(SparkPlan.scala:130) > at org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute( > SparkPlan.scala:130) > at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute( > QueryExecution.scala:55) > at org.apache.spark.sql.execution.QueryExecution. > toRdd(QueryExecution.scala:55) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130) > at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139) > at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init> > (<console>:31) > at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(< > console>:36) > at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console> > :38) > at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40) > at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42) > at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44) > at $line23.$read$$iwC$$iwC.<init>(<console>:46) > at $line23.$read$$iwC.<init>(<console>:48) > at $line23.$read.<init>(<console>:50) > at $line23.$read$.<init>(<console>:54) > at $line23.$read$.<clinit>(<console>) > at $line23.$eval$.<init>(<console>:7) > at $line23.$eval$.<clinit>(<console>) > at $line23.$eval.$print(<console>) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call( > SparkIMain.scala:1065) > at org.apache.spark.repl.SparkIMain$Request.loadAndRun( > SparkIMain.scala:1346) > at org.apache.spark.repl.SparkIMain.loadAndRunReq$1( > SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret( > SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret( > SparkIMain.scala:819) > at org.apache.spark.repl.SparkILoop.reallyInterpret$1( > SparkILoop.scala:857) > at org.apache.spark.repl.SparkILoop.interpretStartingWith( > SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1( > SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1( > SparkILoop.scala:665) > at org.apache.spark.repl.SparkILoop.org$apache$spark$ > repl$SparkILoop$$loop(SparkILoop.scala:670) > at org.apache.spark.repl.SparkILoop$$anonfun$org$ > apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at org.apache.spark.repl.SparkILoop$$anonfun$org$ > apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop$$anonfun$org$ > apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader( > ScalaClassLoader.scala:135) > at org.apache.spark.repl.SparkILoop.org$apache$spark$ > repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$ > deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at org.apache.spark.deploy.SparkSubmit$.doRunMain$1( > SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit( > SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit. > scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: > org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: > Data Loading failed for table carbon_table > at org.apache.carbondata.processing.newflow. > DataLoadExecutor.execute(DataLoadExecutor.java:54) > at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$ > anon$2.<init>(NewCarbonDataLoadRDD.scala:365) > at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD. > compute(NewCarbonDataLoadRDD.scala:322) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask. > scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run( > Executor.scala:227) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at org.apache.carbondata.processing.newflow. > DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java: > 158) > at org.apache.carbondata.processing.newflow. > DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) > at org.apache.carbondata.processing.newflow. > DataLoadExecutor.execute(DataLoadExecutor.java:43) > ... 10 more > ERROR 27-03 02:27:21,422 - main > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage > 2.0 (TID 538, hd25): > org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: > Data Loading failed for table carbon_table > at org.apache.carbondata.processing.newflow. > DataLoadExecutor.execute(DataLoadExecutor.java:54) > at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$ > anon$2.<init>(NewCarbonDataLoadRDD.scala:365) > at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD. > compute(NewCarbonDataLoadRDD.scala:322) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask. > scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run( > Executor.scala:227) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at org.apache.carbondata.processing.newflow. > DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java: > 158) > at org.apache.carbondata.processing.newflow. > DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) > at org.apache.carbondata.processing.newflow. > DataLoadExecutor.execute(DataLoadExecutor.java:43) > ... 10 more > > > Driver stacktrace: > at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$ > scheduler$DAGScheduler$$failJobAndIndependentStages( > DAGScheduler.scala:1431) > at org.apache.spark.scheduler.DAGScheduler$$anonfun$ > abortStage$1.apply(DAGScheduler.scala:1419) > at org.apache.spark.scheduler.DAGScheduler$$anonfun$ > abortStage$1.apply(DAGScheduler.scala:1418) > at scala.collection.mutable.ResizableArray$class.foreach( > ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach( > ArrayBuffer.scala:47) > at org.apache.spark.scheduler.DAGScheduler.abortStage( > DAGScheduler.scala:1418) > at org.apache.spark.scheduler.DAGScheduler$$anonfun$ > handleTaskSetFailed$1.apply(DAGScheduler.scala:799) > at org.apache.spark.scheduler.DAGScheduler$$anonfun$ > handleTaskSetFailed$1.apply(DAGScheduler.scala:799) > at scala.Option.foreach(Option.scala:236) > at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed( > DAGScheduler.scala:799) > at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop. > doOnReceive(DAGScheduler.scala:1640) > at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop. > onReceive(DAGScheduler.scala:1599) > at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop. > onReceive(DAGScheduler.scala:1588) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at org.apache.spark.scheduler.DAGScheduler.runJob( > DAGScheduler.scala:620) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) > at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD. > scala:927) > at org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:150) > at org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:111) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) > at org.apache.spark.rdd.RDD.collect(RDD.scala:926) > at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$. > loadDataFrame$1(CarbonDataRDDFactory.scala:665) > at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$. > loadCarbonData(CarbonDataRDDFactory.scala:794) > at org.apache.spark.sql.execution.command.LoadTable. > run(carbonTableSchema.scala:579) > at org.apache.spark.sql.execution.command.LoadTableByInsert.run( > carbonTableSchema.scala:297) > at org.apache.spark.sql.execution.ExecutedCommand. > sideEffectResult$lzycompute(commands.scala:58) > at org.apache.spark.sql.execution.ExecutedCommand. > sideEffectResult(commands.scala:56) > at org.apache.spark.sql.execution.ExecutedCommand. > doExecute(commands.scala:70) > at org.apache.spark.sql.execution.SparkPlan$$anonfun$ > execute$5.apply(SparkPlan.scala:132) > at org.apache.spark.sql.execution.SparkPlan$$anonfun$ > execute$5.apply(SparkPlan.scala:130) > at org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute( > SparkPlan.scala:130) > at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute( > QueryExecution.scala:55) > at org.apache.spark.sql.execution.QueryExecution. > toRdd(QueryExecution.scala:55) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130) > at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139) > at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init> > (<console>:31) > at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(< > console>:36) > at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console> > :38) > at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40) > at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42) > at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44) > at $line23.$read$$iwC$$iwC.<init>(<console>:46) > at $line23.$read$$iwC.<init>(<console>:48) > at $line23.$read.<init>(<console>:50) > at $line23.$read$.<init>(<console>:54) > at $line23.$read$.<clinit>(<console>) > at $line23.$eval$.<init>(<console>:7) > at $line23.$eval$.<clinit>(<console>) > at $line23.$eval.$print(<console>) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call( > SparkIMain.scala:1065) > at org.apache.spark.repl.SparkIMain$Request.loadAndRun( > SparkIMain.scala:1346) > at org.apache.spark.repl.SparkIMain.loadAndRunReq$1( > SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret( > SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret( > SparkIMain.scala:819) > at org.apache.spark.repl.SparkILoop.reallyInterpret$1( > SparkILoop.scala:857) > at org.apache.spark.repl.SparkILoop.interpretStartingWith( > SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1( > SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1( > SparkILoop.scala:665) > at org.apache.spark.repl.SparkILoop.org$apache$spark$ > repl$SparkILoop$$loop(SparkILoop.scala:670) > at org.apache.spark.repl.SparkILoop$$anonfun$org$ > apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at org.apache.spark.repl.SparkILoop$$anonfun$org$ > apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop$$anonfun$org$ > apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader( > ScalaClassLoader.scala:135) > at org.apache.spark.repl.SparkILoop.org$apache$spark$ > repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$ > deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at org.apache.spark.deploy.SparkSubmit$.doRunMain$1( > SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit( > SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit. > scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: > org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: > Data Loading failed for table carbon_table > at org.apache.carbondata.processing.newflow. > DataLoadExecutor.execute(DataLoadExecutor.java:54) > at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$ > anon$2.<init>(NewCarbonDataLoadRDD.scala:365) > at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD. > compute(NewCarbonDataLoadRDD.scala:322) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask. > scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run( > Executor.scala:227) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at org.apache.carbondata.processing.newflow. > DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java: > 158) > at org.apache.carbondata.processing.newflow. > DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60) > at org.apache.carbondata.processing.newflow. > DataLoadExecutor.execute(DataLoadExecutor.java:43) > ... 10 more > AUDIT 27-03 02:27:21,453 - [hd21][storm][Thread-1]Data load is failed for > default.carbon_table > ERROR 27-03 02:27:21,453 - main > java.lang.Exception: DataLoad failure: Data Loading failed for table > carbon_table > at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$. > loadCarbonData(CarbonDataRDDFactory.scala:937) > at org.apache.spark.sql.execution.command.LoadTable. > run(carbonTableSchema.scala:579) > at org.apache.spark.sql.execution.command.LoadTableByInsert.run( > carbonTableSchema.scala:297) > at org.apache.spark.sql.execution.ExecutedCommand. > sideEffectResult$lzycompute(commands.scala:58) > at org.apache.spark.sql.execution.ExecutedCommand. > sideEffectResult(commands.scala:56) > at org.apache.spark.sql.execution.ExecutedCommand. > doExecute(commands.scala:70) > at org.apache.spark.sql.execution.SparkPlan$$anonfun$ > execute$5.apply(SparkPlan.scala:132) > at org.apache.spark.sql.execution.SparkPlan$$anonfun$ > execute$5.apply(SparkPlan.scala:130) > at org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute( > SparkPlan.scala:130) > at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute( > QueryExecution.scala:55) > at org.apache.spark.sql.execution.QueryExecution. > toRdd(QueryExecution.scala:55) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130) > at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139) > at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init> > (<console>:31) > at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(< > console>:36) > at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console> > :38) > at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40) > at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42) > at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44) > at $line23.$read$$iwC$$iwC.<init>(<console>:46) > at $line23.$read$$iwC.<init>(<console>:48) > at $line23.$read.<init>(<console>:50) > at $line23.$read$.<init>(<console>:54) > at $line23.$read$.<clinit>(<console>) > at $line23.$eval$.<init>(<console>:7) > at $line23.$eval$.<clinit>(<console>) > at $line23.$eval.$print(<console>) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call( > SparkIMain.scala:1065) > at org.apache.spark.repl.SparkIMain$Request.loadAndRun( > SparkIMain.scala:1346) > at org.apache.spark.repl.SparkIMain.loadAndRunReq$1( > SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret( > SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret( > SparkIMain.scala:819) > at org.apache.spark.repl.SparkILoop.reallyInterpret$1( > SparkILoop.scala:857) > at org.apache.spark.repl.SparkILoop.interpretStartingWith( > SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1( > SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1( > SparkILoop.scala:665) > at org.apache.spark.repl.SparkILoop.org$apache$spark$ > repl$SparkILoop$$loop(SparkILoop.scala:670) > at org.apache.spark.repl.SparkILoop$$anonfun$org$ > apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at org.apache.spark.repl.SparkILoop$$anonfun$org$ > apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop$$anonfun$org$ > apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader( > ScalaClassLoader.scala:135) > at org.apache.spark.repl.SparkILoop.org$apache$spark$ > repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$ > deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at org.apache.spark.deploy.SparkSubmit$.doRunMain$1( > SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit( > SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit. > scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > AUDIT 27-03 02:27:21,454 - [hd21][storm][Thread-1]Dataload failure for > default.carbon_table. Please check the logs > java.lang.Exception: DataLoad failure: Data Loading failed for table > carbon_table > at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$. > loadCarbonData(CarbonDataRDDFactory.scala:937) > at org.apache.spark.sql.execution.command.LoadTable. > run(carbonTableSchema.scala:579) > at org.apache.spark.sql.execution.command.LoadTableByInsert.run( > carbonTableSchema.scala:297) > at org.apache.spark.sql.execution.ExecutedCommand. > sideEffectResult$lzycompute(commands.scala:58) > at org.apache.spark.sql.execution.ExecutedCommand. > sideEffectResult(commands.scala:56) > at org.apache.spark.sql.execution.ExecutedCommand. > doExecute(commands.scala:70) > at org.apache.spark.sql.execution.SparkPlan$$anonfun$ > execute$5.apply(SparkPlan.scala:132) > at org.apache.spark.sql.execution.SparkPlan$$anonfun$ > execute$5.apply(SparkPlan.scala:130) > at org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute( > SparkPlan.scala:130) > at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute( > QueryExecution.scala:55) > at org.apache.spark.sql.execution.QueryExecution. > toRdd(QueryExecution.scala:55) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130) > at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38) > at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40) > at $iwC$$iwC$$iwC$$iwC.<init>(<console>:42) > at $iwC$$iwC$$iwC.<init>(<console>:44) > at $iwC$$iwC.<init>(<console>:46) > at $iwC.<init>(<console>:48) > at <init>(<console>:50) > at .<init>(<console>:54) > at .<clinit>(<console>) > at .<init>(<console>:7) > at .<clinit>(<console>) > at $print(<console>) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call( > SparkIMain.scala:1065) > at org.apache.spark.repl.SparkIMain$Request.loadAndRun( > SparkIMain.scala:1346) > at org.apache.spark.repl.SparkIMain.loadAndRunReq$1( > SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret( > SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret( > SparkIMain.scala:819) > at org.apache.spark.repl.SparkILoop.reallyInterpret$1( > SparkILoop.scala:857) > at org.apache.spark.repl.SparkILoop.interpretStartingWith( > SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1( > SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1( > SparkILoop.scala:665) > at org.apache.spark.repl.SparkILoop.org$apache$spark$ > repl$SparkILoop$$loop(SparkILoop.scala:670) > at org.apache.spark.repl.SparkILoop$$anonfun$org$ > apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at org.apache.spark.repl.SparkILoop$$anonfun$org$ > apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop$$anonfun$org$ > apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader( > ScalaClassLoader.scala:135) > at org.apache.spark.repl.SparkILoop.org$apache$spark$ > repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$ > deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at org.apache.spark.deploy.SparkSubmit$.doRunMain$1( > SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit( > SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit. > scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > At 2017-03-27 00:42:28, "a" <ww...@163.com> wrote: > > > > Container log : error executor.CoarseGrainedExecutorBackend: RECEIVED > SIGNAL 15: SIGTERM。 > spark log: 17/03/26 23:40:30 ERROR YarnScheduler: Lost executor 2 on > hd25: Container killed by YARN for exceeding memory limits. 49.0 GB of 49 > GB physical memory used. Consider boosting spark.yarn.executor. > memoryOverhead. > The test sql > > > > > > > > At 2017-03-26 23:34:36, "a" <ww...@163.com> wrote: > > > > > >I have set the parameters as follow: > >1、fs.hdfs.impl.disable.cache=true > >2、dfs.socket.timeout=1800000 (Exception:aused by: java.io.IOException: > Filesystem closed) > >3、dfs.datanode.socket.write.timeout=3600000 > >4、set carbondata property enable.unsafe.sort=true > >5、remove BUCKETCOLUMNS property from the create table sql > >6、set spark job parameter executor-memory=48G (from 20G to 48G) > > > > > >But it still failed, the error is "executor.CoarseGrainedExecutorBackend: > RECEIVED SIGNAL 15: SIGTERM。" > > > > > >Then i try to insert 40000 0000 records into carbondata table ,it works > success. > > > > > >How can i insert 20 0000 0000 records into carbondata? > >Should me set executor-memory big enough? Or Should me generate the csv > file from the hive table first ,then load the csv file into carbon table? > >Any body give me same help? > > > > > >Regards > >fish > > > > > > > > > > > > > > > >At 2017-03-26 00:34:18, "a" <ww...@163.com> wrote: > >>Thank you Ravindra! > >>Version: > >>My carbondata version is 1.0,spark version is 1.6.3,hadoop version is > 2.7.1,hive version is 1.1.0 > >>one of the containers log: > >>17/03/25 22:07:09 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED > SIGNAL 15: SIGTERM > >>17/03/25 22:07:09 INFO storage.DiskBlockManager: Shutdown hook called > >>17/03/25 22:07:09 INFO util.ShutdownHookManager: Shutdown hook called > >>17/03/25 22:07:09 INFO util.ShutdownHookManager: Deleting directory > /data1/hadoop/hd_space/tmp/nm-local-dir/usercache/storm/ > appcache/application_1490340325187_0042/spark-84b305f9-af7b-4f58-a809- > 700345a84109 > >>17/03/25 22:07:10 ERROR impl.ParallelReadMergeSorterImpl: > pool-23-thread-2 > >>java.io.IOException: Error reading file: hdfs://xxxx_table_tmp/dt=2017- > 01-01/pt=ios/000006_0 > >> at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next( > RecordReaderImpl.java:1046) > >> at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ > OriginalReaderPair.next(OrcRawRecordMerger.java:263) > >> at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.next( > OrcRawRecordMerger.java:547) > >> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next( > OrcInputFormat.java:1234) > >> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next( > OrcInputFormat.java:1218) > >> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ > NullKeyRecordReader.next(OrcInputFormat.java:1150) > >> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ > NullKeyRecordReader.next(OrcInputFormat.java:1136) > >> at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext( > HadoopRDD.scala:249) > >> at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext( > HadoopRDD.scala:211) > >> at org.apache.spark.util.NextIterator.hasNext( > NextIterator.scala:73) > >> at org.apache.spark.InterruptibleIterator.hasNext( > InterruptibleIterator.scala:39) > >> at scala.collection.Iterator$$anon$11.hasNext(Iterator. > scala:327) > >> at scala.collection.Iterator$$anon$11.hasNext(Iterator. > scala:327) > >> at scala.collection.Iterator$$anon$11.hasNext(Iterator. > scala:327) > >> at org.apache.carbondata.spark.rdd.NewRddIterator.hasNext( > NewCarbonDataLoadRDD.scala:412) > >> at org.apache.carbondata.processing.newflow.steps. > InputProcessorStepImpl$InputProcessorIterator.internalHasNext( > InputProcessorStepImpl.java:163) > >> at org.apache.carbondata.processing.newflow.steps. > InputProcessorStepImpl$InputProcessorIterator.getBatch( > InputProcessorStepImpl.java:221) > >> at org.apache.carbondata.processing.newflow.steps. > InputProcessorStepImpl$InputProcessorIterator.next( > InputProcessorStepImpl.java:183) > >> at org.apache.carbondata.processing.newflow.steps. > InputProcessorStepImpl$InputProcessorIterator.next( > InputProcessorStepImpl.java:117) > >> at org.apache.carbondata.processing.newflow.steps. > DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl > .java:80) > >> at org.apache.carbondata.processing.newflow.steps. > DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl > .java:73) > >> at org.apache.carbondata.processing.newflow.sort.impl. > ParallelReadMergeSorterImpl$SortIteratorThread.call( > ParallelReadMergeSorterImpl.java:196) > >> at org.apache.carbondata.processing.newflow.sort.impl. > ParallelReadMergeSorterImpl$SortIteratorThread.call( > ParallelReadMergeSorterImpl.java:177) > >> at java.util.concurrent.FutureTask.run(FutureTask.java:262) > >> at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > >> at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > >> at java.lang.Thread.run(Thread.java:745) > >>Caused by: java.io.IOException: Filesystem closed > >> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient. > java:808) > >> at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy( > DFSInputStream.java:868) > >> at org.apache.hadoop.hdfs.DFSInputStream.read( > DFSInputStream.java:934) > >> at java.io.DataInputStream.readFully(DataInputStream.java:195) > >> at org.apache.hadoop.hive.ql.io.orc.MetadataReader. > readStripeFooter(MetadataReader.java:112) > >> at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl. > readStripeFooter(RecordReaderImpl.java:228) > >> at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl. > beginReadStripe(RecordReaderImpl.java:805) > >> at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl. > readStripe(RecordReaderImpl.java:776) > >> at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl. > advanceStripe(RecordReaderImpl.java:986) > >> at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl. > advanceToNextRow(RecordReaderImpl.java:1019) > >> at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next( > RecordReaderImpl.java:1042) > >> ... 26 more > >>I will try to set enable.unsafe.sort=true and remove BUCKETCOLUMNS > property ,and try again. > >> > >> > >>At 2017-03-25 20:55:03, "Ravindra Pesala" <ravi.pes...@gmail.com> wrote: > >>>Hi, > >>> > >>>Carbodata launches one job per each node to sort the data at node level > and > >>>avoid shuffling. Internally it uses threads to use parallel load. Please > >>>use carbon.number.of.cores.while.loading property in carbon.properties > file > >>>and set the number of cores it should use per machine while loading. > >>>Carbondata sorts the data at each node level to maintain the Btree for > >>>each node per segment. It improves the query performance by filtering > >>>faster if we have Btree at node level instead of each block level. > >>> > >>>1.Which version of Carbondata are you using? > >>>2.There are memory issues in Carbondata-1.0 version and are fixed > current > >>>master. > >>>3.And you can improve the performance by enabling > enable.unsafe.sort=true in > >>>carbon.properties file. But it is not supported if bucketing of columns > are > >>>enabled. We are planning to support unsafe sort load for bucketing also > in > >>>next version. > >>> > >>>Please send the executor log to know about the error you are facing. > >>> > >>> > >>> > >>> > >>> > >>> > >>>Regards, > >>>Ravindra > >>> > >>>On 25 March 2017 at 16:18, ww...@163.com <ww...@163.com> wrote: > >>> > >>>> Hello! > >>>> > >>>> *0、The failure* > >>>> When i insert into carbon table,i encounter failure。The failure is as > >>>> follow: > >>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, > most > >>>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): > >>>> ExecutorLostFailure (executor 1 exited caused by one of the running > tasks) > >>>> Reason: Slave lost+details > >>>> > >>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, > most recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): > ExecutorLostFailure (executor 1 exited caused by one of the running tasks) > Reason: Slave lost > >>>> Driver stacktrace: > >>>> > >>>> the stage: > >>>> > >>>> *Step:* > >>>> *1、start spark-shell* > >>>> ./bin/spark-shell \ > >>>> --master yarn-client \ > >>>> --num-executors 5 \ (I tried to set this parameter range from 10 to > >>>> 20,but the second job has only 5 tasks) > >>>> --executor-cores 5 \ > >>>> --executor-memory 20G \ > >>>> --driver-memory 8G \ > >>>> --queue root.default \ > >>>> --jars /xxx.jar > >>>> > >>>> //spark-default.conf spark.default.parallelism=320 > >>>> > >>>> import org.apache.spark.sql.CarbonContext > >>>> val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore") > >>>> > >>>> *2、create table* > >>>> cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt String,lst > >>>> String,plat String,sty String,is_pay String,is_vip String,is_mpack > >>>> String,scene String,status String,nw String,isc String,area > String,spttag > >>>> String,province String,isp String,city String,tv String,hwm String,pip > >>>> String,fo String,sh String,mid String,user_id String,play_pv > Int,spt_cnt > >>>> Int,prg_spt_cnt Int) row format delimited fields terminated by '|' > STORED > >>>> BY 'carbondata' TBLPROPERTIES ('DICTIONARY_EXCLUDE'='pip,sh, > >>>> mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty, > >>>> is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag, > >>>> province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm, > >>>> pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')") > >>>> > >>>> //notes,set "fo" column BUCKETCOLUMNS is to join another table > >>>> //the column distinct values are as follows: > >>>> > >>>> > >>>> *3、insert into table*(xxxx_table_tmp is a hive extenal orc table,has > 20 > >>>> 0000 0000 records) > >>>> cc.sql("insert into xxxx_table select dt,pt,lst,plat,sty,is_pay,is_ > >>>> vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp, > >>>> city,tv,hwm,pip,fo,sh,mid,user_id ,play_pv,spt_cnt,prg_spt_cnt from > >>>> xxxx_table_tmp where dt='2017-01-01'") > >>>> > >>>> *4、spark split sql into two jobs,the first finished succeeded, but the > >>>> second failed:* > >>>> > >>>> > >>>> *5、The second job stage:* > >>>> > >>>> > >>>> > >>>> *Question:* > >>>> 1、Why the second job has only five jobs,but the first job has 994 > jobs ?( > >>>> note:My hadoop cluster has 5 datanode) > >>>> I guess it caused the failure > >>>> 2、In the sources,i find DataLoadPartitionCoalescer.class,is it means > that > >>>> "one datanode has only one partition ,and then the task is only one > on the > >>>> datanode"? > >>>> 3、In the ExampleUtils class,"carbon.table.split.partition.enable" is > set > >>>> as follow,but i can not find "carbon.table.split.partition.enable" in > >>>> other parts of the project。 > >>>> I set "carbon.table.split.partition.enable" to true, but the > second > >>>> job has only five jobs.How to use this property? > >>>> ExampleUtils : > >>>> // whether use table split partition > >>>> // true -> use table split partition, support multiple partition > >>>> loading > >>>> // false -> use node split partition, support data load by host > >>>> partition > >>>> CarbonProperties.getInstance().addProperty("carbon.table. > split.partition.enable", > >>>> "false") > >>>> 4、Insert into carbon table takes 3 hours ,but eventually failed 。How > can > >>>> i speed it. > >>>> 5、in the spark-shell ,I tried to set this parameter range from 10 to > >>>> 20,but the second job has only 5 tasks > >>>> the other parameter executor-memory = 20G is enough? > >>>> > >>>> I need your help!Thank you very much! > >>>> > >>>> ww...@163.com > >>>> > >>>> ------------------------------ > >>>> ww...@163.com > >>>> > >>> > >>> > >>> > >>>-- > >>>Thanks & Regards, > >>>Ravi > -- Thanks & Regards, Ravi