Looks like the apache mail server filtered the log attachment again........
>>>>>>>>>>>>> INFO 13-12 17:16:39,940 - main Query [SELECT VIN, COUNT(*) FROM DEFAULT.MYCARBON_00001 WHERE VIN='LSJW26765GS056837' GROUP BY VIN] INFO 13-12 17:16:39,945 - Parsing command: select vin, count(*) from default.mycarbon_00001 where vin='LSJW26765GS056837' group by vin INFO 13-12 17:16:39,946 - Parse Completed INFO 13-12 17:16:39,948 - Parsing command: select vin, count(*) from default.mycarbon_00001 where vin='LSJW26765GS056837' group by vin INFO 13-12 17:16:39,949 - Parse Completed INFO 13-12 17:16:39,951 - 0: get_table : db=default tbl=mycarbon_00001 INFO 13-12 17:16:39,951 - ugi=lucao ip=unknown-ip-addr cmd=get_table : db=default tbl=mycarbon_00001 res10: org.apache.spark.sql.DataFrame = [vin: string, _c1: bigint] scala> res10.show INFO 13-12 17:16:44,840 - main Starting to optimize plan INFO 13-12 17:16:44,863 - Cleaned accumulator 20 INFO 13-12 17:16:44,864 - Removed broadcast_14_piece0 on localhost:59141 in memory (size: 10.2 KB, free: 143.2 MB) INFO 13-12 17:16:44,865 - Cleaned accumulator 32 INFO 13-12 17:16:44,866 - Cleaned shuffle 2 INFO 13-12 17:16:44,866 - Cleaned accumulator 28 INFO 13-12 17:16:44,866 - Cleaned accumulator 27 INFO 13-12 17:16:44,866 - Cleaned accumulator 26 INFO 13-12 17:16:44,866 - Cleaned accumulator 25 INFO 13-12 17:16:44,866 - Cleaned accumulator 24 INFO 13-12 17:16:44,866 - Cleaned accumulator 23 INFO 13-12 17:16:44,866 - Cleaned accumulator 22 INFO 13-12 17:16:44,866 - Cleaned accumulator 21 INFO 13-12 17:16:44,910 - main ************************Total Number Rows In BTREE: 1 INFO 13-12 17:16:44,911 - main Total Time in retrieving the data reference nodeafter scanning the btree 0 Total number of data reference node for executing filter(s) 1 INFO 13-12 17:16:44,912 - main Total Time taken to ensure the required executors : 1 INFO 13-12 17:16:44,912 - main Time elapsed to allocate the required executors : 0 INFO 13-12 17:16:44,912 - main No.Of Blocks before Blocklet distribution: 1 INFO 13-12 17:16:44,912 - main No.Of Blocks after Blocklet distribution: 1 INFO 13-12 17:16:45,030 - Identified no.of.Blocks: 1,parallelism: 8 , no.of.nodes: 1, no.of.tasks: 1 INFO 13-12 17:16:45,030 - Node : localhost, No.Of Blocks : 1 INFO 13-12 17:16:45,048 - Starting job: show at <console>:42 INFO 13-12 17:16:45,048 - Registering RDD 44 (show at <console>:42) INFO 13-12 17:16:45,049 - Got job 9 (show at <console>:42) with 1 output partitions INFO 13-12 17:16:45,049 - Final stage: ResultStage 15 (show at <console>:42) INFO 13-12 17:16:45,049 - Parents of final stage: List(ShuffleMapStage 14) INFO 13-12 17:16:45,049 - Missing parents: List(ShuffleMapStage 14) INFO 13-12 17:16:45,049 - Submitting ShuffleMapStage 14 (MapPartitionsRDD[44] at show at <console>:42), which has no missing parents INFO 13-12 17:16:45,051 - Block broadcast_15 stored as values in memory (estimated size 18.3 KB, free 55.3 KB) INFO 13-12 17:16:45,052 - Block broadcast_15_piece0 stored as bytes in memory (estimated size 8.8 KB, free 64.1 KB) INFO 13-12 17:16:45,052 - Added broadcast_15_piece0 in memory on localhost:59141 (size: 8.8 KB, free: 143.2 MB) INFO 13-12 17:16:45,052 - Created broadcast 15 from broadcast at DAGScheduler.scala:1006 INFO 13-12 17:16:45,052 - Submitting 1 missing tasks from ShuffleMapStage 14 (MapPartitionsRDD[44] at show at <console>:42) INFO 13-12 17:16:45,052 - Adding task set 14.0 with 1 tasks INFO 13-12 17:16:45,053 - Starting task 0.0 in stage 14.0 (TID 212, localhost, partition 0,ANY, 4677 bytes) INFO 13-12 17:16:45,054 - Running task 0.0 in stage 14.0 (TID 212) INFO 13-12 17:16:45,056 - *************************/Users/lucao/MyDev/spark-1.6.0-bin-hadoop2.6/conf/carbon.properties INFO 13-12 17:16:45,056 - [Executor task launch worker-11][partitionID:00001;queryID:340277307449972_0] Query will be executed on table: mycarbon_00001 ERROR 13-12 17:16:45,059 - [Executor task launch worker-11][partitionID:00001;queryID:340277307449972_0] java.lang.NullPointerException at org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator.intialiseInfos(AbstractDetailQueryResultIterator.java:117) at org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator.<init>(AbstractDetailQueryResultIterator.java:107) at org.apache.carbondata.scan.result.iterator.DetailQueryResultIterator.<init>(DetailQueryResultIterator.java:43) at org.apache.carbondata.scan.executor.impl.DetailQueryExecutor.execute(DetailQueryExecutor.java:39) at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.<init>(CarbonScanRDD.scala:216) at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:192) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ERROR 13-12 17:16:45,060 - Exception in task 0.0 in stage 14.0 (TID 212) java.lang.RuntimeException: Exception occurred in query execution.Please check logs. at scala.sys.package$.error(package.scala:27) at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.<init>(CarbonScanRDD.scala:226) at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:192) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) WARN 13-12 17:16:45,062 - Lost task 0.0 in stage 14.0 (TID 212, localhost): java.lang.RuntimeException: Exception occurred in query execution.Please check logs. at scala.sys.package$.error(package.scala:27) at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.<init>(CarbonScanRDD.scala:226) at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:192) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ERROR 13-12 17:16:45,062 - Task 0 in stage 14.0 failed 1 times; aborting job INFO 13-12 17:16:45,062 - Removed TaskSet 14.0, whose tasks have all completed, from pool INFO 13-12 17:16:45,063 - Cancelling stage 14 INFO 13-12 17:16:45,063 - ShuffleMapStage 14 (show at <console>:42) failed in 0.010 s INFO 13-12 17:16:45,063 - Job 9 failed: show at <console>:42, took 0.015582 s org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 14.0 failed 1 times, most recent failure: Lost task 0.0 in stage 14.0 (TID 212, localhost): java.lang.RuntimeException: Exception occurred in query execution.Please check logs. at scala.sys.package$.error(package.scala:27) at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.<init>(CarbonScanRDD.scala:226) at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:192) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:212) at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:165) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174) at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1538) at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1538) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56) at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2125) at org.apache.spark.sql.DataFrame.org $apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1537) at org.apache.spark.sql.DataFrame.org $apache$spark$sql$DataFrame$$collect(DataFrame.scala:1544) at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1414) at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1413) at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2138) at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:171) at org.apache.spark.sql.DataFrame.show(DataFrame.scala:394) at org.apache.spark.sql.DataFrame.show(DataFrame.scala:355) at org.apache.spark.sql.DataFrame.show(DataFrame.scala:363) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:47) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:49) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:55) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:57) at $iwC$$iwC$$iwC.<init>(<console>:59) at $iwC$$iwC.<init>(<console>:61) at $iwC.<init>(<console>:63) at <init>(<console>:65) at .<init>(<console>:69) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org $apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.RuntimeException: Exception occurred in query execution.Please check logs. at scala.sys.package$.error(package.scala:27) at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.<init>(CarbonScanRDD.scala:226) at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:192) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) On Wed, Dec 14, 2016 at 10:18 AM, Lu Cao <whuca...@gmail.com> wrote: > Hi, > I just uploaded the data file to Baidu: > 链接: https://pan.baidu.com/s/1slERWL3 > 密码: m7kj > > Thanks, > Lionel > > On Wed, Dec 14, 2016 at 10:12 AM, Lu Cao <whuca...@gmail.com> wrote: > >> Hi Dev team, >> As discussed this afternoon, I've changed back to 0.2.0 version for the >> testing. Please ignore the former email about "error when save DF to >> carbondata file", that's on master branch. >> >> Spark version: 1.6.0 >> System: Mac OS X EI Capitan(10.11.6) >> >> [lucao]$ spark-shell --master local[*] --total-executor-cores 2 >> --executor-memory 1g --num-executors 2 --jars ~/MyDev/hive-1.1.1/lib/mysql-c >> onnector-java-5.1.40-bin.jar >> >> In 0.2.0, I can successfully create table and load data into carbondata >> table >> >> scala> cc.sql("create table if not exists default.mycarbon_00001(vin >> String, data_date String, work_model Double) stored by 'carbondata'") >> >> scala> cc.sql("load data inpath'test2.csv' into table >> default.mycarbon_00001") >> >> I can successfully run below query: >> >> scala> cc.sql("select vin, count(*) from default.mycarbon_00001 group >> by vin").show >> >> INFO 13-12 17:13:42,215 - Job 5 finished: show at <console>:42, took >> 0.732793 s >> >> +-----------------+---+ >> >> | vin|_c1| >> >> +-----------------+---+ >> >> |LSJW26760ES065247|464| >> >> |LSJW26760GS018559|135| >> >> |LSJW26761ES064611|104| >> >> |LSJW26761FS090787| 45| >> >> |LSJW26762ES051513| 40| >> >> |LSJW26762FS075036|434| >> >> |LSJW26763ES052363| 32| >> >> |LSJW26763FS088491|305| >> >> |LSJW26764ES064859|186| >> >> |LSJW26764FS078696| 40| >> >> |LSJW26765ES058651|171| >> >> |LSJW26765FS072633|191| >> >> |LSJW26765GS056837|467| >> >> |LSJW26766FS070308| 79| >> >> |LSJW26766GS050853|300| >> >> |LSJW26767FS069913| 8| >> >> |LSJW26767GS053454|286| >> >> |LSJW26768FS062811| 16| >> >> |LSJW26768GS051146| 97| >> >> |LSJW26769FS062722|424| >> >> +-----------------+---+ >> >> only showing top 20 rows >> >> The error occurred when I add "vin" column into where clause: >> >> scala> cc.sql("select vin, count(*) from default.mycarbon_00001 where >> vin='LSJW26760ES065247' group by vin") >> >> +-----------------+---+ >> >> | vin|_c1| >> >> +-----------------+---+ >> >> |LSJW26760ES065247|464| >> >> +-----------------+---+ >> >> >>> This one is OK... Actually as I tested, the *first two value* in the >> top 20 rows usually successed but for most of others it will return error. >> >> For example : >> >> scala> cc.sql("select vin, count(*) from default.mycarbon_00001 where >> vin='LSJW26765GS056837' group by vin").show >> >> >>>Log is coming: >> >> <carbontest_lucao_20161213.log> >> >> >> It is the same error I met at Dec. 6th. As I said in the WeChat Group >> before: >> >> When the data set is 1000 rows, no above error occurred. >> >> When the data set is 1M rows, some returned error, some didn't. >> >> When the data set is 1.9 billion, all tests returned error. >> >> >> *### Attached the sample data set (1M rows) for your reference.* >> >> <<........I sent this email yesterday afternoon but it was rejected by >> apache mail server due to larger than 1000000 bytes, so remove the >> sample data file from attachment, if you need it please reply your personal >> email address........>> >> >> Looking forward to your response. >> >> >> Thanks & Best Regards, >> >> Lionel >> > >