Re: select return error when filter string column in where clause

2016-12-05 Thread Lu Cao
Attach executor log:

#query successed
cc.sql("select count(*) from default.carbontest_003 where id =
'LSJW26761FS090787'").show

#and 'LSJW26760GS018559',  'LSJW26761ES064611' also successed

#query return error
cc.sql("select count(*) from default.carbontest_003 where id =
'LSJW26763FS088491'").show


===

INFO  06-12 12:20:54,821 - Got assigned task 651

INFO  06-12 12:20:54,822 - Running task 0.0 in stage 51.0 (TID 651)

INFO  06-12 12:20:54,830 - Updating epoch to 12 and clearing cache

INFO  06-12 12:20:54,830 - Started reading broadcast variable 43

INFO  06-12 12:20:54,837 - Block broadcast_43_piece0 stored as bytes
in memory (estimated size 8.6 KB, free 8.6 KB)

INFO  06-12 12:20:54,840 - Reading broadcast variable 43 took 10 ms

INFO  06-12 12:20:54,853 - Block broadcast_43 stored as values in
memory (estimated size 17.7 KB, free 26.3 KB)

INFO  06-12 12:20:54,868 - *carbon.properties

INFO  06-12 12:20:54,869 - [Executor task launch
worker-2][partitionID:003;queryID:4305874916432320_0] Query will be
executed on table: carbontest_003

ERROR 06-12 12:20:54,869 - [Executor task launch
worker-2][partitionID:003;queryID:4305874916432320_0]

java.lang.NullPointerException

at 
org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator.intialiseInfos(AbstractDetailQueryResultIterator.java:117)

at 
org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator.(AbstractDetailQueryResultIterator.java:107)

at 
org.apache.carbondata.scan.result.iterator.DetailQueryResultIterator.(DetailQueryResultIterator.java:43)

at 
org.apache.carbondata.scan.executor.impl.DetailQueryExecutor.execute(DetailQueryExecutor.java:39)

at 
org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.(CarbonScanRDD.scala:216)

at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:192)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)

at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)

at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)

at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)

at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)

at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)

at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)

at org.apache.spark.scheduler.Task.run(Task.scala:89)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

ERROR 06-12 12:20:54,871 - Exception in task 0.0 in stage 51.0 (TID 651)

java.lang.RuntimeException: Exception occurred in query
execution.Please check logs.

at scala.sys.package$.error(package.scala:27)

at 
org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.(CarbonScanRDD.scala:226)

at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:192)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)

at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)

at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)

at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)

at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

at 

Re: select return error when filter string column in where clause

2016-12-05 Thread Ravindra Pesala
Hi,

Please provide table schema, load command and sample data to reproduce this
issue, you may create the JIRA for it.

Regards,
Ravi

On 6 December 2016 at 07:05, Lu Cao  wrote:

> Hi Dev team,
> I have loaded some data into carbondata table. But when I put the id
> column(String type) in where clause it always return error as below:
>
> cc.sql("select to_date(data_date),count(*) from default.carbontest_001
> where id='LSJW26762FS044062' group by to_date(data_date)").show
>
>
>
> ===
> WARN  06-12 09:02:13,763 - Lost task 5.0 in stage 44.0 (TID 687,
> .com): java.lang.RuntimeException: Exception occurred in query
> execution.Please check logs.
> at scala.sys.package$.error(package.scala:27)
> at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.<
> init>(CarbonScanRDD.scala:226)
> at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(
> CarbonScanRDD.scala:192)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(
> ShuffleMapTask.scala:73)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(
> ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> ERROR 06-12 09:02:14,091 - Task 1 in stage 44.0 failed 4 times; aborting
> job
> org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 1 in stage 44.0 failed 4 times, most recent failure: Lost task
> 1.3 in stage 44.0 (TID 694, scsp00258.saicdt.com):
> java.lang.RuntimeException: Exception occurred in query
> execution.Please check logs.
> at scala.sys.package$.error(package.scala:27)
> at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.<
> init>(CarbonScanRDD.scala:226)
> at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(
> CarbonScanRDD.scala:192)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(
> ShuffleMapTask.scala:73)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(
> ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> Driver stacktrace:
> at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> 

select return error when filter string column in where clause

2016-12-05 Thread Lu Cao
Hi Dev team,
I have loaded some data into carbondata table. But when I put the id
column(String type) in where clause it always return error as below:

cc.sql("select to_date(data_date),count(*) from default.carbontest_001
where id='LSJW26762FS044062' group by to_date(data_date)").show



===
WARN  06-12 09:02:13,763 - Lost task 5.0 in stage 44.0 (TID 687,
.com): java.lang.RuntimeException: Exception occurred in query
execution.Please check logs.
at scala.sys.package$.error(package.scala:27)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.(CarbonScanRDD.scala:226)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:192)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

ERROR 06-12 09:02:14,091 - Task 1 in stage 44.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure:
Task 1 in stage 44.0 failed 4 times, most recent failure: Lost task
1.3 in stage 44.0 (TID 694, scsp00258.saicdt.com):
java.lang.RuntimeException: Exception occurred in query
execution.Please check logs.
at scala.sys.package$.error(package.scala:27)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.(CarbonScanRDD.scala:226)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:192)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at