Re: query err 'NullPointerException' but fine after table cached in memory
Hi, Can u please share executor log. -Regards Kumar Vishal On Thu, Jan 12, 2017 at 1:59 PM, Li Peng wrote: > Hello, > > use carbondata 0.2.0, following is the problem: > > Only one column 'store_id' throws NullPointerException when query, but it > works fine when some value or table is cached in memory. > > store_id's type is int, cardinality is 200 Thousand, is configured about > dictionary and inverted index. > > sql: > select > order_code,saletype,checkout_date,cashier_code,item_cont, > invoice_price,giveamt,saleamt > from store.sale where store_id=28 > > error: > ERROR 12-01 10:40:16,861 - [Executor task launch > worker-0][partitionID:sale;queryID:1438806645368420_0] > java.lang.NullPointerException > at > org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultItera > tor.intialiseInfos(AbstractDetailQueryResultIterator.java:117) > at > org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultItera > tor.(AbstractDetailQueryResultIterator.java:107) > at > org.apache.carbondata.scan.result.iterator.DetailQueryResultIterator.< > init>(DetailQueryResultIterator.java:43) > at > org.apache.carbondata.scan.executor.impl.DetailQueryExecutor.execute( > DetailQueryExecutor.java:39) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.< > init>(CarbonScanRDD.scala:216) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD.compute( > CarbonScanRDD.scala:192) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at org.apache.spark.rdd.MapPartitionsRDD.compute( > MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at org.apache.spark.rdd.MapPartitionsRDD.compute( > MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at org.apache.spark.rdd.MapPartitionsRDD.compute( > MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at org.apache.spark.rdd.MapPartitionsRDD.compute( > MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask. > scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run( > Executor.scala:227) > at > java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > ERROR 12-01 10:40:16,874 - Exception in task 0.1 in stage 0.0 (TID 1) > java.lang.RuntimeException: Exception occurred in query execution.Please > check logs. > at scala.sys.package$.error(package.scala:27) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.< > init>(CarbonScanRDD.scala:226) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD.compute( > CarbonScanRDD.scala:192) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at org.apache.spark.rdd.MapPartitionsRDD.compute( > MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at org.apache.spark.rdd.MapPartitionsRDD.compute( > MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at org.apache.spark.rdd.MapPartitionsRDD.compute( > MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at org.apache.spark.rdd.MapPartitionsRDD.compute( > MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask. > scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run( > Executor.scala:227) > at > java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > -
query err 'NullPointerException' but fine after table cached in memory
Hello, use carbondata 0.2.0, following is the problem: Only one column 'store_id' throws NullPointerException when query, but it works fine when some value or table is cached in memory. store_id's type is int, cardinality is 200 Thousand, is configured about dictionary and inverted index. sql: select order_code,saletype,checkout_date,cashier_code,item_cont,invoice_price,giveamt,saleamt from store.sale where store_id=28 error: ERROR 12-01 10:40:16,861 - [Executor task launch worker-0][partitionID:sale;queryID:1438806645368420_0] java.lang.NullPointerException at org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator.intialiseInfos(AbstractDetailQueryResultIterator.java:117) at org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator.(AbstractDetailQueryResultIterator.java:107) at org.apache.carbondata.scan.result.iterator.DetailQueryResultIterator.(DetailQueryResultIterator.java:43) at org.apache.carbondata.scan.executor.impl.DetailQueryExecutor.execute(DetailQueryExecutor.java:39) at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.(CarbonScanRDD.scala:216) at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:192) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ERROR 12-01 10:40:16,874 - Exception in task 0.1 in stage 0.0 (TID 1) java.lang.RuntimeException: Exception occurred in query execution.Please check logs. at scala.sys.package$.error(package.scala:27) at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.(CarbonScanRDD.scala:226) at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:192) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) --- Same err 'NullPointerException' in following sql: select * from store.sale where store_id=10 select * from store.sale where store_id=11 select * from store.sale where store_id=12 select * from store.sale where store_id=16 select * from store.sale where store_id=10
[jira] [Created] (CARBONDATA-628) Issue when measure selection with out table order gives wrong result with vectorized reader enabled
Ravindra Pesala created CARBONDATA-628: -- Summary: Issue when measure selection with out table order gives wrong result with vectorized reader enabled Key: CARBONDATA-628 URL: https://issues.apache.org/jira/browse/CARBONDATA-628 Project: CarbonData Issue Type: Bug Reporter: Ravindra Pesala Assignee: Ravindra Pesala Priority: Minor If the table is created with measure order like m1, m2 and user selects the measures m2, m1 then it returns wrong result with vectorized reader enabled -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-627) Fix Union unit test case for spark2
QiangCai created CARBONDATA-627: --- Summary: Fix Union unit test case for spark2 Key: CARBONDATA-627 URL: https://issues.apache.org/jira/browse/CARBONDATA-627 Project: CarbonData Issue Type: Bug Components: data-query Affects Versions: 1.0.0-incubating Reporter: QiangCai Assignee: QiangCai Priority: Minor Fix For: 1.0.0-incubating UnionTestCase failed in spark2, We should fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-626) [Dataload] Dataloading is not working with delimiter set as "|"
SOURYAKANTA DWIVEDY created CARBONDATA-626: -- Summary: [Dataload] Dataloading is not working with delimiter set as "|" Key: CARBONDATA-626 URL: https://issues.apache.org/jira/browse/CARBONDATA-626 Project: CarbonData Issue Type: Bug Components: data-load Affects Versions: 1.0.0-incubating Environment: 3 node cluster Reporter: SOURYAKANTA DWIVEDY Description : Data loading fail with delimiter as "|" . Steps: > 1. Create table > 2. Load data into table Log :- - - create table DIM_TERMINAL ( ID int, TAC String, TER_BRAND_NAME String, TER_MODEL_NAME String, TER_MODENAME String, TER_TYPE_ID String, TER_TYPE_NAME_EN String, TER_TYPE_NAME_CHN String, TER_OSTYPE String, TER_OS_TYPE_NAME String, HSPASPEED String, LTESPEED String, VOLTE_FLAG String, flag String ) stored by 'org.apache.carbondata.format' TBLPROPERTIES ('DICTIONARY_INCLUDE'='TAC,TER_BRAND_NAME,TER_MODEL_NAME,TER_MODENAME,TER_TYPE_ID,TER_TYPE_NAME_EN,TER_TYPE_NAME_CHN,TER_OSTYPE,TER_OS_TYPE_NAME,HSPASPEED,LTESPEED,VOLTE_FLAG,flag'); - jdbc:hive2://172.168.100.212:23040> LOAD DATA inpath 'hdfs://hacluster/SEQIQ/IQ_DIM_TERMINAL.csv' INTO table DIM_TERMINAL1 OPTIONS('DELIMITER'='|','USE_KETTLE'='false','QUOTECHAR'='','FILEHEADER'= 'ID,TAC,TER_BRAND_NAME,TER_MODEL_NAME,TER_MODENAME,TER_TYPE_ID,TER_TYPE_NAME_EN,TER_TYPE_NAME_CHN,TER_OSTYPE,TER_OS_TYPE_NAME,HSPASPEED,LTESPEED,VOLTE_FLAG,flag'); Error: java.lang.RuntimeException: Data loading failed. table not found: default.dim_terminal1 (state=,code=0) 0: jdbc:hive2://172.168.100.212:23040> LOAD DATA inpath 'hdfs://hacluster/SEQIQ/IQ_DIM_TERMINAL1.csv' INTO table DIM_TERMINAL OPTIONS('DELIMITER'='|','USE_KETTLE'='false','QUOTECHAR'='','FILEHEADER'= 'ID,TAC,TER_BRAND_NAME,TER_MODEL_NAME,TER_MODENAME,TER_TYPE_ID,TER_TYPE_NAME_EN,TER_TYPE_NAME_CHN,TER_OSTYPE,TER_OS_TYPE_NAME,HSPASPEED,LTESPEED,VOLTE_FLAG,flag'); Error: org.apache.spark.sql.AnalysisException: Reference 'D' is ambiguous, could be: D#4893, D#4907, D#4920, D#4935, D#4952, D#5025, D#5034.; (state=,code=0) - csv raw details : 103880|99000537|MI|2S H1SC 3C|2G/3G|0|SmartPhone|SmartPhone|4|Android|||1| -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [jira] [Created] (CARBONDATA-624) Complete CarbonData document to be present in git and the same needs to sync with the carbondata.apace.org and for further updates.
OK, thank you start this work. One thing please notice : Please only put .md files to github, don't suggest adding other kind of files to github, like pdf,text and so on. Regards Liang -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/jira-Created-CARBONDATA-624-Complete-CarbonData-document-to-be-present-in-git-and-the-same-needs-to--tp5988p6001.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
[jira] [Created] (CARBONDATA-625) Abnormal behaviour of Int datatype
Geetika Gupta created CARBONDATA-625: Summary: Abnormal behaviour of Int datatype Key: CARBONDATA-625 URL: https://issues.apache.org/jira/browse/CARBONDATA-625 Project: CarbonData Issue Type: Bug Components: data-load Affects Versions: 1.0.0-incubating Environment: Spark: 1.6 and hadoop: 2.6.5 Reporter: Geetika Gupta Priority: Minor Attachments: Screenshot from 2017-01-11 18-36-24.png, testMaxValueForBigInt.csv I was trying to create a table having int as a column and loaded data into the table. Data loading was performed successfully but when I viewed the data of the table, there was some wrong data present in the table. I was trying to load BigInt data to an int column. All the data in int column is loaded with the first value of the csv. Below are the details for the queries: create table xyz(a int, b string)stored by 'carbondata'; Data load query: LOAD DATA INPATH 'hdfs://localhost:54311/testFiles/testMaxValueForBigInt.csv' into table xyz OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','FILEHEADER'='a,b'); select query: select * from xyz; PFA the screenshot of the output and the csv file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-624) Complete CarbonData document to be present in git and the same needs to sync with the carbondata.apace.org and for further updates.
Gururaj Shetty created CARBONDATA-624: - Summary: Complete CarbonData document to be present in git and the same needs to sync with the carbondata.apace.org and for further updates. Key: CARBONDATA-624 URL: https://issues.apache.org/jira/browse/CARBONDATA-624 Project: CarbonData Issue Type: Improvement Reporter: Gururaj Shetty The information about CarbonData is there is git and cwiki. So we have to merge all the information and create the markdown files for each topic about CarbonData. This markdown files will be having the complete information about CarbonData like Overview, Installation, Configuration, DDL, DML, Use case and so on. Also these markdown information will be sync to the website documentation - carbondata.apace.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-623) If we drop table after this condition ---(Firstly we load data in table with single pass true and use kettle false and then in same table load data 2nd time with sing
Payal created CARBONDATA-623: Summary: If we drop table after this condition ---(Firstly we load data in table with single pass true and use kettle false and then in same table load data 2nd time with single pass true and use kettle false ), it is throwing Error: java.lang.NullPointerException Key: CARBONDATA-623 URL: https://issues.apache.org/jira/browse/CARBONDATA-623 Project: CarbonData Issue Type: Bug Components: data-load Reporter: Payal 1.Firstly we load data in table with single pass true and use kettle false data load successfully and we are getting result set properly. 2.then in same table load data in table with single pass true and use kettle false data load successfully and we are getting result set properly. 3.But after that if we drop the table ,its is throwing null pointer exception. Queries 0: jdbc:hive2://hadoop-master:1> CREATE TABLE uniqdata_INCLUDEDICTIONARY (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('DICTIONARY_INCLUDE'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); +-+--+ | Result | +-+--+ +-+--+ No rows selected (1.13 seconds) 0: jdbc:hive2://hadoop-master:1> LOAD DATA INPATH 'hdfs://hadoop-master:54311/data/uniqdata/7000_UniqData.csv' into table uniqdata_INCLUDEDICTIONARY OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_LOGGER_ENABLE'='TRUE', 'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','SINGLE_PASS'='false','USE_KETTLE' ='false'); +-+--+ | Result | +-+--+ +-+--+ No rows selected (22.814 seconds) 0: jdbc:hive2://hadoop-master:1> 0: jdbc:hive2://hadoop-master:1> select count (distinct CUST_NAME) from uniqdata_INCLUDEDICTIONARY ; +---+--+ | _c0 | +---+--+ | 7002 | +---+--+ 1 row selected (3.055 seconds) 0: jdbc:hive2://hadoop-master:1> select count(CUST_NAME) from uniqdata_INCLUDEDICTIONARY ; +---+--+ | _c0 | +---+--+ | 7013 | +---+--+ 1 row selected (0.366 seconds) 0: jdbc:hive2://hadoop-master:1> LOAD DATA INPATH 'hdfs://hadoop-master:54311/data/uniqdata/7000_UniqData.csv' into table uniqdata_INCLUDEDICTIONARY OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_LOGGER_ENABLE'='TRUE', 'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','SINGLE_PASS'='true','USE_KETTLE' ='false'); +-+--+ | Result | +-+--+ +-+--+ No rows selected (4.837 seconds) 0: jdbc:hive2://hadoop-master:1> select count(CUST_NAME) from uniqdata_INCLUDEDICTIONARY ; ++--+ | _c0 | ++--+ | 14026 | ++--+ 1 row selected (0.458 seconds) 0: jdbc:hive2://hadoop-master:1> select count (distinct CUST_NAME) from uniqdata_INCLUDEDICTIONARY ; +---+--+ | _c0 | +---+--+ | 7002 | +---+--+ 1 row selected (3.173 seconds) 0: jdbc:hive2://hadoop-master:1> drop table uniqdata_includedictionary; Error: java.lang.NullPointerException (state=,code=0) Logs WARN 11-01 12:56:52,722 - Lost task 0.0 in stage 61.0 (TID 1740, hadoop-slave-2): FetchFailed(BlockManagerId(0, hadoop-slave-3, 45331), shuffleId=22, mapId=0, reduceId=0, message= org.apache.spark.shuffle.FetchFailedException: Failed to connect to hadoop-slave-3:45331 at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:323) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:300) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:504) at org.apache.spark.sql.execution.aggregate.Tungs