[jira] [Commented] (CARBONDATA-910) Implement Partition feature
[ https://issues.apache.org/jira/browse/CARBONDATA-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965524#comment-15965524 ] Cao, Lionel commented on CARBONDATA-910: Sure, I will enhance the document and then send to mailing list. > Implement Partition feature > --- > > Key: CARBONDATA-910 > URL: https://issues.apache.org/jira/browse/CARBONDATA-910 > Project: CarbonData > Issue Type: New Feature > Components: core, data-load, data-query >Reporter: Cao, Lionel >Assignee: Cao, Lionel > > Why need partition table > Partition table provide an option to divide table into some smaller pieces. > With partition table: > 1. Data could be better managed, organized and stored. > 2. We can avoid full table scan in some scenario and improve query > performance. (partition column in filter, > multiple partition tables join in the same partition column etc.) > Partitioning design > Range Partitioning >range partitioning maps data to partitions according to the range of > partition column values, operator '<' defines non-inclusive upper bound of > current partition. > List Partitioning >list partitioning allows you map data to partitions with specific > value list > Hash Partitioning >hash partitioning maps data to partitions with hash algorithm and put > them to the given number of partitions > Composite Partitioning(2 levels at most for now) >Range-Range, Range-List, Range-Hash, List-Range, List-List, List-Hash, > Hash-Range, Hash-List, Hash-Hash > DDL-Create > Create table sales( > itemid long, > logdate datetime, > customerid int > ... > ...) > [partition by range logdate(...)] > [subpartition by list area(...)] > Stored By 'carbondata' > [tblproperties(...)]; > range partition: > partition by range logdate(< '2016-01-01', < '2017-01-01', < > '2017-02-01', < '2017-03-01', < '2099-01-01') > list partition: > partition by list area('Asia', 'Europe', 'North America', 'Africa', > 'Oceania') > hash partition: > partition by hash(itemid, 9) > composite partition: > partition by range logdate(< '2016- -01', < '2017-01-01', < > '2017-02-01', < '2017-03-01', < '2099-01-01') > subpartition by list area('Asia', 'Europe', 'North America', 'Africa', > 'Oceania') > DDL-Rebuild, Add > Alter table sales rebuild partition by (range|list|hash)(...); > Alter table salse add partition (< '2018-01-01');#only support range > partitioning, list partitioning > Alter table salse add partition ('South America'); > #Note: No delete operation for partition, please use rebuild. > If need delete data, use delete statement, but the definition of partition > will not be deleted. > Partition Table Data Store > [Option One] > Use the current design, keep partition folder out of segments > Fact >|___Part0 >| |___Segment_0 >| |___ ***-[bucketId]-.carbondata >| |___ ***-[bucketId]-.carbondata >| |___Segment_1 >| ... >|___Part1 >| |___Segment_0 >| |___Segment_1 >|... > [Option Two] > remove partition folder, add partition id into file name and build btree in > driver side. > Fact >|___Segment_0 >| |___ ***-[bucketId]-[partitionId].carbondata >| |___ ***-[bucketId]-[partitionId].carbondata >|___Segment_1 >|___Segment_2 >... > Pros & Cons: > Option one would be faster to locate target files > Option two need to store more metadata of folders > Partition Table MetaData Store > partitioni info should be stored in file footer/index file and load into > memory before user query. > Relationship with Bucket > Bucket should be lower level of partition. > Partition Table Query > Example: > Select * from sales > where logdate <= date '2016-12-01'; > User should remember to add a partition filter when write SQL on a partition > table. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CARBONDATA-910) Implement Partition feature
[ https://issues.apache.org/jira/browse/CARBONDATA-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965484#comment-15965484 ] Cao, Lionel commented on CARBONDATA-910: 2017-04-12 notes: 1. list partitioning should support value group; partition by list area((China, India), (England, France), (America, Canada)) 2. support add and delete, maybe rebuild in future, delete partition will delete data also; 3. data store prefer option 2, and use partitionId as taskId; 4. single level partition for first version, no composite partitioning > Implement Partition feature > --- > > Key: CARBONDATA-910 > URL: https://issues.apache.org/jira/browse/CARBONDATA-910 > Project: CarbonData > Issue Type: New Feature > Components: core, data-load, data-query >Reporter: Cao, Lionel >Assignee: Cao, Lionel > > Why need partition table > Partition table provide an option to divide table into some smaller pieces. > With partition table: > 1. Data could be better managed, organized and stored. > 2. We can avoid full table scan in some scenario and improve query > performance. (partition column in filter, > multiple partition tables join in the same partition column etc.) > Partitioning design > Range Partitioning >range partitioning maps data to partitions according to the range of > partition column values, operator '<' defines non-inclusive upper bound of > current partition. > List Partitioning >list partitioning allows you map data to partitions with specific > value list > Hash Partitioning >hash partitioning maps data to partitions with hash algorithm and put > them to the given number of partitions > Composite Partitioning(2 levels at most for now) >Range-Range, Range-List, Range-Hash, List-Range, List-List, List-Hash, > Hash-Range, Hash-List, Hash-Hash > DDL-Create > Create table sales( > itemid long, > logdate datetime, > customerid int > ... > ...) > [partition by range logdate(...)] > [subpartition by list area(...)] > Stored By 'carbondata' > [tblproperties(...)]; > range partition: > partition by range logdate(< '2016-01-01', < '2017-01-01', < > '2017-02-01', < '2017-03-01', < '2099-01-01') > list partition: > partition by list area('Asia', 'Europe', 'North America', 'Africa', > 'Oceania') > hash partition: > partition by hash(itemid, 9) > composite partition: > partition by range logdate(< '2016- -01', < '2017-01-01', < > '2017-02-01', < '2017-03-01', < '2099-01-01') > subpartition by list area('Asia', 'Europe', 'North America', 'Africa', > 'Oceania') > DDL-Rebuild, Add > Alter table sales rebuild partition by (range|list|hash)(...); > Alter table salse add partition (< '2018-01-01');#only support range > partitioning, list partitioning > Alter table salse add partition ('South America'); > #Note: No delete operation for partition, please use rebuild. > If need delete data, use delete statement, but the definition of partition > will not be deleted. > Partition Table Data Store > [Option One] > Use the current design, keep partition folder out of segments > Fact >|___Part0 >| |___Segment_0 >| |___ ***-[bucketId]-.carbondata >| |___ ***-[bucketId]-.carbondata >| |___Segment_1 >| ... >|___Part1 >| |___Segment_0 >| |___Segment_1 >|... > [Option Two] > remove partition folder, add partition id into file name and build btree in > driver side. > Fact >|___Segment_0 >| |___ ***-[bucketId]-[partitionId].carbondata >| |___ ***-[bucketId]-[partitionId].carbondata >|___Segment_1 >|___Segment_2 >... > Pros & Cons: > Option one would be faster to locate target files > Option two need to store more metadata of folders > Partition Table MetaData Store > partitioni info should be stored in file footer/index file and load into > memory before user query. > Relationship with Bucket > Bucket should be lower level of partition. > Partition Table Query > Example: > Select * from sales > where logdate <= date '2016-12-01'; > User should remember to add a partition filter when write SQL on a partition > table. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-840) Limit query performance optimization [Group By]
Cao, Lionel created CARBONDATA-840: -- Summary: Limit query performance optimization [Group By] Key: CARBONDATA-840 URL: https://issues.apache.org/jira/browse/CARBONDATA-840 Project: CarbonData Issue Type: Improvement Components: data-query Reporter: Cao, Lionel Assignee: Cao, Lionel Currently limit query will still scan all data first and limit in the last step. In carbon we can convert limit to filters with dictionary distinct value list... -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (CARBONDATA-762) modify all schemaName->databaseName, cubeName->tableName
[ https://issues.apache.org/jira/browse/CARBONDATA-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cao, Lionel reassigned CARBONDATA-762: -- Assignee: Cao, Lionel (was: QiangCai) > modify all schemaName->databaseName, cubeName->tableName > > > Key: CARBONDATA-762 > URL: https://issues.apache.org/jira/browse/CARBONDATA-762 > Project: CarbonData > Issue Type: Bug >Reporter: QiangCai >Assignee: Cao, Lionel >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > modify all schemaName->databaseName, cubeName->tableName -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CARBONDATA-760) Should to avoid ERROR log for successful select query
[ https://issues.apache.org/jira/browse/CARBONDATA-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925648#comment-15925648 ] Cao, Lionel commented on CARBONDATA-760: Hi [~QiangCai], Could you provide some environment info so that I can re-produce the error info? Currently I can't get it from CarbonExample. thanks, Lionel > Should to avoid ERROR log for successful select query > - > > Key: CARBONDATA-760 > URL: https://issues.apache.org/jira/browse/CARBONDATA-760 > Project: CarbonData > Issue Type: Bug > Components: data-query >Reporter: QiangCai >Assignee: Cao, Lionel >Priority: Minor > > Some table without delete or update operator maybe not have delta files. > Select query shouldn't record error log. > Code: > SegmentUpdateStatusManager.getDeltaFiles > Log detail: > ERROR 06-03 19:21:37,531 - pool-475-thread-1 Invalid tuple id > arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/0 > ERROR 06-03 19:21:37,948 - pool-475-thread-1 Invalid tuple id > arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/1 > ERROR 06-03 19:21:38,517 - pool-475-thread-1 Invalid tuple id > arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/2 > ERROR 06-03 19:21:38,909 - pool-475-thread-1 Invalid tuple id > arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/3 > ERROR 06-03 19:21:39,292 - pool-475-thread-1 Invalid tuple id > arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/4 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (CARBONDATA-760) Should to avoid ERROR log for successful select query
[ https://issues.apache.org/jira/browse/CARBONDATA-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cao, Lionel reassigned CARBONDATA-760: -- Assignee: Cao, Lionel (was: QiangCai) > Should to avoid ERROR log for successful select query > - > > Key: CARBONDATA-760 > URL: https://issues.apache.org/jira/browse/CARBONDATA-760 > Project: CarbonData > Issue Type: Bug > Components: data-query >Reporter: QiangCai >Assignee: Cao, Lionel >Priority: Minor > > Some table without delete or update operator maybe not have delta files. > Select query shouldn't record error log. > Code: > SegmentUpdateStatusManager.getDeltaFiles > Log detail: > ERROR 06-03 19:21:37,531 - pool-475-thread-1 Invalid tuple id > arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/0 > ERROR 06-03 19:21:37,948 - pool-475-thread-1 Invalid tuple id > arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/1 > ERROR 06-03 19:21:38,517 - pool-475-thread-1 Invalid tuple id > arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/2 > ERROR 06-03 19:21:38,909 - pool-475-thread-1 Invalid tuple id > arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/3 > ERROR 06-03 19:21:39,292 - pool-475-thread-1 Invalid tuple id > arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/4 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (CARBONDATA-741) Remove the unnecessary classes from carbondata
[ https://issues.apache.org/jira/browse/CARBONDATA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cao, Lionel reassigned CARBONDATA-741: -- Assignee: Cao, Lionel (was: Liang Chen) > Remove the unnecessary classes from carbondata > -- > > Key: CARBONDATA-741 > URL: https://issues.apache.org/jira/browse/CARBONDATA-741 > Project: CarbonData > Issue Type: Bug >Reporter: Ravindra Pesala >Assignee: Cao, Lionel >Priority: Trivial > Time Spent: 20m > Remaining Estimate: 0h > > Please remove following classes as it is not used now. > VectorChunkRowIterator > CarbonColumnVectorImpl -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (CARBONDATA-739) Avoid creating multiple instances of DirectDictionary in DictionaryBasedResultCollector
[ https://issues.apache.org/jira/browse/CARBONDATA-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cao, Lionel reassigned CARBONDATA-739: -- Assignee: Cao, Lionel (was: Liang Chen) > Avoid creating multiple instances of DirectDictionary in > DictionaryBasedResultCollector > --- > > Key: CARBONDATA-739 > URL: https://issues.apache.org/jira/browse/CARBONDATA-739 > Project: CarbonData > Issue Type: Bug > Components: core >Reporter: Ravindra Pesala >Assignee: Cao, Lionel >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Avoid creating multiple instances of DirectDictionary in > DictionaryBasedResultCollector. > For every row, direct dictionary is creating inside > DictionaryBasedResultCollector.collectData method. > Please create single instance per column and reuse it -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (CARBONDATA-740) Add logger for rows processed while closing in AbstractDataLoadProcessorStep
[ https://issues.apache.org/jira/browse/CARBONDATA-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cao, Lionel reassigned CARBONDATA-740: -- Assignee: Cao, Lionel (was: Liang Chen) > Add logger for rows processed while closing in AbstractDataLoadProcessorStep > > > Key: CARBONDATA-740 > URL: https://issues.apache.org/jira/browse/CARBONDATA-740 > Project: CarbonData > Issue Type: Bug >Reporter: Ravindra Pesala >Assignee: Cao, Lionel >Priority: Trivial > Time Spent: 40m > Remaining Estimate: 0h > > Add logger for rows processed while closing in AbstractDataLoadProcessorStep. > It is good to print the total records processed while closing the step, so > please log the rows processed in AbstractDataLoadProcessorStep -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (CARBONDATA-743) Remove the abundant class CarbonFilters.scala
[ https://issues.apache.org/jira/browse/CARBONDATA-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cao, Lionel reassigned CARBONDATA-743: -- Assignee: Cao, Lionel (was: Liang Chen) > Remove the abundant class CarbonFilters.scala > - > > Key: CARBONDATA-743 > URL: https://issues.apache.org/jira/browse/CARBONDATA-743 > Project: CarbonData > Issue Type: Bug >Reporter: Ravindra Pesala >Assignee: Cao, Lionel >Priority: Trivial > Time Spent: 10m > Remaining Estimate: 0h > > Remove the abundant class CarbonFilters.scala from spark2 package. > Right now there are two classes with name CarbonFilters in carbondata. > 1.Delete the CarbonFilters scala file from spark-common package > 2. Move the CarbonFilters scala from spark2 package to spark-common package. > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (CARBONDATA-514) Select string type columns will return error.
[ https://issues.apache.org/jira/browse/CARBONDATA-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cao, Lionel closed CARBONDATA-514. -- Resolution: Fixed > Select string type columns will return error. > - > > Key: CARBONDATA-514 > URL: https://issues.apache.org/jira/browse/CARBONDATA-514 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 1.0.0-incubating >Reporter: Cao, Lionel > Attachments: Screenshot.png > > > The data successfully loaded and count(*) is OK, but when I tried to query > the detail data, it returns below error: > scala> cc.sql("desc carbontest_002").show > +-+-+---+ > | col_name|data_type|comment| > +-+-+---+ > | vin| string| | > |data_date| string| | > +-+-+---+ > scala> cc.sql("load data inpath > 'hdfs://nameservice2/user/appuser/lucao/mydata4.csv' into table > default.carbontest_002 OPTIONS('DELIMITER'=',')") > WARN 07-12 16:30:30,241 - main skip empty input file: > hdfs://nameservice2/user/appuser/lucao/mydata4.csv/_SUCCESS > AUDIT 07-12 16:30:34,338 - [*.com][appuser][Thread-1]Data load request has > been received for table default.carbontest_002 > AUDIT 07-12 16:30:38,410 - [*.com][appuser][Thread-1]Data load is successful > for default.carbontest_002 > res12: org.apache.spark.sql.DataFrame = [] > scala> cc.sql("select count(*) from carbontest_002") > res14: org.apache.spark.sql.DataFrame = [_c0: bigint] > scala> res14.show > +---+ > |_c0| > +---+ > |100| > +---+ > scala> cc.sql("select vin, count(*) as cnt from carbontest_002 group by > vin").show > WARN 07-12 16:32:04,250 - Lost task 1.0 in stage 20.0 (TID 40, *.com): > java.lang.ClassCastException: java.lang.String cannot be cast to > java.lang.Integer > at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getInt(rows.scala:41) > > at > org.apache.spark.sql.catalyst.expressions.GenericMutableRow.getInt(rows.scala:248) > > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.CarbonScan$$anonfun$1$$anon$1.next(CarbonScan.scala:155) > at > org.apache.spark.sql.CarbonScan$$anonfun$1$$anon$1.next(CarbonScan.scala:149) > at > org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:512) > > at > org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.(TungstenAggregationIterator.scala:686) > > at > org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95) > > at > org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86) > > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) > > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) > > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > ERROR 07-12 16:32:04,516 - Task 1 in stage 20.0 failed 4 times; aborting job > WARN 07-12 16:32:04,600 - Lost task 0.1 in stage 20.0 (TID 45, *): > TaskKilled (killed intentionally) > ERROR 07-12 16:32:04,604 - Listener SQLListener threw an exception > java.lang.NullPointerException > at > org.apache.spark.sql.execution.ui.SQLListener.onTaskEnd(SQLListener.scala:167) > > at > org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42) > > at > org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > > at > org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > > at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55) > at >
[jira] [Commented] (CARBONDATA-514) Select string type columns will return error.
[ https://issues.apache.org/jira/browse/CARBONDATA-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824886#comment-15824886 ] Cao, Lionel commented on CARBONDATA-514: Hi Ravi, Tested the current master branch, all successed. Thanks, Lionel > Select string type columns will return error. > - > > Key: CARBONDATA-514 > URL: https://issues.apache.org/jira/browse/CARBONDATA-514 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 1.0.0-incubating >Reporter: Cao, Lionel > Attachments: Screenshot.png > > > The data successfully loaded and count(*) is OK, but when I tried to query > the detail data, it returns below error: > scala> cc.sql("desc carbontest_002").show > +-+-+---+ > | col_name|data_type|comment| > +-+-+---+ > | vin| string| | > |data_date| string| | > +-+-+---+ > scala> cc.sql("load data inpath > 'hdfs://nameservice2/user/appuser/lucao/mydata4.csv' into table > default.carbontest_002 OPTIONS('DELIMITER'=',')") > WARN 07-12 16:30:30,241 - main skip empty input file: > hdfs://nameservice2/user/appuser/lucao/mydata4.csv/_SUCCESS > AUDIT 07-12 16:30:34,338 - [*.com][appuser][Thread-1]Data load request has > been received for table default.carbontest_002 > AUDIT 07-12 16:30:38,410 - [*.com][appuser][Thread-1]Data load is successful > for default.carbontest_002 > res12: org.apache.spark.sql.DataFrame = [] > scala> cc.sql("select count(*) from carbontest_002") > res14: org.apache.spark.sql.DataFrame = [_c0: bigint] > scala> res14.show > +---+ > |_c0| > +---+ > |100| > +---+ > scala> cc.sql("select vin, count(*) as cnt from carbontest_002 group by > vin").show > WARN 07-12 16:32:04,250 - Lost task 1.0 in stage 20.0 (TID 40, *.com): > java.lang.ClassCastException: java.lang.String cannot be cast to > java.lang.Integer > at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getInt(rows.scala:41) > > at > org.apache.spark.sql.catalyst.expressions.GenericMutableRow.getInt(rows.scala:248) > > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.CarbonScan$$anonfun$1$$anon$1.next(CarbonScan.scala:155) > at > org.apache.spark.sql.CarbonScan$$anonfun$1$$anon$1.next(CarbonScan.scala:149) > at > org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:512) > > at > org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.(TungstenAggregationIterator.scala:686) > > at > org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95) > > at > org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86) > > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) > > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) > > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > ERROR 07-12 16:32:04,516 - Task 1 in stage 20.0 failed 4 times; aborting job > WARN 07-12 16:32:04,600 - Lost task 0.1 in stage 20.0 (TID 45, *): > TaskKilled (killed intentionally) > ERROR 07-12 16:32:04,604 - Listener SQLListener threw an exception > java.lang.NullPointerException > at > org.apache.spark.sql.execution.ui.SQLListener.onTaskEnd(SQLListener.scala:167) > > at > org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42) > > at > org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > > at > org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
[jira] [Closed] (CARBONDATA-559) Job failed at last step
[ https://issues.apache.org/jira/browse/CARBONDATA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cao, Lionel closed CARBONDATA-559. -- > Job failed at last step > --- > > Key: CARBONDATA-559 > URL: https://issues.apache.org/jira/browse/CARBONDATA-559 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 0.2.0-incubating > Environment: carbon version: branch-0.2 > hadoop 2.4.0 > spark 1.6.0 > OS centOS >Reporter: Cao, Lionel > Attachments: test001.log.zip > > > Hi team, > My job alwasy failed at last step: > it said 'yarn' user don't have write access to target data > path(storeLocation). > But I tested twice with 1 rows data, both successed. could you help look > into the log? Please refer to the attachment. > Search 'access=WRITE' you can see the exception. > Search 'Exception' for other exceptions. > thanks, > Lionel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-559) Job failed at last step
[ https://issues.apache.org/jira/browse/CARBONDATA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cao, Lionel resolved CARBONDATA-559. Resolution: Done > Job failed at last step > --- > > Key: CARBONDATA-559 > URL: https://issues.apache.org/jira/browse/CARBONDATA-559 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 0.2.0-incubating > Environment: carbon version: branch-0.2 > hadoop 2.4.0 > spark 1.6.0 > OS centOS >Reporter: Cao, Lionel > Attachments: test001.log.zip > > > Hi team, > My job alwasy failed at last step: > it said 'yarn' user don't have write access to target data > path(storeLocation). > But I tested twice with 1 rows data, both successed. could you help look > into the log? Please refer to the attachment. > Search 'access=WRITE' you can see the exception. > Search 'Exception' for other exceptions. > thanks, > Lionel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-559) Job failed at last step
[ https://issues.apache.org/jira/browse/CARBONDATA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15776831#comment-15776831 ] Cao, Lionel commented on CARBONDATA-559: Thank you Babu! It works! > Job failed at last step > --- > > Key: CARBONDATA-559 > URL: https://issues.apache.org/jira/browse/CARBONDATA-559 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 0.2.0-incubating > Environment: carbon version: branch-0.2 > hadoop 2.4.0 > spark 1.6.0 > OS centOS >Reporter: Cao, Lionel > Attachments: test001.log.zip > > > Hi team, > My job alwasy failed at last step: > it said 'yarn' user don't have write access to target data > path(storeLocation). > But I tested twice with 1 rows data, both successed. could you help look > into the log? Please refer to the attachment. > Search 'access=WRITE' you can see the exception. > Search 'Exception' for other exceptions. > thanks, > Lionel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-559) Job failed at last step
[ https://issues.apache.org/jira/browse/CARBONDATA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15772502#comment-15772502 ] Cao, Lionel commented on CARBONDATA-559: The privilege of data path is appuser:appuser drwxr-xr-x. And looks like the carbondata will use both [yarn, appuser] to write the data file/dictionary file/index file. It is strange for this error occured because some time it doesn't return error. > Job failed at last step > --- > > Key: CARBONDATA-559 > URL: https://issues.apache.org/jira/browse/CARBONDATA-559 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 0.2.0-incubating > Environment: carbon version: branch-0.2 > hadoop 2.4.0 > spark 1.6.0 > OS centOS >Reporter: Cao, Lionel > Attachments: test001.log.zip > > > Hi team, > My job alwasy failed at last step: > it said 'yarn' user don't have write access to target data > path(storeLocation). > But I tested twice with 1 rows data, both successed. could you help look > into the log? Please refer to the attachment. > Search 'access=WRITE' you can see the exception. > Search 'Exception' for other exceptions. > thanks, > Lionel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-559) Job failed at last step
[ https://issues.apache.org/jira/browse/CARBONDATA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cao, Lionel updated CARBONDATA-559: --- Attachment: test001.log.zip > Job failed at last step > --- > > Key: CARBONDATA-559 > URL: https://issues.apache.org/jira/browse/CARBONDATA-559 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 0.2.0-incubating > Environment: carbon version: branch-0.2 > hadoop 2.4.0 > spark 1.6.0 > OS centOS >Reporter: Cao, Lionel > Attachments: test001.log.zip > > > Hi team, > My job alwasy failed at last step: > it said 'yarn' user don't have write access to target data > path(storeLocation). > But I tested twice with 1 rows data, both successed. could you help look > into the log? Please refer to the attachment. > Search 'access=WRITE' you can see the exception. > Search 'Exception' for other exceptions. > thanks, > Lionel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-559) Job failed at last step
Cao, Lionel created CARBONDATA-559: -- Summary: Job failed at last step Key: CARBONDATA-559 URL: https://issues.apache.org/jira/browse/CARBONDATA-559 Project: CarbonData Issue Type: Bug Components: core Affects Versions: 0.2.0-incubating Environment: carbon version: branch-0.2 hadoop 2.4.0 spark 1.6.0 OS centOS Reporter: Cao, Lionel Hi team, My job alwasy failed at last step: it said 'yarn' user don't have write access to target data path(storeLocation). But I tested twice with 1 rows data, both successed. could you help look into the log? Please refer to the attachment. Search 'access=WRITE' you can see the exception. Search 'Exception' for other exceptions. thanks, Lionel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-514) Select string type columns will return error.
Cao, Lionel created CARBONDATA-514: -- Summary: Select string type columns will return error. Key: CARBONDATA-514 URL: https://issues.apache.org/jira/browse/CARBONDATA-514 Project: CarbonData Issue Type: Bug Components: sql Affects Versions: 1.0.0-incubating Reporter: Cao, Lionel The data successfully loaded and count(*) is OK, but when I tried to query the detail data, it returns below error: scala> cc.sql("desc carbontest_002").show +-+-+---+ | col_name|data_type|comment| +-+-+---+ | vin| string| | |data_date| string| | +-+-+---+ scala> cc.sql("load data inpath 'hdfs://nameservice2/user/appuser/lucao/mydata4.csv' into table default.carbontest_002 OPTIONS('DELIMITER'=',')") WARN 07-12 16:30:30,241 - main skip empty input file: hdfs://nameservice2/user/appuser/lucao/mydata4.csv/_SUCCESS AUDIT 07-12 16:30:34,338 - [*.com][appuser][Thread-1]Data load request has been received for table default.carbontest_002 AUDIT 07-12 16:30:38,410 - [*.com][appuser][Thread-1]Data load is successful for default.carbontest_002 res12: org.apache.spark.sql.DataFrame = [] scala> cc.sql("select count(*) from carbontest_002") res14: org.apache.spark.sql.DataFrame = [_c0: bigint] scala> res14.show +---+ |_c0| +---+ |100| +---+ scala> cc.sql("select vin, count(*) as cnt from carbontest_002 group by vin").show WARN 07-12 16:32:04,250 - Lost task 1.0 in stage 20.0 (TID 40, *.com): java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getInt(rows.scala:41) at org.apache.spark.sql.catalyst.expressions.GenericMutableRow.getInt(rows.scala:248) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.CarbonScan$$anonfun$1$$anon$1.next(CarbonScan.scala:155) at org.apache.spark.sql.CarbonScan$$anonfun$1$$anon$1.next(CarbonScan.scala:149) at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:512) at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.(TungstenAggregationIterator.scala:686) at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95) at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ERROR 07-12 16:32:04,516 - Task 1 in stage 20.0 failed 4 times; aborting job WARN 07-12 16:32:04,600 - Lost task 0.1 in stage 20.0 (TID 45, *): TaskKilled (killed intentionally) ERROR 07-12 16:32:04,604 - Listener SQLListener threw an exception java.lang.NullPointerException at org.apache.spark.sql.execution.ui.SQLListener.onTaskEnd(SQLListener.scala:167) at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55) at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80) at