[jira] [Commented] (CARBONDATA-910) Implement Partition feature

2017-04-12 Thread Cao, Lionel (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965524#comment-15965524
 ] 

Cao, Lionel commented on CARBONDATA-910:


Sure, I will enhance the document and then send to mailing list.

> Implement Partition feature
> ---
>
> Key: CARBONDATA-910
> URL: https://issues.apache.org/jira/browse/CARBONDATA-910
> Project: CarbonData
>  Issue Type: New Feature
>  Components: core, data-load, data-query
>Reporter: Cao, Lionel
>Assignee: Cao, Lionel
>
> Why need partition table
> Partition table provide an option to divide table into some smaller pieces. 
> With partition table:
>   1. Data could be better managed, organized and stored. 
>   2. We can avoid full table scan in some scenario and improve query 
> performance. (partition column in filter, 
>   multiple partition tables join in the same partition column etc.)
> Partitioning design
> Range Partitioning   
>range partitioning maps data to partitions according to the range of 
> partition column values, operator '<' defines non-inclusive upper bound of 
> current partition.
> List Partitioning
>list partitioning allows you map data to partitions with specific 
> value list
> Hash Partitioning
>hash partitioning maps data to partitions with hash algorithm and put 
> them to the given number of partitions
> Composite Partitioning(2 levels at most for now)
>Range-Range, Range-List, Range-Hash, List-Range, List-List, List-Hash, 
> Hash-Range, Hash-List, Hash-Hash
> DDL-Create 
> Create table sales(
>  itemid long, 
>  logdate datetime, 
>  customerid int
>  ...
>  ...)
> [partition by range logdate(...)]
> [subpartition by list area(...)]
> Stored By 'carbondata'
> [tblproperties(...)];
> range partition: 
>  partition by range logdate(<  '2016-01-01', < '2017-01-01', < 
> '2017-02-01', < '2017-03-01', < '2099-01-01')
> list partition:
>  partition by list area('Asia', 'Europe', 'North America', 'Africa', 
> 'Oceania')
> hash partition:
>  partition by hash(itemid, 9) 
> composite partition:
>  partition by range logdate(<  '2016- -01', < '2017-01-01', < 
> '2017-02-01', < '2017-03-01', < '2099-01-01')
>  subpartition by list area('Asia', 'Europe', 'North America', 'Africa', 
> 'Oceania')
> DDL-Rebuild, Add
> Alter table sales rebuild partition by (range|list|hash)(...);
> Alter table salse add partition (< '2018-01-01');#only support range 
> partitioning, list partitioning
> Alter table salse add partition ('South America');
> #Note: No delete operation for partition, please use rebuild. 
> If need delete data, use delete statement, but the definition of partition 
> will not be deleted.
> Partition Table Data Store
> [Option One]
> Use the current design, keep partition folder out of segments
> Fact
>|___Part0
>|  |___Segment_0
>| |___ ***-[bucketId]-.carbondata
>| |___ ***-[bucketId]-.carbondata
>|  |___Segment_1
>|  ...
>|___Part1
>|  |___Segment_0
>|  |___Segment_1
>|...
> [Option Two]
> remove partition folder, add partition id into file name and build btree in 
> driver side.
> Fact
>|___Segment_0
>|  |___ ***-[bucketId]-[partitionId].carbondata
>|  |___ ***-[bucketId]-[partitionId].carbondata
>|___Segment_1
>|___Segment_2
>...
> Pros & Cons: 
> Option one would be faster to locate target files
> Option two need to store more metadata of folders
> Partition Table MetaData Store
> partitioni info should be stored in file footer/index file and load into 
> memory before user query.
> Relationship with Bucket
> Bucket should be lower level of partition.
> Partition Table Query
> Example:
> Select * from sales
> where logdate <= date '2016-12-01';
> User should remember to add a partition filter when write SQL on a partition 
> table.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CARBONDATA-910) Implement Partition feature

2017-04-12 Thread Cao, Lionel (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965484#comment-15965484
 ] 

Cao, Lionel commented on CARBONDATA-910:


2017-04-12 notes:
1. list partitioning should support value group;  partition by list 
area((China, India), (England, France), (America, Canada))
2. support add and delete, maybe rebuild in future, delete partition will 
delete data also;
3. data store prefer option 2, and use partitionId as taskId;
4. single level partition for first version, no composite partitioning

> Implement Partition feature
> ---
>
> Key: CARBONDATA-910
> URL: https://issues.apache.org/jira/browse/CARBONDATA-910
> Project: CarbonData
>  Issue Type: New Feature
>  Components: core, data-load, data-query
>Reporter: Cao, Lionel
>Assignee: Cao, Lionel
>
> Why need partition table
> Partition table provide an option to divide table into some smaller pieces. 
> With partition table:
>   1. Data could be better managed, organized and stored. 
>   2. We can avoid full table scan in some scenario and improve query 
> performance. (partition column in filter, 
>   multiple partition tables join in the same partition column etc.)
> Partitioning design
> Range Partitioning   
>range partitioning maps data to partitions according to the range of 
> partition column values, operator '<' defines non-inclusive upper bound of 
> current partition.
> List Partitioning
>list partitioning allows you map data to partitions with specific 
> value list
> Hash Partitioning
>hash partitioning maps data to partitions with hash algorithm and put 
> them to the given number of partitions
> Composite Partitioning(2 levels at most for now)
>Range-Range, Range-List, Range-Hash, List-Range, List-List, List-Hash, 
> Hash-Range, Hash-List, Hash-Hash
> DDL-Create 
> Create table sales(
>  itemid long, 
>  logdate datetime, 
>  customerid int
>  ...
>  ...)
> [partition by range logdate(...)]
> [subpartition by list area(...)]
> Stored By 'carbondata'
> [tblproperties(...)];
> range partition: 
>  partition by range logdate(<  '2016-01-01', < '2017-01-01', < 
> '2017-02-01', < '2017-03-01', < '2099-01-01')
> list partition:
>  partition by list area('Asia', 'Europe', 'North America', 'Africa', 
> 'Oceania')
> hash partition:
>  partition by hash(itemid, 9) 
> composite partition:
>  partition by range logdate(<  '2016- -01', < '2017-01-01', < 
> '2017-02-01', < '2017-03-01', < '2099-01-01')
>  subpartition by list area('Asia', 'Europe', 'North America', 'Africa', 
> 'Oceania')
> DDL-Rebuild, Add
> Alter table sales rebuild partition by (range|list|hash)(...);
> Alter table salse add partition (< '2018-01-01');#only support range 
> partitioning, list partitioning
> Alter table salse add partition ('South America');
> #Note: No delete operation for partition, please use rebuild. 
> If need delete data, use delete statement, but the definition of partition 
> will not be deleted.
> Partition Table Data Store
> [Option One]
> Use the current design, keep partition folder out of segments
> Fact
>|___Part0
>|  |___Segment_0
>| |___ ***-[bucketId]-.carbondata
>| |___ ***-[bucketId]-.carbondata
>|  |___Segment_1
>|  ...
>|___Part1
>|  |___Segment_0
>|  |___Segment_1
>|...
> [Option Two]
> remove partition folder, add partition id into file name and build btree in 
> driver side.
> Fact
>|___Segment_0
>|  |___ ***-[bucketId]-[partitionId].carbondata
>|  |___ ***-[bucketId]-[partitionId].carbondata
>|___Segment_1
>|___Segment_2
>...
> Pros & Cons: 
> Option one would be faster to locate target files
> Option two need to store more metadata of folders
> Partition Table MetaData Store
> partitioni info should be stored in file footer/index file and load into 
> memory before user query.
> Relationship with Bucket
> Bucket should be lower level of partition.
> Partition Table Query
> Example:
> Select * from sales
> where logdate <= date '2016-12-01';
> User should remember to add a partition filter when write SQL on a partition 
> table.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-840) Limit query performance optimization [Group By]

2017-03-31 Thread Cao, Lionel (JIRA)
Cao, Lionel created CARBONDATA-840:
--

 Summary: Limit query performance optimization [Group By]
 Key: CARBONDATA-840
 URL: https://issues.apache.org/jira/browse/CARBONDATA-840
 Project: CarbonData
  Issue Type: Improvement
  Components: data-query
Reporter: Cao, Lionel
Assignee: Cao, Lionel


Currently limit query will still scan all data first and limit in the last 
step. In carbon we can convert limit to filters with dictionary distinct value 
list...




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (CARBONDATA-762) modify all schemaName->databaseName, cubeName->tableName

2017-03-15 Thread Cao, Lionel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao, Lionel reassigned CARBONDATA-762:
--

Assignee: Cao, Lionel  (was: QiangCai)

> modify all schemaName->databaseName, cubeName->tableName
> 
>
> Key: CARBONDATA-762
> URL: https://issues.apache.org/jira/browse/CARBONDATA-762
> Project: CarbonData
>  Issue Type: Bug
>Reporter: QiangCai
>Assignee: Cao, Lionel
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> modify all schemaName->databaseName, cubeName->tableName



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CARBONDATA-760) Should to avoid ERROR log for successful select query

2017-03-15 Thread Cao, Lionel (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925648#comment-15925648
 ] 

Cao, Lionel commented on CARBONDATA-760:


Hi [~QiangCai], Could you provide some environment info so that I can 
re-produce the error info? Currently I can't get it from CarbonExample.

thanks,
Lionel

> Should to avoid ERROR log for successful select query
> -
>
> Key: CARBONDATA-760
> URL: https://issues.apache.org/jira/browse/CARBONDATA-760
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Reporter: QiangCai
>Assignee: Cao, Lionel
>Priority: Minor
>
> Some table without delete or update operator maybe not have delta files. 
> Select query shouldn't record error log.
> Code:
> SegmentUpdateStatusManager.getDeltaFiles
> Log detail:
>  ERROR 06-03 19:21:37,531 - pool-475-thread-1 Invalid tuple id 
> arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/0
> ERROR 06-03 19:21:37,948 - pool-475-thread-1 Invalid tuple id 
> arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/1
> ERROR 06-03 19:21:38,517 - pool-475-thread-1 Invalid tuple id 
> arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/2
> ERROR 06-03 19:21:38,909 - pool-475-thread-1 Invalid tuple id 
> arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/3
> ERROR 06-03 19:21:39,292 - pool-475-thread-1 Invalid tuple id 
> arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/4



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (CARBONDATA-760) Should to avoid ERROR log for successful select query

2017-03-15 Thread Cao, Lionel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao, Lionel reassigned CARBONDATA-760:
--

Assignee: Cao, Lionel  (was: QiangCai)

> Should to avoid ERROR log for successful select query
> -
>
> Key: CARBONDATA-760
> URL: https://issues.apache.org/jira/browse/CARBONDATA-760
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Reporter: QiangCai
>Assignee: Cao, Lionel
>Priority: Minor
>
> Some table without delete or update operator maybe not have delta files. 
> Select query shouldn't record error log.
> Code:
> SegmentUpdateStatusManager.getDeltaFiles
> Log detail:
>  ERROR 06-03 19:21:37,531 - pool-475-thread-1 Invalid tuple id 
> arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/0
> ERROR 06-03 19:21:37,948 - pool-475-thread-1 Invalid tuple id 
> arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/1
> ERROR 06-03 19:21:38,517 - pool-475-thread-1 Invalid tuple id 
> arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/2
> ERROR 06-03 19:21:38,909 - pool-475-thread-1 Invalid tuple id 
> arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/3
> ERROR 06-03 19:21:39,292 - pool-475-thread-1 Invalid tuple id 
> arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/4



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (CARBONDATA-741) Remove the unnecessary classes from carbondata

2017-03-09 Thread Cao, Lionel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao, Lionel reassigned CARBONDATA-741:
--

Assignee: Cao, Lionel  (was: Liang Chen)

> Remove the unnecessary classes from carbondata
> --
>
> Key: CARBONDATA-741
> URL: https://issues.apache.org/jira/browse/CARBONDATA-741
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Ravindra Pesala
>Assignee: Cao, Lionel
>Priority: Trivial
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Please remove following classes as it is not used now.
> VectorChunkRowIterator
> CarbonColumnVectorImpl



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (CARBONDATA-739) Avoid creating multiple instances of DirectDictionary in DictionaryBasedResultCollector

2017-03-08 Thread Cao, Lionel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao, Lionel reassigned CARBONDATA-739:
--

Assignee: Cao, Lionel  (was: Liang Chen)

> Avoid creating multiple instances of DirectDictionary in 
> DictionaryBasedResultCollector
> ---
>
> Key: CARBONDATA-739
> URL: https://issues.apache.org/jira/browse/CARBONDATA-739
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Reporter: Ravindra Pesala
>Assignee: Cao, Lionel
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Avoid creating multiple instances of DirectDictionary in 
> DictionaryBasedResultCollector.
> For every row, direct dictionary is creating inside 
> DictionaryBasedResultCollector.collectData method.
> Please create single instance per column and reuse it



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (CARBONDATA-740) Add logger for rows processed while closing in AbstractDataLoadProcessorStep

2017-03-07 Thread Cao, Lionel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao, Lionel reassigned CARBONDATA-740:
--

Assignee: Cao, Lionel  (was: Liang Chen)

> Add logger for rows processed while closing in AbstractDataLoadProcessorStep
> 
>
> Key: CARBONDATA-740
> URL: https://issues.apache.org/jira/browse/CARBONDATA-740
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Ravindra Pesala
>Assignee: Cao, Lionel
>Priority: Trivial
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Add logger for rows processed while closing in AbstractDataLoadProcessorStep.
> It is good to print the total records processed while closing the step, so 
> please log the rows processed in AbstractDataLoadProcessorStep



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (CARBONDATA-743) Remove the abundant class CarbonFilters.scala

2017-03-05 Thread Cao, Lionel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao, Lionel reassigned CARBONDATA-743:
--

Assignee: Cao, Lionel  (was: Liang Chen)

> Remove the abundant class CarbonFilters.scala
> -
>
> Key: CARBONDATA-743
> URL: https://issues.apache.org/jira/browse/CARBONDATA-743
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Ravindra Pesala
>Assignee: Cao, Lionel
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Remove the abundant class CarbonFilters.scala from spark2 package.
> Right now there are two classes with name CarbonFilters in carbondata.
> 1.Delete the CarbonFilters scala file from spark-common package
> 2. Move the CarbonFilters scala from spark2 package to spark-common package.
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (CARBONDATA-514) Select string type columns will return error.

2017-01-16 Thread Cao, Lionel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao, Lionel closed CARBONDATA-514.
--
Resolution: Fixed

> Select string type columns will return error.
> -
>
> Key: CARBONDATA-514
> URL: https://issues.apache.org/jira/browse/CARBONDATA-514
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 1.0.0-incubating
>Reporter: Cao, Lionel
> Attachments: Screenshot.png
>
>
> The data successfully loaded and count(*) is OK, but when I tried to query 
> the detail data, it returns below error:
> scala> cc.sql("desc carbontest_002").show 
> +-+-+---+ 
> | col_name|data_type|comment| 
> +-+-+---+ 
> |  vin|   string|   | 
> |data_date|   string|   | 
> +-+-+---+ 
> scala> cc.sql("load data inpath 
> 'hdfs://nameservice2/user/appuser/lucao/mydata4.csv' into table 
> default.carbontest_002 OPTIONS('DELIMITER'=',')") 
> WARN  07-12 16:30:30,241 - main skip empty input file: 
> hdfs://nameservice2/user/appuser/lucao/mydata4.csv/_SUCCESS 
> AUDIT 07-12 16:30:34,338 - [*.com][appuser][Thread-1]Data load request has 
> been received for table default.carbontest_002 
> AUDIT 07-12 16:30:38,410 - [*.com][appuser][Thread-1]Data load is successful 
> for default.carbontest_002 
> res12: org.apache.spark.sql.DataFrame = [] 
> scala> cc.sql("select count(*) from carbontest_002") 
> res14: org.apache.spark.sql.DataFrame = [_c0: bigint] 
> scala> res14.show 
> +---+ 
> |_c0| 
> +---+ 
> |100| 
> +---+ 
> scala> cc.sql("select vin, count(*) as cnt from carbontest_002 group by 
> vin").show 
> WARN  07-12 16:32:04,250 - Lost task 1.0 in stage 20.0 (TID 40, *.com): 
> java.lang.ClassCastException: java.lang.String cannot be cast to 
> java.lang.Integer 
> at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) 
> at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getInt(rows.scala:41)
>  
> at 
> org.apache.spark.sql.catalyst.expressions.GenericMutableRow.getInt(rows.scala:248)
>  
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source) 
> at 
> org.apache.spark.sql.CarbonScan$$anonfun$1$$anon$1.next(CarbonScan.scala:155) 
> at 
> org.apache.spark.sql.CarbonScan$$anonfun$1$$anon$1.next(CarbonScan.scala:149) 
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:512)
>  
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.(TungstenAggregationIterator.scala:686)
>  
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
>  
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
>  
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>  
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>  
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) 
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 
> at org.apache.spark.scheduler.Task.run(Task.scala:89) 
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  
> at java.lang.Thread.run(Thread.java:745) 
> ERROR 07-12 16:32:04,516 - Task 1 in stage 20.0 failed 4 times; aborting job 
> WARN  07-12 16:32:04,600 - Lost task 0.1 in stage 20.0 (TID 45, *): 
> TaskKilled (killed intentionally) 
> ERROR 07-12 16:32:04,604 - Listener SQLListener threw an exception 
> java.lang.NullPointerException 
> at 
> org.apache.spark.sql.execution.ui.SQLListener.onTaskEnd(SQLListener.scala:167)
>  
> at 
> org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
>  
> at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
>  
> at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
>  
> at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55) 
> at 
> 

[jira] [Commented] (CARBONDATA-514) Select string type columns will return error.

2017-01-16 Thread Cao, Lionel (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824886#comment-15824886
 ] 

Cao, Lionel commented on CARBONDATA-514:


Hi Ravi,
Tested the current master branch, all successed.

Thanks,
Lionel

> Select string type columns will return error.
> -
>
> Key: CARBONDATA-514
> URL: https://issues.apache.org/jira/browse/CARBONDATA-514
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 1.0.0-incubating
>Reporter: Cao, Lionel
> Attachments: Screenshot.png
>
>
> The data successfully loaded and count(*) is OK, but when I tried to query 
> the detail data, it returns below error:
> scala> cc.sql("desc carbontest_002").show 
> +-+-+---+ 
> | col_name|data_type|comment| 
> +-+-+---+ 
> |  vin|   string|   | 
> |data_date|   string|   | 
> +-+-+---+ 
> scala> cc.sql("load data inpath 
> 'hdfs://nameservice2/user/appuser/lucao/mydata4.csv' into table 
> default.carbontest_002 OPTIONS('DELIMITER'=',')") 
> WARN  07-12 16:30:30,241 - main skip empty input file: 
> hdfs://nameservice2/user/appuser/lucao/mydata4.csv/_SUCCESS 
> AUDIT 07-12 16:30:34,338 - [*.com][appuser][Thread-1]Data load request has 
> been received for table default.carbontest_002 
> AUDIT 07-12 16:30:38,410 - [*.com][appuser][Thread-1]Data load is successful 
> for default.carbontest_002 
> res12: org.apache.spark.sql.DataFrame = [] 
> scala> cc.sql("select count(*) from carbontest_002") 
> res14: org.apache.spark.sql.DataFrame = [_c0: bigint] 
> scala> res14.show 
> +---+ 
> |_c0| 
> +---+ 
> |100| 
> +---+ 
> scala> cc.sql("select vin, count(*) as cnt from carbontest_002 group by 
> vin").show 
> WARN  07-12 16:32:04,250 - Lost task 1.0 in stage 20.0 (TID 40, *.com): 
> java.lang.ClassCastException: java.lang.String cannot be cast to 
> java.lang.Integer 
> at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) 
> at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getInt(rows.scala:41)
>  
> at 
> org.apache.spark.sql.catalyst.expressions.GenericMutableRow.getInt(rows.scala:248)
>  
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source) 
> at 
> org.apache.spark.sql.CarbonScan$$anonfun$1$$anon$1.next(CarbonScan.scala:155) 
> at 
> org.apache.spark.sql.CarbonScan$$anonfun$1$$anon$1.next(CarbonScan.scala:149) 
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:512)
>  
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.(TungstenAggregationIterator.scala:686)
>  
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
>  
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
>  
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>  
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>  
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) 
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 
> at org.apache.spark.scheduler.Task.run(Task.scala:89) 
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  
> at java.lang.Thread.run(Thread.java:745) 
> ERROR 07-12 16:32:04,516 - Task 1 in stage 20.0 failed 4 times; aborting job 
> WARN  07-12 16:32:04,600 - Lost task 0.1 in stage 20.0 (TID 45, *): 
> TaskKilled (killed intentionally) 
> ERROR 07-12 16:32:04,604 - Listener SQLListener threw an exception 
> java.lang.NullPointerException 
> at 
> org.apache.spark.sql.execution.ui.SQLListener.onTaskEnd(SQLListener.scala:167)
>  
> at 
> org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
>  
> at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
>  
> at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)

[jira] [Closed] (CARBONDATA-559) Job failed at last step

2016-12-25 Thread Cao, Lionel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao, Lionel closed CARBONDATA-559.
--

> Job failed at last step
> ---
>
> Key: CARBONDATA-559
> URL: https://issues.apache.org/jira/browse/CARBONDATA-559
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.2.0-incubating
> Environment: carbon version: branch-0.2
> hadoop 2.4.0
> spark 1.6.0
> OS centOS
>Reporter: Cao, Lionel
> Attachments: test001.log.zip
>
>
> Hi team,
> My job alwasy failed at last step:
> it said 'yarn' user don't have write access to target data 
> path(storeLocation).
> But I tested twice with 1 rows data, both successed. could you help look 
> into the log? Please refer to the attachment. 
> Search 'access=WRITE' you can see the exception.
> Search 'Exception' for other exceptions.
> thanks,
> Lionel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CARBONDATA-559) Job failed at last step

2016-12-25 Thread Cao, Lionel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao, Lionel resolved CARBONDATA-559.

Resolution: Done

> Job failed at last step
> ---
>
> Key: CARBONDATA-559
> URL: https://issues.apache.org/jira/browse/CARBONDATA-559
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.2.0-incubating
> Environment: carbon version: branch-0.2
> hadoop 2.4.0
> spark 1.6.0
> OS centOS
>Reporter: Cao, Lionel
> Attachments: test001.log.zip
>
>
> Hi team,
> My job alwasy failed at last step:
> it said 'yarn' user don't have write access to target data 
> path(storeLocation).
> But I tested twice with 1 rows data, both successed. could you help look 
> into the log? Please refer to the attachment. 
> Search 'access=WRITE' you can see the exception.
> Search 'Exception' for other exceptions.
> thanks,
> Lionel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-559) Job failed at last step

2016-12-25 Thread Cao, Lionel (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15776831#comment-15776831
 ] 

Cao, Lionel commented on CARBONDATA-559:


Thank you Babu! It works!

> Job failed at last step
> ---
>
> Key: CARBONDATA-559
> URL: https://issues.apache.org/jira/browse/CARBONDATA-559
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.2.0-incubating
> Environment: carbon version: branch-0.2
> hadoop 2.4.0
> spark 1.6.0
> OS centOS
>Reporter: Cao, Lionel
> Attachments: test001.log.zip
>
>
> Hi team,
> My job alwasy failed at last step:
> it said 'yarn' user don't have write access to target data 
> path(storeLocation).
> But I tested twice with 1 rows data, both successed. could you help look 
> into the log? Please refer to the attachment. 
> Search 'access=WRITE' you can see the exception.
> Search 'Exception' for other exceptions.
> thanks,
> Lionel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-559) Job failed at last step

2016-12-23 Thread Cao, Lionel (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15772502#comment-15772502
 ] 

Cao, Lionel commented on CARBONDATA-559:


The privilege of data path is appuser:appuser drwxr-xr-x.
And looks like the carbondata will use both [yarn, appuser] to write the data 
file/dictionary file/index file.
It is strange for this error occured because some time it doesn't return error.

> Job failed at last step
> ---
>
> Key: CARBONDATA-559
> URL: https://issues.apache.org/jira/browse/CARBONDATA-559
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.2.0-incubating
> Environment: carbon version: branch-0.2
> hadoop 2.4.0
> spark 1.6.0
> OS centOS
>Reporter: Cao, Lionel
> Attachments: test001.log.zip
>
>
> Hi team,
> My job alwasy failed at last step:
> it said 'yarn' user don't have write access to target data 
> path(storeLocation).
> But I tested twice with 1 rows data, both successed. could you help look 
> into the log? Please refer to the attachment. 
> Search 'access=WRITE' you can see the exception.
> Search 'Exception' for other exceptions.
> thanks,
> Lionel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-559) Job failed at last step

2016-12-23 Thread Cao, Lionel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao, Lionel updated CARBONDATA-559:
---
Attachment: test001.log.zip

> Job failed at last step
> ---
>
> Key: CARBONDATA-559
> URL: https://issues.apache.org/jira/browse/CARBONDATA-559
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.2.0-incubating
> Environment: carbon version: branch-0.2
> hadoop 2.4.0
> spark 1.6.0
> OS centOS
>Reporter: Cao, Lionel
> Attachments: test001.log.zip
>
>
> Hi team,
> My job alwasy failed at last step:
> it said 'yarn' user don't have write access to target data 
> path(storeLocation).
> But I tested twice with 1 rows data, both successed. could you help look 
> into the log? Please refer to the attachment. 
> Search 'access=WRITE' you can see the exception.
> Search 'Exception' for other exceptions.
> thanks,
> Lionel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-559) Job failed at last step

2016-12-23 Thread Cao, Lionel (JIRA)
Cao, Lionel created CARBONDATA-559:
--

 Summary: Job failed at last step
 Key: CARBONDATA-559
 URL: https://issues.apache.org/jira/browse/CARBONDATA-559
 Project: CarbonData
  Issue Type: Bug
  Components: core
Affects Versions: 0.2.0-incubating
 Environment: carbon version: branch-0.2
hadoop 2.4.0
spark 1.6.0
OS centOS
Reporter: Cao, Lionel


Hi team,
My job alwasy failed at last step:
it said 'yarn' user don't have write access to target data path(storeLocation).
But I tested twice with 1 rows data, both successed. could you help look 
into the log? Please refer to the attachment. 
Search 'access=WRITE' you can see the exception.
Search 'Exception' for other exceptions.

thanks,
Lionel




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-514) Select string type columns will return error.

2016-12-07 Thread Cao, Lionel (JIRA)
Cao, Lionel created CARBONDATA-514:
--

 Summary: Select string type columns will return error.
 Key: CARBONDATA-514
 URL: https://issues.apache.org/jira/browse/CARBONDATA-514
 Project: CarbonData
  Issue Type: Bug
  Components: sql
Affects Versions: 1.0.0-incubating
Reporter: Cao, Lionel


The data successfully loaded and count(*) is OK, but when I tried to query the 
detail data, it returns below error:

scala> cc.sql("desc carbontest_002").show 

+-+-+---+ 

| col_name|data_type|comment| 

+-+-+---+ 

|  vin|   string|   | 

|data_date|   string|   | 

+-+-+---+ 



scala> cc.sql("load data inpath 
'hdfs://nameservice2/user/appuser/lucao/mydata4.csv' into table 
default.carbontest_002 OPTIONS('DELIMITER'=',')") 

WARN  07-12 16:30:30,241 - main skip empty input file: 
hdfs://nameservice2/user/appuser/lucao/mydata4.csv/_SUCCESS 

AUDIT 07-12 16:30:34,338 - [*.com][appuser][Thread-1]Data load request has been 
received for table default.carbontest_002 

AUDIT 07-12 16:30:38,410 - [*.com][appuser][Thread-1]Data load is successful 
for default.carbontest_002 

res12: org.apache.spark.sql.DataFrame = [] 


scala> cc.sql("select count(*) from carbontest_002") 

res14: org.apache.spark.sql.DataFrame = [_c0: bigint] 


scala> res14.show 

+---+ 

|_c0| 

+---+ 

|100| 

+---+ 



scala> cc.sql("select vin, count(*) as cnt from carbontest_002 group by 
vin").show 

WARN  07-12 16:32:04,250 - Lost task 1.0 in stage 20.0 (TID 40, *.com): 
java.lang.ClassCastException: java.lang.String cannot be cast to 
java.lang.Integer 

at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) 

at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getInt(rows.scala:41)
 

at 
org.apache.spark.sql.catalyst.expressions.GenericMutableRow.getInt(rows.scala:248)
 

at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source) 

at 
org.apache.spark.sql.CarbonScan$$anonfun$1$$anon$1.next(CarbonScan.scala:155) 

at 
org.apache.spark.sql.CarbonScan$$anonfun$1$$anon$1.next(CarbonScan.scala:149) 

at 
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:512)
 

at 
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.(TungstenAggregationIterator.scala:686)
 

at 
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
 

at 
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
 

at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
 

at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
 

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 

at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 

at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 

at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) 

at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 

at org.apache.spark.scheduler.Task.run(Task.scala:89) 

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) 

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 

at java.lang.Thread.run(Thread.java:745) 


ERROR 07-12 16:32:04,516 - Task 1 in stage 20.0 failed 4 times; aborting job 

WARN  07-12 16:32:04,600 - Lost task 0.1 in stage 20.0 (TID 45, *): TaskKilled 
(killed intentionally) 

ERROR 07-12 16:32:04,604 - Listener SQLListener threw an exception 

java.lang.NullPointerException 

at 
org.apache.spark.sql.execution.ui.SQLListener.onTaskEnd(SQLListener.scala:167) 

at 
org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
 

at 
org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
 

at 
org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
 

at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55) 

at 
org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
 

at 
org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
 

at