[GitHub] [carbondata] akashrn5 commented on pull request #4106: [CARBONDATA-4147] Fix re-arrange schema in logical relation on MV partition table having sort column

2021-03-16 Thread GitBox


akashrn5 commented on pull request #4106:
URL: https://github.com/apache/carbondata/pull/4106#issuecomment-800846438


   LGTM, @ajantha-bhat please review this once as you have worked on rearrange 
logic in insert optimization. Please see if there is any impact.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #4106: [CARBONDATA-4147] Fix re-arrange schema in logical relation on MV partition table having sort column

2021-03-16 Thread GitBox


akashrn5 commented on pull request #4106:
URL: https://github.com/apache/carbondata/pull/4106#issuecomment-800846101


   @Indhumathi27 please change description with example as discussed



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4106: [CARBONDATA-4147] Fix re-arrange schema in logical relation on MV partition table having sort column

2021-03-16 Thread GitBox


akashrn5 commented on a change in pull request #4106:
URL: https://github.com/apache/carbondata/pull/4106#discussion_r595752950



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertIntoCommand.scala
##
@@ -181,7 +183,13 @@ case class CarbonInsertIntoCommand(databaseNameOp: 
Option[String],
   if (isNotReArranged) {
 // Re-arrange the catalog table schema and output for partition 
relation
 logicalPartitionRelation =
-  getReArrangedSchemaLogicalRelation(reArrangedIndex, 
logicalPartitionRelation)
+  if (carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable.isMV) {
+// For MV partition table, partition columns will be at the end. 
Re-arrange

Review comment:
   as discussed, please add a comment here with example, so reviewers and 
developers will be clear why only for MV we need to handle it separately





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (CARBONDATA-3816) Support Float and Decimal in the Merge Flow

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3816:
-
Fix Version/s: (was: 2.1.1)
   2.2.0

> Support Float and Decimal in the Merge Flow
> ---
>
> Key: CARBONDATA-3816
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3816
> Project: CarbonData
>  Issue Type: New Feature
>  Components: data-load
>Affects Versions: 2.0.0
>Reporter: Xingjun Hao
>Priority: Major
> Fix For: 2.2.0
>
>
> We don't support FLOAT and DECIMAL datatype in the CDC Flow. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3615) Show metacache shows the index server index-dictionary files when data loaded after index server disabled using set command

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3615:
-
Fix Version/s: (was: 2.1.1)
   2.2.0

> Show metacache shows the index server index-dictionary files when data loaded 
> after index server disabled using set command
> ---
>
> Key: CARBONDATA-3615
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3615
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Vikram Ahuja
>Priority: Minor
> Fix For: 2.2.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Show metacache shows the index server index-dictionary files when data loaded 
> after index server disabled using set command
> +-+-+-+-+--+
> |    Field    |  Size   |         Comment         | Cache Location  |
> +-+-+-+-+--+
> | Index       | 0 B     | 0/2 index files cached  | DRIVER          |
> | Dictionary  | 0 B     |                         | DRIVER          |
> *| Index       | 1.5 KB  | 2/2 index files cached  | INDEX SERVER    |*
> *| Dictionary  | 0 B     |                         | INDEX SERVER    |*
> *+-+-+-+*-+--+



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-3615) Show metacache shows the index server index-dictionary files when data loaded after index server disabled using set command

2021-03-16 Thread Ajantha Bhat (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303133#comment-17303133
 ] 

Ajantha Bhat commented on CARBONDATA-3615:
--

[~vikramahuja_]: please check and close issue. if it is already handled

> Show metacache shows the index server index-dictionary files when data loaded 
> after index server disabled using set command
> ---
>
> Key: CARBONDATA-3615
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3615
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Vikram Ahuja
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Show metacache shows the index server index-dictionary files when data loaded 
> after index server disabled using set command
> +-+-+-+-+--+
> |    Field    |  Size   |         Comment         | Cache Location  |
> +-+-+-+-+--+
> | Index       | 0 B     | 0/2 index files cached  | DRIVER          |
> | Dictionary  | 0 B     |                         | DRIVER          |
> *| Index       | 1.5 KB  | 2/2 index files cached  | INDEX SERVER    |*
> *| Dictionary  | 0 B     |                         | INDEX SERVER    |*
> *+-+-+-+*-+--+



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3875) Support show segments include stage

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3875:
-
Fix Version/s: (was: 2.1.1)
   2.1.0

> Support show segments include stage
> ---
>
> Key: CARBONDATA-3875
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3875
> Project: CarbonData
>  Issue Type: New Feature
>  Components: spark-integration
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Xingjun Hao
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> There is a lack of monitoring of the stage information in the current system, 
> 'Show segments include stage' command shall be supported. which will provide 
> monitoring information, such as createTime, partitioninfo, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3856) Support the LIMIT operator for show segments command

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3856:
-
Fix Version/s: (was: 2.1.1)
   2.2.0

> Support the LIMIT operator for show segments command
> 
>
> Key: CARBONDATA-3856
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3856
> Project: CarbonData
>  Issue Type: New Feature
>  Components: spark-integration
>Affects Versions: 2.0.0
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.2.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Now, in the 2.0.0 release, CarbonData doesn't support LIMIT operator in the 
> SHOW SEGMENTS command. The time cost is expensive when there are too many 
> segments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4095) Select Query with SI filter fails, when columnDrift is enabled

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4095:
-
Issue Type: Bug  (was: Improvement)

> Select Query with SI filter fails, when columnDrift is enabled
> --
>
> Key: CARBONDATA-4095
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4095
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthu Murugesh
>Assignee: Indhumathi Muthu Murugesh
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> sql({color:#067d17}"drop table if exists maintable"{color})
>  sql({color:#067d17}"create table maintable (a string,b string,c int,d int) 
> STORED AS carbondata "{color})
>  sql({color:#067d17}"insert into maintable values('k','d',2,3)"{color})
>  sql({color:#067d17}"alter table maintable set 
> tblproperties('sort_columns'='c,d','sort_scope'='local_sort')"{color})
>  sql({color:#067d17}"create index indextable on table maintable(b) AS 
> 'carbondata'"{color})
>  sql({color:#067d17}"insert into maintable values('k','x',2,4)"{color})
>  sql({color:#067d17}"select * from maintable where b='x'"{color}).show(false)
>  
>  
>  
>  
> 2020-12-22 18:58:37 ERROR Executor:91 - Exception in task 0.0 in stage 40.0 
> (TID 422)
> java.lang.RuntimeException: Error while resolving filter expression
>  at 
> org.apache.carbondata.core.index.IndexFilter.resolveFilter(IndexFilter.java:283)
>  at 
> org.apache.carbondata.core.index.IndexFilter.getResolver(IndexFilter.java:203)
>  at 
> org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:152)
>  at 
> org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:382)
>  at 
> org.apache.carbondata.core.scan.executor.impl.VectorDetailQueryExecutor.execute(VectorDetailQueryExecutor.java:43)
>  at 
> org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.initialize(VectorizedCarbonRecordReader.java:141)
>  at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:540)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.scan_nextBatch_0$(Unknown
>  Source)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>  at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$12$$anon$1.hasNext(WholeStageCodegenExec.scala:631)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolverBasedOnExpressionType(FilterExpressionProcessor.java:190)
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.createFilterResolverTree(FilterExpressionProcessor.java:128)
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.createFilterResolverTree(FilterExpressionProcessor.java:121)
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolverTree(FilterExpressionProcessor.java:77)
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolver(FilterExpressionProcessor.java:61)
>  at 
> org.apache.carbondata.core.index.IndexFilter.resolveFilter(IndexFilter.java:281)
>  ... 26 more
> 2020-12-22 18:58:37 ERROR TaskSetManager:70 - Task 0 in stage 40.0 failed 1 
> times; aborting job



--
Th

[jira] [Updated] (CARBONDATA-4003) Improve IUD Concurrency

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4003:
-
Fix Version/s: (was: 2.1.1)
   2.2.0

> Improve IUD Concurrency
> ---
>
> Key: CARBONDATA-4003
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4003
> Project: CarbonData
>  Issue Type: Improvement
>  Components: spark-integration
>Affects Versions: 2.0.1
>Reporter: Kejian Li
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> When some segments' state of the table is INSERT IN PROGRESS, update 
> operation on the table fails.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3617) loadDataUsingGlobalSort should based on SortColumns Instead Of Whole CarbonRow

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-3617.
--
Fix Version/s: (was: 2.1.1)
   2.0.0
   Resolution: Fixed

> loadDataUsingGlobalSort should based on SortColumns Instead Of Whole CarbonRow
> --
>
> Key: CARBONDATA-3617
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3617
> Project: CarbonData
>  Issue Type: Improvement
>  Components: data-load
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.0.0
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> During loading Data usesing globalsort, the sortby processing is based the 
> whole carbon row, the overhead of gc is huge when there are many columns. 
> Theoretically, the sortby processing can works well just based on the sort 
> columns, which will brings less time overhead and gc overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3603) Feature Change in CarbonData 2.0

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3603:
-
Fix Version/s: (was: 2.1.1)
   2.2.0

> Feature Change in CarbonData 2.0
> 
>
> Key: CARBONDATA-3603
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3603
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jacky Li
>Priority: Major
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3559) Support adding carbon file into CarbonData table

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3559:
-
Fix Version/s: (was: 2.1.1)
   2.2.0

> Support adding carbon file into CarbonData table
> 
>
> Key: CARBONDATA-3559
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3559
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jacky Li
>Assignee: Jacky Li
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Since adding parquet/orc files into CarbonData table are supported now, 
> adding carbon files should be supported as well



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3370) fix missing version of maven-duplicate-finder-plugin

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3370:
-
Fix Version/s: (was: 2.1.1)
   2.2.0

> fix missing version of maven-duplicate-finder-plugin
> 
>
> Key: CARBONDATA-3370
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3370
> Project: CarbonData
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 1.5.3
>Reporter: lamber-ken
>Priority: Critical
> Fix For: 2.2.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> fix missing version of maven-duplicate-finder-plugin in pom file



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-3670) Support compress offheap columnpage directly, avoding a copy of data from offhead to heap when compressed.

2021-03-16 Thread Ajantha Bhat (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303118#comment-17303118
 ] 

Ajantha Bhat commented on CARBONDATA-3670:
--

Already handled in 2.0 from https://github.com/apache/carbondata/pull/3638

> Support compress offheap columnpage directly, avoding a copy of data from 
> offhead to heap when compressed.
> --
>
> Key: CARBONDATA-3670
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3670
> Project: CarbonData
>  Issue Type: Wish
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When writing data, the columnpages are stored on the offheap,  the pages will 
> be compressed to save storage cost. Now, in the compression processing, the 
> data will be copied from the offheap to the heap before compressed, which 
> leads to heavier GC overhead compared with compress offhead directly.
> To sum up, we support compress offheap columnpage directly, avoding a copy of 
> data from offhead to heap when compressed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3670) Support compress offheap columnpage directly, avoding a copy of data from offhead to heap when compressed.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-3670.
--
Fix Version/s: (was: 2.1.1)
   Resolution: Duplicate

> Support compress offheap columnpage directly, avoding a copy of data from 
> offhead to heap when compressed.
> --
>
> Key: CARBONDATA-3670
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3670
> Project: CarbonData
>  Issue Type: Wish
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Xingjun Hao
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When writing data, the columnpages are stored on the offheap,  the pages will 
> be compressed to save storage cost. Now, in the compression processing, the 
> data will be copied from the offheap to the heap before compressed, which 
> leads to heavier GC overhead compared with compress offhead directly.
> To sum up, we support compress offheap columnpage directly, avoding a copy of 
> data from offhead to heap when compressed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4137) Refactor CarbonDataSourceScan without Spark Filter

2021-03-16 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4137.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> Refactor CarbonDataSourceScan without Spark Filter
> --
>
> Key: CARBONDATA-4137
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4137
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: David Cai
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3746) Support column chunk cache creation and basic read/write

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3746:
-
Fix Version/s: (was: 2.1.1)
   2.2.0

> Support column chunk cache creation and basic read/write
> 
>
> Key: CARBONDATA-3746
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3746
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Jacky Li
>Assignee: Jacky Li
>Priority: Major
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3608) Drop 'STORED BY' syntax in create table

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3608:
-
Fix Version/s: (was: 2.1.1)
   2.2.0

> Drop 'STORED BY' syntax in create table
> ---
>
> Key: CARBONDATA-3608
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3608
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Jacky Li
>Priority: Major
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4152) Enhance logger after query failed with exception: java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at BlockletIndexFactory.getSegmentProperties

2021-03-16 Thread Yahui Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yahui Liu updated CARBONDATA-4152:
--
Description: 
There is chance that getIndexes(segment) return empty list and later call 
list.get(0) throw exception.

    caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:657)
at java.util.ArrayList.get(ArrayList.java:433)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getSegmentProperties(BlockletDataMapFactory.java:376)
at 
org.apache.carbondata.core.datamap.TableDataMap.pruneWithFilter(TableDataMap.java:195)
at org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:171)
at 
org.apache.carbondata.hadoop.api.CarbonInputFormat.getPrunedBlocklets(CarbonInputFormat.java:491)
at 
org.apache.carbondata.hadoop.api.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:414)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:494)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:218)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.internalGetPartitions(CarbonScanRDD.scala:129)
at org.apache.carbondata.spark.rdd.CarbonRDD.getPartitions(CarbonRDD.scala:66)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.g

  was:
There is chance that getIndexes(segment) return empty list and later call 
list.get(0) throw exception.

caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:657)
at java.util.ArrayList.get(ArrayList.java:433)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getSegmentProperties(BlockletDataMapFactory.java:376)
at 
org.apache.carbondata.core.datamap.TableDataMap.pruneWithFilter(TableDataMap.java:195)
at org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:171)
at 
org.apache.carbondata.hadoop.api.CarbonInputFormat.getPrunedBlocklets(CarbonInputFormat.java:491)
at 
org.apache.carbondata.hadoop.api.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:414)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:494)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:218)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.internalGetPartitions(CarbonScanRDD.scala:129)
at org.apache.carbondata.spark.rdd.CarbonRDD.getPartitions(CarbonRDD.scala:66)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.g


> Enhance logger after query failed with exception: 
> java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at 
> BlockletIndexFactory.getSegmentProperties
> 
>
> Key: CARBONDATA-4152
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4152
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0
>Reporter: Yahui Liu
>Priority: Minor
>
> There is chance that getIndexes(segment) return empty list and later call 
> list.get(0) throw exception.
>     caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.rangeC

[jira] [Updated] (CARBONDATA-4152) Enhance logger after query failed with exception: java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at BlockletIndexFactory.getSegmentProperties

2021-03-16 Thread Yahui Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yahui Liu updated CARBONDATA-4152:
--
Description: 
There is chance that getIndexes(segment) return empty list and later call 
list.get(0) throw exception.

caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:657)
at java.util.ArrayList.get(ArrayList.java:433)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getSegmentProperties(BlockletDataMapFactory.java:376)
at 
org.apache.carbondata.core.datamap.TableDataMap.pruneWithFilter(TableDataMap.java:195)
at org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:171)
at 
org.apache.carbondata.hadoop.api.CarbonInputFormat.getPrunedBlocklets(CarbonInputFormat.java:491)
at 
org.apache.carbondata.hadoop.api.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:414)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:494)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:218)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.internalGetPartitions(CarbonScanRDD.scala:129)
at org.apache.carbondata.spark.rdd.CarbonRDD.getPartitions(CarbonRDD.scala:66)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.g

  was:There is chance that getIndexes(segment) return empty list and later call 
list.get(0) throw exception.


> Enhance logger after query failed with exception: 
> java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at 
> BlockletIndexFactory.getSegmentProperties
> 
>
> Key: CARBONDATA-4152
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4152
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0
>Reporter: Yahui Liu
>Priority: Minor
>
> There is chance that getIndexes(segment) return empty list and later call 
> list.get(0) throw exception.
> caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657)
> at java.util.ArrayList.get(ArrayList.java:433)
> at 
> org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getSegmentProperties(BlockletDataMapFactory.java:376)
> at 
> org.apache.carbondata.core.datamap.TableDataMap.pruneWithFilter(TableDataMap.java:195)
> at 
> org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:171)
> at 
> org.apache.carbondata.hadoop.api.CarbonInputFormat.getPrunedBlocklets(CarbonInputFormat.java:491)
> at 
> org.apache.carbondata.hadoop.api.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:414)
> at 
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:494)
> at 
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:218)
> at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD.internalGetPartitions(CarbonScanRDD.scala:129)
> at org.apache.carbondata.spark.rdd.CarbonRDD.getPartitions(CarbonRDD.scala:66)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
> at scala.Option.g



--
This message was sent by Atlassi

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


CarbonDataQA2 commented on pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#issuecomment-800773654


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3806/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


CarbonDataQA2 commented on pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#issuecomment-800772914


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5572/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (CARBONDATA-4152) Enhance logger after query failed with exception: java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at BlockletIndexFactory.getSegmentProperties

2021-03-16 Thread Yahui Liu (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303065#comment-17303065
 ] 

Yahui Liu commented on CARBONDATA-4152:
---

When CARBONDATA-3471 issue happens, there is no any log which can help to 
location the root cause of the issue, need to enhance the logger.

> Enhance logger after query failed with exception: 
> java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at 
> BlockletIndexFactory.getSegmentProperties
> 
>
> Key: CARBONDATA-4152
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4152
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0
>Reporter: Yahui Liu
>Priority: Minor
>
> There is chance that getIndexes(segment) return empty list and later call 
> list.get(0) throw exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4152) Enhance logger after query failed with exception: java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at BlockletIndexFactory.getSegmentProperties

2021-03-16 Thread Yahui Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yahui Liu updated CARBONDATA-4152:
--
Summary: Enhance logger after query failed with exception: 
java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at 
BlockletIndexFactory.getSegmentProperties  (was: Query failed with exception: 
java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at 
BlockletIndexFactory.getSegmentProperties)

> Enhance logger after query failed with exception: 
> java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at 
> BlockletIndexFactory.getSegmentProperties
> 
>
> Key: CARBONDATA-4152
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4152
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0
>Reporter: Yahui Liu
>Priority: Minor
>
> There is chance that getIndexes(segment) return empty list and later call 
> list.get(0) throw exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4034) Improve the time-consuming of Horizontal Compaction for update

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4034:
-
Fix Version/s: 2.1.0

> Improve the time-consuming of Horizontal Compaction for update
> --
>
> Key: CARBONDATA-4034
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4034
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Jiayu Shen
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> In the update flow, horizontal compaction will be significantly slower when 
> updating with a lot of segments(or a lot of blocks). There is a case whose 
> costing is as shown in the log.
> {code:java}
> 2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Update Compaction operation started for 
> [ods_oms.oms_wh_outbound_order] 
>  2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Update Compaction operation completed for 
> [ods_oms.oms_wh_outbound_order]. 
>  2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Delete Compaction operation started for 
> [ods_oms.oms_wh_outbound_order] 
>  2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Delete Compaction operation completed for 
> [ods_oms.oms_wh_outbound_order].{code}
> In this PR, we optimize the process between second and third row of the log, 
> by optimizing the method _performDeleteDeltaCompaction_ in horizontal 
> compaction flow.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4057) Support Complex DataType when Save DataFrame

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4057:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Support Complex DataType when Save DataFrame
> 
>
> Key: CARBONDATA-4057
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4057
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jiayu Shen
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently,once trigger df.mode(overwrite).save, complex datatype isn't 
> supported, which shall be optimized



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4030) Concurrent SI global sort cannot be success

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4030:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Concurrent SI global sort cannot be success
> ---
>
> Key: CARBONDATA-4030
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4030
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Ajantha Bhat
>Assignee: Ajantha Bhat
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> when concurrent SI global sort is in progress, one load was removing the 
> table property added by the other load. So, the global sort insert for one 
> load was failing with error that unable to find position id in the projection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4064) TPCDS queries are failing with NOne.get exception when table has SI configured

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4064:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> TPCDS queries are failing with NOne.get exception when table has SI configured
> --
>
> Key: CARBONDATA-4064
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4064
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthu Murugesh
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4029) After delete in the table which has Alter-added SDK segments, then the count(*) is 0.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4029:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> After delete in the table which has Alter-added SDK segments, then the 
> count(*) is 0.
> -
>
> Key: CARBONDATA-4029
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4029
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: 3 node FI cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Fix For: 2.1.1
>
> Attachments: Primitive.rar
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Do delete on a table which has alter added SDK segments. then the count* is 
> 0. Even count* will be 0 even any number of SDK segments are added after it.
> Test queries:
> drop table if exists external_primitive;
> create table external_primitive (id int, name string, rank smallint, salary 
> double, active boolean, dob date, doj timestamp, city string, dept string) 
> stored as carbondata;
> --before executing the below alter add segment-place the attached SDK files 
> in hdfs at /sdkfiles/primitive2 folder;
> alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select
>  * from external_primitive;
> delete from external_primitive where id =2;select * from external_primitive;
> Console output:
> /> drop table if exists external_primitive;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (1.586 seconds)
> /> create table external_primitive (id int, name string, rank smallint, 
> salary double, active boolean, dob date, doj timestamp, city string, dept 
> string) stored as carbondata;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (0.774 seconds)
> /> alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select
>  * from external_primitive;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (1.077 seconds)
> INFO : Execution ID: 320
> +-+---+---+--+-+-++++
> | id | name | rank | salary | active | dob | doj | city | dept |
> +-+---+---+--+-+-++++
> | 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-10 01:00:20.0 | Pune 
> | IT |
> | 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 12:00:20.0 | 
> Bangalore | DATA |
> | 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-12-01 02:20:20.0 | 
> Pune | DATA |
> | 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 07:00:20.0 | Delhi 
> | MAINS |
> | 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 04:00:20.0 | Delhi 
> | IT |
> | 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 05:00:20.0 | 
> Bangalore | DATA |
> | 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2017-01-01 02:00:20.0 | Pune | 
> IT |
> | 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 12:00:20.0 | 
> Bangalore | DATA |
> | 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-15 01:00:20.0 | Pune 
> | DATA |
> | 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 12:00:20.0 | 
> Bangalore | MAINS |
> | 18 | | 3 | 7.86786786787E9 | true | 1980-10-05 | 1995-10-07 22:00:20.0 | 
> Bangalore | IT |
> | 19 | | 2 | 5464545.33 | true | 1986-06-06 | 2008-08-15 01:00:20.0 | Delhi | 
> DATA |
> | 20 | NULL | 3 | 7867867.34 | true | 2000-05-01 | 2014-01-18 12:00:20.0 | 
> Bangalore | MAINS |
> +-+---+---+--+-+-++++
> 13 rows selected (2.458 seconds)
> /> delete from external_primitive where id =2;select * from 
> external_primitive;
> INFO : Execution ID: 322
> ++
> | Deleted Row Count |
> ++
> | 1 |
> ++
> 1 row selected (3.723 seconds)
> +-+---+---+-+-+--+--+---+---+
> | id | name | rank | salary | active | dob | doj | city | dept |
> +-+---+---+-+-+--+--+---+---+
> +-+---+---+-+-+--+--+---+---+
> No rows selected (1.531 seconds)
> /> alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon');select
>  * from external_primitive;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (0.766 seconds)
> +-+---+---+-+-+--+--+---+---+
> | id | name | rank | salary | active | dob | doj | cit

[jira] [Updated] (CARBONDATA-4020) Drop bloom index for single index of table having multiple index drops all indexes

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4020:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Drop bloom index for single index of table having multiple index drops all 
> indexes
> --
>
> Key: CARBONDATA-4020
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4020
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 2.1.0
> Environment: Spark 2.4.5
>Reporter: Chetan Bhat
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Create multiple bloom indexes on the table. Try to drop single bloom index
> drop table if exists datamap_test_1;
>  CREATE TABLE datamap_test_1 (id int,name string,salary float,dob date)STORED 
> as carbondata TBLPROPERTIES('SORT_COLUMNS'='id');
>  
>  CREATE index dm_datamap_test_1_2 ON TABLE datamap_test_1(id) as 
> 'bloomfilter' PROPERTIES ( 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 
> 'BLOOM_COMPRESS'='true');
>  
>  CREATE index dm_datamap_test3 ON TABLE datamap_test_1 (name) as 
> 'bloomfilter' PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 
> 'BLOOM_COMPRESS'='true');
> show indexes on table datamap_test_1;
> drop index dm_datamap_test_1_2 on datamap_test_1;
> show indexes on table datamap_test_1;
>  
> Issue : Drop bloom index for single index of table having multiple index 
> drops all indexes
> 0: jdbc:hive2://linux-32:22550/> show indexes on table datamap_test_1;
> +--+--+--++--+
> | Name | Provider | Indexed Columns | Properties | Status | Sync In
> +--+--+--++--+
> | dm_datamap_test_1_2 | bloomfilter | id | 
> 'INDEX_COLUMNS'='id','bloom_compress'='true','bloom_fpp'='0.1','blo
> | dm_datamap_test3 | bloomfilter | name | 
> 'INDEX_COLUMNS'='name','bloom_compress'='true','bloom_fpp'='0.1','b
> +--+--+--++--+
> 2 rows selected (0.315 seconds)
> 0: jdbc:hive2://linux-32:22550/> drop index dm_datamap_test_1_2 on 
> datamap_test_1;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (1.232 seconds)
> 0: jdbc:hive2://linux-32:22550/> show indexes on table datamap_test_1;
> +---+---+--+-+-++
> | Name | Provider | Indexed Columns | Properties | Status | Sync Info |
> +---+---+--+-+-++
> +---+---+--+-+-++
> No rows selected (0.21 seconds)
> 0: jdbc:hive2://linux-32:22550/>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4078) add external segment and query with index server fails

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4078:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> add external segment and query with index server fails
> --
>
> Key: CARBONDATA-4078
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4078
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Minor
> Fix For: 2.1.1
>
> Attachments: is_noncarbonsegments stacktrace
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> index server tries to cache parquet/orc segments and fails as it cannot read 
> the file format when the fallback mode is disabled.
> Ex:  'test parquet table' test case
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4093) Add logs for MV and method to verify if mv is in Sync during query

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4093:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Add logs for MV and method to verify if mv is in Sync during query
> --
>
> Key: CARBONDATA-4093
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4093
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthu Murugesh
>Assignee: Indhumathi Muthu Murugesh
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4076) Query having Subquery alias used in query projection doesnot hit mv after creation

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4076:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Query having Subquery alias used in query projection doesnot hit mv after 
> creation
> --
>
> Key: CARBONDATA-4076
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4076
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthu Murugesh
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> {color:#067d17}CREATE TABLE fact_table1 (empname String, designation String, 
> doj Timestamp,
> {color}{color:#067d17}workgroupcategory int, workgroupcategoryname String, 
> deptno int, deptname String,
> {color}{color:#067d17}projectcode int, projectjoindate Timestamp, 
> projectenddate Timestamp,attendance int,
> {color}{color:#067d17}utilization int,salary int)
> {color}{color:#067d17}STORED AS carbondata;{color}
> {color:#067d17}create materialized view mv_sub as select empname, sum(result) 
> sum_ut from (select empname, utilization result from fact_table1) fact_table1 
> group by empname;
> {color}
>  
> {color:#067d17}select empname, sum(result) sum_ut from (select empname, 
> utilization result from fact_table1) fact_table1 group by empname;{color}
>  
> {color:#067d17}explain select empname, sum(result) sum_ut from (select 
> empname, utilization result from fact_table1) fact_table1 group by 
> empname;{color}
>  
> {color:#067d17}Expected: Query should hit MV{color}
> {color:#067d17}Actual: Query is not hitting MV{color}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4092) Insert command fails with concurrent delete segment operation

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4092:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Insert command fails with concurrent delete segment operation
> -
>
> Key: CARBONDATA-4092
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4092
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Vikram Ahuja
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3987) Issues in SDK Pagination reader (2 issues)

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3987:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Issues in SDK Pagination reader (2 issues)
> --
>
> Key: CARBONDATA-3987
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3987
> Project: CarbonData
>  Issue Type: Bug
>  Components: other
>Affects Versions: 2.1.0
>Reporter: Chetan Bhat
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Issue 1 : 
> write data to table and insert into one more row , error is thrown when try 
> to read new added row where as getTotalRows get incremented by 1.
> Test code-
> /**
>  * Carbon Files are written using CarbonWriter in outputpath
>  *
>  * Carbon Files are read using paginationCarbonReader object
>  * Checking pagination with insert on large data with 8 split
>  */
>  @Test
>  public void testSDKPaginationInsertData() throws IOException, 
> InvalidLoadOptionException, InterruptedException {
>  System.out.println("___" + 
> name.getMethodName() + " TestCase Execution is 
> started");
> //
> // String outputPath1 = getOutputPath(outputDir, name.getMethodName() + 
> "large");
> //
> // long uid = 123456;
> // TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"));
> // writeMultipleCarbonFiles("id int,name string,rank short,salary 
> double,active boolean,dob date,doj timestamp,city string,dept string", 
> getDatas(), outputPath1, uid, null, null);
> //
> // System.out.println("Data is written");
> List data1 = new ArrayList();
>  String[] row1 = \{"1", "AAA", "3", "3444345.66", "true", "1979-12-09", 
> "2011-2-10 1:00:20", "Pune", "IT"};
>  String[] row2 = \{"2", "BBB", "2", "543124.66", "false", "1987-2-19", 
> "2017-1-1 12:00:20", "Bangalore", "DATA"};
>  String[] row3 = \{"3", "CCC", "1", "787878.888", "false", "1982-05-12", 
> "2015-12-1 2:20:20", "Pune", "DATA"};
>  String[] row4 = \{"4", "DDD", "1", "9.24", "true", "1981-04-09", 
> "2000-1-15 7:00:20", "Delhi", "MAINS"};
>  String[] row5 = \{"5", "EEE", "3", "545656.99", "true", "1987-12-09", 
> "2017-11-25 04:00:20", "Delhi", "IT"};
> data1.add(row1);
>  data1.add(row2);
>  data1.add(row3);
>  data1.add(row4);
>  data1.add(row5);
> String outputPath1 = getOutputPath(outputDir, name.getMethodName() + "large");
> long uid = 123456;
>  TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"));
>  writeMultipleCarbonFiles("id int,name string,rank short,salary double,active 
> boolean,dob date,doj timestamp,city string,dept string", data1, outputPath1, 
> uid, null, null);
> System.out.println("Data is written");
> String hdfsPath1 = moveFiles(outputPath1, outputPath1);
>  String datapath1 = hdfsPath1.concat("/" + name.getMethodName() + "large");
>  System.out.println("HDFS Data Path is: " + datapath1);
> runSQL("create table " + name.getMethodName() + "large" + " using carbon 
> location '" + datapath1 + "'");
>  System.out.println("Table " + name.getMethodName() + " is created 
> Successfully");
>  runSQL("select count(*) from " + name.getMethodName() + "large");
>  long uid1 = 123;
>  String outputPath = getOutputPath(outputDir, name.getMethodName());
>  List data = new ArrayList();
>  String[] row = \{"222", "Daisy", "3", "334.456", "true", "1956-11-08", 
> "2013-12-10 12:00:20", "Pune", "IT"};
>  data.add(row);
>  writeData("id int,name string,rank short,salary double,active boolean,dob 
> date,doj timestamp,city string,dept string", data, outputPath, uid, null, 
> null);
>  String hdfsPath = moveFiles(outputPath, outputPath);
>  String datapath = hdfsPath.concat("/" + name.getMethodName());
> runSQL("create table " + name.getMethodName() + " using carbon location '" + 
> datapath + "'");
>  runSQL("select count(*) from " + name.getMethodName());
>  System.out.println("Insert--");
>  runSQL("insert into table " + name.getMethodName() + " select * from " + 
> name.getMethodName() + "large");
>  System.out.println("Inserted");
>  System.out.println("--After Insert--");
>  System.out.println("Query 1");
>  runSQL("select count(*) from " + name.getMethodName());
>  // configure cache size = 4 blocklet
>  CarbonProperties.getInstance()
>  
> .addProperty(CarbonCommonConstants.CARBON_MAX_PAGINATION_LRU_CACHE_SIZE_IN_MB,
>  "4");
> CarbonReaderBuilder carbonReaderBuilder = CarbonReader.builder(datapath, 
> "_temp").withPaginationSupport().projection(new 
> String[]\{"id","name","rank","salary","active","dob","doj","city","dept"});
>  PaginationCarbonReader paginationCarbonReader =
>  (PaginationCarbonReader) carbonReaderBuilder.build();
>  File[] dataFiles1 = new File(datapath).li

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4051:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Geo spatial index algorithm improvement and UDFs enhancement
> 
>
> Key: CARBONDATA-4051
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4051
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jiayu Shen
>Priority: Minor
> Fix For: 2.1.1
>
> Attachments: CarbonData Spatial Index Design Doc v2.docx
>
>  Time Spent: 21h 10m
>  Remaining Estimate: 0h
>
> The requirement is from SEQ,related algorithms are provided by Discovery Team.
> 1. Replace geohash encoded algorithm, and reduce required properties of 
> CREATE TABLE. For example,
> {code:java}
> CREATE TABLE geoTable(
>  timevalue BIGINT,
>  longitude LONG,
>  latitude LONG) COMMENT "This is a GeoTable"
>  STORED AS carbondata
>  TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
>  'SPATIAL_INDEX.mygeohash.type'='geohash',
>  'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
>  'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
>  'SPATIAL_INDEX.mygeohash.gridSize'='50',
>  'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
> 2. Add geo query UDFs
> query filter UDFs :
>  * _*InPolygonList (List polygonList, OperationType opType)*_
>  * _*InPolylineList (List polylineList, Float bufferInMeter)*_
>  * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_
> *operation only support :*
>  * *"OR", means calculating union of two polygons*
>  * *"AND", means calculating intersection of two polygons*
> geo util UDFs :
>  * _*GeoIdToGridXy(Long geoId) :* *Pair*_
>  * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_
>  * _*GeoIdToLatLng(Long geoId) : Pair*_
>  * _*ToUpperLayerGeoId(Long geoId) : Long*_
>  * _*ToRangeList (String polygon) : List*_
> 3. Currently GeoID is a column created internally for spatial tables, this PR 
> will support GeoID column to be customized during LOAD/INSERT INTO. For 
> example, 
> {code:java}
> INSERT INTO geoTable SELECT 0,157542840,116285807,40084087;
> It uesed to be as below, '855280799612' is generated internally,
> ++-+-++
> |mygeohash  |timevalue   |longitude|latitude|
> ++-+-++
> |855280799612|157542840|116285807|40084087|
> ++-+-++
> but now is
> ++-+-++
> |mygeohash  |timevalue  |longitude|latitude|
> ++-+-++
> |0   |157542840|116285807|40084087|
> ++-+-++{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4111) Filter query having invalid results after add segment to table having SI with Indexserver

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4111:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Filter query having invalid results after add segment to table having SI with 
> Indexserver
> -
>
> Key: CARBONDATA-4111
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4111
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Minor
> Fix For: 2.1.1
>
> Attachments: addseg_si_is.png
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> queries to execute:
> create table maintable_sdk(a string, b int, c string) stored as carbondata;
>  insert into maintable_sdk select 'k',1,'k';
>  insert into maintable_sdk select 'l',2,'l';
>  CREATE INDEX maintable_si_sdk on table maintable_sdk (c) as 'carbondata';
>  alter table maintable_sdk add segment 
> options('path'='hdfs://hacluster/sdkfiles/newsegment/', 'format'='carbon');
> spark-sql> select *from maintable_sdk where c='m';
> 2021-01-27 12:10:54,326 | WARN | IPC Client (653337757) connection to 
> linux-30/10.19.90.30:22900 from car...@hadoop.com | Unexpected error reading 
> responses on connection Thread[IPC Client (653337757) connection to 
> linux-30/10.19.90.30:22900 from car...@hadoop.com,5,main] | 
> org.apache.hadoop.ipc.Client.run(Client.java:1113)
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.carbondata.core.indexstore.SegmentWrapperContainer.()
>  at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:135)
>  at 
> org.apache.hadoop.io.WritableFactories.newInstance(WritableFactories.java:58)
>  at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:284)
>  at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77)
>  at 
> org.apache.hadoop.ipc.RpcWritable$WritableWrapper.readFrom(RpcWritable.java:85)
>  at org.apache.hadoop.ipc.RpcWritable$Buffer.getValue(RpcWritable.java:187)
>  at org.apache.hadoop.ipc.RpcWritable$Buffer.newInstance(RpcWritable.java:183)
>  at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1223)
>  at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1107)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.carbondata.core.indexstore.SegmentWrapperContainer.()
>  at java.lang.Class.getConstructor0(Class.java:3082)
>  at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>  at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129)
>  ... 8 more
> 2021-01-27 12:10:54,330 | WARN | main | Distributed Segment Pruning failed, 
> initiating embedded pruning | 
> org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin$.getFilteredSegments(BroadCastSIFilterPushJoin.scala:349)
> java.lang.reflect.UndeclaredThrowableException
>  at com.sun.proxy.$Proxy59.getPrunedSegments(Unknown Source)
>  at 
> org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin$.getFilteredSegments(BroadCastSIFilterPushJoin.scala:341)
>  at 
> org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin$.getFilteredSegments(BroadCastSIFilterPushJoin.scala:426)
>  at 
> org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin.partitions$lzycompute(BroadCastSIFilterPushJoin.scala:80)
>  at 
> org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin.partitions(BroadCastSIFilterPushJoin.scala:78)
>  at 
> org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin.inputCopy$lzycompute(BroadCastSIFilterPushJoin.scala:94)
>  at 
> org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin.inputCopy(BroadCastSIFilterPushJoin.scala:93)
>  at 
> org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin.doExecute(BroadCastSIFilterPushJoin.scala:132)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:177)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:201)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:198)
>  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:173)
>  at 
> org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:293)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:342)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:372)
>  at 
> org.apache.spark.sql.execution.QueryExecution.hiveResultString(QueryExecution.scala:127)
>  at 
> org.apache.s

[jira] [Updated] (CARBONDATA-4094) Select count(*) on partition table fails in index server fallback mode

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4094:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Select count(*) on partition table fails in index server fallback mode
> --
>
> Key: CARBONDATA-4094
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4094
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Vikram Ahuja
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4075) Should refactor to use withEvents instead of fireEvent

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4075:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Should refactor to use withEvents instead of fireEvent
> --
>
> Key: CARBONDATA-4075
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4075
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: David Cai
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4124) Refresh MV which does not exist is not throwing proper message

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4124:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Refresh MV which does not exist is not throwing proper message
> --
>
> Key: CARBONDATA-4124
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4124
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthu Murugesh
>Assignee: Indhumathi Muthu Murugesh
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4110) Support clean files dry run and show statistics after clean files operation

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4110:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Support clean files dry run and show statistics after clean files operation
> ---
>
> Key: CARBONDATA-4110
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4110
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Vikram Ahuja
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 26h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4053) Alter table rename column failed

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4053:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Alter table rename column failed
> 
>
> Key: CARBONDATA-4053
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4053
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.1.0
>Reporter: Yahui Liu
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: 截图.PNG
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Alter table rename column failed because incorrectly replace the content in 
> tblproperties by new column name, which the content is not related to column 
> name.
>   !截图.PNG!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4052) Select query on SI table after insert overwrite is giving wrong result.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4052:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Select query on SI table after insert overwrite is giving wrong result.
> ---
>
> Key: CARBONDATA-4052
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4052
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> # Create carbon table.
>  # Create SI table on the same carbon table.
>  # Do load or insert operation.
>  # Run query insert overwrite on maintable.
>  # Now select query on SI table is showing old as well as new data which 
> should be only new data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4066) data mismatch observed with SI and without SI when SI global sort and SI segment merge is true

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4066:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> data mismatch observed with SI and without SI when SI global sort and SI 
> segment merge is true
> --
>
> Key: CARBONDATA-4066
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4066
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Mahesh Raju Somalaraju
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> data mismatch observed with SI and without SI when SI global sort and SI 
> segment merge is true
>  
> test case for reproduce the issue:
> CarbonProperties.getInstance()
>  .addProperty(CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE, "true")
> sql("create table complextable2 (id int, name string, country array) 
> stored as " +
>  "carbondata tblproperties('sort_scope'='global_sort','sort_columns'='name')")
> sql(
>  s"load data inpath '$resourcesPath/secindex/array.csv' into table 
> complextable2 options('delimiter'=','," +
>  
> "'quotechar'='\"','fileheader'='id,name,country','complex_delimiter_level_1'='$',"
>  +
>  "'global_sort_partitions'='10')")
> val result = sql(" select * from complextable2 where 
> array_contains(country,'china')")
> sql("create index index_2 on table complextable2(country) as 'carbondata' 
> properties" +
>  "('sort_scope'='global_sort')")
> checkAnswer(sql("select count(*) from complextable2 where 
> array_contains(country,'china')"),
>  sql("select count(*) from complextable2 where 
> ni(array_contains(country,'china'))"))



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4056) Adding global sort support for SI segments data files merge operation.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4056:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Adding global sort support for SI segments data files merge operation.
> --
>
> Key: CARBONDATA-4056
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4056
> Project: CarbonData
>  Issue Type: New Feature
>  Components: other
>Affects Versions: 2.0.0
>Reporter: Karan
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Enabling carbon property (carbon.si.segment.merge) helps to reduce number of 
> carbondata files in the SI segments. When SI is created with sort scope as 
> global sort and this carbon property is enabled, then the data in SI segments 
> must be globally sorted after data files are merged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4054) Size control of minor compaction

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4054:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Size control of minor compaction
> 
>
> Key: CARBONDATA-4054
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4054
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: ZHANGSHUNYU
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> {{Currentlly, minor compaction only consider the num of segments and major}}
> compaction only consider the SUM size of segments, but consider a scenario
>  that the user want to use minor compaction by the num of segments but he
>  dont want to merge the segment whose datasize larger the threshold for
>  example 2GB, as it is no need to merge so much big segment and it is time
>  costly.
>  so we need to add a parameter to control the threshold of segment included
>  in minor compaction, so that the user can specify the segment not included
>  in minor compaction once the datasize exeed the threshold, of course default
>  value must be threre.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4067) Change clean files behaviour to support cleaning of in progress segments

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4067:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Change clean files behaviour to support cleaning of in progress segments
> 
>
> Key: CARBONDATA-4067
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4067
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vikram Ahuja
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Change clean files behaviour to support cleaning of in progress segments



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4062) Should make clean files become data trash manager

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4062:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Should make clean files become data trash manager
> -
>
> Key: CARBONDATA-4062
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4062
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: David Cai
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 26h 10m
>  Remaining Estimate: 0h
>
> To prevent accidental deletion of data, carbon will introduce data trash 
> management. It will provide buffer time for accidental deletion of data to 
> roll back the delete operation.
> Data trash management is a part of carbon data lifecycle management. Clean 
> files as a data trash manager should contain the following two parts.
> part 1: manage metadata-indexed data trash.
>   This data is at the original place of the table and indexed by metadata. 
> carbon manages this data by metadata index and should avoid using listFile() 
> interface.
> part 2: manage ".Trash" folder.
>    Now ".Trash" folder is without metadata index, and the operation on it 
> bases on timestamp and listFile() interface. In the future, carbon will index 
> ".Trash" folder to improve data trash management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3908) When a carbon segment is added through the alter add segments query, then it is not accounting the added carbon segment values.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3908:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> When a carbon segment is added through the alter add segments query, then it 
> is not accounting the added carbon segment values.
> ---
>
> Key: CARBONDATA-3908
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3908
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: FI cluster and opensource cluster.
>Reporter: Prasanna Ravichandran
>Priority: Major
> Fix For: 2.1.1
>
>
> When a carbon segment is added through the alter add segments query, then it 
> is not accounting the added carbon segment values. If we do count(*) on the 
> added segment, then it is always showing as 0.
> Test queries:
> drop table if exists uniqdata;
> CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata;
> load data inpath 'hdfs://hacluster/BabuStore/Data/2000_UniqData.csv' into 
> table uniqdata 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> --hdfs dfs -mkdir /uniqdata-carbon-segment;
> --hdfs dfs -cp /user/hive/warehouse/uniqdata/Fact/Part0/Segment_0/* 
> /uniqdata-carbon-segment/
> Alter table uniqdata add segment options 
> ('path'='hdfs://hacluster/uniqdata-carbon-segment/','format'='carbon');
> select count(*) from uniqdata;--4000 expected as one load of 2000 records 
> happened and same segment is added again;
> set carbon.input.segments.default.uniqdata=1;
> select count(*) from uniqdata;--2000 expected - it should just show the 
> records count of added segments;
> CONSOLE:
> /> set carbon.input.segments.default.uniqdata=1;
> +-++
> | key | value |
> +-++
> | carbon.input.segments.default.uniqdata | 1 |
> +-++
> 1 row selected (0.192 seconds)
> /> select count(*) from uniqdata;
> INFO : Execution ID: 1734
> +---+
> | count(1) |
> +---+
> | 2000 |
> +---+
> 1 row selected (4.036 seconds)
> /> set carbon.input.segments.default.uniqdata=2;
> +-++
> | key | value |
> +-++
> | carbon.input.segments.default.uniqdata | 2 |
> +-++
> 1 row selected (0.088 seconds)
> /> select count(*) from uniqdata;
> INFO : Execution ID: 1745
> +---+
> | count(1) |
> +---+
> | 2000 |
> +---+
> 1 row selected (6.056 seconds)
> /> set carbon.input.segments.default.uniqdata=3;
> +-++
> | key | value |
> +-++
> | carbon.input.segments.default.uniqdata | 3 |
> +-++
> 1 row selected (0.161 seconds)
> /> select count(*) from uniqdata;
> INFO : Execution ID: 1753
> +---+
> | count(1) |
> +---+
> | 0 |
> +---+
> 1 row selected (4.875 seconds)
> /> show segments for table uniqdata;
> +-+--+--+--+++-+--+
> | ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | 
> Index Size | File Format |
> +-+--+--+--+++-+--+
> | 4 | Success | 2020-07-17 16:01:53.673 | 5.579S | {} | 269.10KB | 7.21KB | 
> columnar_v3 |
> | 3 | Success | 2020-07-17 16:00:24.866 | 0.578S | {} | 88.55KB | 1.81KB | 
> columnar_v3 |
> | 2 | Success | 2020-07-17 15:07:54.273 | 0.642S | {} | 36.72KB | NA | orc |
> | 1 | Success | 2020-07-17 15:03:59.767 | 0.564S | {} | 89.26KB | NA | 
> parquet |
> | 0 | Success | 2020-07-16 12:44:32.095 | 4.484S | {} | 88.55KB | 1.81KB | 
> columnar_v3 |
> +-+--+--+--+++-+--+
> Expected result: Records added by adding carbon segment should be considered.
> Actual result: Records added by adding carbon segment is not considered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4068) Alter table set long string should not allowed on SI column.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4068:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Alter table set long string should not allowed on SI column.
> 
>
> Key: CARBONDATA-4068
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4068
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> # Create table and create SI.
>  # Now try to set the column data type to long string on which SI is created.
> Operation should not be allowed because we don't support SI on long string.
> create table maintable (a string,b string,c int) STORED AS carbondata;
> create index indextable on table maintable(b) AS 'carbondata';
> insert into maintable values('k','x',2);
> ALTER TABLE maintable SET TBLPROPERTIES('long_String_columns'='b');



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4069) Alter table set streaming=true should not be allowed on SI table or table having SI.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4069:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Alter table set streaming=true should not be allowed on SI table or table 
> having SI.
> 
>
> Key: CARBONDATA-4069
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4069
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> # Create carbon table and SI .
>  # Now set streaming = true on either SI table or main table.
> Both the operation should not be allowed because SI is not supported on 
> streaming table.
>  
> create table maintable2 (a string,b string,c int) STORED AS carbondata;
> insert into maintable2 values('k','x',2);
> create index m_indextable on table maintable2(b) AS 'carbondata';
> ALTER TABLE maintable2 SET TBLPROPERTIES('streaming'='true');  => operation 
> should not be allowed.
> ALTER TABLE m_indextable SET TBLPROPERTIES('streaming'='true') => operation 
> should not be allowed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4040) Data mismatch incase of compaction failure and retry success

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4040:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Data mismatch incase of compaction failure and retry success
> 
>
> Key: CARBONDATA-4040
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4040
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Ajantha Bhat
>Assignee: Ajantha Bhat
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> For compaction we don't register inprogress segment. so, when unable to get 
> table status lock. compaction can fail. That time compaction partial segment 
> need to be cleaned. If the partial segment is failed to cleanup due to unable 
> to get lock or IO issues. When the user retries the compaction. carbon uses 
> same segment id. so while writing the segment file for new compaction. list 
> only the files mapping to the current compaction, not all the files which 
> contains stale files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4081) Clean files considering files apart from .segment files while cleaning stale segments and moving them to trash

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4081:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Clean files considering files apart from .segment files while cleaning stale 
> segments and moving them to trash
> --
>
> Key: CARBONDATA-4081
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4081
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Vikram Ahuja
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4087) Issue with huge data(exceeding 32K records) after enabling local dictionary

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4087:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Issue with huge data(exceeding 32K records) after enabling local dictionary
> ---
>
> Key: CARBONDATA-4087
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4087
> Project: CarbonData
>  Issue Type: Bug
>  Components: core, presto-integration
>Reporter: Akshay
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> For large data SELECT on array(varchar) throws exception-
> "Error in Reading Data from Carbondata" due to ArrayOutOfBounds
>  
> https://github.com/apache/carbondata/pull/4055



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4072) Clean files command is not deleting .segment files present at metadata/segments/xxxxx.segment for the segments added through alter table add segment query.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4072:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Clean files command is not deleting .segment files present at 
> metadata/segments/x.segment for the segments added through alter table 
> add segment query.
> ---
>
> Key: CARBONDATA-4072
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4072
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0
>Reporter: Karan
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Clean files command is not deleting .segment files present at 
> metadata/segments/x.segment for the segments added through alter table 
> add segment query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4071) If date or timestamp columns are present as child of complex columns, then its giving wrong results on reading through SDK.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4071:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> If date or timestamp columns are present as child of complex columns, then 
> its giving wrong results on reading through SDK.
> ---
>
> Key: CARBONDATA-4071
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4071
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0
>Reporter: Karan
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> If a date or timestamp column is present as child of complex column and on 
> reading its value through SDK gives wrong results. For eg: Array



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4084) Error when loading string field with high cardinary

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4084:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Error when loading string field with high cardinary 
> 
>
> Key: CARBONDATA-4084
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4084
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.0.0, 2.1.0, 2.0.1
>Reporter: Nguyen Dinh Huynh
>Priority: Major
>  Labels: patch
> Fix For: 2.1.1
>
> Attachments: image-2020-12-14-22-40-45-539.png, 
> image_2020_12_13T09_29_38_891Z.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When i am try load string field with more than 1M distinct value, some rows 
> show strange value.
>   !image_2020_12_13T09_29_38_891Z.png!
> I'm trying with this setting: carbon.local.dictionary.enable=false then it 
> works as expect. So seems like have some bugs on decoder fallback.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4099) Fix Concurrent issues with clean files post event listener

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4099:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Fix Concurrent issues with clean files post event listener
> --
>
> Key: CARBONDATA-4099
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4099
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Vikram Ahuja
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There were 2 issues in the clean files post event listener:
>  # In concurrent cases, while writing entry back to the table status file, 
> wrong path was given, due to which table status file was not updated in the 
> case of SI table.
>  # While writing the loadmetadetails to the table status file during 
> concurrent scenarios, we were only writing the unwanted segments and not all 
> the segments, which could make segments stale in the SI table
> Due to these 2 issues, when selet query is executed on SI table, the 
> tablestatus would have entry for a segment but it's carbondata file would be 
> deleted, thus throwing an IO Exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4095) Select Query with SI filter fails, when columnDrift is enabled

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4095:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Select Query with SI filter fails, when columnDrift is enabled
> --
>
> Key: CARBONDATA-4095
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4095
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthu Murugesh
>Assignee: Indhumathi Muthu Murugesh
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> sql({color:#067d17}"drop table if exists maintable"{color})
>  sql({color:#067d17}"create table maintable (a string,b string,c int,d int) 
> STORED AS carbondata "{color})
>  sql({color:#067d17}"insert into maintable values('k','d',2,3)"{color})
>  sql({color:#067d17}"alter table maintable set 
> tblproperties('sort_columns'='c,d','sort_scope'='local_sort')"{color})
>  sql({color:#067d17}"create index indextable on table maintable(b) AS 
> 'carbondata'"{color})
>  sql({color:#067d17}"insert into maintable values('k','x',2,4)"{color})
>  sql({color:#067d17}"select * from maintable where b='x'"{color}).show(false)
>  
>  
>  
>  
> 2020-12-22 18:58:37 ERROR Executor:91 - Exception in task 0.0 in stage 40.0 
> (TID 422)
> java.lang.RuntimeException: Error while resolving filter expression
>  at 
> org.apache.carbondata.core.index.IndexFilter.resolveFilter(IndexFilter.java:283)
>  at 
> org.apache.carbondata.core.index.IndexFilter.getResolver(IndexFilter.java:203)
>  at 
> org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:152)
>  at 
> org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:382)
>  at 
> org.apache.carbondata.core.scan.executor.impl.VectorDetailQueryExecutor.execute(VectorDetailQueryExecutor.java:43)
>  at 
> org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.initialize(VectorizedCarbonRecordReader.java:141)
>  at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:540)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.scan_nextBatch_0$(Unknown
>  Source)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>  at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$12$$anon$1.hasNext(WholeStageCodegenExec.scala:631)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolverBasedOnExpressionType(FilterExpressionProcessor.java:190)
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.createFilterResolverTree(FilterExpressionProcessor.java:128)
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.createFilterResolverTree(FilterExpressionProcessor.java:121)
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolverTree(FilterExpressionProcessor.java:77)
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolver(FilterExpressionProcessor.java:61)
>  at 
> org.apache.carbondata.core.index.IndexFilter.resolveFilter(IndexFilter.java:281)
>  ... 26 more
> 2020-12-22 18:58:37 ERROR TaskSetManager:70 - Task 0 in stage 40.0 failed 1 
>

[jira] [Updated] (CARBONDATA-4073) Added FT for missing scenarios in Presto

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4073:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Added FT for missing scenarios in Presto
> 
>
> Key: CARBONDATA-4073
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4073
> Project: CarbonData
>  Issue Type: Test
>  Components: presto-integration
>Reporter: Akshay
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> FT for following has been added.
> update without local-dict
>  delete operation
>  minor, major, custom compaction
>  add and delete segments
>  test update with inverted index 
>  read with partition columns
>  Filter on partition columns
>  Bloom index
>  test range columns
>  read streaming data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4100) SI loads are in inconsistent state with maintable after concurrent(Load&Compaction) operation

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4100:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> SI loads are in inconsistent state with maintable after 
> concurrent(Load&Compaction) operation
> -
>
> Key: CARBONDATA-4100
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4100
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthu Murugesh
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4070) Handle the scenario mentioned in description for SI.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4070:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Handle the scenario mentioned in description for SI.
> 
>
> Key: CARBONDATA-4070
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4070
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> # SI creation should not be allowed on SI table.
>  # SI table should not be scanned with like filter on MT.
>  # Drop column should not be allowed on SI table.
> Add the FT for all above scenario and sort column related scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4059) Block compaction on SI table.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4059:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Block compaction on SI table.
> -
>
> Key: CARBONDATA-4059
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4059
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently compaction is allowed on SI table. Because of this if only SI table 
> is compacted then running filter query query on main table is causing more 
> data scan of SI table which will causing performance degradation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4104) Vector filling for Primitive decimal type needs to be handled

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4104:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Vector filling for Primitive decimal type needs to be handled
> -
>
> Key: CARBONDATA-4104
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4104
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Reporter: Akshay
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Filling of vectors in case of complex primitive decimal type whose precision 
> is greater than 18 is not handled properly.
> for ex 
> array



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4112) Data mismatch issue in SI

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4112:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Data mismatch issue in SI
> -
>
> Key: CARBONDATA-4112
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4112
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0
>Reporter: Karan
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> When data files of a SI segment are merged. It gives more number of rows in 
> SI table than main table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4109) Improve carbondata coverage for presto-integration code

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4109:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Improve carbondata coverage for presto-integration code
> ---
>
> Key: CARBONDATA-4109
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4109
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core, presto-integration
>Reporter: Akshay
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Few scenarios had missing coverage in presto-integration code. This PR aims 
> to improve it by considering all such scenarios.
> Dead code- ObjectStreamReader.java was created with an aim to query complex 
> types. Instead ComplexTypeStreamReader was created. Making ObjectStreamreader 
> obsolete.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4125) SI compatability issue fix

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4125:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> SI compatability issue fix
> --
>
> Key: CARBONDATA-4125
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4125
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthu Murugesh
>Assignee: Indhumathi Muthu Murugesh
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Refer 
> [http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Bug-SI-Compatibility-Issue-td105485.html]
>  for this issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4107) MV Performance and Lock issues

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4107:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> MV Performance and Lock issues
> --
>
> Key: CARBONDATA-4107
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4107
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthu Murugesh
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> # After MV support multi-tenancy PR, mv system folder is moved to database 
> level. Hence, during each operation, insert/Load/IUD/show mv/query, we are 
> listing all the databases in the system and collecting mv schemas and 
> checking if there is any mv mapped to the table or not. This will degrade 
> performance of the query, to collect mv schemas from all databases, even 
> though the table has mv or not.
>  # When different jvm process call touchMDTFile method, file creation and 
> deletion can happen same time. This may fail the operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4102) Add UT and FT to improve coverage of SI module.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4102:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Add UT and FT to improve coverage of SI module.
> ---
>
> Key: CARBONDATA-4102
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4102
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> Add UT and FT to improve coverage of SI module and also remove dead or unused 
> code if exists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4122) Support Writing Flink Stage data into Hdfs file system

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4122:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Support Writing Flink Stage data into Hdfs file system
> --
>
> Key: CARBONDATA-4122
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4122
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthu Murugesh
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3917) The rows of data loading is not accurate, more rows has been loaded

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3917:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> The rows of data loading is not accurate, more rows has been loaded
> ---
>
> Key: CARBONDATA-3917
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3917
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 2.0.0
>Reporter: Taoli
>Priority: Blocker
> Fix For: 2.1.1
>
>
> 2020-07-18 18:46:23,856 | INFO | [Executor task launch worker for task 28380] 
> | Total rows processed in step Data Writer: 1277745 | 
> org.apache.carbondata.processing.loading.AbstractDataLoadProcessorStep.close(AbstractDataLoadProcessorStep.java:138)
> 2020-07-18 18:46:23,857 | INFO | [Executor task launch worker for task 28380] 
> | Total rows processed in step Sort Processor: 1189959 | 
> org.apache.carbondata.processing.loading.AbstractDataLoadProcessorStep.close(AbstractDataLoadProcessorStep.java:138)
> 2020-07-18 18:46:23,856 | DEBUG | 
> [LocalFolderDeletionPool:detail_cdr_s1mme_18461_1595087183856] | 
> PrivilegedAction as:omm (auth:SIMPLE) 
> from:org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:298)
>  | 
> org.apache.hadoop.security.UserGroupInformation.logPrivilegedAction(UserGroupInformation.java:1756)
> 2020-07-18 18:46:23,857 | INFO | [Executor task launch worker for task 28380] 
> | Total rows processed in step Data Converter: 1189959 | 
> org.apache.carbondata.processing.loading.AbstractDataLoadProcessorStep.close(AbstractDataLoadProcessorStep.java:138)
> 2020-07-18 18:46:23,857 | INFO | [Executor task launch worker for task 28380] 
> | Total rows processed in step Input Processor: 1189959 | 
> org.apache.carbondata.processing.loading.AbstractDataLoadProcessorStep.close(AbstractDataLoadProcessorStep.java:138)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4115) Return and show segment ID after successful load and insert, including patitioned table and normal table .

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4115:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Return and show segment ID after successful load and insert, including 
> patitioned table and normal table .
> --
>
> Key: CARBONDATA-4115
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4115
> Project: CarbonData
>  Issue Type: Improvement
>  Components: spark-integration
>Reporter: lihongzhao
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> Return and show segment ID after successful load and insert, including 
> patitioned table and normal table .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4089) Create table with location, if the location didn't have scheme, the default will be local file system, which is not the file system defined by defaultFS

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4089:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Create table with location, if the location didn't have scheme, the default 
> will be local file system, which is not the file system defined by defaultFS
> 
>
> Key: CARBONDATA-4089
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4089
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.1.0
>Reporter: Yahui Liu
>Priority: Blocker
> Fix For: 2.1.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> If the location didn't specify scheme, should use the file system defined by 
> defaultFS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4152) Query failed with exception: java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at BlockletIndexFactory.getSegmentProperties

2021-03-16 Thread Yahui Liu (Jira)
Yahui Liu created CARBONDATA-4152:
-

 Summary: Query failed with exception: 
java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at 
BlockletIndexFactory.getSegmentProperties
 Key: CARBONDATA-4152
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4152
 Project: CarbonData
  Issue Type: Bug
  Components: core
Affects Versions: 2.1.0
Reporter: Yahui Liu


There is chance that getIndexes(segment) return empty list and later call 
list.get(0) throw exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] jack86596 commented on pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


jack86596 commented on pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#issuecomment-800730205


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (CARBONDATA-4146) Query fails and the error message "unable to get file status" is displayed. query is normal after the "drop metacache on table" command is executed.

2021-03-16 Thread liuhe0702 (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liuhe0702 closed CARBONDATA-4146.
-
Resolution: Duplicate

>  Query fails and the error message "unable to get file status" is displayed. 
> query is normal after the "drop metacache on table" command is executed.
> -
>
> Key: CARBONDATA-4146
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4146
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 1.6.1, 2.0.0, 2.1.0
>Reporter: liuhe0702
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> During compact execution, the status of the new segment is set to success 
> before index files are merged. After index files are merged, the carbonindex 
> files are deleted. As a result, the query task cannot find the cached 
> carbonindex files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] liuhe0702 closed pull request #4103: [CARBONDATA-4145] Query fails and the message "File does not exist: xxx.carbondata" is displayed

2021-03-16 Thread GitBox


liuhe0702 closed pull request #4103:
URL: https://github.com/apache/carbondata/pull/4103


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (CARBONDATA-4145) Query fails and the message "File does not exist: xxxx.carbondata" is displayed

2021-03-16 Thread liuhe0702 (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liuhe0702 closed CARBONDATA-4145.
-
Resolution: Duplicate

> Query fails and the message "File does not exist: .carbondata" is 
> displayed
> ---
>
> Key: CARBONDATA-4145
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4145
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 1.6.1, 2.0.0, 2.1.0
>Reporter: liuhe0702
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> An exception occurs when the rebuild/refresh index command is executed. After 
> that, the query command fails to be executed, and the message "File does not 
> exist: 
> /user/hive/warehouse/carbon.store/sys/idx_tbl_data_event_carbon_user_num/Fact/Part0/Segment_27670/part-1-28_batchno0-0-x.carbondata"
>  is displayed and the idx_tbl_data_event_carbon_user_num table is secondary 
> index table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] liuhe0702 closed pull request #4104: [CARBONDATA-4146]Query fails and the error message "unable to get file status" is displayed. query is normal after the "drop metacache on tab

2021-03-16 Thread GitBox


liuhe0702 closed pull request #4104:
URL: https://github.com/apache/carbondata/pull/4104


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


CarbonDataQA2 commented on pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#issuecomment-800567860


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3805/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


CarbonDataQA2 commented on pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#issuecomment-800566842


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5571/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] jack86596 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


jack86596 commented on a change in pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#discussion_r595406480



##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestIndexRepair.scala
##
@@ -119,6 +119,19 @@ class TestIndexRepair extends QueryTest with 
BeforeAndAfterAll {
 sql("drop table if exists maintable")
   }
 
+  test("reindex command with stale files") {
+sql("drop table if exists maintable")
+sql("CREATE TABLE maintable(a INT, b STRING, c STRING) stored as 
carbondata")
+sql("CREATE INDEX indextable1 on table maintable(c) as 'carbondata'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("DELETE FROM TABLE INDEXTABLE1 WHERE SEGMENT.ID IN(0,1,2)")
+sql("REINDEX INDEX TABLE indextable1 ON MAINTABLE WHERE SEGMENT.ID IN 
(0,1)")

Review comment:
   1. "we shouldn't allow delete segments on index table itself." please 
refer to the second last comment right before your comment. If you ever solve 
production issue, you could not say this. There are thousand of query failed 
issues just because of SI segment is broken. We need to first delete the broken 
SI segment then repair it again(last two to three years, countless issues 
because of SI segment broken or not sync with main table). So please get to 
know customer, not build software without any knowing about how customer use 
the software. And please during coding, stand also at maintainer side, 
implement the feature with more maintainability. Thanks.
   2. "And during repair index, if have segment with partial data, we should 
delete the segment completely(segment folder, segment file, probably 
tablestatus entry for the segment as well) before proceeding with segment 
repair." You suggestion of course is right but too more complicate comparing to 
existing implementation, so please first do the complete analysis and design 
and then we can discuss and plan next step. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] jack86596 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


jack86596 commented on a change in pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#discussion_r595408209



##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -841,8 +841,7 @@ public SegmentFile getSegmentFile() {
 }
 if 
(entry.getValue().status.equals(SegmentStatus.SUCCESS.getMessage())) {
   for (String indexFile : entry.getValue().getFiles()) {
-indexFiles.put(location + CarbonCommonConstants.FILE_SEPARATOR + 
indexFile,
-entry.getValue().mergeFileName);
+indexFiles.put(location + CarbonCommonConstants.FILE_SEPARATOR + 
indexFile, null);

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] jack86596 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


jack86596 commented on a change in pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#discussion_r595406480



##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestIndexRepair.scala
##
@@ -119,6 +119,19 @@ class TestIndexRepair extends QueryTest with 
BeforeAndAfterAll {
 sql("drop table if exists maintable")
   }
 
+  test("reindex command with stale files") {
+sql("drop table if exists maintable")
+sql("CREATE TABLE maintable(a INT, b STRING, c STRING) stored as 
carbondata")
+sql("CREATE INDEX indextable1 on table maintable(c) as 'carbondata'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("DELETE FROM TABLE INDEXTABLE1 WHERE SEGMENT.ID IN(0,1,2)")
+sql("REINDEX INDEX TABLE indextable1 ON MAINTABLE WHERE SEGMENT.ID IN 
(0,1)")

Review comment:
   1. "we shouldn't allow delete segments on index table itself." please 
refer to the second last comment right before your comment.
   2. "And during repair index, if have segment with partial data, we should 
delete the segment completely(segment folder, segment file, probably 
tablestatus entry for the segment as well) before proceeding with segment 
repair." You suggestion of course is right but too more complicate comparing to 
existing implementation, so please first do the complete analysis and design 
and then we can discuss and plan next step. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-4151) When data sampling is done on large data set using Spark's df.sample function - the size of sampled table is not matching with record size of non sampled (Raw Table)

2021-03-16 Thread Amaranadh Vayyala (Jira)
Amaranadh Vayyala created CARBONDATA-4151:
-

 Summary: When data sampling is done on large data set using 
Spark's df.sample function - the size of sampled table is not matching with 
record size of non sampled (Raw Table)
 Key: CARBONDATA-4151
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4151
 Project: CarbonData
  Issue Type: Bug
  Components: core
Affects Versions: 2.0.1
 Environment: Apache carbondata 2.0.1, spark 2.4.5, hadoop 2.7.2
Reporter: Amaranadh Vayyala
 Fix For: 2.0.1, 2.1.0


Hi Team,

When we are performing 5%, 10% data sampling on large dataset using Spark's 
df.sample - the size of sampled table is not matching with record size of non 
sampled (Raw Table).

Our Raw table size is around 11 GB, so when we perform 5%, 10% sampling then 
the sampled table size should come as 550 MB, 1.1 GB. However in our case they 
are coming as 1.5 GB and 3 GB. Which is 3 times higher than the expected 
number. 

Could you please check and help us in understand where is the issue?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4150) Information about indexed datamap

2021-03-16 Thread suyash yadav (Jira)
suyash yadav created CARBONDATA-4150:


 Summary: Information about indexed datamap
 Key: CARBONDATA-4150
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4150
 Project: CarbonData
  Issue Type: Wish
  Components: core
Affects Versions: 2.0.1
 Environment: apache 2.0.1 spark 2.4.5 hadoop 2.7.2
Reporter: suyash yadav
 Fix For: 2.0.1


Hi Team,

 

We would like to know detailed information about indexed datamap and possible 
use cases for this datamap.

So please help us in getting answer to below queries:-
 
1) What is an indexed datamap and related use cases.
2) how it is to be used,
3) any reference documents
 
 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


VenuReddy2103 commented on a change in pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#discussion_r595256317



##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestIndexRepair.scala
##
@@ -119,6 +119,19 @@ class TestIndexRepair extends QueryTest with 
BeforeAndAfterAll {
 sql("drop table if exists maintable")
   }
 
+  test("reindex command with stale files") {
+sql("drop table if exists maintable")
+sql("CREATE TABLE maintable(a INT, b STRING, c STRING) stored as 
carbondata")
+sql("CREATE INDEX indextable1 on table maintable(c) as 'carbondata'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("DELETE FROM TABLE INDEXTABLE1 WHERE SEGMENT.ID IN(0,1,2)")
+sql("REINDEX INDEX TABLE indextable1 ON MAINTABLE WHERE SEGMENT.ID IN 
(0,1)")

Review comment:
   Agree with @Indhumathi27. IMHO, we shouldn't allow delete segments on 
index table itself. And during repair index, if have segment with partial data, 
we should delete the segment completely(segment folder, segment file, probably 
tablestatus entry for the segment as well) before proceeding with segment 
repair.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


VenuReddy2103 commented on a change in pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#discussion_r595256317



##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestIndexRepair.scala
##
@@ -119,6 +119,19 @@ class TestIndexRepair extends QueryTest with 
BeforeAndAfterAll {
 sql("drop table if exists maintable")
   }
 
+  test("reindex command with stale files") {
+sql("drop table if exists maintable")
+sql("CREATE TABLE maintable(a INT, b STRING, c STRING) stored as 
carbondata")
+sql("CREATE INDEX indextable1 on table maintable(c) as 'carbondata'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("DELETE FROM TABLE INDEXTABLE1 WHERE SEGMENT.ID IN(0,1,2)")
+sql("REINDEX INDEX TABLE indextable1 ON MAINTABLE WHERE SEGMENT.ID IN 
(0,1)")

Review comment:
   Agree with @Indhumathi27. IMHO, we shouldn't allow delete segments on 
index table itself. And during repair index, if have segment with partial data, 
we should delete the segment completely(segment folder, segment file, 
tablestatus entry for the segment) before proceeding with segment repair.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] jack86596 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


jack86596 commented on a change in pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#discussion_r595205788



##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -854,7 +853,7 @@ public SegmentFile getSegmentFile() {
* Gets all index files from this segment
* @return
*/
-  public Map getIndexOrMergeFiles() throws IOException {
+  public Map getIndexAndMergeFiles() throws IOException {

Review comment:
   Current implementation:
   if (null != mergeFileName) {
 add merge file
   }
   if (null != index file && not empty) {
 add index file
   }





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] jack86596 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


jack86596 commented on a change in pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#discussion_r595193924



##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -854,7 +853,7 @@ public SegmentFile getSegmentFile() {
* Gets all index files from this segment
* @return
*/
-  public Map getIndexOrMergeFiles() throws IOException {
+  public Map getIndexAndMergeFiles() throws IOException {

Review comment:
   Or means either, not both, but the implementation shows it will return 
both index file and merge file, so Or is not correct. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] jack86596 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


jack86596 commented on a change in pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#discussion_r595193924



##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -854,7 +853,7 @@ public SegmentFile getSegmentFile() {
* Gets all index files from this segment
* @return
*/
-  public Map getIndexOrMergeFiles() throws IOException {
+  public Map getIndexAndMergeFiles() throws IOException {

Review comment:
   Or means either, not both, but sometime it will return both index file 
and merge file, so Or is not correct. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] jack86596 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


jack86596 commented on a change in pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#discussion_r595181102



##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -854,7 +853,7 @@ public SegmentFile getSegmentFile() {
* Gets all index files from this segment
* @return
*/
-  public Map getIndexOrMergeFiles() throws IOException {
+  public Map getIndexAndMergeFiles() throws IOException {

Review comment:
   backward compatibility, carbon 1.3 both index and merge file exist.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


Indhumathi27 commented on a change in pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#discussion_r595180273



##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestIndexRepair.scala
##
@@ -119,6 +119,19 @@ class TestIndexRepair extends QueryTest with 
BeforeAndAfterAll {
 sql("drop table if exists maintable")
   }
 
+  test("reindex command with stale files") {
+sql("drop table if exists maintable")
+sql("CREATE TABLE maintable(a INT, b STRING, c STRING) stored as 
carbondata")
+sql("CREATE INDEX indextable1 on table maintable(c) as 'carbondata'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("DELETE FROM TABLE INDEXTABLE1 WHERE SEGMENT.ID IN(0,1,2)")
+sql("REINDEX INDEX TABLE indextable1 ON MAINTABLE WHERE SEGMENT.ID IN 
(0,1)")

Review comment:
   ok. keep current behavior





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


Indhumathi27 commented on a change in pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#discussion_r595180273



##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestIndexRepair.scala
##
@@ -119,6 +119,19 @@ class TestIndexRepair extends QueryTest with 
BeforeAndAfterAll {
 sql("drop table if exists maintable")
   }
 
+  test("reindex command with stale files") {
+sql("drop table if exists maintable")
+sql("CREATE TABLE maintable(a INT, b STRING, c STRING) stored as 
carbondata")
+sql("CREATE INDEX indextable1 on table maintable(c) as 'carbondata'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("DELETE FROM TABLE INDEXTABLE1 WHERE SEGMENT.ID IN(0,1,2)")
+sql("REINDEX INDEX TABLE indextable1 ON MAINTABLE WHERE SEGMENT.ID IN 
(0,1)")

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


VenuReddy2103 commented on a change in pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#discussion_r595178820



##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -854,7 +853,7 @@ public SegmentFile getSegmentFile() {
* Gets all index files from this segment
* @return
*/
-  public Map getIndexOrMergeFiles() throws IOException {
+  public Map getIndexAndMergeFiles() throws IOException {

Review comment:
   I still think getIndexOrMergeFiles name makes sense. Returned map can 
have either mergeindex or index. Curious to know when it does have both merge 
index and index in map ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] jack86596 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


jack86596 commented on a change in pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#discussion_r595175202



##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -841,8 +841,7 @@ public SegmentFile getSegmentFile() {
 }
 if 
(entry.getValue().status.equals(SegmentStatus.SUCCESS.getMessage())) {
   for (String indexFile : entry.getValue().getFiles()) {
-indexFiles.put(location + CarbonCommonConstants.FILE_SEPARATOR + 
indexFile,
-entry.getValue().mergeFileName);
+indexFiles.put(location + CarbonCommonConstants.FILE_SEPARATOR + 
indexFile, null);

Review comment:
   This make sense, will do it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] jack86596 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


jack86596 commented on a change in pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#discussion_r595167863



##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestIndexRepair.scala
##
@@ -119,6 +119,19 @@ class TestIndexRepair extends QueryTest with 
BeforeAndAfterAll {
 sql("drop table if exists maintable")
   }
 
+  test("reindex command with stale files") {
+sql("drop table if exists maintable")
+sql("CREATE TABLE maintable(a INT, b STRING, c STRING) stored as 
carbondata")
+sql("CREATE INDEX indextable1 on table maintable(c) as 'carbondata'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("INSERT INTO maintable SELECT 1,'string1', 'string2'")
+sql("DELETE FROM TABLE INDEXTABLE1 WHERE SEGMENT.ID IN(0,1,2)")
+sql("REINDEX INDEX TABLE indextable1 ON MAINTABLE WHERE SEGMENT.ID IN 
(0,1)")

Review comment:
   1. For example, if both main table and SI table segment status are 
success, but one of the carbondata file or index file in SI segment is missing 
or broken, how you fix this if you cannot manually delete this SI segment?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


VenuReddy2103 commented on a change in pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#discussion_r595165241



##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -841,8 +841,7 @@ public SegmentFile getSegmentFile() {
 }
 if 
(entry.getValue().status.equals(SegmentStatus.SUCCESS.getMessage())) {
   for (String indexFile : entry.getValue().getFiles()) {
-indexFiles.put(location + CarbonCommonConstants.FILE_SEPARATOR + 
indexFile,
-entry.getValue().mergeFileName);
+indexFiles.put(location + CarbonCommonConstants.FILE_SEPARATOR + 
indexFile, null);

Review comment:
   Its caller `SegmentFileStore.getIndexCarbonFiles()`has code to handle 
for non null value to add merge index file also to index file list(at line 
906). It becomes dead code now. You would want to remove that too.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file

2021-03-16 Thread GitBox


VenuReddy2103 commented on a change in pull request #4105:
URL: https://github.com/apache/carbondata/pull/4105#discussion_r595165241



##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -841,8 +841,7 @@ public SegmentFile getSegmentFile() {
 }
 if 
(entry.getValue().status.equals(SegmentStatus.SUCCESS.getMessage())) {
   for (String indexFile : entry.getValue().getFiles()) {
-indexFiles.put(location + CarbonCommonConstants.FILE_SEPARATOR + 
indexFile,
-entry.getValue().mergeFileName);
+indexFiles.put(location + CarbonCommonConstants.FILE_SEPARATOR + 
indexFile, null);

Review comment:
   Its caller `SegmentFileStore.getIndexCarbonFiles()`has code to handle 
for non null value to add merge index file also to index file list. It becomes 
dead code now. You would want to remove that too.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   >