[GitHub] [carbondata] akashrn5 commented on pull request #4106: [CARBONDATA-4147] Fix re-arrange schema in logical relation on MV partition table having sort column
akashrn5 commented on pull request #4106: URL: https://github.com/apache/carbondata/pull/4106#issuecomment-800846438 LGTM, @ajantha-bhat please review this once as you have worked on rearrange logic in insert optimization. Please see if there is any impact. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #4106: [CARBONDATA-4147] Fix re-arrange schema in logical relation on MV partition table having sort column
akashrn5 commented on pull request #4106: URL: https://github.com/apache/carbondata/pull/4106#issuecomment-800846101 @Indhumathi27 please change description with example as discussed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4106: [CARBONDATA-4147] Fix re-arrange schema in logical relation on MV partition table having sort column
akashrn5 commented on a change in pull request #4106: URL: https://github.com/apache/carbondata/pull/4106#discussion_r595752950 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertIntoCommand.scala ## @@ -181,7 +183,13 @@ case class CarbonInsertIntoCommand(databaseNameOp: Option[String], if (isNotReArranged) { // Re-arrange the catalog table schema and output for partition relation logicalPartitionRelation = - getReArrangedSchemaLogicalRelation(reArrangedIndex, logicalPartitionRelation) + if (carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable.isMV) { +// For MV partition table, partition columns will be at the end. Re-arrange Review comment: as discussed, please add a comment here with example, so reviewers and developers will be clear why only for MV we need to handle it separately This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-3816) Support Float and Decimal in the Merge Flow
[ https://issues.apache.org/jira/browse/CARBONDATA-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-3816: - Fix Version/s: (was: 2.1.1) 2.2.0 > Support Float and Decimal in the Merge Flow > --- > > Key: CARBONDATA-3816 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3816 > Project: CarbonData > Issue Type: New Feature > Components: data-load >Affects Versions: 2.0.0 >Reporter: Xingjun Hao >Priority: Major > Fix For: 2.2.0 > > > We don't support FLOAT and DECIMAL datatype in the CDC Flow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3615) Show metacache shows the index server index-dictionary files when data loaded after index server disabled using set command
[ https://issues.apache.org/jira/browse/CARBONDATA-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-3615: - Fix Version/s: (was: 2.1.1) 2.2.0 > Show metacache shows the index server index-dictionary files when data loaded > after index server disabled using set command > --- > > Key: CARBONDATA-3615 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3615 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 2.0.0 >Reporter: Vikram Ahuja >Priority: Minor > Fix For: 2.2.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Show metacache shows the index server index-dictionary files when data loaded > after index server disabled using set command > +-+-+-+-+--+ > | Field | Size | Comment | Cache Location | > +-+-+-+-+--+ > | Index | 0 B | 0/2 index files cached | DRIVER | > | Dictionary | 0 B | | DRIVER | > *| Index | 1.5 KB | 2/2 index files cached | INDEX SERVER |* > *| Dictionary | 0 B | | INDEX SERVER |* > *+-+-+-+*-+--+ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-3615) Show metacache shows the index server index-dictionary files when data loaded after index server disabled using set command
[ https://issues.apache.org/jira/browse/CARBONDATA-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303133#comment-17303133 ] Ajantha Bhat commented on CARBONDATA-3615: -- [~vikramahuja_]: please check and close issue. if it is already handled > Show metacache shows the index server index-dictionary files when data loaded > after index server disabled using set command > --- > > Key: CARBONDATA-3615 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3615 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 2.0.0 >Reporter: Vikram Ahuja >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Show metacache shows the index server index-dictionary files when data loaded > after index server disabled using set command > +-+-+-+-+--+ > | Field | Size | Comment | Cache Location | > +-+-+-+-+--+ > | Index | 0 B | 0/2 index files cached | DRIVER | > | Dictionary | 0 B | | DRIVER | > *| Index | 1.5 KB | 2/2 index files cached | INDEX SERVER |* > *| Dictionary | 0 B | | INDEX SERVER |* > *+-+-+-+*-+--+ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3875) Support show segments include stage
[ https://issues.apache.org/jira/browse/CARBONDATA-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-3875: - Fix Version/s: (was: 2.1.1) 2.1.0 > Support show segments include stage > --- > > Key: CARBONDATA-3875 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3875 > Project: CarbonData > Issue Type: New Feature > Components: spark-integration >Affects Versions: 2.0.0, 2.0.1 >Reporter: Xingjun Hao >Priority: Major > Fix For: 2.1.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > There is a lack of monitoring of the stage information in the current system, > 'Show segments include stage' command shall be supported. which will provide > monitoring information, such as createTime, partitioninfo, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3856) Support the LIMIT operator for show segments command
[ https://issues.apache.org/jira/browse/CARBONDATA-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-3856: - Fix Version/s: (was: 2.1.1) 2.2.0 > Support the LIMIT operator for show segments command > > > Key: CARBONDATA-3856 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3856 > Project: CarbonData > Issue Type: New Feature > Components: spark-integration >Affects Versions: 2.0.0 >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.2.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > Now, in the 2.0.0 release, CarbonData doesn't support LIMIT operator in the > SHOW SEGMENTS command. The time cost is expensive when there are too many > segments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4095) Select Query with SI filter fails, when columnDrift is enabled
[ https://issues.apache.org/jira/browse/CARBONDATA-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4095: - Issue Type: Bug (was: Improvement) > Select Query with SI filter fails, when columnDrift is enabled > -- > > Key: CARBONDATA-4095 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4095 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthu Murugesh >Assignee: Indhumathi Muthu Murugesh >Priority: Major > Fix For: 2.1.1 > > Time Spent: 1.5h > Remaining Estimate: 0h > > sql({color:#067d17}"drop table if exists maintable"{color}) > sql({color:#067d17}"create table maintable (a string,b string,c int,d int) > STORED AS carbondata "{color}) > sql({color:#067d17}"insert into maintable values('k','d',2,3)"{color}) > sql({color:#067d17}"alter table maintable set > tblproperties('sort_columns'='c,d','sort_scope'='local_sort')"{color}) > sql({color:#067d17}"create index indextable on table maintable(b) AS > 'carbondata'"{color}) > sql({color:#067d17}"insert into maintable values('k','x',2,4)"{color}) > sql({color:#067d17}"select * from maintable where b='x'"{color}).show(false) > > > > > 2020-12-22 18:58:37 ERROR Executor:91 - Exception in task 0.0 in stage 40.0 > (TID 422) > java.lang.RuntimeException: Error while resolving filter expression > at > org.apache.carbondata.core.index.IndexFilter.resolveFilter(IndexFilter.java:283) > at > org.apache.carbondata.core.index.IndexFilter.getResolver(IndexFilter.java:203) > at > org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:152) > at > org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:382) > at > org.apache.carbondata.core.scan.executor.impl.VectorDetailQueryExecutor.execute(VectorDetailQueryExecutor.java:43) > at > org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.initialize(VectorizedCarbonRecordReader.java:141) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:540) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.scan_nextBatch_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$12$$anon$1.hasNext(WholeStageCodegenExec.scala:631) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolverBasedOnExpressionType(FilterExpressionProcessor.java:190) > at > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.createFilterResolverTree(FilterExpressionProcessor.java:128) > at > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.createFilterResolverTree(FilterExpressionProcessor.java:121) > at > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolverTree(FilterExpressionProcessor.java:77) > at > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolver(FilterExpressionProcessor.java:61) > at > org.apache.carbondata.core.index.IndexFilter.resolveFilter(IndexFilter.java:281) > ... 26 more > 2020-12-22 18:58:37 ERROR TaskSetManager:70 - Task 0 in stage 40.0 failed 1 > times; aborting job -- Th
[jira] [Updated] (CARBONDATA-4003) Improve IUD Concurrency
[ https://issues.apache.org/jira/browse/CARBONDATA-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4003: - Fix Version/s: (was: 2.1.1) 2.2.0 > Improve IUD Concurrency > --- > > Key: CARBONDATA-4003 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4003 > Project: CarbonData > Issue Type: Improvement > Components: spark-integration >Affects Versions: 2.0.1 >Reporter: Kejian Li >Priority: Major > Fix For: 2.2.0 > > Time Spent: 20h > Remaining Estimate: 0h > > When some segments' state of the table is INSERT IN PROGRESS, update > operation on the table fails. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3617) loadDataUsingGlobalSort should based on SortColumns Instead Of Whole CarbonRow
[ https://issues.apache.org/jira/browse/CARBONDATA-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat resolved CARBONDATA-3617. -- Fix Version/s: (was: 2.1.1) 2.0.0 Resolution: Fixed > loadDataUsingGlobalSort should based on SortColumns Instead Of Whole CarbonRow > -- > > Key: CARBONDATA-3617 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3617 > Project: CarbonData > Issue Type: Improvement > Components: data-load >Affects Versions: 1.6.1, 2.0.0 >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.0.0 > > Time Spent: 7h 50m > Remaining Estimate: 0h > > During loading Data usesing globalsort, the sortby processing is based the > whole carbon row, the overhead of gc is huge when there are many columns. > Theoretically, the sortby processing can works well just based on the sort > columns, which will brings less time overhead and gc overhead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3603) Feature Change in CarbonData 2.0
[ https://issues.apache.org/jira/browse/CARBONDATA-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-3603: - Fix Version/s: (was: 2.1.1) 2.2.0 > Feature Change in CarbonData 2.0 > > > Key: CARBONDATA-3603 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3603 > Project: CarbonData > Issue Type: Improvement >Reporter: Jacky Li >Priority: Major > Fix For: 2.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3559) Support adding carbon file into CarbonData table
[ https://issues.apache.org/jira/browse/CARBONDATA-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-3559: - Fix Version/s: (was: 2.1.1) 2.2.0 > Support adding carbon file into CarbonData table > > > Key: CARBONDATA-3559 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3559 > Project: CarbonData > Issue Type: Improvement >Reporter: Jacky Li >Assignee: Jacky Li >Priority: Major > Fix For: 2.2.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Since adding parquet/orc files into CarbonData table are supported now, > adding carbon files should be supported as well -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3370) fix missing version of maven-duplicate-finder-plugin
[ https://issues.apache.org/jira/browse/CARBONDATA-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-3370: - Fix Version/s: (was: 2.1.1) 2.2.0 > fix missing version of maven-duplicate-finder-plugin > > > Key: CARBONDATA-3370 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3370 > Project: CarbonData > Issue Type: Improvement > Components: build >Affects Versions: 1.5.3 >Reporter: lamber-ken >Priority: Critical > Fix For: 2.2.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > fix missing version of maven-duplicate-finder-plugin in pom file -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-3670) Support compress offheap columnpage directly, avoding a copy of data from offhead to heap when compressed.
[ https://issues.apache.org/jira/browse/CARBONDATA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303118#comment-17303118 ] Ajantha Bhat commented on CARBONDATA-3670: -- Already handled in 2.0 from https://github.com/apache/carbondata/pull/3638 > Support compress offheap columnpage directly, avoding a copy of data from > offhead to heap when compressed. > -- > > Key: CARBONDATA-3670 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3670 > Project: CarbonData > Issue Type: Wish > Components: core >Affects Versions: 2.0.0 >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When writing data, the columnpages are stored on the offheap, the pages will > be compressed to save storage cost. Now, in the compression processing, the > data will be copied from the offheap to the heap before compressed, which > leads to heavier GC overhead compared with compress offhead directly. > To sum up, we support compress offheap columnpage directly, avoding a copy of > data from offhead to heap when compressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3670) Support compress offheap columnpage directly, avoding a copy of data from offhead to heap when compressed.
[ https://issues.apache.org/jira/browse/CARBONDATA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat resolved CARBONDATA-3670. -- Fix Version/s: (was: 2.1.1) Resolution: Duplicate > Support compress offheap columnpage directly, avoding a copy of data from > offhead to heap when compressed. > -- > > Key: CARBONDATA-3670 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3670 > Project: CarbonData > Issue Type: Wish > Components: core >Affects Versions: 2.0.0 >Reporter: Xingjun Hao >Priority: Minor > Time Spent: 4h 10m > Remaining Estimate: 0h > > When writing data, the columnpages are stored on the offheap, the pages will > be compressed to save storage cost. Now, in the compression processing, the > data will be copied from the offheap to the heap before compressed, which > leads to heavier GC overhead compared with compress offhead directly. > To sum up, we support compress offheap columnpage directly, avoding a copy of > data from offhead to heap when compressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4137) Refactor CarbonDataSourceScan without Spark Filter
[ https://issues.apache.org/jira/browse/CARBONDATA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4137. -- Fix Version/s: 2.1.1 Resolution: Fixed > Refactor CarbonDataSourceScan without Spark Filter > -- > > Key: CARBONDATA-4137 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4137 > Project: CarbonData > Issue Type: Sub-task >Reporter: David Cai >Priority: Major > Fix For: 2.1.1 > > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3746) Support column chunk cache creation and basic read/write
[ https://issues.apache.org/jira/browse/CARBONDATA-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-3746: - Fix Version/s: (was: 2.1.1) 2.2.0 > Support column chunk cache creation and basic read/write > > > Key: CARBONDATA-3746 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3746 > Project: CarbonData > Issue Type: Sub-task >Reporter: Jacky Li >Assignee: Jacky Li >Priority: Major > Fix For: 2.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3608) Drop 'STORED BY' syntax in create table
[ https://issues.apache.org/jira/browse/CARBONDATA-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-3608: - Fix Version/s: (was: 2.1.1) 2.2.0 > Drop 'STORED BY' syntax in create table > --- > > Key: CARBONDATA-3608 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3608 > Project: CarbonData > Issue Type: Sub-task >Reporter: Jacky Li >Priority: Major > Fix For: 2.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4152) Enhance logger after query failed with exception: java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at BlockletIndexFactory.getSegmentProperties
[ https://issues.apache.org/jira/browse/CARBONDATA-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yahui Liu updated CARBONDATA-4152: -- Description: There is chance that getIndexes(segment) return empty list and later call list.get(0) throw exception. caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:657) at java.util.ArrayList.get(ArrayList.java:433) at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getSegmentProperties(BlockletDataMapFactory.java:376) at org.apache.carbondata.core.datamap.TableDataMap.pruneWithFilter(TableDataMap.java:195) at org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:171) at org.apache.carbondata.hadoop.api.CarbonInputFormat.getPrunedBlocklets(CarbonInputFormat.java:491) at org.apache.carbondata.hadoop.api.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:414) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:494) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:218) at org.apache.carbondata.spark.rdd.CarbonScanRDD.internalGetPartitions(CarbonScanRDD.scala:129) at org.apache.carbondata.spark.rdd.CarbonRDD.getPartitions(CarbonRDD.scala:66) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.g was: There is chance that getIndexes(segment) return empty list and later call list.get(0) throw exception. caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:657) at java.util.ArrayList.get(ArrayList.java:433) at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getSegmentProperties(BlockletDataMapFactory.java:376) at org.apache.carbondata.core.datamap.TableDataMap.pruneWithFilter(TableDataMap.java:195) at org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:171) at org.apache.carbondata.hadoop.api.CarbonInputFormat.getPrunedBlocklets(CarbonInputFormat.java:491) at org.apache.carbondata.hadoop.api.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:414) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:494) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:218) at org.apache.carbondata.spark.rdd.CarbonScanRDD.internalGetPartitions(CarbonScanRDD.scala:129) at org.apache.carbondata.spark.rdd.CarbonRDD.getPartitions(CarbonRDD.scala:66) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.g > Enhance logger after query failed with exception: > java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at > BlockletIndexFactory.getSegmentProperties > > > Key: CARBONDATA-4152 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4152 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 2.1.0 >Reporter: Yahui Liu >Priority: Minor > > There is chance that getIndexes(segment) return empty list and later call > list.get(0) throw exception. > caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 > at java.util.ArrayList.rangeC
[jira] [Updated] (CARBONDATA-4152) Enhance logger after query failed with exception: java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at BlockletIndexFactory.getSegmentProperties
[ https://issues.apache.org/jira/browse/CARBONDATA-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yahui Liu updated CARBONDATA-4152: -- Description: There is chance that getIndexes(segment) return empty list and later call list.get(0) throw exception. caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:657) at java.util.ArrayList.get(ArrayList.java:433) at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getSegmentProperties(BlockletDataMapFactory.java:376) at org.apache.carbondata.core.datamap.TableDataMap.pruneWithFilter(TableDataMap.java:195) at org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:171) at org.apache.carbondata.hadoop.api.CarbonInputFormat.getPrunedBlocklets(CarbonInputFormat.java:491) at org.apache.carbondata.hadoop.api.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:414) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:494) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:218) at org.apache.carbondata.spark.rdd.CarbonScanRDD.internalGetPartitions(CarbonScanRDD.scala:129) at org.apache.carbondata.spark.rdd.CarbonRDD.getPartitions(CarbonRDD.scala:66) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.g was:There is chance that getIndexes(segment) return empty list and later call list.get(0) throw exception. > Enhance logger after query failed with exception: > java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at > BlockletIndexFactory.getSegmentProperties > > > Key: CARBONDATA-4152 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4152 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 2.1.0 >Reporter: Yahui Liu >Priority: Minor > > There is chance that getIndexes(segment) return empty list and later call > list.get(0) throw exception. > caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) > at java.util.ArrayList.get(ArrayList.java:433) > at > org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getSegmentProperties(BlockletDataMapFactory.java:376) > at > org.apache.carbondata.core.datamap.TableDataMap.pruneWithFilter(TableDataMap.java:195) > at > org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:171) > at > org.apache.carbondata.hadoop.api.CarbonInputFormat.getPrunedBlocklets(CarbonInputFormat.java:491) > at > org.apache.carbondata.hadoop.api.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:414) > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:494) > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:218) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD.internalGetPartitions(CarbonScanRDD.scala:129) > at org.apache.carbondata.spark.rdd.CarbonRDD.getPartitions(CarbonRDD.scala:66) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > at scala.Option.g -- This message was sent by Atlassi
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
CarbonDataQA2 commented on pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#issuecomment-800773654 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3806/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
CarbonDataQA2 commented on pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#issuecomment-800772914 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5572/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (CARBONDATA-4152) Enhance logger after query failed with exception: java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at BlockletIndexFactory.getSegmentProperties
[ https://issues.apache.org/jira/browse/CARBONDATA-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303065#comment-17303065 ] Yahui Liu commented on CARBONDATA-4152: --- When CARBONDATA-3471 issue happens, there is no any log which can help to location the root cause of the issue, need to enhance the logger. > Enhance logger after query failed with exception: > java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at > BlockletIndexFactory.getSegmentProperties > > > Key: CARBONDATA-4152 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4152 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 2.1.0 >Reporter: Yahui Liu >Priority: Minor > > There is chance that getIndexes(segment) return empty list and later call > list.get(0) throw exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4152) Enhance logger after query failed with exception: java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at BlockletIndexFactory.getSegmentProperties
[ https://issues.apache.org/jira/browse/CARBONDATA-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yahui Liu updated CARBONDATA-4152: -- Summary: Enhance logger after query failed with exception: java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at BlockletIndexFactory.getSegmentProperties (was: Query failed with exception: java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at BlockletIndexFactory.getSegmentProperties) > Enhance logger after query failed with exception: > java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at > BlockletIndexFactory.getSegmentProperties > > > Key: CARBONDATA-4152 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4152 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 2.1.0 >Reporter: Yahui Liu >Priority: Minor > > There is chance that getIndexes(segment) return empty list and later call > list.get(0) throw exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4034) Improve the time-consuming of Horizontal Compaction for update
[ https://issues.apache.org/jira/browse/CARBONDATA-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4034: - Fix Version/s: 2.1.0 > Improve the time-consuming of Horizontal Compaction for update > -- > > Key: CARBONDATA-4034 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4034 > Project: CarbonData > Issue Type: Bug >Reporter: Jiayu Shen >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 17h 10m > Remaining Estimate: 0h > > In the update flow, horizontal compaction will be significantly slower when > updating with a lot of segments(or a lot of blocks). There is a case whose > costing is as shown in the log. > {code:java} > 2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Update Compaction operation started for > [ods_oms.oms_wh_outbound_order] > 2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Update Compaction operation completed for > [ods_oms.oms_wh_outbound_order]. > 2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Delete Compaction operation started for > [ods_oms.oms_wh_outbound_order] > 2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Delete Compaction operation completed for > [ods_oms.oms_wh_outbound_order].{code} > In this PR, we optimize the process between second and third row of the log, > by optimizing the method _performDeleteDeltaCompaction_ in horizontal > compaction flow. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4057) Support Complex DataType when Save DataFrame
[ https://issues.apache.org/jira/browse/CARBONDATA-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4057: - Fix Version/s: (was: 2.2.0) 2.1.1 > Support Complex DataType when Save DataFrame > > > Key: CARBONDATA-4057 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4057 > Project: CarbonData > Issue Type: New Feature >Reporter: Jiayu Shen >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 2h > Remaining Estimate: 0h > > Currently,once trigger df.mode(overwrite).save, complex datatype isn't > supported, which shall be optimized -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4030) Concurrent SI global sort cannot be success
[ https://issues.apache.org/jira/browse/CARBONDATA-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4030: - Fix Version/s: (was: 2.2.0) 2.1.1 > Concurrent SI global sort cannot be success > --- > > Key: CARBONDATA-4030 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4030 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > when concurrent SI global sort is in progress, one load was removing the > table property added by the other load. So, the global sort insert for one > load was failing with error that unable to find position id in the projection. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4064) TPCDS queries are failing with NOne.get exception when table has SI configured
[ https://issues.apache.org/jira/browse/CARBONDATA-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4064: - Fix Version/s: (was: 2.2.0) 2.1.1 > TPCDS queries are failing with NOne.get exception when table has SI configured > -- > > Key: CARBONDATA-4064 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4064 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthu Murugesh >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4029) After delete in the table which has Alter-added SDK segments, then the count(*) is 0.
[ https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4029: - Fix Version/s: (was: 2.2.0) 2.1.1 > After delete in the table which has Alter-added SDK segments, then the > count(*) is 0. > - > > Key: CARBONDATA-4029 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4029 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: 3 node FI cluster >Reporter: Prasanna Ravichandran >Priority: Minor > Fix For: 2.1.1 > > Attachments: Primitive.rar > > Time Spent: 7h 10m > Remaining Estimate: 0h > > Do delete on a table which has alter added SDK segments. then the count* is > 0. Even count* will be 0 even any number of SDK segments are added after it. > Test queries: > drop table if exists external_primitive; > create table external_primitive (id int, name string, rank smallint, salary > double, active boolean, dob date, doj timestamp, city string, dept string) > stored as carbondata; > --before executing the below alter add segment-place the attached SDK files > in hdfs at /sdkfiles/primitive2 folder; > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select > * from external_primitive; > delete from external_primitive where id =2;select * from external_primitive; > Console output: > /> drop table if exists external_primitive; > +-+ > | Result | > +-+ > +-+ > No rows selected (1.586 seconds) > /> create table external_primitive (id int, name string, rank smallint, > salary double, active boolean, dob date, doj timestamp, city string, dept > string) stored as carbondata; > +-+ > | Result | > +-+ > +-+ > No rows selected (0.774 seconds) > /> alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select > * from external_primitive; > +-+ > | Result | > +-+ > +-+ > No rows selected (1.077 seconds) > INFO : Execution ID: 320 > +-+---+---+--+-+-++++ > | id | name | rank | salary | active | dob | doj | city | dept | > +-+---+---+--+-+-++++ > | 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-10 01:00:20.0 | Pune > | IT | > | 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 12:00:20.0 | > Bangalore | DATA | > | 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-12-01 02:20:20.0 | > Pune | DATA | > | 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 07:00:20.0 | Delhi > | MAINS | > | 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 04:00:20.0 | Delhi > | IT | > | 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 05:00:20.0 | > Bangalore | DATA | > | 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2017-01-01 02:00:20.0 | Pune | > IT | > | 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 12:00:20.0 | > Bangalore | DATA | > | 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-15 01:00:20.0 | Pune > | DATA | > | 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 12:00:20.0 | > Bangalore | MAINS | > | 18 | | 3 | 7.86786786787E9 | true | 1980-10-05 | 1995-10-07 22:00:20.0 | > Bangalore | IT | > | 19 | | 2 | 5464545.33 | true | 1986-06-06 | 2008-08-15 01:00:20.0 | Delhi | > DATA | > | 20 | NULL | 3 | 7867867.34 | true | 2000-05-01 | 2014-01-18 12:00:20.0 | > Bangalore | MAINS | > +-+---+---+--+-+-++++ > 13 rows selected (2.458 seconds) > /> delete from external_primitive where id =2;select * from > external_primitive; > INFO : Execution ID: 322 > ++ > | Deleted Row Count | > ++ > | 1 | > ++ > 1 row selected (3.723 seconds) > +-+---+---+-+-+--+--+---+---+ > | id | name | rank | salary | active | dob | doj | city | dept | > +-+---+---+-+-+--+--+---+---+ > +-+---+---+-+-+--+--+---+---+ > No rows selected (1.531 seconds) > /> alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon');select > * from external_primitive; > +-+ > | Result | > +-+ > +-+ > No rows selected (0.766 seconds) > +-+---+---+-+-+--+--+---+---+ > | id | name | rank | salary | active | dob | doj | cit
[jira] [Updated] (CARBONDATA-4020) Drop bloom index for single index of table having multiple index drops all indexes
[ https://issues.apache.org/jira/browse/CARBONDATA-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4020: - Fix Version/s: (was: 2.2.0) 2.1.1 > Drop bloom index for single index of table having multiple index drops all > indexes > -- > > Key: CARBONDATA-4020 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4020 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 2.1.0 > Environment: Spark 2.4.5 >Reporter: Chetan Bhat >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > Create multiple bloom indexes on the table. Try to drop single bloom index > drop table if exists datamap_test_1; > CREATE TABLE datamap_test_1 (id int,name string,salary float,dob date)STORED > as carbondata TBLPROPERTIES('SORT_COLUMNS'='id'); > > CREATE index dm_datamap_test_1_2 ON TABLE datamap_test_1(id) as > 'bloomfilter' PROPERTIES ( 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', > 'BLOOM_COMPRESS'='true'); > > CREATE index dm_datamap_test3 ON TABLE datamap_test_1 (name) as > 'bloomfilter' PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', > 'BLOOM_COMPRESS'='true'); > show indexes on table datamap_test_1; > drop index dm_datamap_test_1_2 on datamap_test_1; > show indexes on table datamap_test_1; > > Issue : Drop bloom index for single index of table having multiple index > drops all indexes > 0: jdbc:hive2://linux-32:22550/> show indexes on table datamap_test_1; > +--+--+--++--+ > | Name | Provider | Indexed Columns | Properties | Status | Sync In > +--+--+--++--+ > | dm_datamap_test_1_2 | bloomfilter | id | > 'INDEX_COLUMNS'='id','bloom_compress'='true','bloom_fpp'='0.1','blo > | dm_datamap_test3 | bloomfilter | name | > 'INDEX_COLUMNS'='name','bloom_compress'='true','bloom_fpp'='0.1','b > +--+--+--++--+ > 2 rows selected (0.315 seconds) > 0: jdbc:hive2://linux-32:22550/> drop index dm_datamap_test_1_2 on > datamap_test_1; > +-+ > | Result | > +-+ > +-+ > No rows selected (1.232 seconds) > 0: jdbc:hive2://linux-32:22550/> show indexes on table datamap_test_1; > +---+---+--+-+-++ > | Name | Provider | Indexed Columns | Properties | Status | Sync Info | > +---+---+--+-+-++ > +---+---+--+-+-++ > No rows selected (0.21 seconds) > 0: jdbc:hive2://linux-32:22550/> -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4078) add external segment and query with index server fails
[ https://issues.apache.org/jira/browse/CARBONDATA-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4078: - Fix Version/s: (was: 2.2.0) 2.1.1 > add external segment and query with index server fails > -- > > Key: CARBONDATA-4078 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4078 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Fix For: 2.1.1 > > Attachments: is_noncarbonsegments stacktrace > > Time Spent: 3h 20m > Remaining Estimate: 0h > > index server tries to cache parquet/orc segments and fails as it cannot read > the file format when the fallback mode is disabled. > Ex: 'test parquet table' test case > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4093) Add logs for MV and method to verify if mv is in Sync during query
[ https://issues.apache.org/jira/browse/CARBONDATA-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4093: - Fix Version/s: (was: 2.2.0) 2.1.1 > Add logs for MV and method to verify if mv is in Sync during query > -- > > Key: CARBONDATA-4093 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4093 > Project: CarbonData > Issue Type: Improvement >Reporter: Indhumathi Muthu Murugesh >Assignee: Indhumathi Muthu Murugesh >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4076) Query having Subquery alias used in query projection doesnot hit mv after creation
[ https://issues.apache.org/jira/browse/CARBONDATA-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4076: - Fix Version/s: (was: 2.2.0) 2.1.1 > Query having Subquery alias used in query projection doesnot hit mv after > creation > -- > > Key: CARBONDATA-4076 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4076 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthu Murugesh >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > {color:#067d17}CREATE TABLE fact_table1 (empname String, designation String, > doj Timestamp, > {color}{color:#067d17}workgroupcategory int, workgroupcategoryname String, > deptno int, deptname String, > {color}{color:#067d17}projectcode int, projectjoindate Timestamp, > projectenddate Timestamp,attendance int, > {color}{color:#067d17}utilization int,salary int) > {color}{color:#067d17}STORED AS carbondata;{color} > {color:#067d17}create materialized view mv_sub as select empname, sum(result) > sum_ut from (select empname, utilization result from fact_table1) fact_table1 > group by empname; > {color} > > {color:#067d17}select empname, sum(result) sum_ut from (select empname, > utilization result from fact_table1) fact_table1 group by empname;{color} > > {color:#067d17}explain select empname, sum(result) sum_ut from (select > empname, utilization result from fact_table1) fact_table1 group by > empname;{color} > > {color:#067d17}Expected: Query should hit MV{color} > {color:#067d17}Actual: Query is not hitting MV{color} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4092) Insert command fails with concurrent delete segment operation
[ https://issues.apache.org/jira/browse/CARBONDATA-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4092: - Fix Version/s: (was: 2.2.0) 2.1.1 > Insert command fails with concurrent delete segment operation > - > > Key: CARBONDATA-4092 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4092 > Project: CarbonData > Issue Type: Bug >Reporter: Vikram Ahuja >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3987) Issues in SDK Pagination reader (2 issues)
[ https://issues.apache.org/jira/browse/CARBONDATA-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-3987: - Fix Version/s: (was: 2.2.0) 2.1.1 > Issues in SDK Pagination reader (2 issues) > -- > > Key: CARBONDATA-3987 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3987 > Project: CarbonData > Issue Type: Bug > Components: other >Affects Versions: 2.1.0 >Reporter: Chetan Bhat >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 6.5h > Remaining Estimate: 0h > > Issue 1 : > write data to table and insert into one more row , error is thrown when try > to read new added row where as getTotalRows get incremented by 1. > Test code- > /** > * Carbon Files are written using CarbonWriter in outputpath > * > * Carbon Files are read using paginationCarbonReader object > * Checking pagination with insert on large data with 8 split > */ > @Test > public void testSDKPaginationInsertData() throws IOException, > InvalidLoadOptionException, InterruptedException { > System.out.println("___" + > name.getMethodName() + " TestCase Execution is > started"); > // > // String outputPath1 = getOutputPath(outputDir, name.getMethodName() + > "large"); > // > // long uid = 123456; > // TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai")); > // writeMultipleCarbonFiles("id int,name string,rank short,salary > double,active boolean,dob date,doj timestamp,city string,dept string", > getDatas(), outputPath1, uid, null, null); > // > // System.out.println("Data is written"); > List data1 = new ArrayList(); > String[] row1 = \{"1", "AAA", "3", "3444345.66", "true", "1979-12-09", > "2011-2-10 1:00:20", "Pune", "IT"}; > String[] row2 = \{"2", "BBB", "2", "543124.66", "false", "1987-2-19", > "2017-1-1 12:00:20", "Bangalore", "DATA"}; > String[] row3 = \{"3", "CCC", "1", "787878.888", "false", "1982-05-12", > "2015-12-1 2:20:20", "Pune", "DATA"}; > String[] row4 = \{"4", "DDD", "1", "9.24", "true", "1981-04-09", > "2000-1-15 7:00:20", "Delhi", "MAINS"}; > String[] row5 = \{"5", "EEE", "3", "545656.99", "true", "1987-12-09", > "2017-11-25 04:00:20", "Delhi", "IT"}; > data1.add(row1); > data1.add(row2); > data1.add(row3); > data1.add(row4); > data1.add(row5); > String outputPath1 = getOutputPath(outputDir, name.getMethodName() + "large"); > long uid = 123456; > TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai")); > writeMultipleCarbonFiles("id int,name string,rank short,salary double,active > boolean,dob date,doj timestamp,city string,dept string", data1, outputPath1, > uid, null, null); > System.out.println("Data is written"); > String hdfsPath1 = moveFiles(outputPath1, outputPath1); > String datapath1 = hdfsPath1.concat("/" + name.getMethodName() + "large"); > System.out.println("HDFS Data Path is: " + datapath1); > runSQL("create table " + name.getMethodName() + "large" + " using carbon > location '" + datapath1 + "'"); > System.out.println("Table " + name.getMethodName() + " is created > Successfully"); > runSQL("select count(*) from " + name.getMethodName() + "large"); > long uid1 = 123; > String outputPath = getOutputPath(outputDir, name.getMethodName()); > List data = new ArrayList(); > String[] row = \{"222", "Daisy", "3", "334.456", "true", "1956-11-08", > "2013-12-10 12:00:20", "Pune", "IT"}; > data.add(row); > writeData("id int,name string,rank short,salary double,active boolean,dob > date,doj timestamp,city string,dept string", data, outputPath, uid, null, > null); > String hdfsPath = moveFiles(outputPath, outputPath); > String datapath = hdfsPath.concat("/" + name.getMethodName()); > runSQL("create table " + name.getMethodName() + " using carbon location '" + > datapath + "'"); > runSQL("select count(*) from " + name.getMethodName()); > System.out.println("Insert--"); > runSQL("insert into table " + name.getMethodName() + " select * from " + > name.getMethodName() + "large"); > System.out.println("Inserted"); > System.out.println("--After Insert--"); > System.out.println("Query 1"); > runSQL("select count(*) from " + name.getMethodName()); > // configure cache size = 4 blocklet > CarbonProperties.getInstance() > > .addProperty(CarbonCommonConstants.CARBON_MAX_PAGINATION_LRU_CACHE_SIZE_IN_MB, > "4"); > CarbonReaderBuilder carbonReaderBuilder = CarbonReader.builder(datapath, > "_temp").withPaginationSupport().projection(new > String[]\{"id","name","rank","salary","active","dob","doj","city","dept"}); > PaginationCarbonReader paginationCarbonReader = > (PaginationCarbonReader) carbonReaderBuilder.build(); > File[] dataFiles1 = new File(datapath).li
[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement
[ https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4051: - Fix Version/s: (was: 2.2.0) 2.1.1 > Geo spatial index algorithm improvement and UDFs enhancement > > > Key: CARBONDATA-4051 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4051 > Project: CarbonData > Issue Type: New Feature >Reporter: Jiayu Shen >Priority: Minor > Fix For: 2.1.1 > > Attachments: CarbonData Spatial Index Design Doc v2.docx > > Time Spent: 21h 10m > Remaining Estimate: 0h > > The requirement is from SEQ,related algorithms are provided by Discovery Team. > 1. Replace geohash encoded algorithm, and reduce required properties of > CREATE TABLE. For example, > {code:java} > CREATE TABLE geoTable( > timevalue BIGINT, > longitude LONG, > latitude LONG) COMMENT "This is a GeoTable" > STORED AS carbondata > TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', > 'SPATIAL_INDEX.mygeohash.type'='geohash', > 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', > 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', > 'SPATIAL_INDEX.mygeohash.gridSize'='50', > 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} > 2. Add geo query UDFs > query filter UDFs : > * _*InPolygonList (List polygonList, OperationType opType)*_ > * _*InPolylineList (List polylineList, Float bufferInMeter)*_ > * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ > *operation only support :* > * *"OR", means calculating union of two polygons* > * *"AND", means calculating intersection of two polygons* > geo util UDFs : > * _*GeoIdToGridXy(Long geoId) :* *Pair*_ > * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_ > * _*GeoIdToLatLng(Long geoId) : Pair*_ > * _*ToUpperLayerGeoId(Long geoId) : Long*_ > * _*ToRangeList (String polygon) : List*_ > 3. Currently GeoID is a column created internally for spatial tables, this PR > will support GeoID column to be customized during LOAD/INSERT INTO. For > example, > {code:java} > INSERT INTO geoTable SELECT 0,157542840,116285807,40084087; > It uesed to be as below, '855280799612' is generated internally, > ++-+-++ > |mygeohash |timevalue |longitude|latitude| > ++-+-++ > |855280799612|157542840|116285807|40084087| > ++-+-++ > but now is > ++-+-++ > |mygeohash |timevalue |longitude|latitude| > ++-+-++ > |0 |157542840|116285807|40084087| > ++-+-++{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4111) Filter query having invalid results after add segment to table having SI with Indexserver
[ https://issues.apache.org/jira/browse/CARBONDATA-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4111: - Fix Version/s: (was: 2.2.0) 2.1.1 > Filter query having invalid results after add segment to table having SI with > Indexserver > - > > Key: CARBONDATA-4111 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4111 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Fix For: 2.1.1 > > Attachments: addseg_si_is.png > > Time Spent: 7h 50m > Remaining Estimate: 0h > > queries to execute: > create table maintable_sdk(a string, b int, c string) stored as carbondata; > insert into maintable_sdk select 'k',1,'k'; > insert into maintable_sdk select 'l',2,'l'; > CREATE INDEX maintable_si_sdk on table maintable_sdk (c) as 'carbondata'; > alter table maintable_sdk add segment > options('path'='hdfs://hacluster/sdkfiles/newsegment/', 'format'='carbon'); > spark-sql> select *from maintable_sdk where c='m'; > 2021-01-27 12:10:54,326 | WARN | IPC Client (653337757) connection to > linux-30/10.19.90.30:22900 from car...@hadoop.com | Unexpected error reading > responses on connection Thread[IPC Client (653337757) connection to > linux-30/10.19.90.30:22900 from car...@hadoop.com,5,main] | > org.apache.hadoop.ipc.Client.run(Client.java:1113) > java.lang.RuntimeException: java.lang.NoSuchMethodException: > org.apache.carbondata.core.indexstore.SegmentWrapperContainer.() > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:135) > at > org.apache.hadoop.io.WritableFactories.newInstance(WritableFactories.java:58) > at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:284) > at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77) > at > org.apache.hadoop.ipc.RpcWritable$WritableWrapper.readFrom(RpcWritable.java:85) > at org.apache.hadoop.ipc.RpcWritable$Buffer.getValue(RpcWritable.java:187) > at org.apache.hadoop.ipc.RpcWritable$Buffer.newInstance(RpcWritable.java:183) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1223) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1107) > Caused by: java.lang.NoSuchMethodException: > org.apache.carbondata.core.indexstore.SegmentWrapperContainer.() > at java.lang.Class.getConstructor0(Class.java:3082) > at java.lang.Class.getDeclaredConstructor(Class.java:2178) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129) > ... 8 more > 2021-01-27 12:10:54,330 | WARN | main | Distributed Segment Pruning failed, > initiating embedded pruning | > org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin$.getFilteredSegments(BroadCastSIFilterPushJoin.scala:349) > java.lang.reflect.UndeclaredThrowableException > at com.sun.proxy.$Proxy59.getPrunedSegments(Unknown Source) > at > org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin$.getFilteredSegments(BroadCastSIFilterPushJoin.scala:341) > at > org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin$.getFilteredSegments(BroadCastSIFilterPushJoin.scala:426) > at > org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin.partitions$lzycompute(BroadCastSIFilterPushJoin.scala:80) > at > org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin.partitions(BroadCastSIFilterPushJoin.scala:78) > at > org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin.inputCopy$lzycompute(BroadCastSIFilterPushJoin.scala:94) > at > org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin.inputCopy(BroadCastSIFilterPushJoin.scala:93) > at > org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin.doExecute(BroadCastSIFilterPushJoin.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:177) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:201) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:198) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:173) > at > org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:293) > at > org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:342) > at > org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:372) > at > org.apache.spark.sql.execution.QueryExecution.hiveResultString(QueryExecution.scala:127) > at > org.apache.s
[jira] [Updated] (CARBONDATA-4094) Select count(*) on partition table fails in index server fallback mode
[ https://issues.apache.org/jira/browse/CARBONDATA-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4094: - Fix Version/s: (was: 2.2.0) 2.1.1 > Select count(*) on partition table fails in index server fallback mode > -- > > Key: CARBONDATA-4094 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4094 > Project: CarbonData > Issue Type: Bug >Reporter: Vikram Ahuja >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4075) Should refactor to use withEvents instead of fireEvent
[ https://issues.apache.org/jira/browse/CARBONDATA-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4075: - Fix Version/s: (was: 2.2.0) 2.1.1 > Should refactor to use withEvents instead of fireEvent > -- > > Key: CARBONDATA-4075 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4075 > Project: CarbonData > Issue Type: Improvement >Reporter: David Cai >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 5h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4124) Refresh MV which does not exist is not throwing proper message
[ https://issues.apache.org/jira/browse/CARBONDATA-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4124: - Fix Version/s: (was: 2.2.0) 2.1.1 > Refresh MV which does not exist is not throwing proper message > -- > > Key: CARBONDATA-4124 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4124 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthu Murugesh >Assignee: Indhumathi Muthu Murugesh >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4110) Support clean files dry run and show statistics after clean files operation
[ https://issues.apache.org/jira/browse/CARBONDATA-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4110: - Fix Version/s: (was: 2.2.0) 2.1.1 > Support clean files dry run and show statistics after clean files operation > --- > > Key: CARBONDATA-4110 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4110 > Project: CarbonData > Issue Type: New Feature >Reporter: Vikram Ahuja >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 26h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4053) Alter table rename column failed
[ https://issues.apache.org/jira/browse/CARBONDATA-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4053: - Fix Version/s: (was: 2.2.0) 2.1.1 > Alter table rename column failed > > > Key: CARBONDATA-4053 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4053 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 2.1.0 >Reporter: Yahui Liu >Priority: Major > Fix For: 2.1.1 > > Attachments: 截图.PNG > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Alter table rename column failed because incorrectly replace the content in > tblproperties by new column name, which the content is not related to column > name. > !截图.PNG! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4052) Select query on SI table after insert overwrite is giving wrong result.
[ https://issues.apache.org/jira/browse/CARBONDATA-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4052: - Fix Version/s: (was: 2.2.0) 2.1.1 > Select query on SI table after insert overwrite is giving wrong result. > --- > > Key: CARBONDATA-4052 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4052 > Project: CarbonData > Issue Type: Bug >Reporter: Nihal kumar ojha >Priority: Major > Fix For: 2.1.1 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > # Create carbon table. > # Create SI table on the same carbon table. > # Do load or insert operation. > # Run query insert overwrite on maintable. > # Now select query on SI table is showing old as well as new data which > should be only new data. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4066) data mismatch observed with SI and without SI when SI global sort and SI segment merge is true
[ https://issues.apache.org/jira/browse/CARBONDATA-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4066: - Fix Version/s: (was: 2.2.0) 2.1.1 > data mismatch observed with SI and without SI when SI global sort and SI > segment merge is true > -- > > Key: CARBONDATA-4066 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4066 > Project: CarbonData > Issue Type: Bug >Reporter: Mahesh Raju Somalaraju >Priority: Major > Fix For: 2.1.1 > > Time Spent: 50m > Remaining Estimate: 0h > > data mismatch observed with SI and without SI when SI global sort and SI > segment merge is true > > test case for reproduce the issue: > CarbonProperties.getInstance() > .addProperty(CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE, "true") > sql("create table complextable2 (id int, name string, country array) > stored as " + > "carbondata tblproperties('sort_scope'='global_sort','sort_columns'='name')") > sql( > s"load data inpath '$resourcesPath/secindex/array.csv' into table > complextable2 options('delimiter'=','," + > > "'quotechar'='\"','fileheader'='id,name,country','complex_delimiter_level_1'='$'," > + > "'global_sort_partitions'='10')") > val result = sql(" select * from complextable2 where > array_contains(country,'china')") > sql("create index index_2 on table complextable2(country) as 'carbondata' > properties" + > "('sort_scope'='global_sort')") > checkAnswer(sql("select count(*) from complextable2 where > array_contains(country,'china')"), > sql("select count(*) from complextable2 where > ni(array_contains(country,'china'))")) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4056) Adding global sort support for SI segments data files merge operation.
[ https://issues.apache.org/jira/browse/CARBONDATA-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4056: - Fix Version/s: (was: 2.2.0) 2.1.1 > Adding global sort support for SI segments data files merge operation. > -- > > Key: CARBONDATA-4056 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4056 > Project: CarbonData > Issue Type: New Feature > Components: other >Affects Versions: 2.0.0 >Reporter: Karan >Priority: Major > Fix For: 2.1.1 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Enabling carbon property (carbon.si.segment.merge) helps to reduce number of > carbondata files in the SI segments. When SI is created with sort scope as > global sort and this carbon property is enabled, then the data in SI segments > must be globally sorted after data files are merged. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4054) Size control of minor compaction
[ https://issues.apache.org/jira/browse/CARBONDATA-4054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4054: - Fix Version/s: (was: 2.2.0) 2.1.1 > Size control of minor compaction > > > Key: CARBONDATA-4054 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4054 > Project: CarbonData > Issue Type: Improvement >Reporter: ZHANGSHUNYU >Priority: Major > Fix For: 2.1.1 > > Time Spent: 10h > Remaining Estimate: 0h > > {{Currentlly, minor compaction only consider the num of segments and major}} > compaction only consider the SUM size of segments, but consider a scenario > that the user want to use minor compaction by the num of segments but he > dont want to merge the segment whose datasize larger the threshold for > example 2GB, as it is no need to merge so much big segment and it is time > costly. > so we need to add a parameter to control the threshold of segment included > in minor compaction, so that the user can specify the segment not included > in minor compaction once the datasize exeed the threshold, of course default > value must be threre. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4067) Change clean files behaviour to support cleaning of in progress segments
[ https://issues.apache.org/jira/browse/CARBONDATA-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4067: - Fix Version/s: (was: 2.2.0) 2.1.1 > Change clean files behaviour to support cleaning of in progress segments > > > Key: CARBONDATA-4067 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4067 > Project: CarbonData > Issue Type: Improvement >Reporter: Vikram Ahuja >Priority: Major > Fix For: 2.1.1 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Change clean files behaviour to support cleaning of in progress segments -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4062) Should make clean files become data trash manager
[ https://issues.apache.org/jira/browse/CARBONDATA-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4062: - Fix Version/s: (was: 2.2.0) 2.1.1 > Should make clean files become data trash manager > - > > Key: CARBONDATA-4062 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4062 > Project: CarbonData > Issue Type: Improvement >Reporter: David Cai >Priority: Major > Fix For: 2.1.1 > > Time Spent: 26h 10m > Remaining Estimate: 0h > > To prevent accidental deletion of data, carbon will introduce data trash > management. It will provide buffer time for accidental deletion of data to > roll back the delete operation. > Data trash management is a part of carbon data lifecycle management. Clean > files as a data trash manager should contain the following two parts. > part 1: manage metadata-indexed data trash. > This data is at the original place of the table and indexed by metadata. > carbon manages this data by metadata index and should avoid using listFile() > interface. > part 2: manage ".Trash" folder. > Now ".Trash" folder is without metadata index, and the operation on it > bases on timestamp and listFile() interface. In the future, carbon will index > ".Trash" folder to improve data trash management. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3908) When a carbon segment is added through the alter add segments query, then it is not accounting the added carbon segment values.
[ https://issues.apache.org/jira/browse/CARBONDATA-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-3908: - Fix Version/s: (was: 2.2.0) 2.1.1 > When a carbon segment is added through the alter add segments query, then it > is not accounting the added carbon segment values. > --- > > Key: CARBONDATA-3908 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3908 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: FI cluster and opensource cluster. >Reporter: Prasanna Ravichandran >Priority: Major > Fix For: 2.1.1 > > > When a carbon segment is added through the alter add segments query, then it > is not accounting the added carbon segment values. If we do count(*) on the > added segment, then it is always showing as 0. > Test queries: > drop table if exists uniqdata; > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) stored as carbondata; > load data inpath 'hdfs://hacluster/BabuStore/Data/2000_UniqData.csv' into > table uniqdata > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); > --hdfs dfs -mkdir /uniqdata-carbon-segment; > --hdfs dfs -cp /user/hive/warehouse/uniqdata/Fact/Part0/Segment_0/* > /uniqdata-carbon-segment/ > Alter table uniqdata add segment options > ('path'='hdfs://hacluster/uniqdata-carbon-segment/','format'='carbon'); > select count(*) from uniqdata;--4000 expected as one load of 2000 records > happened and same segment is added again; > set carbon.input.segments.default.uniqdata=1; > select count(*) from uniqdata;--2000 expected - it should just show the > records count of added segments; > CONSOLE: > /> set carbon.input.segments.default.uniqdata=1; > +-++ > | key | value | > +-++ > | carbon.input.segments.default.uniqdata | 1 | > +-++ > 1 row selected (0.192 seconds) > /> select count(*) from uniqdata; > INFO : Execution ID: 1734 > +---+ > | count(1) | > +---+ > | 2000 | > +---+ > 1 row selected (4.036 seconds) > /> set carbon.input.segments.default.uniqdata=2; > +-++ > | key | value | > +-++ > | carbon.input.segments.default.uniqdata | 2 | > +-++ > 1 row selected (0.088 seconds) > /> select count(*) from uniqdata; > INFO : Execution ID: 1745 > +---+ > | count(1) | > +---+ > | 2000 | > +---+ > 1 row selected (6.056 seconds) > /> set carbon.input.segments.default.uniqdata=3; > +-++ > | key | value | > +-++ > | carbon.input.segments.default.uniqdata | 3 | > +-++ > 1 row selected (0.161 seconds) > /> select count(*) from uniqdata; > INFO : Execution ID: 1753 > +---+ > | count(1) | > +---+ > | 0 | > +---+ > 1 row selected (4.875 seconds) > /> show segments for table uniqdata; > +-+--+--+--+++-+--+ > | ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | > Index Size | File Format | > +-+--+--+--+++-+--+ > | 4 | Success | 2020-07-17 16:01:53.673 | 5.579S | {} | 269.10KB | 7.21KB | > columnar_v3 | > | 3 | Success | 2020-07-17 16:00:24.866 | 0.578S | {} | 88.55KB | 1.81KB | > columnar_v3 | > | 2 | Success | 2020-07-17 15:07:54.273 | 0.642S | {} | 36.72KB | NA | orc | > | 1 | Success | 2020-07-17 15:03:59.767 | 0.564S | {} | 89.26KB | NA | > parquet | > | 0 | Success | 2020-07-16 12:44:32.095 | 4.484S | {} | 88.55KB | 1.81KB | > columnar_v3 | > +-+--+--+--+++-+--+ > Expected result: Records added by adding carbon segment should be considered. > Actual result: Records added by adding carbon segment is not considered. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4068) Alter table set long string should not allowed on SI column.
[ https://issues.apache.org/jira/browse/CARBONDATA-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4068: - Fix Version/s: (was: 2.2.0) 2.1.1 > Alter table set long string should not allowed on SI column. > > > Key: CARBONDATA-4068 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4068 > Project: CarbonData > Issue Type: Bug >Reporter: Nihal kumar ojha >Priority: Major > Fix For: 2.1.1 > > Time Spent: 1.5h > Remaining Estimate: 0h > > # Create table and create SI. > # Now try to set the column data type to long string on which SI is created. > Operation should not be allowed because we don't support SI on long string. > create table maintable (a string,b string,c int) STORED AS carbondata; > create index indextable on table maintable(b) AS 'carbondata'; > insert into maintable values('k','x',2); > ALTER TABLE maintable SET TBLPROPERTIES('long_String_columns'='b'); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4069) Alter table set streaming=true should not be allowed on SI table or table having SI.
[ https://issues.apache.org/jira/browse/CARBONDATA-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4069: - Fix Version/s: (was: 2.2.0) 2.1.1 > Alter table set streaming=true should not be allowed on SI table or table > having SI. > > > Key: CARBONDATA-4069 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4069 > Project: CarbonData > Issue Type: Bug >Reporter: Nihal kumar ojha >Priority: Major > Fix For: 2.1.1 > > Time Spent: 5.5h > Remaining Estimate: 0h > > # Create carbon table and SI . > # Now set streaming = true on either SI table or main table. > Both the operation should not be allowed because SI is not supported on > streaming table. > > create table maintable2 (a string,b string,c int) STORED AS carbondata; > insert into maintable2 values('k','x',2); > create index m_indextable on table maintable2(b) AS 'carbondata'; > ALTER TABLE maintable2 SET TBLPROPERTIES('streaming'='true'); => operation > should not be allowed. > ALTER TABLE m_indextable SET TBLPROPERTIES('streaming'='true') => operation > should not be allowed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4040) Data mismatch incase of compaction failure and retry success
[ https://issues.apache.org/jira/browse/CARBONDATA-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4040: - Fix Version/s: (was: 2.2.0) 2.1.1 > Data mismatch incase of compaction failure and retry success > > > Key: CARBONDATA-4040 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4040 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Major > Fix For: 2.1.1 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > For compaction we don't register inprogress segment. so, when unable to get > table status lock. compaction can fail. That time compaction partial segment > need to be cleaned. If the partial segment is failed to cleanup due to unable > to get lock or IO issues. When the user retries the compaction. carbon uses > same segment id. so while writing the segment file for new compaction. list > only the files mapping to the current compaction, not all the files which > contains stale files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4081) Clean files considering files apart from .segment files while cleaning stale segments and moving them to trash
[ https://issues.apache.org/jira/browse/CARBONDATA-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4081: - Fix Version/s: (was: 2.2.0) 2.1.1 > Clean files considering files apart from .segment files while cleaning stale > segments and moving them to trash > -- > > Key: CARBONDATA-4081 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4081 > Project: CarbonData > Issue Type: Bug >Reporter: Vikram Ahuja >Priority: Major > Fix For: 2.1.1 > > Time Spent: 4h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4087) Issue with huge data(exceeding 32K records) after enabling local dictionary
[ https://issues.apache.org/jira/browse/CARBONDATA-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4087: - Fix Version/s: (was: 2.2.0) 2.1.1 > Issue with huge data(exceeding 32K records) after enabling local dictionary > --- > > Key: CARBONDATA-4087 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4087 > Project: CarbonData > Issue Type: Bug > Components: core, presto-integration >Reporter: Akshay >Priority: Major > Fix For: 2.1.1 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > For large data SELECT on array(varchar) throws exception- > "Error in Reading Data from Carbondata" due to ArrayOutOfBounds > > https://github.com/apache/carbondata/pull/4055 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4072) Clean files command is not deleting .segment files present at metadata/segments/xxxxx.segment for the segments added through alter table add segment query.
[ https://issues.apache.org/jira/browse/CARBONDATA-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4072: - Fix Version/s: (was: 2.2.0) 2.1.1 > Clean files command is not deleting .segment files present at > metadata/segments/x.segment for the segments added through alter table > add segment query. > --- > > Key: CARBONDATA-4072 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4072 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 2.1.0 >Reporter: Karan >Priority: Major > Fix For: 2.1.1 > > Time Spent: 7.5h > Remaining Estimate: 0h > > Clean files command is not deleting .segment files present at > metadata/segments/x.segment for the segments added through alter table > add segment query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4071) If date or timestamp columns are present as child of complex columns, then its giving wrong results on reading through SDK.
[ https://issues.apache.org/jira/browse/CARBONDATA-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4071: - Fix Version/s: (was: 2.2.0) 2.1.1 > If date or timestamp columns are present as child of complex columns, then > its giving wrong results on reading through SDK. > --- > > Key: CARBONDATA-4071 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4071 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 2.1.0 >Reporter: Karan >Priority: Major > Fix For: 2.1.1 > > Time Spent: 5h > Remaining Estimate: 0h > > If a date or timestamp column is present as child of complex column and on > reading its value through SDK gives wrong results. For eg: Array -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4084) Error when loading string field with high cardinary
[ https://issues.apache.org/jira/browse/CARBONDATA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4084: - Fix Version/s: (was: 2.2.0) 2.1.1 > Error when loading string field with high cardinary > > > Key: CARBONDATA-4084 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4084 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 2.0.0, 2.1.0, 2.0.1 >Reporter: Nguyen Dinh Huynh >Priority: Major > Labels: patch > Fix For: 2.1.1 > > Attachments: image-2020-12-14-22-40-45-539.png, > image_2020_12_13T09_29_38_891Z.png > > Time Spent: 1h > Remaining Estimate: 0h > > When i am try load string field with more than 1M distinct value, some rows > show strange value. > !image_2020_12_13T09_29_38_891Z.png! > I'm trying with this setting: carbon.local.dictionary.enable=false then it > works as expect. So seems like have some bugs on decoder fallback. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4099) Fix Concurrent issues with clean files post event listener
[ https://issues.apache.org/jira/browse/CARBONDATA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4099: - Fix Version/s: (was: 2.2.0) 2.1.1 > Fix Concurrent issues with clean files post event listener > -- > > Key: CARBONDATA-4099 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4099 > Project: CarbonData > Issue Type: Bug >Reporter: Vikram Ahuja >Priority: Major > Fix For: 2.1.1 > > Time Spent: 50m > Remaining Estimate: 0h > > There were 2 issues in the clean files post event listener: > # In concurrent cases, while writing entry back to the table status file, > wrong path was given, due to which table status file was not updated in the > case of SI table. > # While writing the loadmetadetails to the table status file during > concurrent scenarios, we were only writing the unwanted segments and not all > the segments, which could make segments stale in the SI table > Due to these 2 issues, when selet query is executed on SI table, the > tablestatus would have entry for a segment but it's carbondata file would be > deleted, thus throwing an IO Exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4095) Select Query with SI filter fails, when columnDrift is enabled
[ https://issues.apache.org/jira/browse/CARBONDATA-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4095: - Fix Version/s: (was: 2.2.0) 2.1.1 > Select Query with SI filter fails, when columnDrift is enabled > -- > > Key: CARBONDATA-4095 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4095 > Project: CarbonData > Issue Type: Improvement >Reporter: Indhumathi Muthu Murugesh >Assignee: Indhumathi Muthu Murugesh >Priority: Major > Fix For: 2.1.1 > > Time Spent: 1.5h > Remaining Estimate: 0h > > sql({color:#067d17}"drop table if exists maintable"{color}) > sql({color:#067d17}"create table maintable (a string,b string,c int,d int) > STORED AS carbondata "{color}) > sql({color:#067d17}"insert into maintable values('k','d',2,3)"{color}) > sql({color:#067d17}"alter table maintable set > tblproperties('sort_columns'='c,d','sort_scope'='local_sort')"{color}) > sql({color:#067d17}"create index indextable on table maintable(b) AS > 'carbondata'"{color}) > sql({color:#067d17}"insert into maintable values('k','x',2,4)"{color}) > sql({color:#067d17}"select * from maintable where b='x'"{color}).show(false) > > > > > 2020-12-22 18:58:37 ERROR Executor:91 - Exception in task 0.0 in stage 40.0 > (TID 422) > java.lang.RuntimeException: Error while resolving filter expression > at > org.apache.carbondata.core.index.IndexFilter.resolveFilter(IndexFilter.java:283) > at > org.apache.carbondata.core.index.IndexFilter.getResolver(IndexFilter.java:203) > at > org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:152) > at > org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:382) > at > org.apache.carbondata.core.scan.executor.impl.VectorDetailQueryExecutor.execute(VectorDetailQueryExecutor.java:43) > at > org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.initialize(VectorizedCarbonRecordReader.java:141) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:540) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.scan_nextBatch_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$12$$anon$1.hasNext(WholeStageCodegenExec.scala:631) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolverBasedOnExpressionType(FilterExpressionProcessor.java:190) > at > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.createFilterResolverTree(FilterExpressionProcessor.java:128) > at > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.createFilterResolverTree(FilterExpressionProcessor.java:121) > at > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolverTree(FilterExpressionProcessor.java:77) > at > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolver(FilterExpressionProcessor.java:61) > at > org.apache.carbondata.core.index.IndexFilter.resolveFilter(IndexFilter.java:281) > ... 26 more > 2020-12-22 18:58:37 ERROR TaskSetManager:70 - Task 0 in stage 40.0 failed 1 >
[jira] [Updated] (CARBONDATA-4073) Added FT for missing scenarios in Presto
[ https://issues.apache.org/jira/browse/CARBONDATA-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4073: - Fix Version/s: (was: 2.2.0) 2.1.1 > Added FT for missing scenarios in Presto > > > Key: CARBONDATA-4073 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4073 > Project: CarbonData > Issue Type: Test > Components: presto-integration >Reporter: Akshay >Priority: Major > Fix For: 2.1.1 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > FT for following has been added. > update without local-dict > delete operation > minor, major, custom compaction > add and delete segments > test update with inverted index > read with partition columns > Filter on partition columns > Bloom index > test range columns > read streaming data -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4100) SI loads are in inconsistent state with maintable after concurrent(Load&Compaction) operation
[ https://issues.apache.org/jira/browse/CARBONDATA-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4100: - Fix Version/s: (was: 2.2.0) 2.1.1 > SI loads are in inconsistent state with maintable after > concurrent(Load&Compaction) operation > - > > Key: CARBONDATA-4100 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4100 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthu Murugesh >Priority: Major > Fix For: 2.1.1 > > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4070) Handle the scenario mentioned in description for SI.
[ https://issues.apache.org/jira/browse/CARBONDATA-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4070: - Fix Version/s: (was: 2.2.0) 2.1.1 > Handle the scenario mentioned in description for SI. > > > Key: CARBONDATA-4070 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4070 > Project: CarbonData > Issue Type: Bug >Reporter: Nihal kumar ojha >Priority: Major > Fix For: 2.1.1 > > Time Spent: 8h 20m > Remaining Estimate: 0h > > # SI creation should not be allowed on SI table. > # SI table should not be scanned with like filter on MT. > # Drop column should not be allowed on SI table. > Add the FT for all above scenario and sort column related scenario. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4059) Block compaction on SI table.
[ https://issues.apache.org/jira/browse/CARBONDATA-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4059: - Fix Version/s: (was: 2.2.0) 2.1.1 > Block compaction on SI table. > - > > Key: CARBONDATA-4059 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4059 > Project: CarbonData > Issue Type: Bug >Reporter: Nihal kumar ojha >Priority: Major > Fix For: 2.1.1 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently compaction is allowed on SI table. Because of this if only SI table > is compacted then running filter query query on main table is causing more > data scan of SI table which will causing performance degradation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4104) Vector filling for Primitive decimal type needs to be handled
[ https://issues.apache.org/jira/browse/CARBONDATA-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4104: - Fix Version/s: (was: 2.2.0) 2.1.1 > Vector filling for Primitive decimal type needs to be handled > - > > Key: CARBONDATA-4104 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4104 > Project: CarbonData > Issue Type: Bug > Components: core >Reporter: Akshay >Priority: Major > Fix For: 2.1.1 > > Time Spent: 8h 10m > Remaining Estimate: 0h > > Filling of vectors in case of complex primitive decimal type whose precision > is greater than 18 is not handled properly. > for ex > array -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4112) Data mismatch issue in SI
[ https://issues.apache.org/jira/browse/CARBONDATA-4112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4112: - Fix Version/s: (was: 2.2.0) 2.1.1 > Data mismatch issue in SI > - > > Key: CARBONDATA-4112 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4112 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 2.1.0 >Reporter: Karan >Priority: Major > Fix For: 2.1.1 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > When data files of a SI segment are merged. It gives more number of rows in > SI table than main table -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4109) Improve carbondata coverage for presto-integration code
[ https://issues.apache.org/jira/browse/CARBONDATA-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4109: - Fix Version/s: (was: 2.2.0) 2.1.1 > Improve carbondata coverage for presto-integration code > --- > > Key: CARBONDATA-4109 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4109 > Project: CarbonData > Issue Type: Improvement > Components: core, presto-integration >Reporter: Akshay >Priority: Major > Fix For: 2.1.1 > > Time Spent: 5h > Remaining Estimate: 0h > > Few scenarios had missing coverage in presto-integration code. This PR aims > to improve it by considering all such scenarios. > Dead code- ObjectStreamReader.java was created with an aim to query complex > types. Instead ComplexTypeStreamReader was created. Making ObjectStreamreader > obsolete. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4125) SI compatability issue fix
[ https://issues.apache.org/jira/browse/CARBONDATA-4125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4125: - Fix Version/s: (was: 2.2.0) 2.1.1 > SI compatability issue fix > -- > > Key: CARBONDATA-4125 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4125 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthu Murugesh >Assignee: Indhumathi Muthu Murugesh >Priority: Major > Fix For: 2.1.1 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Refer > [http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Bug-SI-Compatibility-Issue-td105485.html] > for this issue -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4107) MV Performance and Lock issues
[ https://issues.apache.org/jira/browse/CARBONDATA-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4107: - Fix Version/s: (was: 2.2.0) 2.1.1 > MV Performance and Lock issues > -- > > Key: CARBONDATA-4107 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4107 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthu Murugesh >Priority: Major > Fix For: 2.1.1 > > Time Spent: 12h 10m > Remaining Estimate: 0h > > # After MV support multi-tenancy PR, mv system folder is moved to database > level. Hence, during each operation, insert/Load/IUD/show mv/query, we are > listing all the databases in the system and collecting mv schemas and > checking if there is any mv mapped to the table or not. This will degrade > performance of the query, to collect mv schemas from all databases, even > though the table has mv or not. > # When different jvm process call touchMDTFile method, file creation and > deletion can happen same time. This may fail the operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4102) Add UT and FT to improve coverage of SI module.
[ https://issues.apache.org/jira/browse/CARBONDATA-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4102: - Fix Version/s: (was: 2.2.0) 2.1.1 > Add UT and FT to improve coverage of SI module. > --- > > Key: CARBONDATA-4102 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4102 > Project: CarbonData > Issue Type: Bug >Reporter: Nihal kumar ojha >Priority: Major > Fix For: 2.1.1 > > Time Spent: 17h 10m > Remaining Estimate: 0h > > Add UT and FT to improve coverage of SI module and also remove dead or unused > code if exists. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4122) Support Writing Flink Stage data into Hdfs file system
[ https://issues.apache.org/jira/browse/CARBONDATA-4122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4122: - Fix Version/s: (was: 2.2.0) 2.1.1 > Support Writing Flink Stage data into Hdfs file system > -- > > Key: CARBONDATA-4122 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4122 > Project: CarbonData > Issue Type: Improvement >Reporter: Indhumathi Muthu Murugesh >Priority: Major > Fix For: 2.1.1 > > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3917) The rows of data loading is not accurate, more rows has been loaded
[ https://issues.apache.org/jira/browse/CARBONDATA-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-3917: - Fix Version/s: (was: 2.2.0) 2.1.1 > The rows of data loading is not accurate, more rows has been loaded > --- > > Key: CARBONDATA-3917 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3917 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 2.0.0 >Reporter: Taoli >Priority: Blocker > Fix For: 2.1.1 > > > 2020-07-18 18:46:23,856 | INFO | [Executor task launch worker for task 28380] > | Total rows processed in step Data Writer: 1277745 | > org.apache.carbondata.processing.loading.AbstractDataLoadProcessorStep.close(AbstractDataLoadProcessorStep.java:138) > 2020-07-18 18:46:23,857 | INFO | [Executor task launch worker for task 28380] > | Total rows processed in step Sort Processor: 1189959 | > org.apache.carbondata.processing.loading.AbstractDataLoadProcessorStep.close(AbstractDataLoadProcessorStep.java:138) > 2020-07-18 18:46:23,856 | DEBUG | > [LocalFolderDeletionPool:detail_cdr_s1mme_18461_1595087183856] | > PrivilegedAction as:omm (auth:SIMPLE) > from:org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:298) > | > org.apache.hadoop.security.UserGroupInformation.logPrivilegedAction(UserGroupInformation.java:1756) > 2020-07-18 18:46:23,857 | INFO | [Executor task launch worker for task 28380] > | Total rows processed in step Data Converter: 1189959 | > org.apache.carbondata.processing.loading.AbstractDataLoadProcessorStep.close(AbstractDataLoadProcessorStep.java:138) > 2020-07-18 18:46:23,857 | INFO | [Executor task launch worker for task 28380] > | Total rows processed in step Input Processor: 1189959 | > org.apache.carbondata.processing.loading.AbstractDataLoadProcessorStep.close(AbstractDataLoadProcessorStep.java:138) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4115) Return and show segment ID after successful load and insert, including patitioned table and normal table .
[ https://issues.apache.org/jira/browse/CARBONDATA-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4115: - Fix Version/s: (was: 2.2.0) 2.1.1 > Return and show segment ID after successful load and insert, including > patitioned table and normal table . > -- > > Key: CARBONDATA-4115 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4115 > Project: CarbonData > Issue Type: Improvement > Components: spark-integration >Reporter: lihongzhao >Priority: Major > Fix For: 2.1.1 > > Time Spent: 11h 20m > Remaining Estimate: 0h > > Return and show segment ID after successful load and insert, including > patitioned table and normal table . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4089) Create table with location, if the location didn't have scheme, the default will be local file system, which is not the file system defined by defaultFS
[ https://issues.apache.org/jira/browse/CARBONDATA-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4089: - Fix Version/s: (was: 2.2.0) 2.1.1 > Create table with location, if the location didn't have scheme, the default > will be local file system, which is not the file system defined by defaultFS > > > Key: CARBONDATA-4089 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4089 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 2.1.0 >Reporter: Yahui Liu >Priority: Blocker > Fix For: 2.1.1 > > Time Spent: 2.5h > Remaining Estimate: 0h > > If the location didn't specify scheme, should use the file system defined by > defaultFS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4152) Query failed with exception: java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at BlockletIndexFactory.getSegmentProperties
Yahui Liu created CARBONDATA-4152: - Summary: Query failed with exception: java.lang.IndexOutOfBoundsException: Index: 0 Size: 0 at BlockletIndexFactory.getSegmentProperties Key: CARBONDATA-4152 URL: https://issues.apache.org/jira/browse/CARBONDATA-4152 Project: CarbonData Issue Type: Bug Components: core Affects Versions: 2.1.0 Reporter: Yahui Liu There is chance that getIndexes(segment) return empty list and later call list.get(0) throw exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] jack86596 commented on pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
jack86596 commented on pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#issuecomment-800730205 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (CARBONDATA-4146) Query fails and the error message "unable to get file status" is displayed. query is normal after the "drop metacache on table" command is executed.
[ https://issues.apache.org/jira/browse/CARBONDATA-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuhe0702 closed CARBONDATA-4146. - Resolution: Duplicate > Query fails and the error message "unable to get file status" is displayed. > query is normal after the "drop metacache on table" command is executed. > - > > Key: CARBONDATA-4146 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4146 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.6.1, 2.0.0, 2.1.0 >Reporter: liuhe0702 >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > During compact execution, the status of the new segment is set to success > before index files are merged. After index files are merged, the carbonindex > files are deleted. As a result, the query task cannot find the cached > carbonindex files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] liuhe0702 closed pull request #4103: [CARBONDATA-4145] Query fails and the message "File does not exist: xxx.carbondata" is displayed
liuhe0702 closed pull request #4103: URL: https://github.com/apache/carbondata/pull/4103 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (CARBONDATA-4145) Query fails and the message "File does not exist: xxxx.carbondata" is displayed
[ https://issues.apache.org/jira/browse/CARBONDATA-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuhe0702 closed CARBONDATA-4145. - Resolution: Duplicate > Query fails and the message "File does not exist: .carbondata" is > displayed > --- > > Key: CARBONDATA-4145 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4145 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.6.1, 2.0.0, 2.1.0 >Reporter: liuhe0702 >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > An exception occurs when the rebuild/refresh index command is executed. After > that, the query command fails to be executed, and the message "File does not > exist: > /user/hive/warehouse/carbon.store/sys/idx_tbl_data_event_carbon_user_num/Fact/Part0/Segment_27670/part-1-28_batchno0-0-x.carbondata" > is displayed and the idx_tbl_data_event_carbon_user_num table is secondary > index table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] liuhe0702 closed pull request #4104: [CARBONDATA-4146]Query fails and the error message "unable to get file status" is displayed. query is normal after the "drop metacache on tab
liuhe0702 closed pull request #4104: URL: https://github.com/apache/carbondata/pull/4104 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
CarbonDataQA2 commented on pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#issuecomment-800567860 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3805/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
CarbonDataQA2 commented on pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#issuecomment-800566842 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5571/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] jack86596 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
jack86596 commented on a change in pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#discussion_r595406480 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestIndexRepair.scala ## @@ -119,6 +119,19 @@ class TestIndexRepair extends QueryTest with BeforeAndAfterAll { sql("drop table if exists maintable") } + test("reindex command with stale files") { +sql("drop table if exists maintable") +sql("CREATE TABLE maintable(a INT, b STRING, c STRING) stored as carbondata") +sql("CREATE INDEX indextable1 on table maintable(c) as 'carbondata'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("DELETE FROM TABLE INDEXTABLE1 WHERE SEGMENT.ID IN(0,1,2)") +sql("REINDEX INDEX TABLE indextable1 ON MAINTABLE WHERE SEGMENT.ID IN (0,1)") Review comment: 1. "we shouldn't allow delete segments on index table itself." please refer to the second last comment right before your comment. If you ever solve production issue, you could not say this. There are thousand of query failed issues just because of SI segment is broken. We need to first delete the broken SI segment then repair it again(last two to three years, countless issues because of SI segment broken or not sync with main table). So please get to know customer, not build software without any knowing about how customer use the software. And please during coding, stand also at maintainer side, implement the feature with more maintainability. Thanks. 2. "And during repair index, if have segment with partial data, we should delete the segment completely(segment folder, segment file, probably tablestatus entry for the segment as well) before proceeding with segment repair." You suggestion of course is right but too more complicate comparing to existing implementation, so please first do the complete analysis and design and then we can discuss and plan next step. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] jack86596 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
jack86596 commented on a change in pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#discussion_r595408209 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ## @@ -841,8 +841,7 @@ public SegmentFile getSegmentFile() { } if (entry.getValue().status.equals(SegmentStatus.SUCCESS.getMessage())) { for (String indexFile : entry.getValue().getFiles()) { -indexFiles.put(location + CarbonCommonConstants.FILE_SEPARATOR + indexFile, -entry.getValue().mergeFileName); +indexFiles.put(location + CarbonCommonConstants.FILE_SEPARATOR + indexFile, null); Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] jack86596 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
jack86596 commented on a change in pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#discussion_r595406480 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestIndexRepair.scala ## @@ -119,6 +119,19 @@ class TestIndexRepair extends QueryTest with BeforeAndAfterAll { sql("drop table if exists maintable") } + test("reindex command with stale files") { +sql("drop table if exists maintable") +sql("CREATE TABLE maintable(a INT, b STRING, c STRING) stored as carbondata") +sql("CREATE INDEX indextable1 on table maintable(c) as 'carbondata'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("DELETE FROM TABLE INDEXTABLE1 WHERE SEGMENT.ID IN(0,1,2)") +sql("REINDEX INDEX TABLE indextable1 ON MAINTABLE WHERE SEGMENT.ID IN (0,1)") Review comment: 1. "we shouldn't allow delete segments on index table itself." please refer to the second last comment right before your comment. 2. "And during repair index, if have segment with partial data, we should delete the segment completely(segment folder, segment file, probably tablestatus entry for the segment as well) before proceeding with segment repair." You suggestion of course is right but too more complicate comparing to existing implementation, so please first do the complete analysis and design and then we can discuss and plan next step. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-4151) When data sampling is done on large data set using Spark's df.sample function - the size of sampled table is not matching with record size of non sampled (Raw Table)
Amaranadh Vayyala created CARBONDATA-4151: - Summary: When data sampling is done on large data set using Spark's df.sample function - the size of sampled table is not matching with record size of non sampled (Raw Table) Key: CARBONDATA-4151 URL: https://issues.apache.org/jira/browse/CARBONDATA-4151 Project: CarbonData Issue Type: Bug Components: core Affects Versions: 2.0.1 Environment: Apache carbondata 2.0.1, spark 2.4.5, hadoop 2.7.2 Reporter: Amaranadh Vayyala Fix For: 2.0.1, 2.1.0 Hi Team, When we are performing 5%, 10% data sampling on large dataset using Spark's df.sample - the size of sampled table is not matching with record size of non sampled (Raw Table). Our Raw table size is around 11 GB, so when we perform 5%, 10% sampling then the sampled table size should come as 550 MB, 1.1 GB. However in our case they are coming as 1.5 GB and 3 GB. Which is 3 times higher than the expected number. Could you please check and help us in understand where is the issue? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4150) Information about indexed datamap
suyash yadav created CARBONDATA-4150: Summary: Information about indexed datamap Key: CARBONDATA-4150 URL: https://issues.apache.org/jira/browse/CARBONDATA-4150 Project: CarbonData Issue Type: Wish Components: core Affects Versions: 2.0.1 Environment: apache 2.0.1 spark 2.4.5 hadoop 2.7.2 Reporter: suyash yadav Fix For: 2.0.1 Hi Team, We would like to know detailed information about indexed datamap and possible use cases for this datamap. So please help us in getting answer to below queries:- 1) What is an indexed datamap and related use cases. 2) how it is to be used, 3) any reference documents -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
VenuReddy2103 commented on a change in pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#discussion_r595256317 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestIndexRepair.scala ## @@ -119,6 +119,19 @@ class TestIndexRepair extends QueryTest with BeforeAndAfterAll { sql("drop table if exists maintable") } + test("reindex command with stale files") { +sql("drop table if exists maintable") +sql("CREATE TABLE maintable(a INT, b STRING, c STRING) stored as carbondata") +sql("CREATE INDEX indextable1 on table maintable(c) as 'carbondata'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("DELETE FROM TABLE INDEXTABLE1 WHERE SEGMENT.ID IN(0,1,2)") +sql("REINDEX INDEX TABLE indextable1 ON MAINTABLE WHERE SEGMENT.ID IN (0,1)") Review comment: Agree with @Indhumathi27. IMHO, we shouldn't allow delete segments on index table itself. And during repair index, if have segment with partial data, we should delete the segment completely(segment folder, segment file, probably tablestatus entry for the segment as well) before proceeding with segment repair. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
VenuReddy2103 commented on a change in pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#discussion_r595256317 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestIndexRepair.scala ## @@ -119,6 +119,19 @@ class TestIndexRepair extends QueryTest with BeforeAndAfterAll { sql("drop table if exists maintable") } + test("reindex command with stale files") { +sql("drop table if exists maintable") +sql("CREATE TABLE maintable(a INT, b STRING, c STRING) stored as carbondata") +sql("CREATE INDEX indextable1 on table maintable(c) as 'carbondata'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("DELETE FROM TABLE INDEXTABLE1 WHERE SEGMENT.ID IN(0,1,2)") +sql("REINDEX INDEX TABLE indextable1 ON MAINTABLE WHERE SEGMENT.ID IN (0,1)") Review comment: Agree with @Indhumathi27. IMHO, we shouldn't allow delete segments on index table itself. And during repair index, if have segment with partial data, we should delete the segment completely(segment folder, segment file, tablestatus entry for the segment) before proceeding with segment repair. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] jack86596 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
jack86596 commented on a change in pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#discussion_r595205788 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ## @@ -854,7 +853,7 @@ public SegmentFile getSegmentFile() { * Gets all index files from this segment * @return */ - public Map getIndexOrMergeFiles() throws IOException { + public Map getIndexAndMergeFiles() throws IOException { Review comment: Current implementation: if (null != mergeFileName) { add merge file } if (null != index file && not empty) { add index file } This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] jack86596 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
jack86596 commented on a change in pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#discussion_r595193924 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ## @@ -854,7 +853,7 @@ public SegmentFile getSegmentFile() { * Gets all index files from this segment * @return */ - public Map getIndexOrMergeFiles() throws IOException { + public Map getIndexAndMergeFiles() throws IOException { Review comment: Or means either, not both, but the implementation shows it will return both index file and merge file, so Or is not correct. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] jack86596 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
jack86596 commented on a change in pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#discussion_r595193924 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ## @@ -854,7 +853,7 @@ public SegmentFile getSegmentFile() { * Gets all index files from this segment * @return */ - public Map getIndexOrMergeFiles() throws IOException { + public Map getIndexAndMergeFiles() throws IOException { Review comment: Or means either, not both, but sometime it will return both index file and merge file, so Or is not correct. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] jack86596 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
jack86596 commented on a change in pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#discussion_r595181102 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ## @@ -854,7 +853,7 @@ public SegmentFile getSegmentFile() { * Gets all index files from this segment * @return */ - public Map getIndexOrMergeFiles() throws IOException { + public Map getIndexAndMergeFiles() throws IOException { Review comment: backward compatibility, carbon 1.3 both index and merge file exist. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
Indhumathi27 commented on a change in pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#discussion_r595180273 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestIndexRepair.scala ## @@ -119,6 +119,19 @@ class TestIndexRepair extends QueryTest with BeforeAndAfterAll { sql("drop table if exists maintable") } + test("reindex command with stale files") { +sql("drop table if exists maintable") +sql("CREATE TABLE maintable(a INT, b STRING, c STRING) stored as carbondata") +sql("CREATE INDEX indextable1 on table maintable(c) as 'carbondata'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("DELETE FROM TABLE INDEXTABLE1 WHERE SEGMENT.ID IN(0,1,2)") +sql("REINDEX INDEX TABLE indextable1 ON MAINTABLE WHERE SEGMENT.ID IN (0,1)") Review comment: ok. keep current behavior This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
Indhumathi27 commented on a change in pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#discussion_r595180273 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestIndexRepair.scala ## @@ -119,6 +119,19 @@ class TestIndexRepair extends QueryTest with BeforeAndAfterAll { sql("drop table if exists maintable") } + test("reindex command with stale files") { +sql("drop table if exists maintable") +sql("CREATE TABLE maintable(a INT, b STRING, c STRING) stored as carbondata") +sql("CREATE INDEX indextable1 on table maintable(c) as 'carbondata'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("DELETE FROM TABLE INDEXTABLE1 WHERE SEGMENT.ID IN(0,1,2)") +sql("REINDEX INDEX TABLE indextable1 ON MAINTABLE WHERE SEGMENT.ID IN (0,1)") Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
VenuReddy2103 commented on a change in pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#discussion_r595178820 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ## @@ -854,7 +853,7 @@ public SegmentFile getSegmentFile() { * Gets all index files from this segment * @return */ - public Map getIndexOrMergeFiles() throws IOException { + public Map getIndexAndMergeFiles() throws IOException { Review comment: I still think getIndexOrMergeFiles name makes sense. Returned map can have either mergeindex or index. Curious to know when it does have both merge index and index in map ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] jack86596 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
jack86596 commented on a change in pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#discussion_r595175202 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ## @@ -841,8 +841,7 @@ public SegmentFile getSegmentFile() { } if (entry.getValue().status.equals(SegmentStatus.SUCCESS.getMessage())) { for (String indexFile : entry.getValue().getFiles()) { -indexFiles.put(location + CarbonCommonConstants.FILE_SEPARATOR + indexFile, -entry.getValue().mergeFileName); +indexFiles.put(location + CarbonCommonConstants.FILE_SEPARATOR + indexFile, null); Review comment: This make sense, will do it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] jack86596 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
jack86596 commented on a change in pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#discussion_r595167863 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestIndexRepair.scala ## @@ -119,6 +119,19 @@ class TestIndexRepair extends QueryTest with BeforeAndAfterAll { sql("drop table if exists maintable") } + test("reindex command with stale files") { +sql("drop table if exists maintable") +sql("CREATE TABLE maintable(a INT, b STRING, c STRING) stored as carbondata") +sql("CREATE INDEX indextable1 on table maintable(c) as 'carbondata'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("DELETE FROM TABLE INDEXTABLE1 WHERE SEGMENT.ID IN(0,1,2)") +sql("REINDEX INDEX TABLE indextable1 ON MAINTABLE WHERE SEGMENT.ID IN (0,1)") Review comment: 1. For example, if both main table and SI table segment status are success, but one of the carbondata file or index file in SI segment is missing or broken, how you fix this if you cannot manually delete this SI segment? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
VenuReddy2103 commented on a change in pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#discussion_r595165241 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ## @@ -841,8 +841,7 @@ public SegmentFile getSegmentFile() { } if (entry.getValue().status.equals(SegmentStatus.SUCCESS.getMessage())) { for (String indexFile : entry.getValue().getFiles()) { -indexFiles.put(location + CarbonCommonConstants.FILE_SEPARATOR + indexFile, -entry.getValue().mergeFileName); +indexFiles.put(location + CarbonCommonConstants.FILE_SEPARATOR + indexFile, null); Review comment: Its caller `SegmentFileStore.getIndexCarbonFiles()`has code to handle for non null value to add merge index file also to index file list(at line 906). It becomes dead code now. You would want to remove that too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
VenuReddy2103 commented on a change in pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#discussion_r595165241 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ## @@ -841,8 +841,7 @@ public SegmentFile getSegmentFile() { } if (entry.getValue().status.equals(SegmentStatus.SUCCESS.getMessage())) { for (String indexFile : entry.getValue().getFiles()) { -indexFiles.put(location + CarbonCommonConstants.FILE_SEPARATOR + indexFile, -entry.getValue().mergeFileName); +indexFiles.put(location + CarbonCommonConstants.FILE_SEPARATOR + indexFile, null); Review comment: Its caller `SegmentFileStore.getIndexCarbonFiles()`has code to handle for non null value to add merge index file also to index file list. It becomes dead code now. You would want to remove that too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org