[jira] [Updated] (CARBONDATA-4133) Concurrent Insert Overwrite with static partition on Index server fails
[ https://issues.apache.org/jira/browse/CARBONDATA-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-4133: - Fix Version/s: (was: 2.2.0) 2.1.1 > Concurrent Insert Overwrite with static partition on Index server fails > --- > > Key: CARBONDATA-4133 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4133 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > [Steps] :- > with Index Server running execute the concurrent insert overwrite with static > partition. > > Set 0: > CREATE TABLE if not exists uniqdata_string(CUST_ID int,CUST_NAME String,DOB > timestamp,DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10),DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) PARTITIONED BY(ACTIVE_EMUI_VERSION string) STORED AS carbondata > TBLPROPERTIES ('TABLE_BLOCKSIZE'= '256 MB'); > Set 1: > LOAD DATA INPATH 'hdfs://hacluster/BabuStore/Data/2000_UniqData.csv' into > table uniqdata_string partition(active_emui_version='abc') > OPTIONS('FILEHEADER'='CUST_ID,CUST_NAME ,ACTIVE_EMUI_VERSION,DOB,DOJ, > BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, > Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE'); > LOAD DATA INPATH 'hdfs://hacluster/datasets/2000_UniqData.csv' into table > uniqdata_string partition(active_emui_version='abc') > OPTIONS('FILEHEADER'='CUST_ID,CUST_NAME ,ACTIVE_EMUI_VERSION,DOB,DOJ, > BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, > Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE'); > Set 2: > CREATE TABLE if not exists uniqdata_hive (CUST_ID int,CUST_NAME > String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, > BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), > DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, > INTEGER_COLUMN1 int)ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; > load data local inpath "/opt/csv/2000_UniqData.csv" into table uniqdata_hive; > Set 3: (concurrent) > insert overwrite table uniqdata_string partition(active_emui_version='abc') > select CUST_ID, CUST_NAME,DOB,doj, bigint_column1, bigint_column2, > decimal_column1, decimal_column2,double_column1, > double_column2,integer_column1 from uniqdata_hive limit 10; > insert overwrite table uniqdata_string partition(active_emui_version='abc') > select CUST_ID, CUST_NAME,DOB,doj, bigint_column1, bigint_column2, > decimal_column1, decimal_column2,double_column1, > double_column2,integer_column1 from uniqdata_hive limit 10; > [Expected Result] :- Insert should be success for timestamp data in Hive > Carbon partition table > > [Actual Issue] : - Concurrent Insert Overwrite with static partition on Index > server fails > [!https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/1/17/c71035/a40a6d6be1434b1db8e8c1c6f5a2e97b/image.png!|https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/1/17/c71035/a40a6d6be1434b1db8e8c1c6f5a2e97b/image.png] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing
CarbonDataQA2 commented on pull request #3988: URL: https://github.com/apache/carbondata/pull/3988#issuecomment-799665489 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3803/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing
CarbonDataQA2 commented on pull request #3988: URL: https://github.com/apache/carbondata/pull/3988#issuecomment-799663865 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5569/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] jack86596 edited a comment on pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
jack86596 edited a comment on pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#issuecomment-799346641 > > > @jack86596 , won't running clean files before doing the reindex operation solve this issue? > > > > > > why before reindex operation, need to run clean files? any dependency relation between reindex and clean files? > > No, there is no need to do clean files before reindex there is no dependency, but we only do reindex when the segments are missing, it is not the case here though. What i meant was that running clean files on si table would solve your problem why I need to run one extra command to solve the problem, why not just solve the problem directly, in this way, no need to run extra command? If in order to solve the problem, you need to run clean files, then this is dependency(The success of reindex command depends on clean files command). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4107: [CARBONDATA-4149] Query with SI after add partition based on location on partition table gives incorrect results
CarbonDataQA2 commented on pull request #4107: URL: https://github.com/apache/carbondata/pull/4107#issuecomment-799453029 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3802/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4107: [CARBONDATA-4149] Query with SI after add partition based on location on partition table gives incorrect results
CarbonDataQA2 commented on pull request #4107: URL: https://github.com/apache/carbondata/pull/4107#issuecomment-799446392 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5568/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG opened a new pull request #4107: [CARBONDATA-4149] Query with SI after add partition based on location on partition table gives incorrect results
ShreelekhyaG opened a new pull request #4107: URL: https://github.com/apache/carbondata/pull/4107 ### Why is this PR needed? Query with SI after add partition based on location on partition table gives incorrect results. While pruning, if it's an external segment, it should use `ExternalSegmentResolver `, and no need to use `ImplicitIncludeFilterExecutor `as an external segment is not added in the SI table. ### What changes were proposed in this PR? Based on `isRelative `path, set `isExternalSegment `value for partition segment. In `getBlockId `method, when segment id is not present, setting from block name. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-4149) Query with SI after add partition based on location on partition table gives incorrect results
SHREELEKHYA GAMPA created CARBONDATA-4149: - Summary: Query with SI after add partition based on location on partition table gives incorrect results Key: CARBONDATA-4149 URL: https://issues.apache.org/jira/browse/CARBONDATA-4149 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA Queries to execute: * drop table if exists partitionTable; * create table partitionTable (id int,name String) partitioned by(email string) stored as carbondata; * insert into partitionTable select 1,'blue','abc'; * CREATE INDEX maintable_si112 on table partitionTable (name) as 'carbondata'; * alter table partitionTable add partition (email='def') location '$sdkWritePath'; * select *from partitionTable where name = 'red'; ---> returns empty result * select *from partitionTable where ni(name = 'red'); * alter table partitionTable compact 'major'; * select *from partitionTable where name = 'red'; spark-sql> create table partitionTable (id int,name String) partitioned by(email string) STORED AS carbondata; Time taken: 1.962 seconds spark-sql> CREATE INDEX maintable_si112 on table partitionTable (name) as 'carbondata'; Time taken: 2.759 seconds spark-sql> insert into partitionTable select 1,'huawei','abc'; 0 Time taken: 5.808 seconds, Fetched 1 row(s) spark-sql> alter table partitionTable add partition (email='def') location 'hdfs://hacluster/datastore'; Time taken: 1.108 seconds spark-sql> insert into partitionTable select 1,'huawei','def'; 1 Time taken: 2.707 seconds, Fetched 1 row(s) spark-sql> select *from partitionTable where name='huawei'; 1 huawei abc Time taken: 0.75 seconds, Fetched 1 row(s) spark-sql> select *from partitionTable where ni(name='huawei'); 1 huawei def 1 huawei abc Time taken: 0.507 seconds, Fetched 2 row(s) spark-sql> -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] jack86596 commented on pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
jack86596 commented on pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#issuecomment-799346641 > > > @jack86596 , won't running clean files before doing the reindex operation solve this issue? > > > > > > why before reindex operation, need to run clean files? any dependency relation between reindex and clean files? > > No, there is no need to do clean files before reindex there is no dependency, but we only do reindex when the segments are missing, it is not the case here though. What i meant was that running clean files on si table would solve your problem why I need to run one extra command to solve the problem, why not just solve the problem directly, in this way, no need to run extra command? If in order to solve the problem, you need to run clean files, then this is dependency. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (CARBONDATA-4110) Support clean files dry run and show statistics after clean files operation
[ https://issues.apache.org/jira/browse/CARBONDATA-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301587#comment-17301587 ] Akash R Nilugal commented on CARBONDATA-4110: - https://github.com/apache/carbondata/pull/4072 > Support clean files dry run and show statistics after clean files operation > --- > > Key: CARBONDATA-4110 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4110 > Project: CarbonData > Issue Type: New Feature >Reporter: Vikram Ahuja >Priority: Minor > Time Spent: 26h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4110) Support clean files dry run and show statistics after clean files operation
[ https://issues.apache.org/jira/browse/CARBONDATA-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal resolved CARBONDATA-4110. - Fix Version/s: 2.2.0 Resolution: Fixed > Support clean files dry run and show statistics after clean files operation > --- > > Key: CARBONDATA-4110 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4110 > Project: CarbonData > Issue Type: New Feature >Reporter: Vikram Ahuja >Priority: Minor > Fix For: 2.2.0 > > Time Spent: 26h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-4110) Support clean files dry run and show statistics after clean files operation
[ https://issues.apache.org/jira/browse/CARBONDATA-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301586#comment-17301586 ] Akash R Nilugal commented on CARBONDATA-4110: - Why is this PR needed? Currently in the clean files operation the user does not know how much space will be freed. The idea is the add support for dry run in clean files which can tell the user how much space will be freed in the clean files operation without cleaning the actual data. What changes were proposed in this PR? This PR has the following changes: Support dry run in clean files: It will show the user how much space will be freed by the clean files operation and how much space left (which can be released after expiration time) after the clean files operation. Clean files output: Total size released during the clean files operation Disable clean files Statistics option in case the user does not want clean files statistics Clean files log: To enhance the clean files log to print the name of every file that is being deleted in the info log. > Support clean files dry run and show statistics after clean files operation > --- > > Key: CARBONDATA-4110 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4110 > Project: CarbonData > Issue Type: New Feature >Reporter: Vikram Ahuja >Priority: Minor > Time Spent: 26h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
asfgit closed pull request #4072: URL: https://github.com/apache/carbondata/pull/4072 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
vikramahuja1001 commented on pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#issuecomment-799320505 > > @jack86596 , won't running clean files before doing the reindex operation solve this issue? > > why before reindex operation, need to run clean files? any dependency relation between reindex and clean files? No, there is no need to do clean files before reindex there is no dependency, but we only do reindex when the segments are missing, it is not the case here though. What i meant was that running clean files on si table would solve your problem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing
CarbonDataQA2 commented on pull request #3988: URL: https://github.com/apache/carbondata/pull/3988#issuecomment-799257753 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5567/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing
CarbonDataQA2 commented on pull request #3988: URL: https://github.com/apache/carbondata/pull/3988#issuecomment-799257028 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3801/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] jack86596 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
jack86596 commented on a change in pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#discussion_r594143885 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestIndexRepair.scala ## @@ -119,6 +119,19 @@ class TestIndexRepair extends QueryTest with BeforeAndAfterAll { sql("drop table if exists maintable") } + test("reindex command with stale files") { +sql("drop table if exists maintable") +sql("CREATE TABLE maintable(a INT, b STRING, c STRING) stored as carbondata") +sql("CREATE INDEX indextable1 on table maintable(c) as 'carbondata'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("DELETE FROM TABLE INDEXTABLE1 WHERE SEGMENT.ID IN(0,1,2)") +sql("REINDEX INDEX TABLE indextable1 ON MAINTABLE WHERE SEGMENT.ID IN (0,1)") Review comment: So please provide a better solution to solve this issue: main table segment (success), SI table segment (marked for delete). The solution should not be "run clean files for SI table first", because it is not better than current one. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] jack86596 edited a comment on pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
jack86596 edited a comment on pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#issuecomment-799227509 > @jack86596 , won't running clean files before doing the reindex operation solve this issue? why before reindex operation, need to run clean files? any dependency relation between reindex and clean files? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] jack86596 commented on pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
jack86596 commented on pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#issuecomment-799227509 > @jack86596 , won't running clean files before doing the reindex operation solve this issue? why before reindex operation, need to run clean files? and dependency relation between reindex and clean files? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
vikramahuja1001 commented on a change in pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#discussion_r594123727 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestIndexRepair.scala ## @@ -119,6 +119,19 @@ class TestIndexRepair extends QueryTest with BeforeAndAfterAll { sql("drop table if exists maintable") } + test("reindex command with stale files") { +sql("drop table if exists maintable") +sql("CREATE TABLE maintable(a INT, b STRING, c STRING) stored as carbondata") +sql("CREATE INDEX indextable1 on table maintable(c) as 'carbondata'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("DELETE FROM TABLE INDEXTABLE1 WHERE SEGMENT.ID IN(0,1,2)") +sql("REINDEX INDEX TABLE indextable1 ON MAINTABLE WHERE SEGMENT.ID IN (0,1)") Review comment: technically speaking reindex should not be allowed at this point, because the segment still exists(even if it is MFD), the purpose of reindex command is to reload missing segments which is not the case here. Also, will this same behaviour also happen for other segment status, say success or partially success? @akashrn5 can give some input maybe This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on pull request #4105: [CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
vikramahuja1001 commented on pull request #4105: URL: https://github.com/apache/carbondata/pull/4105#issuecomment-799208880 @jack86596 , won't running clean files before doing the reindex operation solve this issue? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
CarbonDataQA2 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-799185057 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3800/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
CarbonDataQA2 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-799180625 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5566/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing
ShreelekhyaG commented on pull request #3988: URL: https://github.com/apache/carbondata/pull/3988#issuecomment-799169566 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org