[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver
ShreelekhyaG commented on a change in pull request #4080: URL: https://github.com/apache/carbondata/pull/4080#discussion_r569160839 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithAddSegment.scala ## @@ -86,8 +86,8 @@ class TestSIWithAddSegment extends QueryTest with BeforeAndAfterAll { sql(s"alter table maintable1 add segment options('path'='${ newSegmentPath }', " + s"'format'='carbon')") sql("CREATE INDEX maintable1_si on table maintable1 (c) as 'carbondata'") -assert(sql("show segments for table maintable1_si").collect().length == - sql("show segments for table maintable1").collect().length) +assert(sql("show segments for table maintable1_si").collect().length == 2) +assert(sql("show segments for table maintable1").collect().length == 3) Review comment: Disabled SI table after alter add load and added a check to verify in test cases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver
ShreelekhyaG commented on a change in pull request #4080: URL: https://github.com/apache/carbondata/pull/4080#discussion_r569160528 ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/ExtendedBlocklet.java ## @@ -221,7 +223,13 @@ public void deserializeFields(DataInput in, String[] locations, String tablePath indexUniqueId = in.readUTF(); } String filePath = getPath(); -if (filePath.startsWith(File.separator)) { +boolean isLocalFile = FileFactory.getCarbonFile(filePath) instanceof LocalCarbonFile; +// If it is external segment path, table path need not be appended to filePath +// Example filepath: hdfs://hacluster/opt/newsegmentpath/ +// filePath value would start with hdfs:// or s3:// . If it is local +// ubuntu storage, it starts with File separator, so check if given path exists or not. +if ((!isLocalFile && filePath.startsWith(File.separator)) || (isLocalFile && !FileFactory Review comment: ok done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
vikramahuja1001 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-772241927 @QiangCai @ajantha-bhat @akashrn5 please review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #4083: [CARBONDATA-4112] Data mismatch issue in SI.
ajantha-bhat edited a comment on pull request #4083: URL: https://github.com/apache/carbondata/pull/4083#issuecomment-771551095 The title of the PR can be more specific like "**Data mismatch issue in SI global sort merge scenario**" @Karan980 : From next time consider this point. I have changed it while merging now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #4083: [CARBONDATA-4112] Data mismatch issue in SI.
ajantha-bhat commented on pull request #4083: URL: https://github.com/apache/carbondata/pull/4083#issuecomment-771548605 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4076: [CARBONDATA-4107] Added related MV tables Map to fact table and added lock while touchMDTFile
CarbonDataQA2 commented on pull request #4076: URL: https://github.com/apache/carbondata/pull/4076#issuecomment-771646152 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5412/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] asfgit closed pull request #4083: [CARBONDATA-4112] Data mismatch issue in SI global sort merge scenario.
asfgit closed pull request #4083: URL: https://github.com/apache/carbondata/pull/4083 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.
kunal642 commented on a change in pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#discussion_r568530696 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala ## @@ -294,6 +297,49 @@ case class CarbonAddLoadCommand( OperationListenerBus.getInstance().fireEvent(loadTablePreStatusUpdateEvent, operationContext) } +val deltaFiles = FileFactory.getCarbonFile(segmentPath).listFiles() + .filter(_.getName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) +if (deltaFiles.length > 0) { + val blockNameToDeltaFilesMap = +collection.mutable.Map[String, collection.mutable.ListBuffer[(CarbonFile, String)]]() + deltaFiles.foreach { deltaFile => +val tmpDeltaFilePath = deltaFile.getAbsolutePath + .replace(CarbonCommonConstants.WINDOWS_FILE_SEPARATOR, +CarbonCommonConstants.FILE_SEPARATOR) +val deltaFilePathElements = tmpDeltaFilePath.split(CarbonCommonConstants.FILE_SEPARATOR) +if (deltaFilePathElements != null && deltaFilePathElements.nonEmpty) { + val deltaFileName = deltaFilePathElements(deltaFilePathElements.length - 1) + val blockName = CarbonTablePath.DataFileUtil +.getBlockNameFromDeleteDeltaFile(deltaFileName) + if (blockNameToDeltaFilesMap.contains(blockName)) { +blockNameToDeltaFilesMap(blockName) += ((deltaFile, deltaFileName)) + } else { +val deltaFileList = new ListBuffer[(CarbonFile, String)]() +deltaFileList += ((deltaFile, deltaFileName)) +blockNameToDeltaFilesMap.put(blockName, deltaFileList) + } +} + } + val segmentUpdateDetails = new util.ArrayList[SegmentUpdateDetails]() + val columnCompressor = CompressorFactory.getInstance.getCompressor.getName + blockNameToDeltaFilesMap.foreach { entry => +val segmentUpdateDetail = new SegmentUpdateDetails() +segmentUpdateDetail.setBlockName(entry._1) +segmentUpdateDetail.setActualBlockName( + entry._1 + CarbonCommonConstants.POINT + columnCompressor + +CarbonCommonConstants.FACT_FILE_EXT) +segmentUpdateDetail.setSegmentName(model.getSegmentId) +setMinMaxDeltaStampAndDeletedRowCount(entry._2, segmentUpdateDetail) +segmentUpdateDetails.add(segmentUpdateDetail) + } + val timestamp = System.currentTimeMillis().toString + val segmentDetails = new util.HashSet[Segment]() + segmentDetails.add(model.getSegment) + CarbonUpdateUtil.updateSegmentStatus(segmentUpdateDetails, carbonTable, timestamp, false) Review comment: can we pass a check like forcewrite in the updateSegmentStatus to avoid the validation of the segment from tablestaus file.. this flag would be true in addload command when delete delta is present. This way you can avoid writing twice. ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala ## @@ -369,5 +426,64 @@ case class CarbonAddLoadCommand( } } + /** + * If there are more than one deleteDelta File present for a block. Then This method + * will pick the deltaFile with highest timestamp, because the default threshold for horizontal + * compaction is 1. It is assumed that threshold for horizontal compaction is not changed from + * default value. So there will always be only one valid delete delta file present for a block. + * It also sets the number of deleted rows for a segment. + */ + def setValidDeltaFileAndDeletedRowCount( + deleteDeltaFiles : ListBuffer[(CarbonFile, String)], + segmentUpdateDetails : SegmentUpdateDetails + ) : Unit = { +var maxDeltaStamp : Long = -1 +var deletedRowsCount : Long = 0 +var validDeltaFile : CarbonFile = null +deleteDeltaFiles.foreach { deltaFile => + val currentFileTimestamp = CarbonTablePath.DataFileUtil +.getTimeStampFromDeleteDeltaFile(deltaFile._2) + if (currentFileTimestamp.toLong > maxDeltaStamp) { +maxDeltaStamp = currentFileTimestamp.toLong +validDeltaFile = deltaFile._1 + } +} +val blockDetails = + new CarbonDeleteDeltaFileReaderImpl(validDeltaFile.getAbsolutePath).readJson() +blockDetails.getBlockletDetails.asScala.foreach { blocklet => + deletedRowsCount = deletedRowsCount + blocklet.getDeletedRows.size() +} +segmentUpdateDetails.setDeleteDeltaStartTimestamp(maxDeltaStamp.toString) +segmentUpdateDetails.setDeleteDeltaEndTimestamp(maxDeltaStamp.toString) +segmentUpdateDetails.setDeletedRowsInBlock(deletedRowsCount.toString) + } + + /** + * As horizontal compaction not supported for SDK segments. So all delta files are valid + */ + def readAllDeltaFiles( + deleteDeltaFiles : ListBuffer[(CarbonFile, String)], + segmentUpdateDetails
[GitHub] [carbondata] nihal0107 edited a comment on pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID
nihal0107 edited a comment on pull request #4086: URL: https://github.com/apache/carbondata/pull/4086#issuecomment-771464559 > > If we enable the property `ENABLE_AUTO_LOAD_MERGE` then which segment id are we planning to show, the segment generated after compaction or before compaction? Better to add a test case for that scenario also. > > > If we enable the property `ENABLE_AUTO_LOAD_MERGE` then which segment id are we planning to show, the segment generated after compaction or before compaction? Better to add a test case for that scenario also. > > If we enable 'AUTO_LOAD_MERGE',then we return and show the segment id before compaction since the user would focus on his load operation. Test case has been added. Please review. We are showing the segment id because if we need to query on specific segment then this feature will be helpful. But if we will show the segment id before compaction and will query on specific segment then the operation will fail. Better to take the opinion in community. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID
ajantha-bhat commented on a change in pull request #4086: URL: https://github.com/apache/carbondata/pull/4086#discussion_r568528970 ## File path: integration/spark/src/test/scala/org/apache/spark/util/CarbonCommandSuite.scala ## @@ -82,6 +83,43 @@ class CarbonCommandSuite extends QueryTest with BeforeAndAfterAll { """.stripMargin) } + protected def createTestTable(tableName: String): Unit = { +sql( + s""" Review comment: Instead of adding new testcase , In one of the existing testcases of **loading, insert into, partition table loading, partition table insert into**, just add a validation for segment id ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertIntoCommand.scala ## @@ -276,7 +280,15 @@ case class CarbonInsertIntoCommand(databaseNameOp: Option[String], } throw ex } -Seq.empty +if(loadResultForReturn!=null && loadResultForReturn.getLoadName!=null) { + Seq(Row(loadResultForReturn.getLoadName)) +} else { + rowsForReturn Review comment: why are you returning number of rows instead of segment id here ? when will the code enter here when load is success ? can you add some comment ? ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertIntoCommand.scala ## @@ -276,7 +280,15 @@ case class CarbonInsertIntoCommand(databaseNameOp: Option[String], } throw ex } -Seq.empty +if(loadResultForReturn!=null && loadResultForReturn.getLoadName!=null) { Review comment: I think code is not formatted, we follow space after if and != ## File path: integration/spark/src/test/scala/org/apache/spark/util/CarbonCommandSuite.scala ## @@ -100,6 +138,56 @@ class CarbonCommandSuite extends QueryTest with BeforeAndAfterAll { private lazy val location = CarbonProperties.getStorePath() + test("Return segment ID after load and insert") { +val tableName = "test_table" +val inputTableName = "csv_table" +val inputPath = s"$resourcesPath/data_alltypes.csv" +dropTable(tableName) +dropTable(inputTableName) +createAndLoadInputTable(inputTableName, inputPath) +createTestTable(tableName) +checkAnswer(sql( + s""" + | INSERT INTO TABLE $tableName + | SELECT shortField, intField, bigintField, doubleField, stringField, + | from_unixtime(unix_timestamp(timestampField,'/M/dd')) timestampField, decimalField, + | cast(to_date(from_unixtime(unix_timestamp(dateField,'/M/dd'))) as date), charField + | FROM $inputTableName + """.stripMargin), Seq(Row("0"))) +checkAnswer(sql( + s"LOAD DATA LOCAL INPATH '$inputPath'" + + s" INTO TABLE $tableName" + + " OPTIONS('FILEHEADER'=" + + "'shortField,intField,bigintField,doubleField,stringField," + + "timestampField,decimalField,dateField,charField')"), Seq(Row("1"))) Review comment: possible to return text like "Successfully loaded to segment id : 1" instead of returning just "1" ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala ## @@ -191,7 +196,15 @@ case class CarbonLoadDataCommand(databaseNameOp: Option[String], throw ex } } -Seq.empty +if(loadResultForReturn!=null && loadResultForReturn.getLoadName!=null) { Review comment: @QiangCai, @ydvpankaj99 : why our checkstyle is not catching this format issues ? ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala ## @@ -191,7 +196,15 @@ case class CarbonLoadDataCommand(databaseNameOp: Option[String], throw ex } } -Seq.empty +if(loadResultForReturn!=null && loadResultForReturn.getLoadName!=null) { Review comment: @QiangCai, @ydvpankaj99 : why our checkstyle is not catching this format issues ? ## File path: integration/spark/src/test/scala/org/apache/spark/util/CarbonCommandSuite.scala ## @@ -100,6 +138,56 @@ class CarbonCommandSuite extends QueryTest with BeforeAndAfterAll { private lazy val location = CarbonProperties.getStorePath() + test("Return segment ID after load and insert") { +val tableName = "test_table" +val inputTableName = "csv_table" +val inputPath = s"$resourcesPath/data_alltypes.csv" +dropTable(tableName) +dropTable(inputTableName) +createAndLoadInputTable(inputTableName, inputPath) +createTestTable(tableName) +checkAnswer(sql( + s""" + | INSERT INTO TABLE $tableName + | SELECT shortField, intField, bigintField, doubleField, stringField, + | from_unixtime(unix_timestamp(timestampField,'/M/dd')) timestampField, decimalField, +
[GitHub] [carbondata] ajantha-bhat commented on pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID
ajantha-bhat commented on pull request #4086: URL: https://github.com/apache/carbondata/pull/4086#issuecomment-771560715 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] asfgit closed pull request #4084: [CARBONDATA-4113] Partition prune and cache fix when carbon.read.partition.hive.direct is disabled.
asfgit closed pull request #4084: URL: https://github.com/apache/carbondata/pull/4084 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on pull request #4083: [CARBONDATA-4112] Data mismatch issue in SI global sort merge scenario.
Karan980 commented on pull request #4083: URL: https://github.com/apache/carbondata/pull/4083#issuecomment-771551639 > The title of the PR can be more specific like "**Data mismatch issue in SI global sort merge scenario**" Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4076: [CARBONDATA-4107] Added related MV tables Map to fact table and added lock while touchMDTFile
akashrn5 commented on a change in pull request #4076: URL: https://github.com/apache/carbondata/pull/4076#discussion_r568501151 ## File path: docs/mv-guide.md ## @@ -241,6 +242,10 @@ The current information includes: | Refresh Mode | FULL / INCREMENTAL refresh to MV | | Refresh Trigger Mode | ON_COMMIT / ON_MANUAL refresh to MV provided by user | | Properties | Table properties of the materialized view | + +**NOTE**: For materialized views created +before [CARBONDATA-4107](https://issues.apache.org/jira/browse/CARBONDATA-4107) issue fix, run +refresh mv command to add mv name to fact table property and to enable it. Review comment: also please add what happens if user doesn't use refresh This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on pull request #4084: [CARBONDATA-4113] Partition prune and cache fix when carbon.read.partition.hive.direct is disabled.
ShreelekhyaG commented on pull request #4084: URL: https://github.com/apache/carbondata/pull/4084#issuecomment-771481950 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver
akashrn5 commented on a change in pull request #4080: URL: https://github.com/apache/carbondata/pull/4080#discussion_r568509426 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithAddSegment.scala ## @@ -86,8 +86,8 @@ class TestSIWithAddSegment extends QueryTest with BeforeAndAfterAll { sql(s"alter table maintable1 add segment options('path'='${ newSegmentPath }', " + s"'format'='carbon')") sql("CREATE INDEX maintable1_si on table maintable1 (c) as 'carbondata'") -assert(sql("show segments for table maintable1_si").collect().length == - sql("show segments for table maintable1").collect().length) +assert(sql("show segments for table maintable1_si").collect().length == 2) +assert(sql("show segments for table maintable1").collect().length == 3) Review comment: also have an assert of checking SI table is disabled and query doesn't hit SI ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/ExtendedBlocklet.java ## @@ -221,7 +223,13 @@ public void deserializeFields(DataInput in, String[] locations, String tablePath indexUniqueId = in.readUTF(); } String filePath = getPath(); -if (filePath.startsWith(File.separator)) { +boolean isLocalFile = FileFactory.getCarbonFile(filePath) instanceof LocalCarbonFile; +// If it is external segment path, table path need not be appended to filePath +// Example filepath: hdfs://hacluster/opt/newsegmentpath/ +// filePath value would start with hdfs:// or s3:// . If it is local +// ubuntu storage, it starts with File separator, so check if given path exists or not. +if ((!isLocalFile && filePath.startsWith(File.separator)) || (isLocalFile && !FileFactory Review comment: the comment is not clear, please rewrite it with better example and scenarios This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver
CarbonDataQA2 commented on pull request #4080: URL: https://github.com/apache/carbondata/pull/4080#issuecomment-771878716 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] areyouokfreejoe closed pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID
areyouokfreejoe closed pull request #4086: URL: https://github.com/apache/carbondata/pull/4086 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4084: [CARBONDATA-4113] Partition prune and cache fix when carbon.read.partition.hive.direct is disabled.
CarbonDataQA2 commented on pull request #4084: URL: https://github.com/apache/carbondata/pull/4084#issuecomment-771459205 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #4084: [CARBONDATA-4113] Partition prune and cache fix when carbon.read.partition.hive.direct is disabled.
kunal642 commented on pull request #4084: URL: https://github.com/apache/carbondata/pull/4084#issuecomment-771412881 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] areyouokfreejoe commented on pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID
areyouokfreejoe commented on pull request #4086: URL: https://github.com/apache/carbondata/pull/4086#issuecomment-771456273 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID
CarbonDataQA2 commented on pull request #4086: URL: https://github.com/apache/carbondata/pull/4086#issuecomment-771318129 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ydvpankaj99 commented on pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.
ydvpankaj99 commented on pull request #4071: URL: https://github.com/apache/carbondata/pull/4071#issuecomment-771448246 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.
CarbonDataQA2 commented on pull request #4071: URL: https://github.com/apache/carbondata/pull/4071#issuecomment-771387700 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID
nihal0107 commented on pull request #4086: URL: https://github.com/apache/carbondata/pull/4086#issuecomment-771385539 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.
nihal0107 commented on pull request #4071: URL: https://github.com/apache/carbondata/pull/4071#issuecomment-771354445 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver
CarbonDataQA2 commented on pull request #4080: URL: https://github.com/apache/carbondata/pull/4080#issuecomment-771881699 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3652/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver
CarbonDataQA2 commented on pull request #4080: URL: https://github.com/apache/carbondata/pull/4080#issuecomment-771878716 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5413/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4076: [CARBONDATA-4107] Added related MV tables Map to fact table and added lock while touchMDTFile
CarbonDataQA2 commented on pull request #4076: URL: https://github.com/apache/carbondata/pull/4076#issuecomment-771647871 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3651/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4076: [CARBONDATA-4107] Added related MV tables Map to fact table and added lock while touchMDTFile
CarbonDataQA2 commented on pull request #4076: URL: https://github.com/apache/carbondata/pull/4076#issuecomment-771646152 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5412/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.
CarbonDataQA2 commented on pull request #4071: URL: https://github.com/apache/carbondata/pull/4071#issuecomment-771619559 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3650/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID
CarbonDataQA2 commented on pull request #4086: URL: https://github.com/apache/carbondata/pull/4086#issuecomment-771616575 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5409/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.
CarbonDataQA2 commented on pull request #4071: URL: https://github.com/apache/carbondata/pull/4071#issuecomment-771610225 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5410/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID
CarbonDataQA2 commented on pull request #4086: URL: https://github.com/apache/carbondata/pull/4086#issuecomment-771609869 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3649/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID
ajantha-bhat commented on a change in pull request #4086: URL: https://github.com/apache/carbondata/pull/4086#discussion_r568524277 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala ## @@ -191,7 +196,15 @@ case class CarbonLoadDataCommand(databaseNameOp: Option[String], throw ex } } -Seq.empty +if(loadResultForReturn!=null && loadResultForReturn.getLoadName!=null) { Review comment: @QiangCai, @ydvpankaj99 : why our checkstyle is not catching this format issues ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.
kunal642 commented on a change in pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#discussion_r568534198 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala ## @@ -294,6 +297,49 @@ case class CarbonAddLoadCommand( OperationListenerBus.getInstance().fireEvent(loadTablePreStatusUpdateEvent, operationContext) } +val deltaFiles = FileFactory.getCarbonFile(segmentPath).listFiles() Review comment: Better to use CarbonFileFilter to list only the delete delta files This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.
kunal642 commented on a change in pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#discussion_r568531096 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala ## @@ -369,5 +426,64 @@ case class CarbonAddLoadCommand( } } + /** + * If there are more than one deleteDelta File present for a block. Then This method + * will pick the deltaFile with highest timestamp, because the default threshold for horizontal + * compaction is 1. It is assumed that threshold for horizontal compaction is not changed from + * default value. So there will always be only one valid delete delta file present for a block. + * It also sets the number of deleted rows for a segment. + */ + def setValidDeltaFileAndDeletedRowCount( + deleteDeltaFiles : ListBuffer[(CarbonFile, String)], + segmentUpdateDetails : SegmentUpdateDetails + ) : Unit = { +var maxDeltaStamp : Long = -1 +var deletedRowsCount : Long = 0 +var validDeltaFile : CarbonFile = null +deleteDeltaFiles.foreach { deltaFile => + val currentFileTimestamp = CarbonTablePath.DataFileUtil +.getTimeStampFromDeleteDeltaFile(deltaFile._2) + if (currentFileTimestamp.toLong > maxDeltaStamp) { +maxDeltaStamp = currentFileTimestamp.toLong +validDeltaFile = deltaFile._1 + } +} +val blockDetails = + new CarbonDeleteDeltaFileReaderImpl(validDeltaFile.getAbsolutePath).readJson() +blockDetails.getBlockletDetails.asScala.foreach { blocklet => + deletedRowsCount = deletedRowsCount + blocklet.getDeletedRows.size() +} +segmentUpdateDetails.setDeleteDeltaStartTimestamp(maxDeltaStamp.toString) +segmentUpdateDetails.setDeleteDeltaEndTimestamp(maxDeltaStamp.toString) +segmentUpdateDetails.setDeletedRowsInBlock(deletedRowsCount.toString) + } + + /** + * As horizontal compaction not supported for SDK segments. So all delta files are valid + */ + def readAllDeltaFiles( + deleteDeltaFiles : ListBuffer[(CarbonFile, String)], + segmentUpdateDetails : SegmentUpdateDetails + ) : Unit = { Review comment: please fix this formatting.. move to above line. Check other code for the same as well This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.
kunal642 commented on a change in pull request #4070: URL: https://github.com/apache/carbondata/pull/4070#discussion_r568530696 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala ## @@ -294,6 +297,49 @@ case class CarbonAddLoadCommand( OperationListenerBus.getInstance().fireEvent(loadTablePreStatusUpdateEvent, operationContext) } +val deltaFiles = FileFactory.getCarbonFile(segmentPath).listFiles() + .filter(_.getName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) +if (deltaFiles.length > 0) { + val blockNameToDeltaFilesMap = +collection.mutable.Map[String, collection.mutable.ListBuffer[(CarbonFile, String)]]() + deltaFiles.foreach { deltaFile => +val tmpDeltaFilePath = deltaFile.getAbsolutePath + .replace(CarbonCommonConstants.WINDOWS_FILE_SEPARATOR, +CarbonCommonConstants.FILE_SEPARATOR) +val deltaFilePathElements = tmpDeltaFilePath.split(CarbonCommonConstants.FILE_SEPARATOR) +if (deltaFilePathElements != null && deltaFilePathElements.nonEmpty) { + val deltaFileName = deltaFilePathElements(deltaFilePathElements.length - 1) + val blockName = CarbonTablePath.DataFileUtil +.getBlockNameFromDeleteDeltaFile(deltaFileName) + if (blockNameToDeltaFilesMap.contains(blockName)) { +blockNameToDeltaFilesMap(blockName) += ((deltaFile, deltaFileName)) + } else { +val deltaFileList = new ListBuffer[(CarbonFile, String)]() +deltaFileList += ((deltaFile, deltaFileName)) +blockNameToDeltaFilesMap.put(blockName, deltaFileList) + } +} + } + val segmentUpdateDetails = new util.ArrayList[SegmentUpdateDetails]() + val columnCompressor = CompressorFactory.getInstance.getCompressor.getName + blockNameToDeltaFilesMap.foreach { entry => +val segmentUpdateDetail = new SegmentUpdateDetails() +segmentUpdateDetail.setBlockName(entry._1) +segmentUpdateDetail.setActualBlockName( + entry._1 + CarbonCommonConstants.POINT + columnCompressor + +CarbonCommonConstants.FACT_FILE_EXT) +segmentUpdateDetail.setSegmentName(model.getSegmentId) +setMinMaxDeltaStampAndDeletedRowCount(entry._2, segmentUpdateDetail) +segmentUpdateDetails.add(segmentUpdateDetail) + } + val timestamp = System.currentTimeMillis().toString + val segmentDetails = new util.HashSet[Segment]() + segmentDetails.add(model.getSegment) + CarbonUpdateUtil.updateSegmentStatus(segmentUpdateDetails, carbonTable, timestamp, false) Review comment: can we pass a check like forcewrite in the updateSegmentStatus to avoid the validation of the segment from tablestaus file.. this flag would be true in addload command when delete delta is present. This way you can avoid writing twice. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID
ajantha-bhat commented on a change in pull request #4086: URL: https://github.com/apache/carbondata/pull/4086#discussion_r568528970 ## File path: integration/spark/src/test/scala/org/apache/spark/util/CarbonCommandSuite.scala ## @@ -82,6 +83,43 @@ class CarbonCommandSuite extends QueryTest with BeforeAndAfterAll { """.stripMargin) } + protected def createTestTable(tableName: String): Unit = { +sql( + s""" Review comment: Instead of adding new testcase , In one of the existing testcases of **loading, insert into, partition table loading, partition table insert into**, just add a validation for segment id ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertIntoCommand.scala ## @@ -276,7 +280,15 @@ case class CarbonInsertIntoCommand(databaseNameOp: Option[String], } throw ex } -Seq.empty +if(loadResultForReturn!=null && loadResultForReturn.getLoadName!=null) { + Seq(Row(loadResultForReturn.getLoadName)) +} else { + rowsForReturn Review comment: why are you returning number of rows instead of segment id here ? when will the code enter here when load is success ? can you add some comment ? ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertIntoCommand.scala ## @@ -276,7 +280,15 @@ case class CarbonInsertIntoCommand(databaseNameOp: Option[String], } throw ex } -Seq.empty +if(loadResultForReturn!=null && loadResultForReturn.getLoadName!=null) { Review comment: I think code is not formatted, we follow space after if and != ## File path: integration/spark/src/test/scala/org/apache/spark/util/CarbonCommandSuite.scala ## @@ -100,6 +138,56 @@ class CarbonCommandSuite extends QueryTest with BeforeAndAfterAll { private lazy val location = CarbonProperties.getStorePath() + test("Return segment ID after load and insert") { +val tableName = "test_table" +val inputTableName = "csv_table" +val inputPath = s"$resourcesPath/data_alltypes.csv" +dropTable(tableName) +dropTable(inputTableName) +createAndLoadInputTable(inputTableName, inputPath) +createTestTable(tableName) +checkAnswer(sql( + s""" + | INSERT INTO TABLE $tableName + | SELECT shortField, intField, bigintField, doubleField, stringField, + | from_unixtime(unix_timestamp(timestampField,'/M/dd')) timestampField, decimalField, + | cast(to_date(from_unixtime(unix_timestamp(dateField,'/M/dd'))) as date), charField + | FROM $inputTableName + """.stripMargin), Seq(Row("0"))) +checkAnswer(sql( + s"LOAD DATA LOCAL INPATH '$inputPath'" + + s" INTO TABLE $tableName" + + " OPTIONS('FILEHEADER'=" + + "'shortField,intField,bigintField,doubleField,stringField," + + "timestampField,decimalField,dateField,charField')"), Seq(Row("1"))) Review comment: possible to return text like "Successfully loaded to segment id : 1" instead of returning just "1" ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala ## @@ -191,7 +196,15 @@ case class CarbonLoadDataCommand(databaseNameOp: Option[String], throw ex } } -Seq.empty +if(loadResultForReturn!=null && loadResultForReturn.getLoadName!=null) { Review comment: @QiangCai, @ydvpankaj99 : why our checkstyle is not catching this format issues ? ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala ## @@ -191,7 +196,15 @@ case class CarbonLoadDataCommand(databaseNameOp: Option[String], throw ex } } -Seq.empty +if(loadResultForReturn!=null && loadResultForReturn.getLoadName!=null) { Review comment: @QiangCai, @ydvpankaj99 : why our checkstyle is not catching this format issues ? ## File path: integration/spark/src/test/scala/org/apache/spark/util/CarbonCommandSuite.scala ## @@ -100,6 +138,56 @@ class CarbonCommandSuite extends QueryTest with BeforeAndAfterAll { private lazy val location = CarbonProperties.getStorePath() + test("Return segment ID after load and insert") { +val tableName = "test_table" +val inputTableName = "csv_table" +val inputPath = s"$resourcesPath/data_alltypes.csv" +dropTable(tableName) +dropTable(inputTableName) +createAndLoadInputTable(inputTableName, inputPath) +createTestTable(tableName) +checkAnswer(sql( + s""" + | INSERT INTO TABLE $tableName + | SELECT shortField, intField, bigintField, doubleField, stringField, + | from_unixtime(unix_timestamp(timestampField,'/M/dd')) timestampField, decimalField, +
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4080: [CARBONDATA-4111] Filter query having invalid results after add segment to table having SI with Indexserver
akashrn5 commented on a change in pull request #4080: URL: https://github.com/apache/carbondata/pull/4080#discussion_r568509426 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithAddSegment.scala ## @@ -86,8 +86,8 @@ class TestSIWithAddSegment extends QueryTest with BeforeAndAfterAll { sql(s"alter table maintable1 add segment options('path'='${ newSegmentPath }', " + s"'format'='carbon')") sql("CREATE INDEX maintable1_si on table maintable1 (c) as 'carbondata'") -assert(sql("show segments for table maintable1_si").collect().length == - sql("show segments for table maintable1").collect().length) +assert(sql("show segments for table maintable1_si").collect().length == 2) +assert(sql("show segments for table maintable1").collect().length == 3) Review comment: also have an assert of checking SI table is disabled and query doesn't hit SI ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/ExtendedBlocklet.java ## @@ -221,7 +223,13 @@ public void deserializeFields(DataInput in, String[] locations, String tablePath indexUniqueId = in.readUTF(); } String filePath = getPath(); -if (filePath.startsWith(File.separator)) { +boolean isLocalFile = FileFactory.getCarbonFile(filePath) instanceof LocalCarbonFile; +// If it is external segment path, table path need not be appended to filePath +// Example filepath: hdfs://hacluster/opt/newsegmentpath/ +// filePath value would start with hdfs:// or s3:// . If it is local +// ubuntu storage, it starts with File separator, so check if given path exists or not. +if ((!isLocalFile && filePath.startsWith(File.separator)) || (isLocalFile && !FileFactory Review comment: the comment is not clear, please rewrite it with better example and scenarios This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID
ajantha-bhat commented on pull request #4086: URL: https://github.com/apache/carbondata/pull/4086#issuecomment-771560807 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID
ajantha-bhat commented on pull request #4086: URL: https://github.com/apache/carbondata/pull/4086#issuecomment-771560715 @nihal0107 : As per me, showing the original single segment before compaction is ok, because it is load command and not the compaction command. so, when load command finishes we can give segment id that loaded and user can do show segments before querying that segment to see if it has undergone compaction or not. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-4113) Partition query results invalid when carbon.read.partition.hive.direct is disabled
[ https://issues.apache.org/jira/browse/CARBONDATA-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4113. -- Fix Version/s: 2.1.1 Resolution: Fixed > Partition query results invalid when carbon.read.partition.hive.direct is > disabled > -- > > Key: CARBONDATA-4113 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4113 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > set 'carbon.read.partition.hive.direct' to false. > queries to execute: > create table partition_cache(a string) partitioned by(b int) stored as > carbondata > insert into partition_cache select 'k',1; > insert into partition_cache select 'k',1; > insert into partition_cache select 'k',2; > insert into partition_cache select 'k',2; > alter table partition_cache compact 'minor'; > select *from partition_cache; => no results -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #4084: [CARBONDATA-4113] Partition prune and cache fix when carbon.read.partition.hive.direct is disabled.
asfgit closed pull request #4084: URL: https://github.com/apache/carbondata/pull/4084 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #4084: [CARBONDATA-4113] Partition prune and cache fix when carbon.read.partition.hive.direct is disabled.
kunal642 commented on pull request #4084: URL: https://github.com/apache/carbondata/pull/4084#issuecomment-771559408 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-4112) Data mismatch issue in SI
[ https://issues.apache.org/jira/browse/CARBONDATA-4112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat resolved CARBONDATA-4112. -- Fix Version/s: (was: 2.1.0) 2.2.0 Resolution: Fixed > Data mismatch issue in SI > - > > Key: CARBONDATA-4112 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4112 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 2.1.0 >Reporter: Karan >Priority: Major > Fix For: 2.2.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > When data files of a SI segment are merged. It gives more number of rows in > SI table than main table -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #4083: [CARBONDATA-4112] Data mismatch issue in SI global sort merge scenario.
asfgit closed pull request #4083: URL: https://github.com/apache/carbondata/pull/4083 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on pull request #4083: [CARBONDATA-4112] Data mismatch issue in SI global sort merge scenario.
Karan980 commented on pull request #4083: URL: https://github.com/apache/carbondata/pull/4083#issuecomment-771551639 > The title of the PR can be more specific like "**Data mismatch issue in SI global sort merge scenario**" Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #4083: [CARBONDATA-4112] Data mismatch issue in SI.
ajantha-bhat edited a comment on pull request #4083: URL: https://github.com/apache/carbondata/pull/4083#issuecomment-771551095 The title of the PR can be more specific like "**Data mismatch issue in SI global sort merge scenario**" @Karan980 : From next time consider this point. I have changed it while merging now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #4083: [CARBONDATA-4112] Data mismatch issue in SI.
ajantha-bhat commented on pull request #4083: URL: https://github.com/apache/carbondata/pull/4083#issuecomment-771551095 The title of the PR can be more specific like "**Data mismatch issue in SI global sort merge scenario**" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #4083: [CARBONDATA-4112] Data mismatch issue in SI.
ajantha-bhat commented on pull request #4083: URL: https://github.com/apache/carbondata/pull/4083#issuecomment-771548605 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4084: [CARBONDATA-4113] Partition prune and cache fix when carbon.read.partition.hive.direct is disabled.
CarbonDataQA2 commented on pull request #4084: URL: https://github.com/apache/carbondata/pull/4084#issuecomment-771547238 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3647/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4084: [CARBONDATA-4113] Partition prune and cache fix when carbon.read.partition.hive.direct is disabled.
CarbonDataQA2 commented on pull request #4084: URL: https://github.com/apache/carbondata/pull/4084#issuecomment-771547087 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5407/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4076: [CARBONDATA-4107] Added related MV tables Map to fact table and added lock while touchMDTFile
akashrn5 commented on a change in pull request #4076: URL: https://github.com/apache/carbondata/pull/4076#discussion_r568501151 ## File path: docs/mv-guide.md ## @@ -241,6 +242,10 @@ The current information includes: | Refresh Mode | FULL / INCREMENTAL refresh to MV | | Refresh Trigger Mode | ON_COMMIT / ON_MANUAL refresh to MV provided by user | | Properties | Table properties of the materialized view | + +**NOTE**: For materialized views created +before [CARBONDATA-4107](https://issues.apache.org/jira/browse/CARBONDATA-4107) issue fix, run +refresh mv command to add mv name to fact table property and to enable it. Review comment: also please add what happens if user doesn't use refresh This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-4117) Test cg index query with Index server fails with NPE
SHREELEKHYA GAMPA created CARBONDATA-4117: - Summary: Test cg index query with Index server fails with NPE Key: CARBONDATA-4117 URL: https://issues.apache.org/jira/browse/CARBONDATA-4117 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA Test queries to execute: spark-sql> CREATE TABLE index_test_cg(id INT, name STRING, city STRING, age INT) STORED AS carbondata TBLPROPERTIES('SORT_COLUMNS'='city,name', 'SORT_SCOPE'='LOCAL_SORT'); spark-sql> create index cgindex on table index_test_cg (name) as 'org.apache.carbondata.spark.testsuite.index.CGIndexFactory'; LOAD DATA LOCAL INPATH '$file2' INTO TABLE index_test_cg OPTIONS('header'='false') spark-sql> select * from index_test_cg where name='n502670'; 2021-01-29 15:09:25,881 | ERROR | main | Exception occurred while getting splits using index server. Initiating Fallback to embedded mode | org.apache.carbondata.hadoop.api.CarbonInputFormat.getDistributedSplit(CarbonInputFormat.java:454) java.lang.reflect.UndeclaredThrowableException at com.sun.proxy.$Proxy69.getSplits(Unknown Source) at org.apache.carbondata.indexserver.DistributedIndexJob$$anonfun$1.apply(IndexJobs.scala:85) at org.apache.carbondata.indexserver.DistributedIndexJob$$anonfun$1.apply(IndexJobs.scala:59) at org.apache.carbondata.spark.util.CarbonScalaUtil$.logTime(CarbonScalaUtil.scala:769) at org.apache.carbondata.indexserver.DistributedIndexJob.execute(IndexJobs.scala:58) at org.apache.carbondata.core.index.IndexUtil.executeIndexJob(IndexUtil.java:307) at org.apache.carbondata.hadoop.api.CarbonInputFormat.getDistributedSplit(CarbonInputFormat.java:443) at org.apache.carbondata.hadoop.api.CarbonInputFormat.getPrunedBlocklets(CarbonInputFormat.java:555) at org.apache.carbondata.hadoop.api.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:500) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:357) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:205) at org.apache.carbondata.spark.rdd.CarbonScanRDD.internalGetPartitions(CarbonScanRDD.scala:159) at org.apache.carbondata.spark.rdd.CarbonRDD.getPartitions(CarbonRDD.scala:68) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2299) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:989) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:384) at org.apache.spark.rdd.RDD.collect(RDD.scala:988) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:345) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:372) at org.apache.spark.sql.execution.QueryExecution.hiveResultString(QueryExecution.scala:127) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver$$anonfun$run$1.apply(SparkSQLDriver.scala:66) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver$$anonfun$run$1.apply(SparkSQLDriver.scala:66) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:95) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:144) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:86) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:789) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:63) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:65) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:383) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:277) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDrive
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID
CarbonDataQA2 commented on pull request #4086: URL: https://github.com/apache/carbondata/pull/4086#issuecomment-771518487 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5405/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.
CarbonDataQA2 commented on pull request #4071: URL: https://github.com/apache/carbondata/pull/4071#issuecomment-771518388 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3646/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID
CarbonDataQA2 commented on pull request #4086: URL: https://github.com/apache/carbondata/pull/4086#issuecomment-771515497 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3645/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.
CarbonDataQA2 commented on pull request #4071: URL: https://github.com/apache/carbondata/pull/4071#issuecomment-771513934 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5406/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on pull request #4084: [CARBONDATA-4113] Partition prune and cache fix when carbon.read.partition.hive.direct is disabled.
ShreelekhyaG commented on pull request #4084: URL: https://github.com/apache/carbondata/pull/4084#issuecomment-771481950 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 edited a comment on pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID
nihal0107 edited a comment on pull request #4086: URL: https://github.com/apache/carbondata/pull/4086#issuecomment-771464559 > > If we enable the property `ENABLE_AUTO_LOAD_MERGE` then which segment id are we planning to show, the segment generated after compaction or before compaction? Better to add a test case for that scenario also. > > > If we enable the property `ENABLE_AUTO_LOAD_MERGE` then which segment id are we planning to show, the segment generated after compaction or before compaction? Better to add a test case for that scenario also. > > If we enable 'AUTO_LOAD_MERGE',then we return and show the segment id before compaction since the user would focus on his load operation. Test case has been added. Please review. We are showing the segment id because if we need to query on specific segment then this feature will be helpful. But if we will show the segment id before compaction and will query on specific segment then the operation will fail. Better to take the opinion in community. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4084: [CARBONDATA-4113] Partition prune and cache fix when carbon.read.partition.hive.direct is disabled.
CarbonDataQA2 commented on pull request #4084: URL: https://github.com/apache/carbondata/pull/4084#issuecomment-771465576 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5402/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID
nihal0107 commented on pull request #4086: URL: https://github.com/apache/carbondata/pull/4086#issuecomment-771464559 > > If we enable the property `ENABLE_AUTO_LOAD_MERGE` then which segment id are we planning to show, the segment generated after compaction or before compaction? Better to add a test case for that scenario also. > > > If we enable the property `ENABLE_AUTO_LOAD_MERGE` then which segment id are we planning to show, the segment generated after compaction or before compaction? Better to add a test case for that scenario also. > > If we enable 'AUTO_LOAD_MERGE',then we return and show the segment id before compaction since the user would focus on his load operation. Test case has been added. Please review. We are showing the segment id because if we need to query on specific segment then this feature will be helpful. But if we will show the segment id before compaction and will query on specific segment then the operation will fail. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4084: [CARBONDATA-4113] Partition prune and cache fix when carbon.read.partition.hive.direct is disabled.
CarbonDataQA2 commented on pull request #4084: URL: https://github.com/apache/carbondata/pull/4084#issuecomment-771459205 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3642/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-4116) Concurrent Data Loading Issue
suyash yadav created CARBONDATA-4116: Summary: Concurrent Data Loading Issue Key: CARBONDATA-4116 URL: https://issues.apache.org/jira/browse/CARBONDATA-4116 Project: CarbonData Issue Type: Improvement Components: core Affects Versions: 1.6.1, 2.0.1 Environment: Apache carbondata 2.0.1 Reporter: suyash yadav Even carbon claim that it can support the concurrent data loading together with table compaction, in fact it cannot. We have faced data inconsistent issue in carbon 1.6.1 when loading data concurrently into the table together with compaction. That is why we implement table locking for load data, compact and clean files command. All this is due to the manipulation of the table’s metadata file, i.e .tablestatus. We are facing issue in concurrent data loading together with compaction - gets us to data inconsistency - what is the way out to fix this as we want concurrent loading with compaction -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] areyouokfreejoe commented on pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID
areyouokfreejoe commented on pull request #4086: URL: https://github.com/apache/carbondata/pull/4086#issuecomment-771458076 > If we enable the property `ENABLE_AUTO_LOAD_MERGE` then which segment id are we planning to show, the segment generated after compaction or before compaction? Better to add a test case for that scenario also. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] areyouokfreejoe closed pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID
areyouokfreejoe closed pull request #4086: URL: https://github.com/apache/carbondata/pull/4086 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] areyouokfreejoe commented on pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID
areyouokfreejoe commented on pull request #4086: URL: https://github.com/apache/carbondata/pull/4086#issuecomment-771456273 > If we enable the property `ENABLE_AUTO_LOAD_MERGE` then which segment id are we planning to show, the segment generated after compaction or before compaction? Better to add a test case for that scenario also. > If we enable the property `ENABLE_AUTO_LOAD_MERGE` then which segment id are we planning to show, the segment generated after compaction or before compaction? Better to add a test case for that scenario also. If we enable 'AUTO_LOAD_MERGE',then we return and show the segment id before compaction since the user would focus on his load operation. Test case has been added. Please review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ydvpankaj99 commented on pull request #4071: [CARBONDATA-4102] Added UT and FT to improve coverage of SI module.
ydvpankaj99 commented on pull request #4071: URL: https://github.com/apache/carbondata/pull/4071#issuecomment-771448246 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org