[GitHub] [carbondata] akashrn5 commented on a change in pull request #3793: [CARBONDATA-3858] Increase the parallelism of CDC intermediate files processing
akashrn5 commented on a change in pull request #3793: URL: https://github.com/apache/carbondata/pull/3793#discussion_r443361855 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/merge/CarbonMergeDataSetCommand.scala ## @@ -269,11 +271,10 @@ case class CarbonMergeDataSetCommand( new SparkCarbonFileFormat().prepareWrite(sparkSession, job, Map(), schema) val config = SparkSQLUtil.broadCastHadoopConf(sparkSession.sparkContext, job.getConfiguration) - (frame.rdd.coalesce(DistributionUtil.getConfiguredExecutors(sparkSession.sparkContext)). - mapPartitionsWithIndex { case (index, iter) => +(frame.rdd.mapPartitionsWithIndex { case (index, iter) => CarbonProperties.getInstance().addProperty(CarbonLoadOptionConstants .ENABLE_CARBON_LOAD_DIRECT_WRITE_TO_STORE_PATH, "true") -val confB = config.value.value +val confB = new Configuration(config.value.value) Review comment: i think adding new conf for it is not correct we need to analyze properly, may be you can revert these changes and we can handle during other cdc optimizations This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-3858) Check CDC deltafiles count in the testcase
[ https://issues.apache.org/jira/browse/CARBONDATA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingjun Hao updated CARBONDATA-3858: Description: Current there is no deltafiles count check in the testcase, which shall be supplemented. (was: In the CDC flow. the parallelism of deltafiles processing is the same as executor number, which reduce the parallelism heavily. The insufficient parallelism limits CPU overhead, hampers CDC's performance.) Summary: Check CDC deltafiles count in the testcase (was: Increase the parallelism of CDC intermediate files processing) > Check CDC deltafiles count in the testcase > -- > > Key: CARBONDATA-3858 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3858 > Project: CarbonData > Issue Type: Improvement >Reporter: Xingjun Hao >Priority: Minor > Time Spent: 2h 50m > Remaining Estimate: 0h > > Current there is no deltafiles count check in the testcase, which shall be > supplemented. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] niuge01 closed pull request #3797: [WIP] Support show segment information
niuge01 closed pull request #3797: URL: https://github.com/apache/carbondata/pull/3797 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] niuge01 commented on a change in pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation
niuge01 commented on a change in pull request #3799: URL: https://github.com/apache/carbondata/pull/3799#discussion_r443412925 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertFromStageCommand.scala ## @@ -477,6 +487,48 @@ case class CarbonInsertFromStageCommand( output.asScala } + /** + * create '.loading' file to tag the stage in process + * Return false means the stage files were creat successfully + * While return true means the stage files were failed to create + */ + private def createStageLoadingFiles( + executorService: ExecutorService, + stageFiles: Array[(CarbonFile, CarbonFile)]): Array[(CarbonFile, CarbonFile)] = { +stageFiles.map { files => + executorService.submit(new Callable[Boolean] { +override def call(): Boolean = { + val stageLoadingFile = +FileFactory.getCarbonFile(files._1.getAbsolutePath + + CarbonTablePath.LOADING_FILE_SUBFIX); + if (!stageLoadingFile.exists()) { +stageLoadingFile.createNewFile(); + } else { +stageLoadingFile.setLastModifiedTime(System.currentTimeMillis()); + } +} + }) +}.filter { future => + future.get() +} +stageFiles + } + + /** + * create '.loading' file with retry + */ + private def createStageLoadingFilesWithRetry( + executorService: ExecutorService, + stageFiles: Array[(CarbonFile, CarbonFile)]): Unit = { +val startTime = System.currentTimeMillis() +var retry = CarbonInsertFromStageCommand.DELETE_FILES_RETRY_TIMES +while (createStageLoadingFiles(executorService, stageFiles).length > 0 && retry > 0) { Review comment: Please check this loop condition, if createStageLoadingFiles(executorService, stageFiles).length > 0, should loop continue? ## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ## @@ -1903,6 +1903,30 @@ public static Long getInputMetricsInterval() { } } + /** + * Validate and get the input metrics interval + * + * @return input metrics interval + */ + public static Long getInsertStageTimeout() { +String timeout = CarbonProperties.getInstance() +.getProperty(CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT); +if (timeout == null) { + return CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT_DEFAULT; +} else { + try { +long configuredValue = Long.parseLong(timeout); +if (configuredValue < 0) { + return CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT_DEFAULT; Review comment: Log a warning for illegal configuration value ## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ## @@ -1903,6 +1903,30 @@ public static Long getInputMetricsInterval() { } } + /** + * Validate and get the input metrics interval + * + * @return input metrics interval + */ + public static Long getInsertStageTimeout() { +String timeout = CarbonProperties.getInstance() +.getProperty(CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT); +if (timeout == null) { + return CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT_DEFAULT; +} else { + try { +long configuredValue = Long.parseLong(timeout); +if (configuredValue < 0) { + return CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT_DEFAULT; +} else { + return configuredValue; +} + } catch (Exception ex) { Review comment: Catch NumberFormatException。 Log a warning for exception. ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertFromStageCommand.scala ## @@ -148,10 +149,19 @@ case class CarbonInsertFromStageCommand( return Seq.empty } - // 2) read all stage files to collect input files for data loading - // create a thread pool to read them + // We add a tag 'loading' to the stages in process. + // different insertstage processes can load different data separately + // by choose the stages without 'loading' tag or stages loaded timeout. + // which avoid loading the same data between concurrent insertstage processes. + // The 'loading' tag is actually an empty file with + // '.loading' suffix filename val numThreads = Math.min(Math.max(stageFiles.length, 1), 10) val executorService = Executors.newFixedThreadPool(numThreads) + createStageLoadingFilesWithRetry(executorService, stageFiles) + lock.unlock() Review comment: remove this line, lock will unlock in finally block. ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -1521,6 +1521,10 @@ private CarbonCommonConstants() { public static final String CARBON_QUERY_STAG
[GitHub] [carbondata] Indhumathi27 commented on pull request #3795: [CARBONDATA-3860] Fix IndexServer keeps loading some segments index repeatly
Indhumathi27 commented on pull request #3795: URL: https://github.com/apache/carbondata/pull/3795#issuecomment-647390472 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3793: [CARBONDATA-3858] Check CDC deltafiles count in the testcase
CarbonDataQA1 commented on pull request #3793: URL: https://github.com/apache/carbondata/pull/3793#issuecomment-647406601 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3188/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3793: [CARBONDATA-3858] Check CDC deltafiles count in the testcase
CarbonDataQA1 commented on pull request #3793: URL: https://github.com/apache/carbondata/pull/3793#issuecomment-647407337 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1462/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on a change in pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation
marchpure commented on a change in pull request #3799: URL: https://github.com/apache/carbondata/pull/3799#discussion_r443450793 ## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ## @@ -1903,6 +1903,30 @@ public static Long getInputMetricsInterval() { } } + /** + * Validate and get the input metrics interval + * + * @return input metrics interval + */ + public static Long getInsertStageTimeout() { +String timeout = CarbonProperties.getInstance() +.getProperty(CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT); +if (timeout == null) { + return CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT_DEFAULT; +} else { + try { +long configuredValue = Long.parseLong(timeout); +if (configuredValue < 0) { + return CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT_DEFAULT; +} else { + return configuredValue; +} + } catch (Exception ex) { Review comment: modified ## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ## @@ -1903,6 +1903,30 @@ public static Long getInputMetricsInterval() { } } + /** + * Validate and get the input metrics interval + * + * @return input metrics interval + */ + public static Long getInsertStageTimeout() { +String timeout = CarbonProperties.getInstance() +.getProperty(CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT); +if (timeout == null) { + return CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT_DEFAULT; +} else { + try { +long configuredValue = Long.parseLong(timeout); +if (configuredValue < 0) { + return CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT_DEFAULT; Review comment: modified This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on a change in pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation
marchpure commented on a change in pull request #3799: URL: https://github.com/apache/carbondata/pull/3799#discussion_r443451334 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertFromStageCommand.scala ## @@ -148,10 +149,19 @@ case class CarbonInsertFromStageCommand( return Seq.empty } - // 2) read all stage files to collect input files for data loading - // create a thread pool to read them + // We add a tag 'loading' to the stages in process. + // different insertstage processes can load different data separately + // by choose the stages without 'loading' tag or stages loaded timeout. + // which avoid loading the same data between concurrent insertstage processes. + // The 'loading' tag is actually an empty file with + // '.loading' suffix filename val numThreads = Math.min(Math.max(stageFiles.length, 1), 10) val executorService = Executors.newFixedThreadPool(numThreads) + createStageLoadingFilesWithRetry(executorService, stageFiles) + lock.unlock() Review comment: it can't be removed, as we aim to release ingest lock once complete tag 'loading' for the choosed stage. ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -1521,6 +1521,10 @@ private CarbonCommonConstants() { public static final String CARBON_QUERY_STAGE_INPUT_DEFAULT = "false"; + public static final String CARBON_INSERT_STAGE_TIMEOUT = "carbon.insert.stage.timeout"; + + public static final long CARBON_INSERT_STAGE_TIMEOUT_DEFAULT = 2880; Review comment: modified This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on a change in pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation
marchpure commented on a change in pull request #3799: URL: https://github.com/apache/carbondata/pull/3799#discussion_r443451443 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertFromStageCommand.scala ## @@ -477,6 +487,48 @@ case class CarbonInsertFromStageCommand( output.asScala } + /** + * create '.loading' file to tag the stage in process + * Return false means the stage files were creat successfully + * While return true means the stage files were failed to create + */ + private def createStageLoadingFiles( + executorService: ExecutorService, + stageFiles: Array[(CarbonFile, CarbonFile)]): Array[(CarbonFile, CarbonFile)] = { +stageFiles.map { files => + executorService.submit(new Callable[Boolean] { +override def call(): Boolean = { + val stageLoadingFile = +FileFactory.getCarbonFile(files._1.getAbsolutePath + + CarbonTablePath.LOADING_FILE_SUBFIX); + if (!stageLoadingFile.exists()) { Review comment: modified This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation
CarbonDataQA1 commented on pull request #3799: URL: https://github.com/apache/carbondata/pull/3799#issuecomment-647418441 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1464/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on a change in pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation
marchpure commented on a change in pull request #3799: URL: https://github.com/apache/carbondata/pull/3799#discussion_r443452016 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertFromStageCommand.scala ## @@ -477,6 +487,48 @@ case class CarbonInsertFromStageCommand( output.asScala } + /** + * create '.loading' file to tag the stage in process + * Return false means the stage files were creat successfully + * While return true means the stage files were failed to create + */ + private def createStageLoadingFiles( + executorService: ExecutorService, + stageFiles: Array[(CarbonFile, CarbonFile)]): Array[(CarbonFile, CarbonFile)] = { +stageFiles.map { files => + executorService.submit(new Callable[Boolean] { +override def call(): Boolean = { + val stageLoadingFile = +FileFactory.getCarbonFile(files._1.getAbsolutePath + + CarbonTablePath.LOADING_FILE_SUBFIX); + if (!stageLoadingFile.exists()) { +stageLoadingFile.createNewFile(); + } else { +stageLoadingFile.setLastModifiedTime(System.currentTimeMillis()); + } +} + }) +}.filter { future => + future.get() +} +stageFiles + } + + /** + * create '.loading' file with retry + */ + private def createStageLoadingFilesWithRetry( + executorService: ExecutorService, + stageFiles: Array[(CarbonFile, CarbonFile)]): Unit = { +val startTime = System.currentTimeMillis() +var retry = CarbonInsertFromStageCommand.DELETE_FILES_RETRY_TIMES +while (createStageLoadingFiles(executorService, stageFiles).length > 0 && retry > 0) { Review comment: checked. it shall loop continue. createStageLoadingFiles(executorService, stageFiles).length is equal to the stages fails to tag 'loading'. if length >0, we shall loop continue and retry to tag 'loading' again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation
CarbonDataQA1 commented on pull request #3799: URL: https://github.com/apache/carbondata/pull/3799#issuecomment-647418962 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3190/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3795: [CARBONDATA-3860] Fix IndexServer keeps loading some segments index repeatly
CarbonDataQA1 commented on pull request #3795: URL: https://github.com/apache/carbondata/pull/3795#issuecomment-647456554 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1463/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3795: [CARBONDATA-3860] Fix IndexServer keeps loading some segments index repeatly
CarbonDataQA1 commented on pull request #3795: URL: https://github.com/apache/carbondata/pull/3795#issuecomment-647457682 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3189/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on pull request #3795: [CARBONDATA-3860] Fix IndexServer keeps loading some segments index repeatly
Indhumathi27 commented on pull request #3795: URL: https://github.com/apache/carbondata/pull/3795#issuecomment-647458097 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 closed pull request #3789: [WIP] Store Size Optimization
Indhumathi27 closed pull request #3789: URL: https://github.com/apache/carbondata/pull/3789 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3864) Store Size Optimization
Indhumathi Muthumurugesh created CARBONDATA-3864: Summary: Store Size Optimization Key: CARBONDATA-3864 URL: https://issues.apache.org/jira/browse/CARBONDATA-3864 Project: CarbonData Issue Type: Improvement Reporter: Indhumathi Muthumurugesh -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation
CarbonDataQA1 commented on pull request #3799: URL: https://github.com/apache/carbondata/pull/3799#issuecomment-647491662 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3191/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation
CarbonDataQA1 commented on pull request #3799: URL: https://github.com/apache/carbondata/pull/3799#issuecomment-647492250 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1465/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (CARBONDATA-3857) Implement delete and update feature in carbondata SDK.
[ https://issues.apache.org/jira/browse/CARBONDATA-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karanpreet Singh closed CARBONDATA-3857. Resolution: Invalid > Implement delete and update feature in carbondata SDK. > -- > > Key: CARBONDATA-3857 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3857 > Project: CarbonData > Issue Type: New Feature >Reporter: Karanpreet Singh >Priority: Major > Attachments: Implement delete and update feature in carbondata SDK.pdf > > > Please find the design document attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3865) Implement delete and update feature in carbondata SDK.
Karanpreet Singh created CARBONDATA-3865: Summary: Implement delete and update feature in carbondata SDK. Key: CARBONDATA-3865 URL: https://issues.apache.org/jira/browse/CARBONDATA-3865 Project: CarbonData Issue Type: New Feature Reporter: Karanpreet Singh Attachments: Implement delete and update feature in carbondata SDK.pdf Please find the design document attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3865) Implement delete and update feature in carbondata SDK.
[ https://issues.apache.org/jira/browse/CARBONDATA-3865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karanpreet Singh updated CARBONDATA-3865: - Attachment: Implement delete and update feature in carbondata SDK.pdf > Implement delete and update feature in carbondata SDK. > -- > > Key: CARBONDATA-3865 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3865 > Project: CarbonData > Issue Type: New Feature >Reporter: Karanpreet Singh >Priority: Major > Attachments: Implement delete and update feature in carbondata SDK.pdf > > > Please find the design document attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3865) Implement delete and update feature in carbondata SDK.
[ https://issues.apache.org/jira/browse/CARBONDATA-3865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karanpreet Singh updated CARBONDATA-3865: - Attachment: (was: Implement delete and update feature in carbondata SDK.pdf) > Implement delete and update feature in carbondata SDK. > -- > > Key: CARBONDATA-3865 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3865 > Project: CarbonData > Issue Type: New Feature >Reporter: Karanpreet Singh >Priority: Major > Attachments: Implement delete and update feature in carbondata SDK.pdf > > > Please find the design document attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3800: [WIP] Lock to read tablestatus
CarbonDataQA1 commented on pull request #3800: URL: https://github.com/apache/carbondata/pull/3800#issuecomment-647689389 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3192/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3800: [WIP] Lock to read tablestatus
CarbonDataQA1 commented on pull request #3800: URL: https://github.com/apache/carbondata/pull/3800#issuecomment-647689995 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1466/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] xubo245 commented on a change in pull request #3795: [CARBONDATA-3860] Fix IndexServer keeps loading some segments index repeatly
xubo245 commented on a change in pull request #3795: URL: https://github.com/apache/carbondata/pull/3795#discussion_r443917288 ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletIndexFactory.java ## @@ -352,9 +352,12 @@ private void modifyColumnSchemaForSortColumn(ColumnSchema columnSchema, boolean throws IOException { SegmentBlockIndexInfo segmentBlockIndexInfo = segmentMap.get(segment.getSegmentNo()); Set tableBlockIndexUniqueIdentifiers = null; -if (null != segmentBlockIndexInfo && null != segmentBlockIndexInfo.getSegmentMetaDataInfo()) { - segment.setSegmentMetaDataInfo( - segmentMap.get(segment.getSegmentNo()).getSegmentMetaDataInfo()); +if (null != segmentBlockIndexInfo +&& segmentBlockIndexInfo.getTableBlockIndexUniqueIdentifiers().size() > 0) { Review comment: 建议用CollectionUtils.isNotEmpty()来判断segmentBlockIndexInfo.getTableBlockIndexUniqueIdentifiers(), 这个多了null的判断 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] xubo245 commented on a change in pull request #3795: [CARBONDATA-3860] Fix IndexServer keeps loading some segments index repeatly
xubo245 commented on a change in pull request #3795: URL: https://github.com/apache/carbondata/pull/3795#discussion_r443917288 ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletIndexFactory.java ## @@ -352,9 +352,12 @@ private void modifyColumnSchemaForSortColumn(ColumnSchema columnSchema, boolean throws IOException { SegmentBlockIndexInfo segmentBlockIndexInfo = segmentMap.get(segment.getSegmentNo()); Set tableBlockIndexUniqueIdentifiers = null; -if (null != segmentBlockIndexInfo && null != segmentBlockIndexInfo.getSegmentMetaDataInfo()) { - segment.setSegmentMetaDataInfo( - segmentMap.get(segment.getSegmentNo()).getSegmentMetaDataInfo()); +if (null != segmentBlockIndexInfo +&& segmentBlockIndexInfo.getTableBlockIndexUniqueIdentifiers().size() > 0) { Review comment: Suggestion:use CollectionUtils.isNotEmpty() to judge segmentBlockIndexInfo.getTableBlockIndexUniqueIdentifiers(), isNotEmpty include judge null This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3795: [CARBONDATA-3860] Fix IndexServer keeps loading some segments index repeatly
CarbonDataQA1 commented on pull request #3795: URL: https://github.com/apache/carbondata/pull/3795#issuecomment-647874762 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1467/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3795: [CARBONDATA-3860] Fix IndexServer keeps loading some segments index repeatly
CarbonDataQA1 commented on pull request #3795: URL: https://github.com/apache/carbondata/pull/3795#issuecomment-647875150 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3193/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #3793: [CARBONDATA-3858] Check CDC deltafiles count in the testcase
akashrn5 commented on pull request #3793: URL: https://github.com/apache/carbondata/pull/3793#issuecomment-647900779 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation
CarbonDataQA1 commented on pull request #3799: URL: https://github.com/apache/carbondata/pull/3799#issuecomment-647923116 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3194/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3795: [CARBONDATA-3860] Fix IndexServer keeps loading some segments index repeatly
CarbonDataQA1 commented on pull request #3795: URL: https://github.com/apache/carbondata/pull/3795#issuecomment-647927390 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation
CarbonDataQA1 commented on pull request #3799: URL: https://github.com/apache/carbondata/pull/3799#issuecomment-647934951 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1468/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #3795: [CARBONDATA-3860] Fix IndexServer keeps loading some segments index repeatly
kunal642 commented on pull request #3795: URL: https://github.com/apache/carbondata/pull/3795#issuecomment-647937900 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3866) Code comment with issue numbers
ackelcn created CARBONDATA-3866: --- Summary: Code comment with issue numbers Key: CARBONDATA-3866 URL: https://issues.apache.org/jira/browse/CARBONDATA-3866 Project: CarbonData Issue Type: Improvement Reporter: ackelcn When I read the code of carbondata, I find several comments with issue numbers. One of them comes from CSVCarbonWriterTest.java: {code:java} // [CARBONDATA-3688]: compressor name is added in data file name @Test public void testFileName() throws IOException { String path = "./testWriteFiles"; FileUtils.deleteDirectory(new File(path)); Field[] fields = new Field[2]; fields[0] = new Field("name", DataTypes.STRING); fields[1] = new Field("age", DataTypes.INT); TestUtil.writeFilesAndVerify(new Schema(fields), path); ... }{code} These comments are quite useful for other programmers and me to understand the code, but I notice that not all issue numbers are written in code comments. It can be already quite tedious to write them into commit messages :) To handle the problem, I implemented a tool to automatically instrument issue numbers into code comments. I tried my tool on activemq, and the instrumented version is [https://github.com/ackelcn/carbondatawithissuecomment] To avoid confusion, if there is already an issue number in code comments, my tool ignored the issue number. All my generated comments start from //IC, so it is easy to find them. Would you please some feedbacks to my tool? Please feel free to merge my generated comments in your code, if you feel that some are useful. -- This message was sent by Atlassian Jira (v8.3.4#803005)