[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3059#discussion_r246317595 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java --- @@ -575,19 +575,23 @@ public static Dictionary getDictionary(AbsoluteTableIdentifier absoluteTableIden } // calculate the average expected size for each node -long sizePerNode = 0; +long numberOfBlocksPerNode = 0; +if (blockInfos.size() > 0) { + numberOfBlocksPerNode = blockInfos.size() / numOfNodes; +} +numberOfBlocksPerNode = numberOfBlocksPerNode <= 0 ? 1 : numberOfBlocksPerNode; +long dataSizePerNode = 0; long totalFileSize = 0; +for (Distributable blockInfo : uniqueBlocks) { + totalFileSize += ((TableBlockInfo) blockInfo).getBlockLength(); +} +dataSizePerNode = totalFileSize / numOfNodes; +long sizePerNode = 0; if (BlockAssignmentStrategy.BLOCK_NUM_FIRST == blockAssignmentStrategy) { - if (blockInfos.size() > 0) { -sizePerNode = blockInfos.size() / numOfNodes; - } - sizePerNode = sizePerNode <= 0 ? 1 : sizePerNode; + sizePerNode = numberOfBlocksPerNode; --- End diff -- this modify i think is ok , if using BLOCK_NUM_FIRST block assignment strategy ---
[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3059#discussion_r246299802 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java --- @@ -1164,4 +1156,35 @@ private static void deleteFiles(List filesToBeDeleted) throws IOExceptio FileFactory.deleteFile(filePath, FileFactory.getFileType(filePath)); } } + + /** + * This method will calculate the average expected size for each node + * + * @param blockInfos blocks + * @param uniqueBlocks unique blocks + * @param numOfNodes if number of nodes has to be decided + * based on block location information + * @param blockAssignmentStrategy strategy used to assign blocks + * @return the average expected size for each node + */ + private static long calcAvgLoadSizePerNode(List blockInfos, --- End diff -- ok, i modify it ---
[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3059#discussion_r246299700 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java --- @@ -609,6 +597,10 @@ public static Dictionary getDictionary(AbsoluteTableIdentifier absoluteTableIden blockAssignmentStrategy = BlockAssignmentStrategy.BLOCK_SIZE_FIRST; } else { blockAssignmentStrategy = BlockAssignmentStrategy.BLOCK_NUM_FIRST; + // fall back to BLOCK_NUM_FIRST strategy need to recalculate + // the average expected size for each node + sizePerNode = calcAvgLoadSizePerNode(blockInfos,uniqueBlocks, --- End diff -- ok, i modify it. ---
[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...
GitHub user ndwangsen opened a pull request: https://github.com/apache/carbondata/pull/3059 [HOTFIX][DataLoad]fix task assignment issue using NODE_MIN_SIZE_FIRST block assignment strategy fix task assignment issue using NODE_MIN_SIZE_FIRST block assignment strategy Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? NA - [ ] Testing done Test OK in local env - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ndwangsen/incubator-carbondata fix_load_min_size_bug Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3059.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3059 commit 04d6bff55a5c9120ae8d5c4899a82bc63f1e2e37 Author: ndwangsen Date: 2019-01-09T07:10:21Z [HOTFIX][DataLoad]fix task assignment issue using NODE_MIN_SIZE_FIRST block assignment strategy. ---
[GitHub] carbondata issue #2864: [CARBONDATA-3041] Optimize load minimum size strateg...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2864 retest this please ---
[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2864#discussion_r228792020 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -1171,21 +1171,25 @@ object CarbonDataRDDFactory { .ensureExecutorsAndGetNodeList(blockList, sqlContext.sparkContext) val skewedDataOptimization = CarbonProperties.getInstance() .isLoadSkewedDataOptimizationEnabled() -val loadMinSizeOptimization = CarbonProperties.getInstance() - .isLoadMinSizeOptimizationEnabled() // get user ddl input the node loads the smallest amount of data -val expectedMinSizePerNode = carbonLoadModel.getLoadMinSize() -val blockAssignStrategy = if (skewedDataOptimization) { - CarbonLoaderUtil.BlockAssignmentStrategy.BLOCK_SIZE_FIRST -} else if (loadMinSizeOptimization) { +val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable +var loadMinSize = carbonLoadModel.getLoadMinSize() +if (loadMinSize == "0" ) { --- End diff -- ok, i modify it ---
[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2864#discussion_r228791974 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -833,4 +833,32 @@ object CommonUtil { }) } } + + /** + * This method will validate single node minimum load data volume of table specified by the user + * + * @param tableProperties table property specified by user + * @param propertyName property name + */ + def validateLoadMinSize(tableProperties: Map[String, String], propertyName: String): Unit = { +var size: Integer = 0 +if (tableProperties.get(propertyName).isDefined) { + val loadSizeStr: String = +parsePropertyValueStringInMB(tableProperties(propertyName)) + try { +size = Integer.parseInt(loadSizeStr) + } catch { +case e: NumberFormatException => + throw new MalformedCarbonCommandException(s"Invalid $propertyName value found: " + +s"$loadSizeStr, only int value greater " + +s"than 0 is supported.") + } + // if the value is negative, set the value is 0 + if(size > 0) { +tableProperties.put(propertyName, loadSizeStr) + } else { +tableProperties.put(propertyName, "0") --- End diff -- ok, i modify it ---
[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2864#discussion_r228791841 --- Diff: docs/ddl-of-carbondata.md --- @@ -474,7 +475,19 @@ CarbonData DDL statements are documented here,which includes: be later viewed in table description for reference. ``` - TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords'') + TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords') + ``` + + - # Load minimum data size --- End diff -- ok, i modify it ---
[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2864#discussion_r228708281 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -1171,12 +1171,27 @@ object CarbonDataRDDFactory { .ensureExecutorsAndGetNodeList(blockList, sqlContext.sparkContext) val skewedDataOptimization = CarbonProperties.getInstance() .isLoadSkewedDataOptimizationEnabled() -val loadMinSizeOptimization = CarbonProperties.getInstance() - .isLoadMinSizeOptimizationEnabled() // get user ddl input the node loads the smallest amount of data -val expectedMinSizePerNode = carbonLoadModel.getLoadMinSize() +val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable +val loadMinSize = carbonTable.getTableInfo.getFactTable.getTableProperties.asScala --- End diff -- Has been modified based on the review ---
[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2864#discussion_r228708277 --- Diff: docs/ddl-of-carbondata.md --- @@ -474,7 +475,22 @@ CarbonData DDL statements are documented here,which includes: be later viewed in table description for reference. ``` - TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords'') + TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords') + ``` + + - # Load minimum data size + This property determines whether to enable node minumun input data size allocation strategy + for data loading.It will make sure that the node load the minimum amount of data there by + reducing number of carbondata files. This property is useful if the size of the input data + files are very small, like 1MB to 256MB. And This property can also be specified + in the load option, the property value only int value is supported. + + ``` + TBLPROPERTIES('LOAD_MIN_SIZE_INMB'='256 MB') --- End diff -- Has been modified based on the review ---
[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2864#discussion_r228708250 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala --- @@ -123,6 +123,12 @@ private[sql] case class CarbonDescribeFormattedCommand( tblProps.get(CarbonCommonConstants.LONG_STRING_COLUMNS), "")) } +// load min size info +if (tblProps.containsKey(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB)) { + results ++= Seq(("Single node load min data size", --- End diff -- Has been modified based on the review ---
[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2864#discussion_r228708257 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -833,4 +833,26 @@ object CommonUtil { }) } } + + /** + * This method will validate single node minimum load data volume of table specified by the user + * + * @param tableProperties table property specified by user + * @param propertyName property name + */ + def validateLoadMinSize(tableProperties: Map[String, String], propertyName: String): Unit = { +var size: Integer = 0 +if (tableProperties.get(propertyName).isDefined) { + val loadSizeStr: String = +parsePropertyValueStringInMB(tableProperties(propertyName)) + try { +size = Integer.parseInt(loadSizeStr) + } catch { +case e: NumberFormatException => --- End diff -- Has been modified based on the review ---
[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2864#discussion_r228708260 --- Diff: integration/spark2/src/main/scala/org/apache/spark/util/AlterTableUtil.scala --- @@ -748,4 +752,18 @@ object AlterTableUtil { false } } + + private def validateLoadMinSizeProperties(carbonTable: CarbonTable, + propertiesMap: mutable.Map[String, String]): Unit = { +// validate load min size property +if (propertiesMap.get(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB).isDefined) { + // Cache level is not allowed for child tables and dataMaps --- End diff -- Has been modified based on the review ---
[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2864#discussion_r228708269 --- Diff: docs/ddl-of-carbondata.md --- @@ -474,7 +475,22 @@ CarbonData DDL statements are documented here,which includes: be later viewed in table description for reference. ``` - TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords'') + TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords') + ``` + + - # Load minimum data size + This property determines whether to enable node minumun input data size allocation strategy --- End diff -- Has been modified based on the review ---
[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2864#discussion_r228708254 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -833,4 +833,26 @@ object CommonUtil { }) } } + + /** + * This method will validate single node minimum load data volume of table specified by the user + * + * @param tableProperties table property specified by user + * @param propertyName property name + */ + def validateLoadMinSize(tableProperties: Map[String, String], propertyName: String): Unit = { +var size: Integer = 0 +if (tableProperties.get(propertyName).isDefined) { + val loadSizeStr: String = +parsePropertyValueStringInMB(tableProperties(propertyName)) + try { +size = Integer.parseInt(loadSizeStr) --- End diff -- Has been modified based on the review ---
[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2864#discussion_r228708258 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -1171,12 +1171,27 @@ object CarbonDataRDDFactory { .ensureExecutorsAndGetNodeList(blockList, sqlContext.sparkContext) val skewedDataOptimization = CarbonProperties.getInstance() .isLoadSkewedDataOptimizationEnabled() -val loadMinSizeOptimization = CarbonProperties.getInstance() - .isLoadMinSizeOptimizationEnabled() // get user ddl input the node loads the smallest amount of data -val expectedMinSizePerNode = carbonLoadModel.getLoadMinSize() +val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable +val loadMinSize = carbonTable.getTableInfo.getFactTable.getTableProperties.asScala + .getOrElse(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB, "") +var expectedMinSizePerNode = carbonLoadModel.getLoadMinSize() --- End diff -- Has been modified based on the review ---
[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2864#discussion_r228708265 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/LoadOption.java --- @@ -186,8 +186,7 @@ optionsFinal.put("sort_scope", "local_sort"); optionsFinal.put("sort_column_bounds", Maps.getOrDefault(options, "sort_column_bounds", "")); optionsFinal.put(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB, - Maps.getOrDefault(options,CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB, -CarbonCommonConstants.CARBON_LOAD_MIN_NODE_SIZE_INMB_DEFAULT)); + Maps.getOrDefault(options,CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB, "")); --- End diff -- ok ---
[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...
GitHub user ndwangsen opened a pull request: https://github.com/apache/carbondata/pull/2864 [CARBONDATA-3041] Optimize load minimum size strategy for data loading this PR modifies the following points: 1. Delete system property carbon.load.min.size.enabledï¼modified this property load_min_size_inmb to table propertyï¼and This property can also be specified in the load option. 2. Support to alter table xxx set TBLPROPERTIES('load_min_size_inmb '='256') 3. If creating a table has this property load_min_size_inmbï¼Display this property via the desc formatted command. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? YES - [ ] Testing done Test ok in our test env - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ndwangsen/incubator-carbondata fix_load_min Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2864.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2864 commit bbbe70d04cef85b2c7ab50d3f697e0d1e35efc95 Author: ndwangsen Date: 2018-10-27T02:38:48Z [CARBONDATA-3041]Optimize load minimum size strategy for data loading ---
[GitHub] carbondata pull request #2843: [CARBONDATA-3034] Carding parameters,Organize...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2843#discussion_r227626868 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -23,86 +23,26 @@ import org.apache.carbondata.core.util.CarbonProperty; public final class CarbonCommonConstants { - /** - * surrogate value of null - */ - public static final int DICT_VALUE_NULL = 1; - /** - * surrogate value of null for direct dictionary - */ - public static final int DIRECT_DICT_VALUE_NULL = 1; - /** - * integer size in bytes - */ - public static final int INT_SIZE_IN_BYTE = 4; - /** - * short size in bytes - */ - public static final int SHORT_SIZE_IN_BYTE = 2; - /** - * DOUBLE size in bytes - */ - public static final int DOUBLE_SIZE_IN_BYTE = 8; - /** - * LONG size in bytes - */ - public static final int LONG_SIZE_IN_BYTE = 8; - /** - * byte to KB conversion factor - */ - public static final int BYTE_TO_KB_CONVERSION_FACTOR = 1024; - /** - * BYTE_ENCODING - */ - public static final String BYTE_ENCODING = "ISO-8859-1"; - /** - * measure meta data file name - */ - public static final String MEASURE_METADATA_FILE_NAME = "/msrMetaData_"; - - /** - * set the segment ids to query from the table - */ - public static final String CARBON_INPUT_SEGMENTS = "carbon.input.segments."; - - /** - * key prefix for set command. 'carbon.datamap.visible.dbName.tableName.dmName = false' means - * that the query on 'dbName.table' will not use the datamap 'dmName' - */ - @InterfaceStability.Unstable - public static final String CARBON_DATAMAP_VISIBLE = "carbon.datamap.visible."; - - /** - * Fetch and validate the segments. - * Used for aggregate table load as segment validation is not required. - */ - public static final String VALIDATE_CARBON_INPUT_SEGMENTS = "validate.carbon.input.segments."; + private CarbonCommonConstants() { + } /** --- End diff -- okï¼i modify it ---
[GitHub] carbondata pull request #2843: [CARBONDATA-3034] Carding parameters,Organize...
GitHub user ndwangsen opened a pull request: https://github.com/apache/carbondata/pull/2843 [CARBONDATA-3034] Carding parameters,Organized by parameter category. This PR is mainly combing parameters, organized by parameter category. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? NA - [ ] Testing done Test in local env. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ndwangsen/incubator-carbondata parameter_comb Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2843.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2843 commit 21dc71ba986ab1c2cbdd2cfaa5418a2d629bc34a Author: ndwangsen Date: 2018-10-23T03:35:17Z [CARBONDATA-3034] Carding parameters,Organized by parameter category. ---
[GitHub] carbondata issue #2627: [CARBONDATA-2835] [MVDataMap] Block MV datamap on st...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2627 retest sdv please ---
[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2627#discussion_r210787370 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala --- @@ -237,6 +237,21 @@ object CarbonEnv { getCarbonTable(tableIdentifier.database, tableIdentifier.table)(sparkSession) } + /** + * This method returns corresponding CarbonTable, it will return None if it's not a CarbonTable + */ + def getCarbonTableOption( --- End diff -- the getCarbonTable will throw a exception when get a non-carbon table ---
[GitHub] carbondata issue #2627: [CARBONDATA-2835] [MVDataMap] Block MV datamap on st...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2627 retest sdv please ---
[GitHub] carbondata issue #2627: [CARBONDATA-2835] [MVDataMap] Block MV datamap on st...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2627 retest sdv please ---
[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2627#discussion_r210241324 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala --- @@ -80,6 +81,16 @@ object MVHelper { dmProperties.foreach(t => tableProperties.put(t._1, t._2)) val selectTables = getTables(logicalPlan) +selectTables.map { selectTable => + val mainCarbonTable = CarbonEnv.getCarbonTableOption(selectTable.identifier.database, +selectTable.identifier.table)(sparkSession) + + if (!mainCarbonTable.isEmpty && mainCarbonTable.get.isStreamingSink ) { +throw new MalformedCarbonCommandException(s"Streaming table does not support creating " + +s"MV datamap") + } + selectTable --- End diff -- okï¼I remove it ---
[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2627#discussion_r210241115 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonCreateDataMapCommand.scala --- @@ -73,13 +73,8 @@ case class CarbonCreateDataMapCommand( } } -if (mainTable != null && -mainTable.isStreamingSink && - !(dmProviderName.equalsIgnoreCase(DataMapClassProvider.PREAGGREGATE.toString) - || dmProviderName.equalsIgnoreCase(DataMapClassProvider.TIMESERIES.toString))) { - throw new MalformedCarbonCommandException(s"Streaming table does not support creating " + -s"$dmProviderName datamap") -} +// delete this code because streaming table only does not support creating MV datamap, --- End diff -- Delete the comments here ---
[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2627#discussion_r210239499 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala --- @@ -80,6 +81,16 @@ object MVHelper { dmProperties.foreach(t => tableProperties.put(t._1, t._2)) val selectTables = getTables(logicalPlan) +selectTables.map { selectTable => --- End diff -- ok ---
[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2627#discussion_r210239381 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala --- @@ -80,6 +81,16 @@ object MVHelper { dmProperties.foreach(t => tableProperties.put(t._1, t._2)) val selectTables = getTables(logicalPlan) +selectTables.map { selectTable => + val mainCarbonTable = CarbonEnv.getCarbonTableOption(selectTable.identifier.database, +selectTable.identifier.table)(sparkSession) + + if (!mainCarbonTable.isEmpty && mainCarbonTable.get.isStreamingSink ) { +throw new MalformedCarbonCommandException(s"Streaming table does not support creating " + --- End diff -- okï¼i modify it ---
[GitHub] carbondata issue #2627: [CARBONDATA-2835] [MVDataMap] Block MV datamap on st...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2627 retest this please ---
[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...
GitHub user ndwangsen opened a pull request: https://github.com/apache/carbondata/pull/2627 [CARBONDATA-2835] [MVDataMap] Block MV datamap on streaming table This PR block creating MV datamap on streaming table and also block setting streaming property for table which has MV datamap. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? NA - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. Add test case and test pass - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ndwangsen/incubator-carbondata block_stream_mv Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2627.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2627 commit e7494f6390226e475a0ab9d6d894eafe2c45bed9 Author: ndwangsen Date: 2018-08-10T01:32:59Z [CARBONDATA-2835] Block MV datamap on streaming table ---
[GitHub] carbondata issue #2601: [CARBONDATA-2804][DataMap] fix the bug when bloom fi...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2601 retest this please ---
[GitHub] carbondata issue #2601: [CARBONDATA-2804][DataMap] fix the bug when bloom fi...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2601 retest this please ---
[GitHub] carbondata issue #2601: [CARBONDATA-2804][DataMap] fix the bug when bloom fi...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2601 retest this please ---
[GitHub] carbondata pull request #2601: [CARBONDATA-2804][DataMap] fix the bug when b...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2601#discussion_r207430989 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -3212,28 +3213,27 @@ public static ColumnarFormatVersion getFormatVersion(CarbonTable carbonTable) } storePath = carbonTable.getSegmentPath(validSegments.get(0).getSegmentNo()); } - -CarbonFile[] carbonFiles = FileFactory -.getCarbonFile(storePath) -.listFiles(new CarbonFileFilter() { - @Override - public boolean accept(CarbonFile file) { -if (file == null) { - return false; -} -return file.getName().endsWith("carbondata"); - } -}); -if (carbonFiles == null || carbonFiles.length < 1) { - return CarbonProperties.getInstance().getFormatVersion(); +// get the carbon index file header +FileFactory.FileType fileType = FileFactory.getFileType(storePath); +ColumnarFormatVersion version = null; +if (FileFactory.isFileExist(storePath, fileType)) { + SegmentIndexFileStore fileStore = new SegmentIndexFileStore(); + fileStore.readAllIIndexOfSegment(storePath); + Map carbonIndexMap = fileStore.getCarbonIndexMap(); + if (carbonIndexMap.size() == 0) { +version = CarbonProperties.getInstance().getFormatVersion(); + } + CarbonIndexFileReader indexReader = new CarbonIndexFileReader(); + for (byte[] fileData : carbonIndexMap.values()) { +indexReader.openThriftReader(fileData); +IndexHeader indexHeader = indexReader.readIndexHeader(); +version = ColumnarFormatVersion.valueOf((short)indexHeader.getVersion()); +break; --- End diff -- modified according to review comments ---
[GitHub] carbondata pull request #2601: [CARBONDATA-2804][DataMap] fix the bug when b...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2601#discussion_r207430964 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -3212,28 +3213,27 @@ public static ColumnarFormatVersion getFormatVersion(CarbonTable carbonTable) } storePath = carbonTable.getSegmentPath(validSegments.get(0).getSegmentNo()); } - -CarbonFile[] carbonFiles = FileFactory -.getCarbonFile(storePath) -.listFiles(new CarbonFileFilter() { - @Override - public boolean accept(CarbonFile file) { -if (file == null) { - return false; -} -return file.getName().endsWith("carbondata"); - } -}); -if (carbonFiles == null || carbonFiles.length < 1) { - return CarbonProperties.getInstance().getFormatVersion(); +// get the carbon index file header +FileFactory.FileType fileType = FileFactory.getFileType(storePath); +ColumnarFormatVersion version = null; +if (FileFactory.isFileExist(storePath, fileType)) { --- End diff -- modified according to review comments ---
[GitHub] carbondata pull request #2601: [CARBONDATA-2804][DataMap] fix the bug when b...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2601#discussion_r207430996 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -3212,28 +3213,27 @@ public static ColumnarFormatVersion getFormatVersion(CarbonTable carbonTable) } storePath = carbonTable.getSegmentPath(validSegments.get(0).getSegmentNo()); } - -CarbonFile[] carbonFiles = FileFactory -.getCarbonFile(storePath) -.listFiles(new CarbonFileFilter() { - @Override - public boolean accept(CarbonFile file) { -if (file == null) { - return false; -} -return file.getName().endsWith("carbondata"); - } -}); -if (carbonFiles == null || carbonFiles.length < 1) { - return CarbonProperties.getInstance().getFormatVersion(); +// get the carbon index file header +FileFactory.FileType fileType = FileFactory.getFileType(storePath); +ColumnarFormatVersion version = null; +if (FileFactory.isFileExist(storePath, fileType)) { + SegmentIndexFileStore fileStore = new SegmentIndexFileStore(); + fileStore.readAllIIndexOfSegment(storePath); + Map carbonIndexMap = fileStore.getCarbonIndexMap(); + if (carbonIndexMap.size() == 0) { +version = CarbonProperties.getInstance().getFormatVersion(); + } + CarbonIndexFileReader indexReader = new CarbonIndexFileReader(); + for (byte[] fileData : carbonIndexMap.values()) { +indexReader.openThriftReader(fileData); +IndexHeader indexHeader = indexReader.readIndexHeader(); +version = ColumnarFormatVersion.valueOf((short)indexHeader.getVersion()); +break; + } +} else { + version = CarbonProperties.getInstance().getFormatVersion(); --- End diff -- modified according to review comments ---
[GitHub] carbondata issue #2601: [CARBONDATA-2804][DataMap] fix the bug when bloom fi...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2601 retest this please ---
[GitHub] carbondata pull request #2601: [CARBONDATA-2804][DataMap] fix the bug when b...
GitHub user ndwangsen opened a pull request: https://github.com/apache/carbondata/pull/2601 [CARBONDATA-2804][DataMap] fix the bug when bloom filter or preaggregate datamap tried to be created on older V1-V2 version stores [CARBONDATA-2804] fix the bug when bloom filter or preaggregate datamap tried to be created on older V1-V2 version store fix the bug for read the carbondata format version form carbondata file header of the older V1-V2 version stores, the version filed is moved to FileHeader Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? NA - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. test pass in test environment - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ndwangsen/incubator-carbondata fix_block_dm_v1_v2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2601.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2601 commit 921c436b5d19421d68dc7085c9155608dfdb81e3 Author: ndwangsen Date: 2018-08-02T08:21:22Z [CARBONDATA-2804] fix the bug when bloom filter or preaggregate datamap tried to be created on older V1-V2 version stores ---
[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2520#discussion_r204973033 --- Diff: docs/data-management-on-carbondata.md --- @@ -122,6 +122,45 @@ This tutorial is going to introduce all commands and data operations on CarbonDa TBLPROPERTIES ('streaming'='true') ``` + - **Local Dictionary Configuration** + + Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + 1. Getting more compression on dimension columns with less cardinality. + 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. + 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. + + By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. --- End diff -- By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. - Is the data loading performance ok? ---
[GitHub] carbondata issue #2488: [CARBONDATA-2724][DataMap]Unsupported create datamap...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2488 retest this please ---
[GitHub] carbondata issue #2488: [CARBONDATA-2724][DataMap]Unsupported create datamap...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2488 retest this please ---
[GitHub] carbondata issue #2488: [CARBONDATA-2724][DataMap]Unsupported create datamap...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2488 retest this please ---
[GitHub] carbondata issue #2488: [CARBONDATA-2724][DataMap]Unsupported create datamap...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2488 retest sdv please ---
[GitHub] carbondata issue #2488: [CARBONDATA-2724][DataMap]Unsupported create datamap...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2488 retest this please ---
[GitHub] carbondata pull request #2488: [CARBONDATA-2724][DataMap]Unsupported create ...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2488#discussion_r202878487 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -3231,4 +3231,42 @@ private static int unsetLocalDictForComplexColumns(List allColumns } return columnLocalDictGenMap; } + + /** + * This method get the carbon file format version + * + * @param carbonTable + * carbon Table + */ + public static ColumnarFormatVersion getFormatVersion(CarbonTable carbonTable) throws IOException + { +String tablePath = carbonTable.getTablePath(); +CarbonFile[] carbonFiles = FileFactory +.getCarbonFile(tablePath) +.listFiles(new CarbonFileFilter() { + @Override + public boolean accept(CarbonFile file) { +if (file == null) { + return false; +} +return file.getName().endsWith("carbonindex"); --- End diff -- I have modified the comments to get version from the data file ---
[GitHub] carbondata issue #2452: [CARBONDATA-2693][BloomDataMap]Fix bug for alter ren...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2452 retest this please ---
[GitHub] carbondata issue #2488: [CARBONDATA-2724][DataMap]Unsupported create datamap...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2488 retest sdv please ---
[GitHub] carbondata issue #2488: [CARBONDATA-2724][DataMap]Unsupported create datamap...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2488 retest sdv please ---
[GitHub] carbondata issue #2483: [CARBONDATA-2719][DataMap]Table update/delete is nee...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2483 retest this please ---
[GitHub] carbondata issue #2483: [CARBONDATA-2719][DataMap]Table update/delete is nee...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2483 modified according to comments ---
[GitHub] carbondata issue #2488: [CARBONDATA-2724][DataMap]Unsupported create datamap...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2488 retest sdv please ---
[GitHub] carbondata issue #2488: [CARBONDATA-2724][DataMap]Unsupported create datamap...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2488 retest this please ---
[GitHub] carbondata issue #2488: [CARBONDATA-2724][DataMap]Unsupported create datamap...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2488 retest this please ---
[GitHub] carbondata issue #2452: [CARBONDATA-2693][BloomDataMap]Fix bug for alter ren...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2452 retest this please ---
[GitHub] carbondata pull request #2488: [CARBONDATA-2724][DataMap]Unsupported create ...
GitHub user ndwangsen opened a pull request: https://github.com/apache/carbondata/pull/2488 [CARBONDATA-2724][DataMap]Unsupported create datamap on table with V1 or V2 format data [CARBONDATA-2724]Unsupported create datamap on table with V1 or V2 format data Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? NA - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. test pass in environment - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ndwangsen/incubator-carbondata dm_block_v1_v2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2488.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2488 commit c16331b78837d9e7ddb15497b9cf8acad4517d91 Author: ndwangsen Date: 2018-07-11T09:41:25Z [CARBONDATA-2724]Unsupported create datamap on table with V1 or V2 format data ---
[GitHub] carbondata pull request #2483: [CARBONDATA-2719]Table update/delete is neede...
GitHub user ndwangsen opened a pull request: https://github.com/apache/carbondata/pull/2483 [CARBONDATA-2719]Table update/delete is needed block on table having datamaps [CARBONDATA-2719]Table update/delete is needed block on table having datamaps Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? NA - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. Add Test case, and test pass in environment - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ndwangsen/incubator-carbondata tb_dm_update_del Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2483.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2483 commit 96d3bab7d2a88ee59d5b3fd25bda058aa1e13751 Author: ndwangsen Date: 2018-07-11T03:52:09Z [CARBONDATA-2719]Table update/delete is needed block on table having datamaps ---
[GitHub] carbondata pull request #2452: [CARBONDATA-2693][BloomDataMap]Fix bug for al...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2452#discussion_r200910733 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java --- @@ -172,6 +174,48 @@ public void dropDataMapSchema(String dataMapName) throws IOException { provider.dropSchema(dataMapName); } + /** + * Update the datamap schema to storage by table rename --- End diff -- okï¼i am modifying it. ---
[GitHub] carbondata pull request #2452: [CARBONDATA-2693][BloomDataMap]Fix bug for al...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2452#discussion_r200910555 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -2814,6 +2817,22 @@ public static boolean hasAggregationDataMap(CarbonTable carbonTable) { return false; } + /** + * Utility function to check whether table has mv datamap or not + * @param carbonTable + * @return timeseries data map present --- End diff -- okï¼ i am modifying it ---
[GitHub] carbondata pull request #2452: [CARBONDATA-2693][BloomDataMap]Fix bug for al...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2452#discussion_r200910090 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -2814,6 +2817,22 @@ public static boolean hasAggregationDataMap(CarbonTable carbonTable) { return false; } + /** + * Utility function to check whether table has mv datamap or not + * @param carbonTable + * @return timeseries data map present --- End diff -- ok , i change it ---
[GitHub] carbondata issue #2452: [CARBONDATA-2693][BloomDataMap]Fix bug for alter ren...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2452 retest this please ---
[GitHub] carbondata issue #2452: [CARBONDATA-2693][BloomDataMap]Fix bug for alter ren...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2452 retest this please ---
[GitHub] carbondata pull request #2452: [CARBONDATA-2693][BloomDataMap]Fix bug for al...
GitHub user ndwangsen opened a pull request: https://github.com/apache/carbondata/pull/2452 [CARBONDATA-2693][BloomDataMap]Fix bug for alter rename is renameing the existing table on which bloomfilter datamp exists Fix bug for alter rename is renameing the existing table on which bloomfilter datamp exists Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? NA - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. add a test case ,test pass in environment - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ndwangsen/incubator-carbondata bugfix_rename_dm_table Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2452.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2452 commit d96c8ed235cab72c2ab65384473ccf2dce65444e Author: ndwangsen Date: 2018-07-05T08:51:39Z [CARBONDATA-2693]Fix bug for alter rename is renameing the existing table on which bloomfilter datamp exists ---
[GitHub] carbondata issue #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemor...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2397 @chenliang613 This parameter controls how large sort temp file merge in memory ---
[GitHub] carbondata issue #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemor...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2397 retest this case please ---
[GitHub] carbondata pull request #2414: [CARBONDATA-2658][DataLoad]No difference in m...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2414#discussion_r199112802 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/merger/UnsafeIntermediateMerger.java --- @@ -88,13 +88,25 @@ public UnsafeIntermediateMerger(SortParameters parameters) { CarbonLoadOptionConstants.CARBON_LOAD_SORT_MEMORY_SPILL_PERCENTAGE, CarbonLoadOptionConstants.CARBON_LOAD_SORT_MEMORY_SPILL_PERCENTAGE_DEFAULT); spillPercentage = Integer.valueOf(spillPercentageStr); + if (spillPercentage > 100 || spillPercentage < 0) { --- End diff -- yes this pr base on #2397 ---
[GitHub] carbondata pull request #2414: [CARBONDATA-2658][DataLoad]No difference in m...
GitHub user ndwangsen opened a pull request: https://github.com/apache/carbondata/pull/2414 [CARBONDATA-2658][DataLoad]No difference in memory spilled to disk for any value of carbon.load.sortMemory.spill.percentage the parameter carbon.load.sortMemory.spill.percentage configured the value range 0-100,according to configuration merge and spill in-memory pages to disk Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? NA - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. Test pass in environment - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ndwangsen/incubator-carbondata bugfix_09939 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2414.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2414 commit 736c571ce911374d8cde16f9d6f64b310984f5e9 Author: ndwangsen Date: 2018-06-22T04:02:36Z ADD carbon.load.sortMemory.spill.percentage parameter invalid value check commit 1671f3b114d229c81c4ea7b8023d80334e512df0 Author: ndwangsen Date: 2018-06-25T06:24:56Z Update CarbonLoadOptionConstants.java add a space commit a976783f6787ec1401389a3427820aa1b572a5dc Author: ndwangsen Date: 2018-06-26T12:19:22Z the parameter carbon.load.sortMemory.spill.percentage configured the value range 0-100,according to configuration merge and spill in-memory pages to disk ---
[GitHub] carbondata pull request #2407: [CARBONDATA-2646][DataLoad]change the log lev...
GitHub user ndwangsen opened a pull request: https://github.com/apache/carbondata/pull/2407 [CARBONDATA-2646][DataLoad]change the log level while loading data into a table with 'sort_column_bounds' property,'ERROR' flag change to 'WARN' flag for some expected tasks. change the log level while loading data into a table with 'sort_column_bounds' property,'ERROR' flag change to 'WARN' flag for some expected tasks. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NO - [ ] Document update required? NA - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. Test in environment and check the log displayed - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ndwangsen/incubator-carbondata bugfix_dts2018062011034 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2407.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2407 ---- commit eef16725b04c339dcf6ed948e6f08ba83ad5e025 Author: ndwangsen Date: 2018-06-25T08:50:18Z Change the log level while loading data into a table with 'sort_column_bounds' property,'ERROR' flag change to 'WARN' ---
[GitHub] carbondata pull request #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.so...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2397#discussion_r197688967 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithUnsafeMemory.scala --- @@ -64,6 +67,8 @@ class TestLoadDataWithUnsafeMemory extends QueryTest .addProperty(CarbonCommonConstants.UNSAFE_WORKING_MEMORY_IN_MB, "512") CarbonProperties.getInstance() .addProperty(CarbonCommonConstants.OFFHEAP_SORT_CHUNK_SIZE_IN_MB, "512") +CarbonProperties.getInstance() + .addProperty(CarbonLoadOptionConstants.CARBON_LOAD_SORT_MEMORY_SPILL_PERCENTAGE, "-1") --- End diff -- test the range of 0-100 ---
[GitHub] carbondata pull request #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.so...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2397#discussion_r197688936 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java --- @@ -144,8 +144,8 @@ * If the sort memory is insufficient, spill inmemory pages to disk. * The total amount of pages is at most the specified percentage of total sort memory. Default * value 0 means that no pages will be spilled and the newly incoming pages will be spilled, - * whereas value 1 means that all pages will be spilled and newly incoming pages will be loaded - * into sort memory. + * whereas value 100 means that all pages will be spilled and newly incoming pages will be loaded + * into sort memory,Other percentage values range 0-100. --- End diff -- ok ---
[GitHub] carbondata pull request #2397: [CARBONDATA-2644][Dataload]ADD carbon.load.so...
GitHub user ndwangsen reopened a pull request: https://github.com/apache/carbondata/pull/2397 [CARBONDATA-2644][Dataload]ADD carbon.load.sortMemory.spill.percentage parameter invalid value check Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? NA - [x] Any backward compatibility impacted? NO - [x] Document update required? NO - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? Testing done UT - How it is tested? Please attach test report. add example for it - Is it a performance related change? Please attach the performance test report. NA - Any additional information to help reviewers in testing this change. NA - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NO You can merge this pull request into a Git repository by running: $ git pull https://github.com/ndwangsen/incubator-carbondata bugfix_dts12160 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2397.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2397 commit 736c571ce911374d8cde16f9d6f64b310984f5e9 Author: ndwangsen Date: 2018-06-22T04:02:36Z ADD carbon.load.sortMemory.spill.percentage parameter invalid value check ---
[GitHub] carbondata pull request #2397: [HOTFIX]ADD carbon.load.sortMemory.spill.perc...
Github user ndwangsen closed the pull request at: https://github.com/apache/carbondata/pull/2397 ---
[GitHub] carbondata pull request #2397: [HOTFIX]ADD carbon.load.sortMemory.spill.perc...
GitHub user ndwangsen opened a pull request: https://github.com/apache/carbondata/pull/2397 [HOTFIX]ADD carbon.load.sortMemory.spill.percentage parameter invalid value check Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NO - [ ] Document update required? NO - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? Testing done UT - How it is tested? Please attach test report. add example for it - Is it a performance related change? Please attach the performance test report. NA - Any additional information to help reviewers in testing this change. NA - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NO You can merge this pull request into a Git repository by running: $ git pull https://github.com/ndwangsen/incubator-carbondata bugfix_dts12160 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2397.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2397 commit 736c571ce911374d8cde16f9d6f64b310984f5e9 Author: ndwangsen Date: 2018-06-22T04:02:36Z ADD carbon.load.sortMemory.spill.percentage parameter invalid value check ---
[GitHub] carbondata issue #2371: [HOTFIX] fix java style errors
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2371 I just prepared to modify and found that you have modified ^-^. ---
[GitHub] carbondata pull request #2314: [CARBONDATA-2309][DataLoad] Add strategy to g...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2314#discussion_r190776632 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java --- @@ -575,11 +577,12 @@ public static Dictionary getDictionary(AbsoluteTableIdentifier absoluteTableIden * @param noOfNodesInput -1 if number of nodes has to be decided * based on block location information * @param blockAssignmentStrategy strategy used to assign blocks + * @param loadMinSize the property load_min_size_inmb specified by the user * @return a map that maps node to blocks */ public static Map> nodeBlockMapping( List blockInfos, int noOfNodesInput, List activeNodes, - BlockAssignmentStrategy blockAssignmentStrategy) { + BlockAssignmentStrategy blockAssignmentStrategy, String loadMinSize ) { --- End diff -- okï¼I modify it according to your reviewed messageï¼thanks. ---
[GitHub] carbondata issue #2314: [CARBONDATA-2309][DataLoad] Add strategy to generate...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2314 @kumarvishal09 If the user specifies or default that the minimum data load of the node is less than the average data amount of each node, the existing strategy is used to handle ---
[GitHub] carbondata issue #2314: [CARBONDATA-2309][DataLoad] Add strategy to generate...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2314 @kumarvishal09 Yeah, I Has been modified in accordance with xuchuanyin's proposal, adding a strategy,this strategy targets to loading small amount of input data, Avoid generating a large number of small files. ---
[GitHub] carbondata pull request #2314: [CARBONDATA-2309][DataLoad] Add strategy to g...
Github user ndwangsen closed the pull request at: https://github.com/apache/carbondata/pull/2314 ---
[GitHub] carbondata pull request #2314: [CARBONDATA-2309][DataLoad] Add strategy to g...
GitHub user ndwangsen reopened a pull request: https://github.com/apache/carbondata/pull/2314 [CARBONDATA-2309][DataLoad] Add strategy to generate bigger carbondata files in case of small amo⦠In some scenario, the input amount of loading data is small, but carbondata still distribute them to each executors (nodes) to do local-sort, thus resulting to small carbondata files generated by each executor. In some extreme conditions, if the cluster is big enough or if the amount of data is small enough, the carbondata file contains only one blocklet or page. I think a new strategy should be introduced to solve the above problem. The new strategy should: be able to control the minimum amount of input data for each node ignore data locality otherwise it may always choose a small portion of particular nodes Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NO - [ ] Any backward compatibility impacted? NO - [ ] Document update required? YES - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? YES - How it is tested? Please attach test report. Tested in local - Is it a performance related change? Please attach the performance test report.mac After this PR, performance is as we expected - Any additional information to help reviewers in testing this change. NO - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NO You can merge this pull request into a Git repository by running: $ git pull https://github.com/ndwangsen/incubator-carbondata load_min Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2314.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2314 commit 987921ef4d1c16e01b5c46384b8b1c356e3abe8a Author: ndwangsen Date: 2018-05-17T09:26:00Z Add strategy to generate bigger carbondata files in case of small amount of input data ---
[GitHub] carbondata pull request #2314: [CarbonData-2309][DataLoad] Add strategy to g...
GitHub user ndwangsen opened a pull request: https://github.com/apache/carbondata/pull/2314 [CarbonData-2309][DataLoad] Add strategy to generate bigger carbondata files in case of small amo⦠In some scenario, the input amount of loading data is small, but carbondata still distribute them to each executors (nodes) to do local-sort, thus resulting to small carbondata files generated by each executor. In some extreme conditions, if the cluster is big enough or if the amount of data is small enough, the carbondata file contains only one blocklet or page. I think a new strategy should be introduced to solve the above problem. The new strategy should: be able to control the minimum amount of input data for each node ignore data locality otherwise it may always choose a small portion of particular nodes Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NO - [ ] Any backward compatibility impacted? NO - [ ] Document update required? YES - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? YES - How it is tested? Please attach test report. Tested in local - Is it a performance related change? Please attach the performance test report.mac After this PR, performance is as we expected - Any additional information to help reviewers in testing this change. NO - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NO You can merge this pull request into a Git repository by running: $ git pull https://github.com/ndwangsen/incubator-carbondata load_min Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2314.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2314 commit 987921ef4d1c16e01b5c46384b8b1c356e3abe8a Author: ndwangsen Date: 2018-05-17T09:26:00Z Add strategy to generate bigger carbondata files in case of small amount of input data ---
[GitHub] carbondata issue #1559: [CARBONDATA-1805][Dictionary] Optimize pruning for d...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/1559 nice jobï¼loading performance is improved obviouslyã ---