[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-09 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3059#discussion_r246317595
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -575,19 +575,23 @@ public static Dictionary 
getDictionary(AbsoluteTableIdentifier absoluteTableIden
 }
 
 // calculate the average expected size for each node
-long sizePerNode = 0;
+long numberOfBlocksPerNode = 0;
+if (blockInfos.size() > 0) {
+  numberOfBlocksPerNode = blockInfos.size() / numOfNodes;
+}
+numberOfBlocksPerNode = numberOfBlocksPerNode <= 0 ? 1 : 
numberOfBlocksPerNode;
+long dataSizePerNode = 0;
 long totalFileSize = 0;
+for (Distributable blockInfo : uniqueBlocks) {
+  totalFileSize += ((TableBlockInfo) blockInfo).getBlockLength();
+}
+dataSizePerNode = totalFileSize / numOfNodes;
+long sizePerNode = 0;
 if (BlockAssignmentStrategy.BLOCK_NUM_FIRST == 
blockAssignmentStrategy) {
-  if (blockInfos.size() > 0) {
-sizePerNode = blockInfos.size() / numOfNodes;
-  }
-  sizePerNode = sizePerNode <= 0 ? 1 : sizePerNode;
+  sizePerNode = numberOfBlocksPerNode;
--- End diff --

this modify i think is ok , if using BLOCK_NUM_FIRST block assignment 
strategy


---


[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-09 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3059#discussion_r246299802
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -1164,4 +1156,35 @@ private static void deleteFiles(List 
filesToBeDeleted) throws IOExceptio
   FileFactory.deleteFile(filePath, FileFactory.getFileType(filePath));
 }
   }
+
+  /**
+   * This method will calculate the average expected size for each node
+   *
+   * @param blockInfos blocks
+   * @param uniqueBlocks unique blocks
+   * @param numOfNodes if number of nodes has to be decided
+   *   based on block location information
+   * @param blockAssignmentStrategy strategy used to assign blocks
+   * @return the average expected size for each node
+   */
+  private static long calcAvgLoadSizePerNode(List 
blockInfos,
--- End diff --

ok, i modify it


---


[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-09 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3059#discussion_r246299700
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -609,6 +597,10 @@ public static Dictionary 
getDictionary(AbsoluteTableIdentifier absoluteTableIden
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_SIZE_FIRST;
 } else {
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_NUM_FIRST;
+  // fall back to BLOCK_NUM_FIRST strategy need to recalculate
+  // the average expected size for each node
+  sizePerNode = calcAvgLoadSizePerNode(blockInfos,uniqueBlocks,
--- End diff --

ok, i modify it.


---


[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-08 Thread ndwangsen
GitHub user ndwangsen opened a pull request:

https://github.com/apache/carbondata/pull/3059

[HOTFIX][DataLoad]fix task assignment issue using NODE_MIN_SIZE_FIRST block 
assignment strategy


fix task assignment issue using NODE_MIN_SIZE_FIRST block assignment 
strategy

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 NA
 - [ ] Any backward compatibility impacted?
 NA
 - [ ] Document update required?
NA
 - [ ] Testing done
 Test OK in local env
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ndwangsen/incubator-carbondata 
fix_load_min_size_bug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/3059.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3059


commit 04d6bff55a5c9120ae8d5c4899a82bc63f1e2e37
Author: ndwangsen 
Date:   2019-01-09T07:10:21Z

[HOTFIX][DataLoad]fix task assignment issue using NODE_MIN_SIZE_FIRST

block assignment strategy.




---


[GitHub] carbondata issue #2864: [CARBONDATA-3041] Optimize load minimum size strateg...

2018-10-28 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2864
  
retest this please



---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-28 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228792020
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -1171,21 +1171,25 @@ object CarbonDataRDDFactory {
   .ensureExecutorsAndGetNodeList(blockList, sqlContext.sparkContext)
 val skewedDataOptimization = CarbonProperties.getInstance()
   .isLoadSkewedDataOptimizationEnabled()
-val loadMinSizeOptimization = CarbonProperties.getInstance()
-  .isLoadMinSizeOptimizationEnabled()
 // get user ddl input the node loads the smallest amount of data
-val expectedMinSizePerNode = carbonLoadModel.getLoadMinSize()
-val blockAssignStrategy = if (skewedDataOptimization) {
-  CarbonLoaderUtil.BlockAssignmentStrategy.BLOCK_SIZE_FIRST
-} else if (loadMinSizeOptimization) {
+val carbonTable = 
carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
+var loadMinSize = carbonLoadModel.getLoadMinSize()
+if (loadMinSize == "0" ) {
--- End diff --

ok, i modify it


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-28 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228791974
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
 ---
@@ -833,4 +833,32 @@ object CommonUtil {
   })
 }
   }
+
+  /**
+   * This method will validate single node minimum load data volume of 
table specified by the user
+   *
+   * @param tableProperties table property specified by user
+   * @param propertyName property name
+   */
+  def validateLoadMinSize(tableProperties: Map[String, String], 
propertyName: String): Unit = {
+var size: Integer = 0
+if (tableProperties.get(propertyName).isDefined) {
+  val loadSizeStr: String =
+parsePropertyValueStringInMB(tableProperties(propertyName))
+  try {
+size = Integer.parseInt(loadSizeStr)
+  } catch {
+case e: NumberFormatException =>
+  throw new MalformedCarbonCommandException(s"Invalid 
$propertyName value found: " +
+s"$loadSizeStr, only 
int value greater " +
+s"than 0 is 
supported.")
+  }
+  // if the value is negative, set the value is 0
+  if(size > 0) {
+tableProperties.put(propertyName, loadSizeStr)
+  } else {
+tableProperties.put(propertyName, "0")
--- End diff --

ok, i modify it


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-28 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228791841
  
--- Diff: docs/ddl-of-carbondata.md ---
@@ -474,7 +475,19 @@ CarbonData DDL statements are documented here,which 
includes:
  be later viewed in table description for reference.
 
  ```
-   TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords'')
+   TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords')
+ ```
+ 
+   - # Load minimum data size
--- End diff --

ok, i modify it


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228708281
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -1171,12 +1171,27 @@ object CarbonDataRDDFactory {
   .ensureExecutorsAndGetNodeList(blockList, sqlContext.sparkContext)
 val skewedDataOptimization = CarbonProperties.getInstance()
   .isLoadSkewedDataOptimizationEnabled()
-val loadMinSizeOptimization = CarbonProperties.getInstance()
-  .isLoadMinSizeOptimizationEnabled()
 // get user ddl input the node loads the smallest amount of data
-val expectedMinSizePerNode = carbonLoadModel.getLoadMinSize()
+val carbonTable = 
carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
+val loadMinSize = 
carbonTable.getTableInfo.getFactTable.getTableProperties.asScala
--- End diff --

Has been modified based on the review


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228708277
  
--- Diff: docs/ddl-of-carbondata.md ---
@@ -474,7 +475,22 @@ CarbonData DDL statements are documented here,which 
includes:
  be later viewed in table description for reference.
 
  ```
-   TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords'')
+   TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords')
+ ```
+ 
+   - # Load minimum data size
+ This property determines whether to enable node minumun input data 
size allocation strategy 
+ for data loading.It will make sure that the node load the minimum 
amount of data there by 
+ reducing number of carbondata files. This property is useful if the 
size of the input data 
+ files are very small, like 1MB to 256MB. And This property can also 
be specified 
+ in the load option, the property value only int value is supported.
+
+ ```
+   TBLPROPERTIES('LOAD_MIN_SIZE_INMB'='256 MB')
--- End diff --

Has been modified based on the review


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228708250
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala
 ---
@@ -123,6 +123,12 @@ private[sql] case class CarbonDescribeFormattedCommand(
 tblProps.get(CarbonCommonConstants.LONG_STRING_COLUMNS), ""))
 }
 
+// load min size info
+if 
(tblProps.containsKey(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB)) {
+  results ++= Seq(("Single node load min data size",
--- End diff --

Has been modified based on the review


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228708257
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
 ---
@@ -833,4 +833,26 @@ object CommonUtil {
   })
 }
   }
+
+  /**
+   * This method will validate single node minimum load data volume of 
table specified by the user
+   *
+   * @param tableProperties table property specified by user
+   * @param propertyName property name
+   */
+  def validateLoadMinSize(tableProperties: Map[String, String], 
propertyName: String): Unit = {
+var size: Integer = 0
+if (tableProperties.get(propertyName).isDefined) {
+  val loadSizeStr: String =
+parsePropertyValueStringInMB(tableProperties(propertyName))
+  try {
+size = Integer.parseInt(loadSizeStr)
+  } catch {
+case e: NumberFormatException =>
--- End diff --

Has been modified based on the review


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228708260
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/util/AlterTableUtil.scala ---
@@ -748,4 +752,18 @@ object AlterTableUtil {
   false
 }
   }
+
+  private def validateLoadMinSizeProperties(carbonTable: CarbonTable,
+  propertiesMap: mutable.Map[String, String]): Unit = {
+// validate load min size property
+if 
(propertiesMap.get(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB).isDefined) {
+  // Cache level is not allowed for child tables and dataMaps
--- End diff --

Has been modified based on the review


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228708269
  
--- Diff: docs/ddl-of-carbondata.md ---
@@ -474,7 +475,22 @@ CarbonData DDL statements are documented here,which 
includes:
  be later viewed in table description for reference.
 
  ```
-   TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords'')
+   TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords')
+ ```
+ 
+   - # Load minimum data size
+ This property determines whether to enable node minumun input data 
size allocation strategy 
--- End diff --

Has been modified based on the review


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228708254
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
 ---
@@ -833,4 +833,26 @@ object CommonUtil {
   })
 }
   }
+
+  /**
+   * This method will validate single node minimum load data volume of 
table specified by the user
+   *
+   * @param tableProperties table property specified by user
+   * @param propertyName property name
+   */
+  def validateLoadMinSize(tableProperties: Map[String, String], 
propertyName: String): Unit = {
+var size: Integer = 0
+if (tableProperties.get(propertyName).isDefined) {
+  val loadSizeStr: String =
+parsePropertyValueStringInMB(tableProperties(propertyName))
+  try {
+size = Integer.parseInt(loadSizeStr)
--- End diff --

Has been modified based on the review


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228708258
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -1171,12 +1171,27 @@ object CarbonDataRDDFactory {
   .ensureExecutorsAndGetNodeList(blockList, sqlContext.sparkContext)
 val skewedDataOptimization = CarbonProperties.getInstance()
   .isLoadSkewedDataOptimizationEnabled()
-val loadMinSizeOptimization = CarbonProperties.getInstance()
-  .isLoadMinSizeOptimizationEnabled()
 // get user ddl input the node loads the smallest amount of data
-val expectedMinSizePerNode = carbonLoadModel.getLoadMinSize()
+val carbonTable = 
carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
+val loadMinSize = 
carbonTable.getTableInfo.getFactTable.getTableProperties.asScala
+  .getOrElse(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB, "")
+var expectedMinSizePerNode = carbonLoadModel.getLoadMinSize()
--- End diff --

Has been modified based on the review


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228708265
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/model/LoadOption.java
 ---
@@ -186,8 +186,7 @@
 optionsFinal.put("sort_scope", "local_sort");
 optionsFinal.put("sort_column_bounds", Maps.getOrDefault(options, 
"sort_column_bounds", ""));
 optionsFinal.put(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB,
-
Maps.getOrDefault(options,CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB,
-CarbonCommonConstants.CARBON_LOAD_MIN_NODE_SIZE_INMB_DEFAULT));
+
Maps.getOrDefault(options,CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB, ""));
--- End diff --

ok


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-26 Thread ndwangsen
GitHub user ndwangsen opened a pull request:

https://github.com/apache/carbondata/pull/2864

[CARBONDATA-3041] Optimize load minimum size strategy for data loading

this PR modifies the following points:

1. Delete system property carbon.load.min.size.enabled,modified this 
property load_min_size_inmb to table property,and This property can also be 
specified in the load option.

2. Support to alter table xxx set TBLPROPERTIES('load_min_size_inmb 
'='256') 

3. If creating a table has this property  load_min_size_inmb,Display this 
property via the desc formatted command.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 NA
 - [ ] Any backward compatibility impacted?
 NA
 - [ ] Document update required?
 YES
 - [ ] Testing done
Test ok in our test env
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ndwangsen/incubator-carbondata fix_load_min

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2864.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2864


commit bbbe70d04cef85b2c7ab50d3f697e0d1e35efc95
Author: ndwangsen 
Date:   2018-10-27T02:38:48Z

[CARBONDATA-3041]Optimize load minimum size strategy for data loading




---


[GitHub] carbondata pull request #2843: [CARBONDATA-3034] Carding parameters,Organize...

2018-10-23 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2843#discussion_r227626868
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -23,86 +23,26 @@
 import org.apache.carbondata.core.util.CarbonProperty;
 
 public final class CarbonCommonConstants {
-  /**
-   * surrogate value of null
-   */
-  public static final int DICT_VALUE_NULL = 1;
-  /**
-   * surrogate value of null for direct dictionary
-   */
-  public static final int DIRECT_DICT_VALUE_NULL = 1;
-  /**
-   * integer size in bytes
-   */
-  public static final int INT_SIZE_IN_BYTE = 4;
-  /**
-   * short size in bytes
-   */
-  public static final int SHORT_SIZE_IN_BYTE = 2;
-  /**
-   * DOUBLE size in bytes
-   */
-  public static final int DOUBLE_SIZE_IN_BYTE = 8;
-  /**
-   * LONG size in bytes
-   */
-  public static final int LONG_SIZE_IN_BYTE = 8;
-  /**
-   * byte to KB conversion factor
-   */
-  public static final int BYTE_TO_KB_CONVERSION_FACTOR = 1024;
-  /**
-   * BYTE_ENCODING
-   */
-  public static final String BYTE_ENCODING = "ISO-8859-1";
-  /**
-   * measure meta data file name
-   */
-  public static final String MEASURE_METADATA_FILE_NAME = "/msrMetaData_";
-
-  /**
-   * set the segment ids to query from the table
-   */
-  public static final String CARBON_INPUT_SEGMENTS = 
"carbon.input.segments.";
-
-  /**
-   * key prefix for set command. 
'carbon.datamap.visible.dbName.tableName.dmName = false' means
-   * that the query on 'dbName.table' will not use the datamap 'dmName'
-   */
-  @InterfaceStability.Unstable
-  public static final String CARBON_DATAMAP_VISIBLE = 
"carbon.datamap.visible.";
-
-  /**
-   * Fetch and validate the segments.
-   * Used for aggregate table load as segment validation is not required.
-   */
-  public static final String VALIDATE_CARBON_INPUT_SEGMENTS = 
"validate.carbon.input.segments.";
 
+  private CarbonCommonConstants() {
+  }
   /**
--- End diff --

ok,i modify it


---


[GitHub] carbondata pull request #2843: [CARBONDATA-3034] Carding parameters,Organize...

2018-10-22 Thread ndwangsen
GitHub user ndwangsen opened a pull request:

https://github.com/apache/carbondata/pull/2843

[CARBONDATA-3034] Carding parameters,Organized by parameter category.

This PR is mainly combing parameters, organized by parameter category. 

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 NA
 - [ ] Any backward compatibility impacted?
 NA
 - [ ] Document update required?
NA
 - [ ] Testing done
 Test in local env.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ndwangsen/incubator-carbondata parameter_comb

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2843.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2843


commit 21dc71ba986ab1c2cbdd2cfaa5418a2d629bc34a
Author: ndwangsen 
Date:   2018-10-23T03:35:17Z

[CARBONDATA-3034] Carding parameters,Organized by parameter category.




---


[GitHub] carbondata issue #2627: [CARBONDATA-2835] [MVDataMap] Block MV datamap on st...

2018-08-29 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2627
  
retest sdv please


---


[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...

2018-08-16 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2627#discussion_r210787370
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala ---
@@ -237,6 +237,21 @@ object CarbonEnv {
 getCarbonTable(tableIdentifier.database, 
tableIdentifier.table)(sparkSession)
   }
 
+  /**
+   * This method returns corresponding CarbonTable, it will return None if 
it's not a CarbonTable
+   */
+  def getCarbonTableOption(
--- End diff --

the getCarbonTable will throw a exception when get a non-carbon table


---


[GitHub] carbondata issue #2627: [CARBONDATA-2835] [MVDataMap] Block MV datamap on st...

2018-08-16 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2627
  
retest sdv please


---


[GitHub] carbondata issue #2627: [CARBONDATA-2835] [MVDataMap] Block MV datamap on st...

2018-08-15 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2627
  
retest sdv please


---


[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...

2018-08-15 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2627#discussion_r210241324
  
--- Diff: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala 
---
@@ -80,6 +81,16 @@ object MVHelper {
 dmProperties.foreach(t => tableProperties.put(t._1, t._2))
 
 val selectTables = getTables(logicalPlan)
+selectTables.map { selectTable =>
+  val mainCarbonTable = 
CarbonEnv.getCarbonTableOption(selectTable.identifier.database,
+selectTable.identifier.table)(sparkSession)
+
+  if (!mainCarbonTable.isEmpty && mainCarbonTable.get.isStreamingSink 
) {
+throw new MalformedCarbonCommandException(s"Streaming table does 
not support creating " +
+s"MV datamap")
+  }
+  selectTable
--- End diff --

ok,I remove it


---


[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...

2018-08-15 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2627#discussion_r210241115
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonCreateDataMapCommand.scala
 ---
@@ -73,13 +73,8 @@ case class CarbonCreateDataMapCommand(
   }
 }
 
-if (mainTable != null &&
-mainTable.isStreamingSink &&
-
!(dmProviderName.equalsIgnoreCase(DataMapClassProvider.PREAGGREGATE.toString)
-  || 
dmProviderName.equalsIgnoreCase(DataMapClassProvider.TIMESERIES.toString))) {
-  throw new MalformedCarbonCommandException(s"Streaming table does not 
support creating " +
-s"$dmProviderName datamap")
-}
+// delete this code because streaming table only does not support 
creating MV datamap,
--- End diff --

Delete the comments here


---


[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...

2018-08-15 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2627#discussion_r210239499
  
--- Diff: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala 
---
@@ -80,6 +81,16 @@ object MVHelper {
 dmProperties.foreach(t => tableProperties.put(t._1, t._2))
 
 val selectTables = getTables(logicalPlan)
+selectTables.map { selectTable =>
--- End diff --

ok


---


[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...

2018-08-15 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2627#discussion_r210239381
  
--- Diff: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala 
---
@@ -80,6 +81,16 @@ object MVHelper {
 dmProperties.foreach(t => tableProperties.put(t._1, t._2))
 
 val selectTables = getTables(logicalPlan)
+selectTables.map { selectTable =>
+  val mainCarbonTable = 
CarbonEnv.getCarbonTableOption(selectTable.identifier.database,
+selectTable.identifier.table)(sparkSession)
+
+  if (!mainCarbonTable.isEmpty && mainCarbonTable.get.isStreamingSink 
) {
+throw new MalformedCarbonCommandException(s"Streaming table does 
not support creating " +
--- End diff --

ok,i modify it


---


[GitHub] carbondata issue #2627: [CARBONDATA-2835] [MVDataMap] Block MV datamap on st...

2018-08-09 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2627
  
retest this please



---


[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...

2018-08-09 Thread ndwangsen
GitHub user ndwangsen opened a pull request:

https://github.com/apache/carbondata/pull/2627

[CARBONDATA-2835] [MVDataMap] Block MV datamap on streaming table

This PR  block creating MV datamap on streaming table and also block 
setting streaming property for table which has MV datamap.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 NA
 - [ ] Any backward compatibility impacted?
 NA
 - [ ] Document update required?
NA
 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
  Add test case and test pass  
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ndwangsen/incubator-carbondata block_stream_mv

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2627.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2627


commit e7494f6390226e475a0ab9d6d894eafe2c45bed9
Author: ndwangsen 
Date:   2018-08-10T01:32:59Z

[CARBONDATA-2835] Block MV datamap on streaming table




---


[GitHub] carbondata issue #2601: [CARBONDATA-2804][DataMap] fix the bug when bloom fi...

2018-08-02 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2601
  
retest this please


---


[GitHub] carbondata issue #2601: [CARBONDATA-2804][DataMap] fix the bug when bloom fi...

2018-08-02 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2601
  
retest this please


---


[GitHub] carbondata issue #2601: [CARBONDATA-2804][DataMap] fix the bug when bloom fi...

2018-08-02 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2601
  
retest this please


---


[GitHub] carbondata pull request #2601: [CARBONDATA-2804][DataMap] fix the bug when b...

2018-08-02 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2601#discussion_r207430989
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -3212,28 +3213,27 @@ public static ColumnarFormatVersion 
getFormatVersion(CarbonTable carbonTable)
   }
   storePath = 
carbonTable.getSegmentPath(validSegments.get(0).getSegmentNo());
 }
-
-CarbonFile[] carbonFiles = FileFactory
-.getCarbonFile(storePath)
-.listFiles(new CarbonFileFilter() {
-  @Override
-  public boolean accept(CarbonFile file) {
-if (file == null) {
-  return false;
-}
-return file.getName().endsWith("carbondata");
-  }
-});
-if (carbonFiles == null || carbonFiles.length < 1) {
-  return CarbonProperties.getInstance().getFormatVersion();
+// get the carbon index file header
+FileFactory.FileType fileType = FileFactory.getFileType(storePath);
+ColumnarFormatVersion version = null;
+if (FileFactory.isFileExist(storePath, fileType)) {
+  SegmentIndexFileStore fileStore = new SegmentIndexFileStore();
+  fileStore.readAllIIndexOfSegment(storePath);
+  Map carbonIndexMap = fileStore.getCarbonIndexMap();
+  if (carbonIndexMap.size() == 0) {
+version = CarbonProperties.getInstance().getFormatVersion();
+  }
+  CarbonIndexFileReader indexReader = new CarbonIndexFileReader();
+  for (byte[] fileData : carbonIndexMap.values()) {
+indexReader.openThriftReader(fileData);
+IndexHeader indexHeader = indexReader.readIndexHeader();
+version = 
ColumnarFormatVersion.valueOf((short)indexHeader.getVersion());
+break;
--- End diff --

 modified according to review comments


---


[GitHub] carbondata pull request #2601: [CARBONDATA-2804][DataMap] fix the bug when b...

2018-08-02 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2601#discussion_r207430964
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -3212,28 +3213,27 @@ public static ColumnarFormatVersion 
getFormatVersion(CarbonTable carbonTable)
   }
   storePath = 
carbonTable.getSegmentPath(validSegments.get(0).getSegmentNo());
 }
-
-CarbonFile[] carbonFiles = FileFactory
-.getCarbonFile(storePath)
-.listFiles(new CarbonFileFilter() {
-  @Override
-  public boolean accept(CarbonFile file) {
-if (file == null) {
-  return false;
-}
-return file.getName().endsWith("carbondata");
-  }
-});
-if (carbonFiles == null || carbonFiles.length < 1) {
-  return CarbonProperties.getInstance().getFormatVersion();
+// get the carbon index file header
+FileFactory.FileType fileType = FileFactory.getFileType(storePath);
+ColumnarFormatVersion version = null;
+if (FileFactory.isFileExist(storePath, fileType)) {
--- End diff --

 modified according to review comments


---


[GitHub] carbondata pull request #2601: [CARBONDATA-2804][DataMap] fix the bug when b...

2018-08-02 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2601#discussion_r207430996
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -3212,28 +3213,27 @@ public static ColumnarFormatVersion 
getFormatVersion(CarbonTable carbonTable)
   }
   storePath = 
carbonTable.getSegmentPath(validSegments.get(0).getSegmentNo());
 }
-
-CarbonFile[] carbonFiles = FileFactory
-.getCarbonFile(storePath)
-.listFiles(new CarbonFileFilter() {
-  @Override
-  public boolean accept(CarbonFile file) {
-if (file == null) {
-  return false;
-}
-return file.getName().endsWith("carbondata");
-  }
-});
-if (carbonFiles == null || carbonFiles.length < 1) {
-  return CarbonProperties.getInstance().getFormatVersion();
+// get the carbon index file header
+FileFactory.FileType fileType = FileFactory.getFileType(storePath);
+ColumnarFormatVersion version = null;
+if (FileFactory.isFileExist(storePath, fileType)) {
+  SegmentIndexFileStore fileStore = new SegmentIndexFileStore();
+  fileStore.readAllIIndexOfSegment(storePath);
+  Map carbonIndexMap = fileStore.getCarbonIndexMap();
+  if (carbonIndexMap.size() == 0) {
+version = CarbonProperties.getInstance().getFormatVersion();
+  }
+  CarbonIndexFileReader indexReader = new CarbonIndexFileReader();
+  for (byte[] fileData : carbonIndexMap.values()) {
+indexReader.openThriftReader(fileData);
+IndexHeader indexHeader = indexReader.readIndexHeader();
+version = 
ColumnarFormatVersion.valueOf((short)indexHeader.getVersion());
+break;
+  }
+} else {
+  version = CarbonProperties.getInstance().getFormatVersion();
--- End diff --

 modified according to review comments


---


[GitHub] carbondata issue #2601: [CARBONDATA-2804][DataMap] fix the bug when bloom fi...

2018-08-02 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2601
  
retest this please


---


[GitHub] carbondata pull request #2601: [CARBONDATA-2804][DataMap] fix the bug when b...

2018-08-02 Thread ndwangsen
GitHub user ndwangsen opened a pull request:

https://github.com/apache/carbondata/pull/2601

[CARBONDATA-2804][DataMap] fix the bug when bloom filter or preaggregate 
datamap tried to be created on older V1-V2 version stores

[CARBONDATA-2804] fix the bug when bloom filter or preaggregate datamap 
tried to be created on older V1-V2 version store

fix the bug for read the carbondata format version form carbondata file 
header of the older V1-V2 version stores, the version filed is moved to 
FileHeader

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 NA
 - [ ] Any backward compatibility impacted?
 NA
 - [ ] Document update required?
NA
 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   test pass in test environment
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ndwangsen/incubator-carbondata 
fix_block_dm_v1_v2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2601.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2601


commit 921c436b5d19421d68dc7085c9155608dfdb81e3
Author: ndwangsen 
Date:   2018-08-02T08:21:22Z

[CARBONDATA-2804] fix the bug when bloom filter or preaggregate datamap

tried to be created on older V1-V2 version stores




---


[GitHub] carbondata pull request #2520: [CARBONDATA-2750] Added Documentation for Loc...

2018-07-24 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2520#discussion_r204973033
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -122,6 +122,45 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
  TBLPROPERTIES ('streaming'='true')
  ```
 
+  - **Local Dictionary Configuration**
+  
+  Local Dictionary is generated only for no-dictionary string/varchar 
datatype columns. It helps in:
+  1. Getting more compression on dimension columns with less cardinality.
+  2. Filter queries and full scan queries on No-dictionary columns with 
local dictionary will be faster as filter will be done on encoded data.
+  3. Reducing the store size and memory footprint as only unique values 
will be stored as part of local dictionary and corresponding data will be 
stored as encoded data.
+
+   By default, Local Dictionary will be enabled and generated for all 
no-dictionary string/varchar datatype columns.
--- End diff --

By default, Local Dictionary will be enabled and generated for all 
no-dictionary string/varchar datatype columns. - Is the data loading 
performance ok?


---


[GitHub] carbondata issue #2488: [CARBONDATA-2724][DataMap]Unsupported create datamap...

2018-07-17 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2488
  
retest this please


---


[GitHub] carbondata issue #2488: [CARBONDATA-2724][DataMap]Unsupported create datamap...

2018-07-16 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2488
  
retest this please


---


[GitHub] carbondata issue #2488: [CARBONDATA-2724][DataMap]Unsupported create datamap...

2018-07-16 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2488
  
retest this please


---


[GitHub] carbondata issue #2488: [CARBONDATA-2724][DataMap]Unsupported create datamap...

2018-07-16 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2488
  
retest sdv please


---


[GitHub] carbondata issue #2488: [CARBONDATA-2724][DataMap]Unsupported create datamap...

2018-07-16 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2488
  
retest this please



---


[GitHub] carbondata pull request #2488: [CARBONDATA-2724][DataMap]Unsupported create ...

2018-07-16 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2488#discussion_r202878487
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -3231,4 +3231,42 @@ private static int 
unsetLocalDictForComplexColumns(List allColumns
 }
 return columnLocalDictGenMap;
   }
+
+  /**
+   * This method get the carbon file format version
+   *
+   * @param carbonTable
+   * carbon Table
+   */
+  public static ColumnarFormatVersion getFormatVersion(CarbonTable 
carbonTable) throws IOException
+  {
+String tablePath = carbonTable.getTablePath();
+CarbonFile[] carbonFiles = FileFactory
+.getCarbonFile(tablePath)
+.listFiles(new CarbonFileFilter() {
+  @Override
+  public boolean accept(CarbonFile file) {
+if (file == null) {
+  return false;
+}
+return file.getName().endsWith("carbonindex");
--- End diff --

I have modified the comments to get version from the data file 


---


[GitHub] carbondata issue #2452: [CARBONDATA-2693][BloomDataMap]Fix bug for alter ren...

2018-07-13 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2452
  
retest this please


---


[GitHub] carbondata issue #2488: [CARBONDATA-2724][DataMap]Unsupported create datamap...

2018-07-13 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2488
  
retest sdv please



---


[GitHub] carbondata issue #2488: [CARBONDATA-2724][DataMap]Unsupported create datamap...

2018-07-12 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2488
  
retest sdv please


---


[GitHub] carbondata issue #2483: [CARBONDATA-2719][DataMap]Table update/delete is nee...

2018-07-12 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2483
  
retest this please


---


[GitHub] carbondata issue #2483: [CARBONDATA-2719][DataMap]Table update/delete is nee...

2018-07-12 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2483
  
modified according to comments


---


[GitHub] carbondata issue #2488: [CARBONDATA-2724][DataMap]Unsupported create datamap...

2018-07-12 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2488
  
retest sdv please


---


[GitHub] carbondata issue #2488: [CARBONDATA-2724][DataMap]Unsupported create datamap...

2018-07-12 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2488
  
retest this please


---


[GitHub] carbondata issue #2488: [CARBONDATA-2724][DataMap]Unsupported create datamap...

2018-07-12 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2488
  
retest this please


---


[GitHub] carbondata issue #2452: [CARBONDATA-2693][BloomDataMap]Fix bug for alter ren...

2018-07-12 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2452
  
retest this please


---


[GitHub] carbondata pull request #2488: [CARBONDATA-2724][DataMap]Unsupported create ...

2018-07-11 Thread ndwangsen
GitHub user ndwangsen opened a pull request:

https://github.com/apache/carbondata/pull/2488

[CARBONDATA-2724][DataMap]Unsupported create datamap on table with V1 or V2 
format data

[CARBONDATA-2724]Unsupported create datamap on table with V1 or V2 format 
data

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 NA
 - [ ] Any backward compatibility impacted?
 NA
 - [ ] Document update required?
NA
 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   test pass in environment
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ndwangsen/incubator-carbondata dm_block_v1_v2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2488.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2488


commit c16331b78837d9e7ddb15497b9cf8acad4517d91
Author: ndwangsen 
Date:   2018-07-11T09:41:25Z

[CARBONDATA-2724]Unsupported create datamap on table with V1 or V2

format data




---


[GitHub] carbondata pull request #2483: [CARBONDATA-2719]Table update/delete is neede...

2018-07-10 Thread ndwangsen
GitHub user ndwangsen opened a pull request:

https://github.com/apache/carbondata/pull/2483

[CARBONDATA-2719]Table update/delete is needed block on table having 
datamaps

[CARBONDATA-2719]Table update/delete is needed block on table having 
datamaps


Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 NA
 - [ ] Any backward compatibility impacted?
 NA
 - [ ] Document update required?
NA
 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
 Add Test case, and test pass in environment 
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ndwangsen/incubator-carbondata 
tb_dm_update_del

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2483


commit 96d3bab7d2a88ee59d5b3fd25bda058aa1e13751
Author: ndwangsen 
Date:   2018-07-11T03:52:09Z

[CARBONDATA-2719]Table update/delete is needed block on table having

datamaps




---


[GitHub] carbondata pull request #2452: [CARBONDATA-2693][BloomDataMap]Fix bug for al...

2018-07-09 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2452#discussion_r200910733
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java 
---
@@ -172,6 +174,48 @@ public void dropDataMapSchema(String dataMapName) 
throws IOException {
 provider.dropSchema(dataMapName);
   }
 
+  /**
+   * Update the datamap schema to storage by table rename
--- End diff --

ok,i am modifying it.


---


[GitHub] carbondata pull request #2452: [CARBONDATA-2693][BloomDataMap]Fix bug for al...

2018-07-09 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2452#discussion_r200910555
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -2814,6 +2817,22 @@ public static boolean 
hasAggregationDataMap(CarbonTable carbonTable) {
 return false;
   }
 
+  /**
+   * Utility function to check whether table has mv datamap or not
+   * @param carbonTable
+   * @return timeseries data map present
--- End diff --

ok, i am modifying it


---


[GitHub] carbondata pull request #2452: [CARBONDATA-2693][BloomDataMap]Fix bug for al...

2018-07-09 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2452#discussion_r200910090
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -2814,6 +2817,22 @@ public static boolean 
hasAggregationDataMap(CarbonTable carbonTable) {
 return false;
   }
 
+  /**
+   * Utility function to check whether table has mv datamap or not
+   * @param carbonTable
+   * @return timeseries data map present
--- End diff --

ok , i change it


---


[GitHub] carbondata issue #2452: [CARBONDATA-2693][BloomDataMap]Fix bug for alter ren...

2018-07-08 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2452
  
retest this please



---


[GitHub] carbondata issue #2452: [CARBONDATA-2693][BloomDataMap]Fix bug for alter ren...

2018-07-06 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2452
  
retest this please



---


[GitHub] carbondata pull request #2452: [CARBONDATA-2693][BloomDataMap]Fix bug for al...

2018-07-05 Thread ndwangsen
GitHub user ndwangsen opened a pull request:

https://github.com/apache/carbondata/pull/2452

[CARBONDATA-2693][BloomDataMap]Fix bug for alter rename is renameing the 
existing table on which bloomfilter datamp exists

Fix bug for alter rename is renameing the existing table on which 
bloomfilter datamp exists

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 NA
 - [ ] Any backward compatibility impacted?
 NA
 - [ ] Document update required?
NA
 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
  add a test case ,test pass in environment
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ndwangsen/incubator-carbondata 
bugfix_rename_dm_table

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2452.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2452


commit d96c8ed235cab72c2ab65384473ccf2dce65444e
Author: ndwangsen 
Date:   2018-07-05T08:51:39Z

[CARBONDATA-2693]Fix bug for alter rename is renameing the existing

table on which bloomfilter datamp exists




---


[GitHub] carbondata issue #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemor...

2018-06-30 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2397
  
@chenliang613 This parameter controls how large sort temp file merge in 
memory


---


[GitHub] carbondata issue #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemor...

2018-06-29 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2397
  
retest this case please


---


[GitHub] carbondata pull request #2414: [CARBONDATA-2658][DataLoad]No difference in m...

2018-06-29 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2414#discussion_r199112802
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/merger/UnsafeIntermediateMerger.java
 ---
@@ -88,13 +88,25 @@ public UnsafeIntermediateMerger(SortParameters 
parameters) {
   
CarbonLoadOptionConstants.CARBON_LOAD_SORT_MEMORY_SPILL_PERCENTAGE,
   
CarbonLoadOptionConstants.CARBON_LOAD_SORT_MEMORY_SPILL_PERCENTAGE_DEFAULT);
   spillPercentage = Integer.valueOf(spillPercentageStr);
+  if (spillPercentage > 100 || spillPercentage < 0) {
--- End diff --

yes this pr base on #2397 


---


[GitHub] carbondata pull request #2414: [CARBONDATA-2658][DataLoad]No difference in m...

2018-06-26 Thread ndwangsen
GitHub user ndwangsen opened a pull request:

https://github.com/apache/carbondata/pull/2414

[CARBONDATA-2658][DataLoad]No difference in memory spilled to disk for any 
value of carbon.load.sortMemory.spill.percentage

the parameter carbon.load.sortMemory.spill.percentage configured the value 
range 0-100,according to configuration merge and spill in-memory pages to disk

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 NA
 - [ ] Any backward compatibility impacted?
 NA
 - [ ] Document update required?
NA
 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   Test pass in environment
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ndwangsen/incubator-carbondata bugfix_09939

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2414.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2414


commit 736c571ce911374d8cde16f9d6f64b310984f5e9
Author: ndwangsen 
Date:   2018-06-22T04:02:36Z

ADD carbon.load.sortMemory.spill.percentage parameter  invalid value
check

commit 1671f3b114d229c81c4ea7b8023d80334e512df0
Author: ndwangsen 
Date:   2018-06-25T06:24:56Z

Update CarbonLoadOptionConstants.java

add a space

commit a976783f6787ec1401389a3427820aa1b572a5dc
Author: ndwangsen 
Date:   2018-06-26T12:19:22Z

the parameter carbon.load.sortMemory.spill.percentage configured the
value range 0-100,according to configuration merge and spill in-memory
pages to disk




---


[GitHub] carbondata pull request #2407: [CARBONDATA-2646][DataLoad]change the log lev...

2018-06-25 Thread ndwangsen
GitHub user ndwangsen opened a pull request:

https://github.com/apache/carbondata/pull/2407

[CARBONDATA-2646][DataLoad]change the log level while loading data into a 
table with 'sort_column_bounds' property,'ERROR' flag change to 'WARN' flag for 
some expected tasks.

change the log level while loading data into a table with 
'sort_column_bounds' property,'ERROR' flag change to 'WARN' flag for some 
expected tasks.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 NA
 - [ ] Any backward compatibility impacted?
 NO
 - [ ] Document update required?
NA
 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
  Test in environment and check the log displayed
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ndwangsen/incubator-carbondata 
bugfix_dts2018062011034

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2407.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2407

----
commit eef16725b04c339dcf6ed948e6f08ba83ad5e025
Author: ndwangsen 
Date:   2018-06-25T08:50:18Z

Change the log level while loading data into a table with
'sort_column_bounds' property,'ERROR' flag change to 'WARN'




---


[GitHub] carbondata pull request #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.so...

2018-06-24 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2397#discussion_r197688967
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithUnsafeMemory.scala
 ---
@@ -64,6 +67,8 @@ class TestLoadDataWithUnsafeMemory extends QueryTest
   .addProperty(CarbonCommonConstants.UNSAFE_WORKING_MEMORY_IN_MB, 
"512")
 CarbonProperties.getInstance()
   .addProperty(CarbonCommonConstants.OFFHEAP_SORT_CHUNK_SIZE_IN_MB, 
"512")
+CarbonProperties.getInstance()
+  
.addProperty(CarbonLoadOptionConstants.CARBON_LOAD_SORT_MEMORY_SPILL_PERCENTAGE,
 "-1")
--- End diff --

test the range of 0-100


---


[GitHub] carbondata pull request #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.so...

2018-06-24 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2397#discussion_r197688936
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java
 ---
@@ -144,8 +144,8 @@
* If the sort memory is insufficient, spill inmemory pages to disk.
* The total amount of pages is at most the specified percentage of 
total sort memory. Default
* value 0 means that no pages will be spilled and the newly incoming 
pages will be spilled,
-   * whereas value 1 means that all pages will be spilled and newly 
incoming pages will be loaded
-   * into sort memory.
+   * whereas value 100 means that all pages will be spilled and newly 
incoming pages will be loaded
+   * into sort memory,Other percentage values range 0-100.
--- End diff --

ok


---


[GitHub] carbondata pull request #2397: [CARBONDATA-2644][Dataload]ADD carbon.load.so...

2018-06-24 Thread ndwangsen
GitHub user ndwangsen reopened a pull request:

https://github.com/apache/carbondata/pull/2397

[CARBONDATA-2644][Dataload]ADD carbon.load.sortMemory.spill.percentage 
parameter  invalid value check

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [x] Any interfaces changed?
 NA
 - [x] Any backward compatibility impacted?
 NO
 - [x] Document update required?
NO
 - [x] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
Testing done UT 
- How it is tested? Please attach test report.
add example for it
- Is it a performance related change? Please attach the performance 
test report.
  NA
- Any additional information to help reviewers in testing this 
change.
NA
 - [x] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
   NO


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ndwangsen/incubator-carbondata bugfix_dts12160

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2397.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2397


commit 736c571ce911374d8cde16f9d6f64b310984f5e9
Author: ndwangsen 
Date:   2018-06-22T04:02:36Z

ADD carbon.load.sortMemory.spill.percentage parameter  invalid value
check




---


[GitHub] carbondata pull request #2397: [HOTFIX]ADD carbon.load.sortMemory.spill.perc...

2018-06-24 Thread ndwangsen
Github user ndwangsen closed the pull request at:

https://github.com/apache/carbondata/pull/2397


---


[GitHub] carbondata pull request #2397: [HOTFIX]ADD carbon.load.sortMemory.spill.perc...

2018-06-21 Thread ndwangsen
GitHub user ndwangsen opened a pull request:

https://github.com/apache/carbondata/pull/2397

[HOTFIX]ADD carbon.load.sortMemory.spill.percentage parameter  invalid 
value check

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 NA
 - [ ] Any backward compatibility impacted?
 NO
 - [ ] Document update required?
NO
 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
Testing done UT 
- How it is tested? Please attach test report.
add example for it
- Is it a performance related change? Please attach the performance 
test report.
  NA
- Any additional information to help reviewers in testing this 
change.
NA
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
   NO


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ndwangsen/incubator-carbondata bugfix_dts12160

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2397.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2397


commit 736c571ce911374d8cde16f9d6f64b310984f5e9
Author: ndwangsen 
Date:   2018-06-22T04:02:36Z

ADD carbon.load.sortMemory.spill.percentage parameter  invalid value
check




---


[GitHub] carbondata issue #2371: [HOTFIX] fix java style errors

2018-06-13 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2371
  
I just prepared to modify and found that you have modified ^-^.


---


[GitHub] carbondata pull request #2314: [CARBONDATA-2309][DataLoad] Add strategy to g...

2018-05-24 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2314#discussion_r190776632
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -575,11 +577,12 @@ public static Dictionary 
getDictionary(AbsoluteTableIdentifier absoluteTableIden
* @param noOfNodesInput -1 if number of nodes has to be decided
*   based on block location information
* @param blockAssignmentStrategy strategy used to assign blocks
+   * @param loadMinSize the property load_min_size_inmb specified by the 
user
* @return a map that maps node to blocks
*/
   public static Map> nodeBlockMapping(
   List blockInfos, int noOfNodesInput, List 
activeNodes,
-  BlockAssignmentStrategy blockAssignmentStrategy) {
+  BlockAssignmentStrategy blockAssignmentStrategy, String loadMinSize 
) {
--- End diff --

ok,I modify it according to your reviewed message,thanks.


---


[GitHub] carbondata issue #2314: [CARBONDATA-2309][DataLoad] Add strategy to generate...

2018-05-23 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2314
  
@kumarvishal09  If the user specifies or default that the minimum data load 
of the node is less than the average data amount of each node, the existing 
strategy is used to handle


---


[GitHub] carbondata issue #2314: [CARBONDATA-2309][DataLoad] Add strategy to generate...

2018-05-22 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/2314
  
@kumarvishal09 Yeah, I Has been modified in accordance with xuchuanyin's 
proposal, adding a strategy,this strategy targets to loading small amount of 
input data, Avoid generating a large number of small files. 


---


[GitHub] carbondata pull request #2314: [CARBONDATA-2309][DataLoad] Add strategy to g...

2018-05-17 Thread ndwangsen
Github user ndwangsen closed the pull request at:

https://github.com/apache/carbondata/pull/2314


---


[GitHub] carbondata pull request #2314: [CARBONDATA-2309][DataLoad] Add strategy to g...

2018-05-17 Thread ndwangsen
GitHub user ndwangsen reopened a pull request:

https://github.com/apache/carbondata/pull/2314

[CARBONDATA-2309][DataLoad] Add strategy to generate bigger carbondata 
files in case of small amo…

In some scenario, the input amount of loading data is small, but carbondata 
still distribute them to each executors (nodes) to do local-sort, thus 
resulting to small carbondata files generated by each executor. 

In  some extreme conditions, if the cluster is big enough or if the amount 
of data is small enough, the carbondata file contains only one blocklet or page.

I  think a new strategy should be introduced to solve the above problem.

The new strategy should:

be able to control the minimum amount of input data for each node
ignore data locality otherwise it may always choose a small portion of 
particular nodes

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
   NO
 - [ ] Any backward compatibility impacted?
   NO
 - [ ] Document update required?
   YES
 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
   YES
- How it is tested? Please attach test report.
  Tested in local
- Is it a performance related change? Please attach the performance 
test report.mac
  After this PR, performance is as we expected
- Any additional information to help reviewers in testing this 
change.
  NO
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
  NO


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ndwangsen/incubator-carbondata load_min

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2314.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2314


commit 987921ef4d1c16e01b5c46384b8b1c356e3abe8a
Author: ndwangsen 
Date:   2018-05-17T09:26:00Z

Add strategy to generate bigger carbondata files in case of small amount of 
input data




---


[GitHub] carbondata pull request #2314: [CarbonData-2309][DataLoad] Add strategy to g...

2018-05-17 Thread ndwangsen
GitHub user ndwangsen opened a pull request:

https://github.com/apache/carbondata/pull/2314

[CarbonData-2309][DataLoad] Add strategy to generate bigger carbondata 
files in case of small amo…

In some scenario, the input amount of loading data is small, but carbondata 
still distribute them to each executors (nodes) to do local-sort, thus 
resulting to small carbondata files generated by each executor. 

In  some extreme conditions, if the cluster is big enough or if the amount 
of data is small enough, the carbondata file contains only one blocklet or page.

I  think a new strategy should be introduced to solve the above problem.

The new strategy should:

be able to control the minimum amount of input data for each node
ignore data locality otherwise it may always choose a small portion of 
particular nodes

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
   NO
 - [ ] Any backward compatibility impacted?
   NO
 - [ ] Document update required?
   YES
 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
   YES
- How it is tested? Please attach test report.
  Tested in local
- Is it a performance related change? Please attach the performance 
test report.mac
  After this PR, performance is as we expected
- Any additional information to help reviewers in testing this 
change.
  NO
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
  NO


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ndwangsen/incubator-carbondata load_min

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2314.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2314


commit 987921ef4d1c16e01b5c46384b8b1c356e3abe8a
Author: ndwangsen 
Date:   2018-05-17T09:26:00Z

Add strategy to generate bigger carbondata files in case of small amount of 
input data




---


[GitHub] carbondata issue #1559: [CARBONDATA-1805][Dictionary] Optimize pruning for d...

2017-11-23 Thread ndwangsen
Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/1559
  
nice job,loading performance is improved obviously。


---