[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1321 ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r139301875 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -433,10 +442,23 @@ case class LoadTable( val dateFormat = options.getOrElse("dateformat", null) ValidateUtil.validateDateFormat(dateFormat, table, tableName) val maxColumns = options.getOrElse("maxcolumns", null) - val sortScope = options.getOrElse("sort_scope", null) + + val tableProperties = table.getTableInfo.getFactTable.getTableProperties + val sortScope = if (null == tableProperties) { +CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT + } else { +tableProperties.getOrDefault("sort_scope", + CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT) + } + --- End diff -- ok, already update. ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r139300959 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -433,10 +442,23 @@ case class LoadTable( val dateFormat = options.getOrElse("dateformat", null) ValidateUtil.validateDateFormat(dateFormat, table, tableName) val maxColumns = options.getOrElse("maxcolumns", null) - val sortScope = options.getOrElse("sort_scope", null) + + val tableProperties = table.getTableInfo.getFactTable.getTableProperties + val sortScope = if (null == tableProperties) { +CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT + } else { +tableProperties.getOrDefault("sort_scope", + CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT) + } + --- End diff -- Don't hardcode the sort scope here, you should get from carbon properties. Please use the code like below. ``` val sortScopeDefault = CarbonProperties.getInstance(). getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE, CarbonProperties.getInstance().getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT)) val sortScope = if (null == tableProperties) { sortScopeDefault } else { tableProperties.getOrDefault("sort_scope", sortScopeDefault) } ``` ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r138853538 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -432,7 +440,9 @@ case class LoadTable( val dateFormat = options.getOrElse("dateformat", null) ValidateUtil.validateDateFormat(dateFormat, table, tableName) val maxColumns = options.getOrElse("maxcolumns", null) - val sortScope = options.getOrElse("sort_scope", null) + + val tableProperties = table.getTableInfo.getFactTable.getTableProperties + val sortScope = if (null == tableProperties) null else tableProperties.get("sort_scope") --- End diff -- I use CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT as its default value. ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r138852185 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -344,6 +345,13 @@ case class CreateTable(cm: TableModel, createDSTable: Boolean = true) extends Ru val tableInfo: TableInfo = TableNewProcessor(cm) +// Add validation for sort scope when create table +val sortScope = tableInfo.getFactTable.getTableProperties.get("sort_scope") +if (null != sortScope && !CarbonUtil.isValidSortOption(sortScope)) { --- End diff -- ok, I will provide default value instead of null, then I will remove this ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r138852303 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -172,6 +173,13 @@ case class CreateTable(cm: TableModel) extends RunnableCommand { val tableInfo: TableInfo = TableNewProcessor(cm) +// Add validation for sort scope when create table +val sortScope = tableInfo.getFactTable.getTableProperties.get("sort_scope") +if (null != sortScope && !CarbonUtil.isValidSortOption(sortScope)) { + throw new InvalidConfigurationException("The sort scope " + sortScope --- End diff -- For this, I just keep same with error message which is already exists. ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r138832604 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -874,6 +884,8 @@ private[sql] case class DescribeCommandFormatted( results ++= Seq(("CARBON Store Path: ", relation.tableMeta.storePath, "")) val carbonTable = relation.tableMeta.carbonTable results ++= Seq(("Table Block Size : ", carbonTable.getBlockSizeInMB + " MB", "")) +results ++= Seq(("SORT_SCOPE", carbonTable.getTableInfo.getFactTable + .getTableProperties.getOrDefault("sort_scope", ""), "")) --- End diff -- Should give a default sort_scope instead of "" ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r138832339 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -432,7 +440,9 @@ case class LoadTable( val dateFormat = options.getOrElse("dateformat", null) ValidateUtil.validateDateFormat(dateFormat, table, tableName) val maxColumns = options.getOrElse("maxcolumns", null) - val sortScope = options.getOrElse("sort_scope", null) + + val tableProperties = table.getTableInfo.getFactTable.getTableProperties + val sortScope = if (null == tableProperties) null else tableProperties.get("sort_scope") --- End diff -- If tableProperties is null, it need to use default sort scope, right? ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r138831288 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -172,6 +173,13 @@ case class CreateTable(cm: TableModel) extends RunnableCommand { val tableInfo: TableInfo = TableNewProcessor(cm) +// Add validation for sort scope when create table +val sortScope = tableInfo.getFactTable.getTableProperties.get("sort_scope") +if (null != sortScope && !CarbonUtil.isValidSortOption(sortScope)) { + throw new InvalidConfigurationException("The sort scope " + sortScope --- End diff -- suggest to change to ``` s"Passing invalid sort scope '$sortScope', valid sort scopes are 'NO_SORT', 'BATCH_SORT', 'LOCAL_SORT' or 'GLOBAL_SORT' " ``` ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r138830621 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -172,6 +173,13 @@ case class CreateTable(cm: TableModel) extends RunnableCommand { val tableInfo: TableInfo = TableNewProcessor(cm) +// Add validation for sort scope when create table +val sortScope = tableInfo.getFactTable.getTableProperties.get("sort_scope") +if (null != sortScope && !CarbonUtil.isValidSortOption(sortScope)) { --- End diff -- `null != sortScope` can be removed, it is checked inside `CarbonUtil.isValidSortOption` ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r138830655 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -344,6 +345,13 @@ case class CreateTable(cm: TableModel, createDSTable: Boolean = true) extends Ru val tableInfo: TableInfo = TableNewProcessor(cm) +// Add validation for sort scope when create table +val sortScope = tableInfo.getFactTable.getTableProperties.get("sort_scope") +if (null != sortScope && !CarbonUtil.isValidSortOption(sortScope)) { --- End diff -- null != sortScope can be removed, it is checked inside CarbonUtil.isValidSortOption ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r137442307 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestGlobalSortDataLoad.scala --- @@ -318,12 +329,12 @@ class TestGlobalSortDataLoad extends QueryTest with BeforeAndAfterEach with Befo | charField CHAR(5), | floatField FLOAT | ) - | STORED BY 'org.apache.carbondata.format' + | STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('SORT_SCOPE'='GLOBAL_SORT') """.stripMargin) sql( s""" | LOAD DATA LOCAL INPATH '$path' INTO TABLE carbon_globalsort_difftypes - | OPTIONS('SORT_SCOPE'='GLOBAL_SORT', + | OPTIONS( | 'FILEHEADER'='shortField,intField,bigintField,doubleField,stringField,timestampField,decimalField,dateField,charField,floatField') """.stripMargin) --- End diff -- ok ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r137441759 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -639,6 +639,23 @@ case class LoadTable( val carbonProperty: CarbonProperties = CarbonProperties.getInstance() carbonProperty.addProperty("zookeeper.enable.lock", "false") val optionsFinal = getFinalOptions(carbonProperty) +val tableProperties = relation.tableMeta.carbonTable.getTableInfo + .getFactTable.getTableProperties + +optionsFinal.put("sort_scope", tableProperties.getOrDefault("sort_scope", + carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE, + carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, +CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT + +optionsFinal.put("batch_sort_size_inmb", tableProperties.getOrDefault("batch_sort_size_inmb", --- End diff -- Yes, this is only needed for batch sort, but I think if users specify this parameter in global sort, it is better to ignore it. ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r137441815 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -639,6 +639,23 @@ case class LoadTable( val carbonProperty: CarbonProperties = CarbonProperties.getInstance() carbonProperty.addProperty("zookeeper.enable.lock", "false") val optionsFinal = getFinalOptions(carbonProperty) +val tableProperties = relation.tableMeta.carbonTable.getTableInfo + .getFactTable.getTableProperties + +optionsFinal.put("sort_scope", tableProperties.getOrDefault("sort_scope", + carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE, + carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, +CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT + +optionsFinal.put("batch_sort_size_inmb", tableProperties.getOrDefault("batch_sort_size_inmb", + carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_BATCH_SORT_SIZE_INMB, + carbonProperty.getProperty(CarbonCommonConstants.LOAD_BATCH_SORT_SIZE_INMB, + CarbonCommonConstants.LOAD_BATCH_SORT_SIZE_INMB_DEFAULT + +optionsFinal.put("global_sort_partitions", tableProperties.getOrDefault("global_sort_partitions", --- End diff -- Same as batch sort size I think. ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r136830112 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestGlobalSortDataLoad.scala --- @@ -318,12 +329,12 @@ class TestGlobalSortDataLoad extends QueryTest with BeforeAndAfterEach with Befo | charField CHAR(5), | floatField FLOAT | ) - | STORED BY 'org.apache.carbondata.format' + | STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('SORT_SCOPE'='GLOBAL_SORT') """.stripMargin) sql( s""" | LOAD DATA LOCAL INPATH '$path' INTO TABLE carbon_globalsort_difftypes - | OPTIONS('SORT_SCOPE'='GLOBAL_SORT', + | OPTIONS( | 'FILEHEADER'='shortField,intField,bigintField,doubleField,stringField,timestampField,decimalField,dateField,charField,floatField') """.stripMargin) --- End diff -- Add a new testcase to test using SORT_SCOPE in LOAD DATA statement, it should fail ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r136829772 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -639,6 +639,23 @@ case class LoadTable( val carbonProperty: CarbonProperties = CarbonProperties.getInstance() carbonProperty.addProperty("zookeeper.enable.lock", "false") val optionsFinal = getFinalOptions(carbonProperty) +val tableProperties = relation.tableMeta.carbonTable.getTableInfo + .getFactTable.getTableProperties + +optionsFinal.put("sort_scope", tableProperties.getOrDefault("sort_scope", + carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE, + carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, +CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT + +optionsFinal.put("batch_sort_size_inmb", tableProperties.getOrDefault("batch_sort_size_inmb", + carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_BATCH_SORT_SIZE_INMB, + carbonProperty.getProperty(CarbonCommonConstants.LOAD_BATCH_SORT_SIZE_INMB, + CarbonCommonConstants.LOAD_BATCH_SORT_SIZE_INMB_DEFAULT + +optionsFinal.put("global_sort_partitions", tableProperties.getOrDefault("global_sort_partitions", --- End diff -- This is needed only if using global sort ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r136829860 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -556,21 +556,21 @@ case class LoadTable( carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_DATEFORMAT, CarbonLoadOptionConstants.CARBON_OPTIONS_DATEFORMAT_DEFAULT))) -optionsFinal.put("global_sort_partitions", options.getOrElse("global_sort_partitions", - carbonProperty - .getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_GLOBAL_SORT_PARTITIONS, null))) +//optionsFinal.put("global_sort_partitions", options.getOrElse("global_sort_partitions", --- End diff -- delete unused code ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r136829721 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -639,6 +639,23 @@ case class LoadTable( val carbonProperty: CarbonProperties = CarbonProperties.getInstance() carbonProperty.addProperty("zookeeper.enable.lock", "false") val optionsFinal = getFinalOptions(carbonProperty) +val tableProperties = relation.tableMeta.carbonTable.getTableInfo + .getFactTable.getTableProperties + +optionsFinal.put("sort_scope", tableProperties.getOrDefault("sort_scope", + carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE, + carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, +CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT + +optionsFinal.put("batch_sort_size_inmb", tableProperties.getOrDefault("batch_sort_size_inmb", --- End diff -- This is needed only if using batch sort ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r136828909 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestGlobalSortDataLoad.scala --- @@ -109,23 +118,22 @@ class TestGlobalSortDataLoad extends QueryTest with BeforeAndAfterEach with Befo } // --- Configuration Validity --- - test("Don't support GLOBAL_SORT on partitioned table") { + ignore("Don't support GLOBAL_SORT on partitioned table") { --- End diff -- why ignore this testcase? ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
GitHub user chenerlu opened a pull request: https://github.com/apache/carbondata/pull/1321 [CARBONDATA-1438] Unify the sort column and sort scope in create table command Background: In order to improve the ease of usage for users, unify the sort column and sort scope in create table command. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenerlu/incubator-carbondata pr-1438 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1321.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1321 commit c20caf973a70e6c6dc94d3cb26bf22404646e8eb Author: chenerlu Date: 2017-09-04T12:54:55Z Unify the sort column and sort scope in create table command ---