[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/3053 > Does this PR fix two problems? > If it is yes, better to separate it into two. > the one line change of rowId to rowId + 1 is coupled with this, when i removed the compress method in unSafeFixLengthColumnPage, i got this issue and fixed in this, so this is required in this PR only ---
[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/3053 @kumarvishal09 i have tested the fallback scenario by changing code, it is even failing with that also and i have raised discussion in snappy community also [https://groups.google.com/forum/#!topic/snappy-compression/4noNVKCMBqM](url) ---
[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/3053 > i think the performance of rawCompress is better than compressLong,compressInt, can we find the root cause of JVM crashï¼ i dont think there is much difference we get with timing, but problem is JVM crash happens randomly, since we get maxsizefor compression from snapy itself, we even allocated that memory sucessfully and passed the address to snappy, after that JVM crashed. This is very random. So better to remove that. ---
[GitHub] carbondata pull request #3053: [CARBONDATA-3233]Fix JVM crash issue in snapp...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3053#discussion_r245890379 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java --- @@ -369,7 +367,7 @@ public BigDecimal getDecimal(int rowId) { @Override public double[] getDoublePage() { -double[] data = new double[getPageSize()]; +double[] data = new double[getEndLoop()]; --- End diff -- during complex type enhancement, to convert value and to get the value, pageSize was changed and new method called getEndLoop added, this was missed for double, so when double datatype is there in complex type data miss match happens, this is handled here. you can refer #2417 ---
[GitHub] carbondata pull request #3053: [WIP]JVM crash issue in snappy compressor
GitHub user akashrn5 opened a pull request: https://github.com/apache/carbondata/pull/3053 [WIP]JVM crash issue in snappy compressor Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata jvmcrash Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3053.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3053 commit b50d1c8b9c69565231cabf1d5dd507a006312a19 Author: akashrn5 Date: 2019-01-07T11:04:48Z JVM crash issue in snappy compressor ---
[GitHub] carbondata issue #3044: [CARBONDATA-3202]Documentation for alter table colum...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/3044 @SteNicholas i did not get your point correctly, basicaly you are telling that, 1. column coment after column rename ? 2. whether spark 2.1 supports column rename? am i right? if not, please can you explain it corectly, so that i can clear your doubt ---
[GitHub] carbondata pull request #3044: [CARBONDATA-3202]Documentation for alter tabl...
GitHub user akashrn5 opened a pull request: https://github.com/apache/carbondata/pull/3044 [CARBONDATA-3202]Documentation for alter table column rename Add coumentation for alter table column rename Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? NA - [x] Any backward compatibility impacted? NA - [x] Document update required? YES - [x] Testing done NA Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata alter_doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3044.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3044 commit 0c5df2ef11d3e7376e7a71ef79af56d6311b822f Author: akashrn5 Date: 2019-01-02T06:39:05Z Documentation for alter table column rename ---
[GitHub] carbondata issue #3027: [CARBONDATA-3202]update the schema to session catalo...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/3027 retest this please ---
[GitHub] carbondata issue #3027: [CARBONDATA-3202]update the schema to session catalo...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/3027 retest this please ---
[GitHub] carbondata issue #3027: [CARBONDATA-3202]update the schema to session catalo...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/3027 @manishgupta88 handled review coments, please review ---
[GitHub] carbondata pull request #3014: [CARBONDATA-3201] Added load level SORT_SCOPE
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3014#discussion_r244092819 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableWithSortScope.scala --- @@ -32,25 +32,6 @@ class TestCreateTableWithSortScope extends QueryTest with BeforeAndAfterAll { sql("DROP TABLE IF EXISTS tableWithBatchSort") sql("DROP TABLE IF EXISTS tableWithNoSort") sql("DROP TABLE IF EXISTS tableWithUnsupportSortScope") -sql("DROP TABLE IF EXISTS tableLoadWithSortScope") - } - - test("Do not support load data with specify sort scope") { -sql( -s""" - | CREATE TABLE tableLoadWithSortScope( - | intField INT, - | stringField STRING - | ) - | STORED BY 'carbondata' - | TBLPROPERTIES('SORT_COLUMNS'='stringField') - """.stripMargin) - -val exception_loaddata_sortscope: Exception = intercept[Exception] { - sql("LOAD DATA LOCAL INPATH '/path/to/data' INTO TABLE tableLoadWithSortScope " + - "OPTIONS('SORT_SCOPE'='GLOBAL_SORT')") -} -assert(exception_loaddata_sortscope.getMessage.contains("Error: Invalid option(s): sort_scope")) --- End diff -- please add a test case and you can give different sort scope in create tavble, load and property and check the sort scope in describe formatted. ---
[GitHub] carbondata pull request #3014: [CARBONDATA-3201] Added load level SORT_SCOPE
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3014#discussion_r244092650 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala --- @@ -191,10 +191,17 @@ case class CarbonLoadDataCommand( optionsFinal .put("complex_delimiter_level_4", ComplexDelimitersEnum.COMPLEX_DELIMITERS_LEVEL_4.value()) -optionsFinal.put("sort_scope", tableProperties.asScala.getOrElse("sort_scope", - carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE, -carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, - CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT +optionsFinal.put( + "sort_scope", + options.getOrElse( +"sort_scope", +tableProperties.asScala.getOrElse( + "sort_scope", + carbonProperty.getProperty( +CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE, +carbonProperty.getProperty( + CarbonCommonConstants.LOAD_SORT_SCOPE, + CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT) --- End diff -- please handle for SDK and file format also ---
[GitHub] carbondata pull request #3014: [CARBONDATA-3201] Added load level SORT_SCOPE
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3014#discussion_r244092283 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala --- @@ -191,10 +191,17 @@ case class CarbonLoadDataCommand( optionsFinal .put("complex_delimiter_level_4", ComplexDelimitersEnum.COMPLEX_DELIMITERS_LEVEL_4.value()) -optionsFinal.put("sort_scope", tableProperties.asScala.getOrElse("sort_scope", - carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE, -carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, - CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT +optionsFinal.put( --- End diff -- please format the code ---
[GitHub] carbondata pull request #3027: [CARBONDATA-3202]update the schema to session...
GitHub user akashrn5 opened a pull request: https://github.com/apache/carbondata/pull/3027 [CARBONDATA-3202]update the schema to session catalog after add column, drop column and column rename ### Why this PR? **Problem:**For alter table rename, once we change the table name in carbon, we fire alter table rename DDL using hive client. But for add, drop and column rename Spark does not support there features, but hive supports. so after rename, or add or drop column, the new updated schema is not updated in catalog. **Solution:**We can directly call the spark API **alterTableDataSchema** by passing the updated schema, which in turn updates the shema in sessioncatalog Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? YES - [x] Any backward compatibility impacted? NA - [x] Document update required? NA - [x] Testing done tested in three node cluster for various spark versions Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata addcolumn Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3027.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3027 commit ee78136e55b309a4521914fdee57adf59cda8531 Author: akashrn5 Date: 2018-12-27T06:01:44Z update the schema to session catalog after add column, drop column and column rename ---
[GitHub] carbondata issue #2990: [CARBONDATA-3149]Support alter table column rename
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2990 @jackylk thank you ---
[GitHub] carbondata issue #2990: [CARBONDATA-3149]Support alter table column rename
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2990 @jackylk handled comments, please review ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r243261978 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala --- @@ -0,0 +1,324 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.schema + +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.execution.command.{AlterTableDataTypeChangeModel, DataTypeInfo, + MetadataCommand} +import org.apache.spark.sql.hive.CarbonSessionCatalog +import org.apache.spark.util.AlterTableUtil + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.features.TableOperation +import org.apache.carbondata.core.locks.{ICarbonLock, LockUsage} +import org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl +import org.apache.carbondata.core.metadata.datatype.DecimalType +import org.apache.carbondata.core.metadata.schema.table.CarbonTable +import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn +import org.apache.carbondata.events.{AlterTableColRenameAndDataTypeChangePostEvent, + AlterTableColRenameAndDataTypeChangePreEvent, OperationContext, OperationListenerBus} +import org.apache.carbondata.format.{ColumnSchema, SchemaEvolutionEntry, TableInfo} +import org.apache.carbondata.spark.util.DataTypeConverterUtil + +abstract class CarbonAlterTableColumnRenameCommand(oldColumnName: String, newColumnName: String) + extends MetadataCommand { + + protected def validColumnsForRenaming(carbonColumns: mutable.Buffer[CarbonColumn], + oldCarbonColumn: CarbonColumn, + carbonTable: CarbonTable): Unit = { +// check whether new column name is already an existing column name +if (carbonColumns.exists(_.getColName.equalsIgnoreCase(newColumnName))) { + throw new MalformedCarbonCommandException(s"Column Rename Operation failed. New " + +s"column name $newColumnName already exists" + +s" in table ${ carbonTable.getTableName }") +} + +// if the column rename is for complex column, block the operation +if (oldCarbonColumn.isComplex) { + throw new MalformedCarbonCommandException(s"Column Rename Operation failed. Rename " + +s"column is unsupported for complex datatype " + +s"column ${ oldCarbonColumn.getColName }") +} + +// if column rename operation is on partition column, then fail the rename operation +if (null != carbonTable.getPartitionInfo) { + val partitionColumns = carbonTable.getPartitionInfo.getColumnSchemaList + partitionColumns.asScala.foreach { +col => + if (col.getColumnName.equalsIgnoreCase(oldColumnName)) { +throw new MalformedCarbonCommandException( + s"Column Rename Operation failed. Renaming " + + s"the partition column $newColumnName is not " + + s"allowed") + } + } +} + + } +} + +private[sql] case class CarbonAlterTableColRenameDataTypeChangeCommand( +alterTableColRenameAndDataTypeChangeModel: AlterTableDataTypeChangeModel, +childTableColumnRename: Boolean = false) + extends CarbonAlterTableColumnRenameCommand(alterTableColRenameAndDataTypeChangeModel.columnName, +alterTableColRenameAndDataTypeChangeModel.newColu
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r243260747 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala --- @@ -1511,7 +1514,16 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { } DataTypeInfo("decimal", precision, scale) case _ => -throw new MalformedCarbonCommandException("Data type provided is invalid.") +if (isColumnRename) { + dataType match { --- End diff -- ok, refactored, if rename operation then just return the datatypeInfo object, else throw invalid datatype exception as before ---
[GitHub] carbondata issue #2990: [CARBONDATA-3149]Support alter table column rename
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2990 @manishgupta88 handled, please review ---
[GitHub] carbondata issue #2990: [CARBONDATA-3149]Support alter table column rename
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2990 @manishgupta88 handled comments, please review ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r242030254 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala --- @@ -1464,31 +1464,46 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { * @param values * @return */ - def parseDataType(dataType: String, values: Option[List[(Int, Int)]]): DataTypeInfo = { + def parseDataType( + dataType: String, + values: Option[List[(Int, Int)]], + isColumnRename: Boolean): DataTypeInfo = { +def validateAndGetDecimalDatatype: DataTypeInfo = { + var precision: Int = 0 + var scale: Int = 0 + if (values.isDefined) { +precision = values.get(0)._1 +scale = values.get(0)._2 + } else { +throw new MalformedCarbonCommandException("Decimal format provided is invalid") + } + // precision should be > 0 and <= 38 and scale should be >= 0 and <= 38 + if (precision < 1 || precision > 38) { --- End diff -- since the number is fixed, i havent changed the old code, no need to define constants, so i kept teh same as old ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r242026814 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala --- @@ -0,0 +1,345 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.schema + +import scala.collection.JavaConverters._ +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.execution.command.{AlterTableColRenameAndDataTypeChangeModel, DataTypeInfo, MetadataCommand} +import org.apache.spark.sql.hive.CarbonSessionCatalog +import org.apache.spark.util.AlterTableUtil + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.features.TableOperation +import org.apache.carbondata.core.locks.{ICarbonLock, LockUsage} +import org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl +import org.apache.carbondata.core.metadata.schema.table.CarbonTable +import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn +import org.apache.carbondata.events.{AlterTableColRenameAndDataTypeChangePostEvent, AlterTableColRenameAndDataTypeChangePreEvent, OperationContext, OperationListenerBus} +import org.apache.carbondata.format.{ColumnSchema, SchemaEvolutionEntry, TableInfo} +import org.apache.carbondata.spark.util.DataTypeConverterUtil + +private[sql] case class CarbonAlterTableColRenameDataTypeChangeCommand( +alterTableColRenameAndDataTypeChangeModel: AlterTableColRenameAndDataTypeChangeModel) + extends MetadataCommand { + + override def processMetadata(sparkSession: SparkSession): Seq[Row] = { +val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName) +val tableName = alterTableColRenameAndDataTypeChangeModel.tableName +val dbName = alterTableColRenameAndDataTypeChangeModel.databaseName + .getOrElse(sparkSession.catalog.currentDatabase) +var isColumnRenameOnly = false +var isDataTypeChangeOnly = false +var isBothColRenameAndDataTypeChange = false +setAuditTable(dbName, tableName) +setAuditInfo(Map( + "column" -> alterTableColRenameAndDataTypeChangeModel.columnName, + "newColumn" -> alterTableColRenameAndDataTypeChangeModel.newColumnName, + "newType" -> alterTableColRenameAndDataTypeChangeModel.dataTypeInfo.dataType)) +val locksToBeAcquired = List(LockUsage.METADATA_LOCK, LockUsage.COMPACTION_LOCK) +var locks = List.empty[ICarbonLock] +// get the latest carbon table and check for column existence +var carbonTable: CarbonTable = null +var timeStamp = 0L +try { + locks = AlterTableUtil +.validateTableAndAcquireLock(dbName, tableName, locksToBeAcquired)(sparkSession) + val metastore = CarbonEnv.getInstance(sparkSession).carbonMetastore + carbonTable = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession) + if (!carbonTable.canAllow(carbonTable, TableOperation.ALTER_COL_RENAME_AND_CHANGE_DATATYPE, +alterTableColRenameAndDataTypeChangeModel.columnName)) { +throw new MalformedCarbonCommandException( + "alter table change datatype or column rename is not supported for index datamap") + } + val operationContext = new OperationContext + val alterTableColRenameAndDataTypeChangePreEvent = +AlterTableColRenameAndDataTypeChangePreEvent(sparkSession, carbonTable, + alterTableColRenameAndDataTypeChangeModel) + OperationListenerBus.getInstance() +.fireEvent(alterTableColRenameAndDa
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r242026928 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala --- @@ -286,12 +286,16 @@ class CarbonFileMetastore extends CarbonMetaStore { newTableIdentifier: CarbonTableIdentifier, oldTableIdentifier: CarbonTableIdentifier, thriftTableInfo: org.apache.carbondata.format.TableInfo, - schemaEvolutionEntry: SchemaEvolutionEntry, + schemaEvolutionEntry: List[SchemaEvolutionEntry], --- End diff -- yes, can change to schemaEvolutionEntryList ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r242026723 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala --- @@ -0,0 +1,345 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.schema + +import scala.collection.JavaConverters._ +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.execution.command.{AlterTableColRenameAndDataTypeChangeModel, DataTypeInfo, MetadataCommand} +import org.apache.spark.sql.hive.CarbonSessionCatalog +import org.apache.spark.util.AlterTableUtil + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.features.TableOperation +import org.apache.carbondata.core.locks.{ICarbonLock, LockUsage} +import org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl +import org.apache.carbondata.core.metadata.schema.table.CarbonTable +import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn +import org.apache.carbondata.events.{AlterTableColRenameAndDataTypeChangePostEvent, AlterTableColRenameAndDataTypeChangePreEvent, OperationContext, OperationListenerBus} +import org.apache.carbondata.format.{ColumnSchema, SchemaEvolutionEntry, TableInfo} +import org.apache.carbondata.spark.util.DataTypeConverterUtil + +private[sql] case class CarbonAlterTableColRenameDataTypeChangeCommand( +alterTableColRenameAndDataTypeChangeModel: AlterTableColRenameAndDataTypeChangeModel) + extends MetadataCommand { + + override def processMetadata(sparkSession: SparkSession): Seq[Row] = { +val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName) +val tableName = alterTableColRenameAndDataTypeChangeModel.tableName +val dbName = alterTableColRenameAndDataTypeChangeModel.databaseName + .getOrElse(sparkSession.catalog.currentDatabase) +var isColumnRenameOnly = false +var isDataTypeChangeOnly = false +var isBothColRenameAndDataTypeChange = false +setAuditTable(dbName, tableName) +setAuditInfo(Map( + "column" -> alterTableColRenameAndDataTypeChangeModel.columnName, + "newColumn" -> alterTableColRenameAndDataTypeChangeModel.newColumnName, + "newType" -> alterTableColRenameAndDataTypeChangeModel.dataTypeInfo.dataType)) +val locksToBeAcquired = List(LockUsage.METADATA_LOCK, LockUsage.COMPACTION_LOCK) +var locks = List.empty[ICarbonLock] +// get the latest carbon table and check for column existence +var carbonTable: CarbonTable = null +var timeStamp = 0L +try { + locks = AlterTableUtil +.validateTableAndAcquireLock(dbName, tableName, locksToBeAcquired)(sparkSession) + val metastore = CarbonEnv.getInstance(sparkSession).carbonMetastore + carbonTable = CarbonEnv.getCarbonTable(Some(dbName), tableName)(sparkSession) + if (!carbonTable.canAllow(carbonTable, TableOperation.ALTER_COL_RENAME_AND_CHANGE_DATATYPE, +alterTableColRenameAndDataTypeChangeModel.columnName)) { +throw new MalformedCarbonCommandException( + "alter table change datatype or column rename is not supported for index datamap") + } + val operationContext = new OperationContext + val alterTableColRenameAndDataTypeChangePreEvent = +AlterTableColRenameAndDataTypeChangePreEvent(sparkSession, carbonTable, + alterTableColRenameAndDataTypeChangeModel) + OperationListenerBus.getInstance() +.fireEvent(alterTableColRenameAndDa
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
GitHub user akashrn5 opened a pull request: https://github.com/apache/carbondata/pull/2990 [CARBONDATA-3149]Support alter table column rename ### Why this PR? This PR is to support column rename feature in carbondata. Carbon already supports datatype change, alter table add column and drop column. This PR uses same DDL as datatype change and supports the column rename. Any column canbe Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata alter_rename Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2990.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2990 commit e84b09dd244803746fcacd3e9c6eda4105dd7bef Author: akashrn5 Date: 2018-12-14T11:20:09Z Support alter table column rename ---
[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r238658593 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.DoubleBuffer; +import java.nio.FloatBuffer; +import java.nio.IntBuffer; +import java.nio.LongBuffer; +import java.nio.ShortBuffer; + +import org.apache.carbondata.core.util.ByteUtil; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +public class GzipCompressor implements Compressor { + + public GzipCompressor() { + } + + @Override public String getName() { +return "gzip"; + } + + /* + * Method called for compressing the data and --- End diff -- change the comment as starndard doc ---
[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r238657955 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.DoubleBuffer; +import java.nio.FloatBuffer; +import java.nio.IntBuffer; +import java.nio.LongBuffer; +import java.nio.ShortBuffer; + +import org.apache.carbondata.core.util.ByteUtil; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +public class GzipCompressor implements Compressor { + + public GzipCompressor() { + } + + @Override public String getName() { +return "gzip"; + } + + /* + * Method called for compressing the data and + * return a byte array + */ + private byte[] compressData(byte[] data) { + +ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); +try { + GzipCompressorOutputStream gzipCompressorOutputStream = + new GzipCompressorOutputStream(byteArrayOutputStream); + try { +gzipCompressorOutputStream.write(data); + } catch (IOException e) { +throw new RuntimeException("Error during Compression step " + e.getMessage()); + } finally { +gzipCompressorOutputStream.close(); + } +} catch (IOException e) { + throw new RuntimeException("Error during Compression step " + e.getMessage()); +} + +return byteArrayOutputStream.toByteArray(); + } + + /* + * Method called for decompressing the data and + * return a byte array + */ + private byte[] decompressData(byte[] data) { + +ByteArrayInputStream byteArrayOutputStream = new ByteArrayInputStream(data); +ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream(); + --- End diff -- remove empty line ---
[GitHub] carbondata pull request #2967: [CARBONDATA-3140]Block create like table comm...
GitHub user akashrn5 opened a pull request: https://github.com/apache/carbondata/pull/2967 [CARBONDATA-3140]Block create like table command in carbon ### Why this PR? when like table is created using carbon table as source table, and the new table is dropped, it deletes the source table in spark-2.1 and works fine in other. Blocking the create like command for carbon table Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata like Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2967.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2967 commit cc8dbbf337976bae9e3a4a43b317c853b783fcff Author: akashrn5 Date: 2018-11-30T10:31:34Z Block create like table command in carbon ---
[GitHub] carbondata issue #2967: [CARBONDATA-3140]Block create like table command in ...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2967 @KanakaKumar @ravipesala please review ---
[GitHub] carbondata pull request #2967: [CARBONDATA-3140]Block create like table comm...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2967#discussion_r237817147 --- Diff: core/src/main/java/org/apache/carbondata/core/memory/UnsafeMemoryManager.java --- @@ -129,7 +129,7 @@ private synchronized MemoryBlock allocateMemory(MemoryType memoryType, String ta memoryBlock = MemoryAllocator.HEAP.allocate(memoryRequested); if (LOGGER.isDebugEnabled()) { LOGGER.debug(String -.format("Creating onheap working Memory block (%d) with size: ", memoryBlock.size())); +.format("Creating onheap working Memory block with size: (%d)", memoryBlock.size())); --- End diff -- corrected the error message here ---
[GitHub] carbondata pull request #2791: [WIP][HOTFIX]correct the exception handling i...
Github user akashrn5 closed the pull request at: https://github.com/apache/carbondata/pull/2791 ---
[GitHub] carbondata pull request #2953: [CARBONDATA-3132]Correct the task disrtibutio...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2953#discussion_r236646439 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonMergerRDD.scala --- @@ -401,7 +401,11 @@ class CarbonMergerRDD[K, V]( .add(new CarbonInputSplitTaskInfo(entry._1, entry._2).asInstanceOf[Distributable]) ) -val nodeBlockMap = CarbonLoaderUtil.nodeBlockMapping(taskInfoList, -1) +// get all the active nodes of cluster and prepare the nodeBlockMap based on these nodes +val activeNodes = DistributionUtil + .ensureExecutorsAndGetNodeList(taskInfoList.asScala, sparkContext) + +val nodeBlockMap = CarbonLoaderUtil.nodeBlockMapping(taskInfoList, -1, activeNodes.asJava) --- End diff -- done ---
[GitHub] carbondata pull request #2953: [CARBONDATA-3132]Correct the task disrtibutio...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2953#discussion_r236552294 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java --- @@ -536,7 +537,8 @@ public static Dictionary getDictionary(AbsoluteTableIdentifier absoluteTableIden */ public static Map> nodeBlockMapping(List blockInfos) { // -1 if number of nodes has to be decided based on block location information -return nodeBlockMapping(blockInfos, -1); +return nodeBlockMapping(blockInfos, -1, null, --- End diff -- here i just did the refactoring ---
[GitHub] carbondata pull request #2953: [CARBONDATA-3132]Correct the task disrtibutio...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2953#discussion_r236551537 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java --- @@ -536,7 +537,8 @@ public static Dictionary getDictionary(AbsoluteTableIdentifier absoluteTableIden */ public static Map> nodeBlockMapping(List blockInfos) { // -1 if number of nodes has to be decided based on block location information -return nodeBlockMapping(blockInfos, -1); +return nodeBlockMapping(blockInfos, -1, null, --- End diff -- yes, in compaction case `BlockAssignmentStrategy.BLOCK_NUM_FIRST` is default , same as before ---
[GitHub] carbondata pull request #2953: [CARBONDATA-3132]Correct the task disrtibutio...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2953#discussion_r236537747 --- Diff: tools/cli/src/test/java/org/apache/carbondata/tool/CarbonCliTest.java --- @@ -205,7 +206,7 @@ public void testSummaryOutputAll() { expectedOutput = buildLines( "## version Details", "written_by Version ", -"TestUtil1.6.0-SNAPSHOT "); +"TestUtil"+ CarbonVersionConstants.CARBONDATA_VERSION+" "); --- End diff -- removed the harded value here ---
[GitHub] carbondata pull request #2953: [CARBONDATA-3132]Correct the task disrtibutio...
GitHub user akashrn5 opened a pull request: https://github.com/apache/carbondata/pull/2953 [CARBONDATA-3132]Correct the task disrtibution in case of compaction when the actual block nodes and active nodes are different ### Why This PR? There is an unequal distribution of tasks during compaction ex: When the load is done using replication factor 2 and all nodes are active and during compaction one node is down, basically it is not active executor, so the task distribution should take care to distribute the tasks equally among all the active executors instead of giving more tasks to single executor and less to other executor. But sometimes the unequal distribution happens and the compaction becomes sow. ### Solution Currently we are not getting active executors before the node block mapping and sending the list of active executors as null, which will lead to the above problem sometimes. so get the active executors and send for node block mapping logic which will handle to distribute equally. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? NA - [x] Any backward compatibility impacted? NA - [x] Document update required? NA - [x] Testing done tested using 3 node and 6 node cluster Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata distribute Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2953.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2953 commit caaa044b461a64f28be6107e55e8ab2a1ae2bf4d Author: brijoobopanna Date: 2018-11-27T06:38:15Z Correct the task disrtibution in case of compaction when the actual block nodes and active nodes are different ---
[GitHub] carbondata pull request #2925: [CARBONDATA-3102] Fix NoClassDefFoundError wh...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2925#discussion_r235015512 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala --- @@ -157,8 +157,8 @@ object S3Example { } def getSparkMaster(args: Array[String]): String = { - if (args.length == 5) args(4) - else if (args(3).contains("spark:") || args(3).contains("mesos:")) args(3) - else "local" -} --- End diff -- revert the unnecessary changes ---
[GitHub] carbondata issue #2928: [CARBONDATA-3106] Written_by_APPNAME not serialized ...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2928 In `CarbonFactDataWriterImplV3.java` , `writeFootrToFile` , if appName after getting from carbonProperties is null,you can fail the dataload with dataload exception, please handle that also ---
[GitHub] carbondata pull request #2872: [WIP] Added reusable buffer code
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2872#discussion_r234233366 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveIntegralCodec.java --- @@ -119,24 +120,34 @@ public ColumnPage decode(byte[] input, int offset, int length) return LazyColumnPage.newPage(page, converter); } - @Override - public void decodeAndFillVector(byte[] input, int offset, int length, - ColumnVectorInfo vectorInfo, BitSet nullBits, boolean isLVEncoded) - throws MemoryException, IOException { -ColumnPage page = null; -if (DataTypes.isDecimal(meta.getSchemaDataType())) { - page = ColumnPage.decompressDecimalPage(meta, input, offset, length); - vectorInfo.decimalConverter = ((DecimalColumnPage) page).getDecimalConverter(); + @Override public void decodeAndFillVector(byte[] input, int offset, int length, + ColumnVectorInfo vectorInfo, BitSet nullBits, boolean isLVEncoded, int pageSize, + ReusableDataBuffer reusableDataBuffer) throws MemoryException, IOException { +Compressor compressor = + CompressorFactory.getInstance().getCompressor(meta.getCompressorName()); +byte[] unCompressData; +if (null != reusableDataBuffer && compressor.supportReusableBuffer()) { + int uncompressedLength = compressor.unCompressedLength(input, offset, length); + unCompressData = reusableDataBuffer.getDataBuffer(uncompressedLength); } else { - page = ColumnPage.decompress(meta, input, offset, length, isLVEncoded); + unCompressData = compressor.unCompressByte(input, offset, length); } -page.setNullBits(nullBits); -converter.decodeAndFillVector(page, vectorInfo); +compressor.rawUncompress(input, offset, length, unCompressData); --- End diff -- i think, when ZSTD compressor is configured, it will throw unsupported exception, plese check and handle for `AdaptiveIntegralCodec` and `AdaptiveDeltaIntegralCodec` ---
[GitHub] carbondata issue #2922: [HOTFIX]s3 lock file fix
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2922 @jackylk please review ---
[GitHub] carbondata issue #2922: [HOTFIX]s3 lock file fix
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2922 @jackylk CARBONDATA-3103 is different issue ---
[GitHub] carbondata pull request #2922: [WIP]s3 lock file fix
GitHub user akashrn5 opened a pull request: https://github.com/apache/carbondata/pull/2922 [WIP]s3 lock file fix Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata s3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2922.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2922 commit a34f191f8e9026c5d6950ea86962ead0282d957f Author: akashrn5 Date: 2018-11-14T16:23:39Z s3 lock file fix ---
[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaul...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2886 retest this please ---
[GitHub] carbondata issue #2903: [CARBONDATA-3084]dataload failure fix when float val...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2903 retest this please ---
[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaul...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2886 @ravipesala please review the documentation commit added ---
[GitHub] carbondata pull request #2903: [CARBONDATA-3084]dataload failure fix when fl...
GitHub user akashrn5 opened a pull request: https://github.com/apache/carbondata/pull/2903 [CARBONDATA-3084]dataload failure fix when float value exceeds the limit ### Problem: when the float value exceeds the range and we try to insert that data, data load fails. ### Analysis: when the value exceeds the range, the max is set as `Infinity`, so the decimal count of that value will be 0, so when decimal count is zero we go for CodecByAlgorithmForIntegral, so it fails ### Solution: when the value exceeds, and decimal count is zero , source datatype is float, then select DirectCompressCodec ### How this tested: test cases are added to validate the load and data Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata float Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2903.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2903 commit 798592441f77300dd89b1dec4dd51c15bc6b2c07 Author: akashrn5 Date: 2018-11-06T07:06:24Z dataload failure fix when float value exceeds the limit ---
[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaul...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2886 2.2.1 and 2.3.1 failed test cases, which are not related and these are random ---
[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaul...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2886 retest this please ---
[GitHub] carbondata pull request #2886: [CARBONDATA-3065]make inverted index false by...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2886#discussion_r230684559 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestNoInvertedIndexLoadAndQuery.scala --- @@ -289,6 +292,44 @@ class TestNoInvertedIndexLoadAndQuery extends QueryTest with BeforeAndAfterAll { checkAnswer(sql("""select c2 from testNull where c2 is null"""), Seq(Row(null), Row(null), Row(null), Row(null), Row(null), Row(null))) } + test("inverted index with Dictionary_EXCLUDE and INVERTED_INDEX") { +sql("drop table if exists index1") +sql( + """ + CREATE TABLE IF NOT EXISTS index1 + (id Int, name String, city String) + STORED BY 'org.apache.carbondata.format' + TBLPROPERTIES('DICTIONARY_EXCLUDE'='city','INVERTED_INDEX'='city') + """) +sql( + s""" + LOAD DATA LOCAL INPATH '$testData1' into table index1 + """) +checkAnswer( + sql( +""" + SELECT * FROM index1 WHERE city = "Bangalore" +"""), + Seq(Row(19.0, "Emily", "Bangalore"))) --- End diff -- done ---
[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaul...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2886 @ravipesala in existing flow, all the sort columns and dimensions will have inverted index and, in current implementation, if measure columns are given in inverted_index property, then i ll not set it as inverted index. Basically i wont throw any exception. Do we need to throw exception? and currently im allowing the inverted index for all dimension columns and not restricting to sort columns only. So basically, user can mention columns in inverted_index columns (dimensions), i am setting those as inverted index. it is correct right ---
[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2882#discussion_r230637603 --- Diff: tools/cli/src/main/java/org/apache/carbondata/tool/DataSummary.java --- @@ -311,26 +309,39 @@ private void printColumnStats(String columnName) throws IOException, MemoryExcep max = new String(blockletMax, Charset.forName(DEFAULT_CHARSET)); } } else { - minPercent = String.format("%.1f", blocklet.getColumnChunk().getMinPercentage() * 100); - maxPercent = String.format("%.1f", blocklet.getColumnChunk().getMaxPercentage() * 100); + // for complex columns min and max and percentage + if (blocklet.getColumnChunk().column.getColumnName().contains(".val") || + blocklet.getColumnChunk().column.getColumnName().contains(".")) { +minPercent = "NA"; +maxPercent = "NA"; + } else { +minPercent = +String.format("%.1f", Math.abs(blocklet.getColumnChunk().getMinPercentage() * 100)); +maxPercent = +String.format("%.1f", Math.abs(blocklet.getColumnChunk().getMaxPercentage() * 100)); + } DataFile.ColumnChunk columnChunk = blocklet.columnChunk; - if (columnChunk.column.isDimensionColumn() && DataTypeUtil + // need to consider no dictionary complex column + if (columnChunk.column.hasEncoding(Encoding.DICTIONARY) || blocklet + .getColumnChunk().column.getColumnName().contains(".val") || blocklet --- End diff -- can have a method which tells the column is complex column based on name, we already have method which tells the column by datatype, and for dictionary include and complex type, togetther no need to check, because again for child columns need to have other method, as we cant give child columns in dictionary ---
[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2882#discussion_r230637639 --- Diff: tools/cli/src/main/java/org/apache/carbondata/tool/DataSummary.java --- @@ -311,26 +309,39 @@ private void printColumnStats(String columnName) throws IOException, MemoryExcep max = new String(blockletMax, Charset.forName(DEFAULT_CHARSET)); } } else { - minPercent = String.format("%.1f", blocklet.getColumnChunk().getMinPercentage() * 100); - maxPercent = String.format("%.1f", blocklet.getColumnChunk().getMaxPercentage() * 100); + // for complex columns min and max and percentage + if (blocklet.getColumnChunk().column.getColumnName().contains(".val") || + blocklet.getColumnChunk().column.getColumnName().contains(".")) { +minPercent = "NA"; +maxPercent = "NA"; + } else { +minPercent = +String.format("%.1f", Math.abs(blocklet.getColumnChunk().getMinPercentage() * 100)); +maxPercent = +String.format("%.1f", Math.abs(blocklet.getColumnChunk().getMaxPercentage() * 100)); + } DataFile.ColumnChunk columnChunk = blocklet.columnChunk; - if (columnChunk.column.isDimensionColumn() && DataTypeUtil + // need to consider no dictionary complex column + if (columnChunk.column.hasEncoding(Encoding.DICTIONARY) || blocklet + .getColumnChunk().column.getColumnName().contains(".val") || blocklet --- End diff -- handled, please review ---
[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2882#discussion_r230637383 --- Diff: examples/spark2/src/main/java/org/apache/carbondata/examples/sdk/CarbonReaderExample.java --- @@ -61,7 +61,8 @@ public static void main(String[] args) { CarbonWriter writer = CarbonWriter.builder() .outputPath(path) .withLoadOptions(map) -.withCsvInput(new Schema(fields)).build(); +.withCsvInput(new Schema(fields)) +.writtenBy("CarbonReaderExample").build(); --- End diff -- done, actually initially it was like that, and im using carbon formatting only, but i do not know why it is formatting like this, need to once check the xml ---
[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2882#discussion_r230340585 --- Diff: tools/cli/src/main/java/org/apache/carbondata/tool/DataFile.java --- @@ -453,7 +455,16 @@ private double computePercentage(byte[] data, byte[] min, byte[] max, ColumnSche return dataValue.divide(factorValue).doubleValue(); } double dataValue, minValue, factorValue; - if (column.getDataType() == DataTypes.SHORT) { + if (columnChunk.column.isDimensionColumn() && DataTypeUtil --- End diff -- done ---
[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2882#discussion_r230319095 --- Diff: examples/spark2/src/main/java/org/apache/carbondata/examples/sdk/CarbonReaderExample.java --- @@ -61,7 +61,7 @@ public static void main(String[] args) { CarbonWriter writer = CarbonWriter.builder() .outputPath(path) .withLoadOptions(map) -.withCsvInput(new Schema(fields)).build(); +.withCsvInput(new Schema(fields)).writtenBy("CarbonReaderExample").build(); --- End diff -- done ---
[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2882#discussion_r230319129 --- Diff: tools/cli/src/main/java/org/apache/carbondata/tool/DataSummary.java --- @@ -314,23 +312,26 @@ private void printColumnStats(String columnName) throws IOException, MemoryExcep minPercent = String.format("%.1f", blocklet.getColumnChunk().getMinPercentage() * 100); maxPercent = String.format("%.1f", blocklet.getColumnChunk().getMaxPercentage() * 100); DataFile.ColumnChunk columnChunk = blocklet.columnChunk; - if (columnChunk.column.isDimensionColumn() && DataTypeUtil + if (columnChunk.column.hasEncoding(Encoding.DICTIONARY) || blocklet + .getColumnChunk().column.getColumnName().contains(".val") || blocklet --- End diff -- done ---
[GitHub] carbondata pull request #2886: [CARBONDATA-3065]make inverted index false by...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2886#discussion_r230309345 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -359,8 +359,13 @@ private CarbonCommonConstants() { public static final String TABLE_BLOCKSIZE = "table_blocksize"; // table blocklet size in MB public static final String TABLE_BLOCKLET_SIZE = "table_blocklet_size"; - // set in column level to disable inverted index + /** + * set in column level to disable inverted index + * @Deprecated :This property is deprecated, it is kep just for compatibility --- End diff -- done ---
[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaut
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2886 @kevinjmh basically, with my changes the column level sorting will be skipped and only row level sorting will be done in sort step. ---
[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaut
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2886 > @akashrn5 Please expose these properties from SDK and fileformat as well. @ravipesala handed for SDK and fileformat, please review ---
[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaut
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2886 @kevinjmh 1. when you set in INVERTED_INDEX , but not in SORT_COLUMNS, then data will not be sorted, only RLE will be applied on data. 2. `isInvertedIndex ` basically this boolean cannot say that, it is do RLE, basically RLE will be applied on both inverted and no inverted case, but after checking `isInvertedIndex ` it decides to sort based on isSort check ---
[GitHub] carbondata pull request #2888: [CARBONDATA-3066]add documentation for writte...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2888#discussion_r230272135 --- Diff: docs/sdk-guide.md --- @@ -429,6 +429,15 @@ public CarbonWriterBuilder withAvroInput(org.apache.avro.Schema avroSchema); public CarbonWriterBuilder withJsonInput(Schema carbonSchema); ``` +``` +/** +* To support writing the ApplicationName which is writing the carbondata file +* @param application name which is writing the carbondata files +* @return CarbonWriterBuilder --- End diff -- this is mandatory because SDK can be used by different application, one application can write and other can read this, so we store this info, during load case we take from spark application name, SDK needs to provide this ---
[GitHub] carbondata pull request #2888: [CARBONDATA-3066]add documentation for writte...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2888#discussion_r230270106 --- Diff: docs/sdk-guide.md --- @@ -124,7 +124,7 @@ public class TestSdkAvro { try { CarbonWriter writer = CarbonWriter.builder() .outputPath(path) - .withAvroInput(new org.apache.avro.Schema.Parser().parse(avroSchema)).build(); + .withAvroInput(new org.apache.avro.Schema.Parser().parse(avroSchema))..writtenBy("SDK").build(); --- End diff -- done ---
[GitHub] carbondata pull request #2888: [CARBONDATA-3066]add documentation for writte...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2888#discussion_r230270097 --- Diff: docs/sdk-guide.md --- @@ -686,6 +695,16 @@ Find example code at [CarbonReaderExample](https://github.com/apache/carbondata/ public static Schema readSchemaInIndexFile(String indexFilePath); ``` +``` + /** + * This method return the version details in formatted string by reading from carbondata file + * @param dataFilePath complete path including carbondata file name + * @return string with information of who has written this file in which carbondata project version --- End diff -- done ---
[GitHub] carbondata pull request #2888: [CARBONDATA-3066]add documentation for writte...
GitHub user akashrn5 opened a pull request: https://github.com/apache/carbondata/pull/2888 [CARBONDATA-3066]add documentation for writtenBy and getVersionDetails APIs in SDK This PR adds the documentation for new APIs added in SDK builder API- writtenBy() reader API- getVersionDetails() Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata sdk_doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2888.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2888 commit cf6b6ebe4b8f05695d8beea79ec677a6ce500044 Author: akashrn5 Date: 2018-10-31T14:43:02Z add documentation for writtenBy and getVersionDetails APIs in SDK ---
[GitHub] carbondata pull request #2886: [WIP]make inverted index false by defaut
GitHub user akashrn5 opened a pull request: https://github.com/apache/carbondata/pull/2886 [WIP]make inverted index false by defaut Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata disable_inverted_index Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2886.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2886 commit c8b434294ee52ccd1a5445ce694f99f738908b0c Author: akashrn5 Date: 2018-10-31T08:43:48Z make inverted index false by defaut ---
[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2882#discussion_r229300837 --- Diff: tools/cli/src/main/java/org/apache/carbondata/tool/DataSummary.java --- @@ -314,23 +312,26 @@ private void printColumnStats(String columnName) throws IOException, MemoryExcep minPercent = String.format("%.1f", blocklet.getColumnChunk().getMinPercentage() * 100); maxPercent = String.format("%.1f", blocklet.getColumnChunk().getMaxPercentage() * 100); DataFile.ColumnChunk columnChunk = blocklet.columnChunk; - if (columnChunk.column.isDimensionColumn() && DataTypeUtil + if (columnChunk.column.hasEncoding(Encoding.DICTIONARY) || blocklet + .getColumnChunk().column.getColumnName().contains(".val") || blocklet --- End diff -- this will be for no dictionary colmplex column, for complex column min max can be shown as NA, that will be ok right ---
[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2882#discussion_r229297954 --- Diff: tools/cli/src/main/java/org/apache/carbondata/tool/DataFile.java --- @@ -443,7 +444,8 @@ void computePercentage(byte[] shardMin, byte[] shardMax) { * @return result */ private double computePercentage(byte[] data, byte[] min, byte[] max, ColumnSchema column) { - if (column.getDataType() == DataTypes.STRING) { + if (column.getDataType() == DataTypes.STRING || column.getDataType() == DataTypes.BOOLEAN + || column.hasEncoding(Encoding.DICTIONARY)) { --- End diff -- yes, but min max will be surrogate keys right, showing min and max as dictionary value is not useful right ---
[GitHub] carbondata pull request #2861: [CARBONDATA-3025]handle passing spark appname...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2861#discussion_r229294323 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala --- @@ -175,6 +172,11 @@ with Serializable { dataSchema: StructType, context: TaskAttemptContext): OutputWriter = { val model = CarbonTableOutputFormat.getLoadModel(context.getConfiguration) +val appName = context.getConfiguration.get(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME) +if (null != appName) { --- End diff -- actually the appname will be always set ,spark will always set the appname, this check is added for one of the test cases , i will remove this ---
[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...
GitHub user akashrn5 opened a pull request: https://github.com/apache/carbondata/pull/2882 [CARBONDATA-3060]Improve the command for cli and fixed other issues ### Changes proposed in this PR: **1. improve the syntax for CLI DDL, now the command can be given as** `CarbonCli for table options('-cmd summary/benchmark-a -s -v -c -m')` the options will take one string, which is basically a command, which user can directly paste into command promt and run as java command Now user no nned to give -P also, internally when above commad is run we take table path into consideration in command line arguments **2. other issues are fixed in this PR are** 1. when numeric columns are included in dictionary 2. timestamp column's min and max details 3. help command was not working in beeline 4. for all the complex columns, like parent and child columns, min and max will be given as "NA" Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata improve Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2882.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2882 commit ca8061e1d6e0652c786c1231ad82efe17e504ee4 Author: akashrn5 Date: 2018-10-29T12:44:24Z Improve the command for cli and fixed other issues ---
[GitHub] carbondata pull request #2861: [CARBONDATA-3025]handle passing spark appname...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2861#discussion_r229235332 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala --- @@ -175,6 +174,12 @@ with Serializable { dataSchema: StructType, context: TaskAttemptContext): OutputWriter = { val model = CarbonTableOutputFormat.getLoadModel(context.getConfiguration) +val appName = context.getConfiguration.get(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME) +if (null != appName) { + CarbonProperties.getInstance() +.addProperty(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME, + appName) --- End diff -- done ---
[GitHub] carbondata issue #2861: [HOTFIX]handle passing spark appname for partition t...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2861 retest this please ---
[GitHub] carbondata pull request #2861: [HOTFIX]handle passing spark appname for part...
GitHub user akashrn5 opened a pull request: https://github.com/apache/carbondata/pull/2861 [HOTFIX]handle passing spark appname for partition table and file format Dataload with partion table file format fails, as the appname is not in carbonproperties in executor. This PR sets the spark appname in carbon properties which will be written to carbondata footer. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata written Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2861.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2861 commit 92a97277a96453e321a120eec2e1dc0b6d327c17 Author: akashrn5 Date: 2018-10-26T14:04:04Z handle passing spark appname for partition table and file format ---
[GitHub] carbondata issue #2829: [CARBONDATA-3025]add more metadata in carbon file fo...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2829 > @akashrn5 Instead of changing many classes to pass writtenBy and appName can't we set to CarbonProperties and in writer step we can get from the same and write to thrift?? handled ---
[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2829#discussion_r228124274 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java --- @@ -371,8 +381,14 @@ public CarbonWriter build() throws IOException, InvalidLoadOptionException { "Writer type is not set, use withCsvInput() or withAvroInput() or withJsonInput() " + "API based on input"); } +if (this.writtenByApp == null) { --- End diff -- added ---
[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2829#discussion_r228124241 --- Diff: format/src/main/thrift/carbondata.thrift --- @@ -206,6 +206,7 @@ struct FileFooter3{ 4: optional list blocklet_info_list3; // Information about blocklets of all columns in this file for V3 format 5: optional dictionary.ColumnDictionaryChunk dictionary; // Blocklet local dictionary 6: optional bool is_sort; // True if the data is sorted in this file, it is used for compaction to decide whether to use merge sort or not +7: optional map extra_info; // written by is used to write who wrote the file, it can be Aplication name, or SDK etc and version in which this carbondata file is written etc --- End diff -- modified ---
[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2829#discussion_r228124157 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/CarbonDataLoadConfiguration.java --- @@ -460,4 +464,20 @@ public String getColumnCompressor() { public void setColumnCompressor(String columnCompressor) { this.columnCompressor = columnCompressor; } + + public String getAppName() { --- End diff -- handled as carbon property ---
[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2829#discussion_r228123974 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCarbonFileInputFormatWithExternalCarbonTable.scala --- @@ -56,7 +56,7 @@ class TestCarbonFileInputFormatWithExternalCarbonTable extends QueryTest with Be val builder = CarbonWriter.builder() val writer = builder.outputPath(writerPath + "/Fact/Part0/Segment_null") - .withCsvInput(Schema.parseJson(schema)).build() + .withCsvInput(Schema.parseJson(schema)).writtenBy("SDK").build() --- End diff -- added classname for writtenby ---
[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2829#discussion_r228124063 --- Diff: integration/spark-datasource/src/test/scala/org/apache/spark/sql/carbondata/datasource/SparkCarbonDataSourceTest.scala --- @@ -984,7 +984,7 @@ class SparkCarbonDataSourceTest extends FunSuite with BeforeAndAfterAll { val writer = builder.outputPath(path) .uniqueIdentifier(System.nanoTime()).withBlockSize(2) - .withCsvInput(new Schema(structType)).build() + .withCsvInput(new Schema(structType)).writtenBy("DataSource").build() --- End diff -- ok ---
[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2829#discussion_r228124017 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala --- @@ -139,13 +139,13 @@ class TestNonTransactionalCarbonTable extends QueryTest with BeforeAndAfterAll { .sortBy(sortColumns.toArray) .uniqueIdentifier( System.currentTimeMillis).withBlockSize(2).withLoadOptions(options) -.withCsvInput(Schema.parseJson(schema)).build() + .withCsvInput(Schema.parseJson(schema)).writtenBy("SDK").build() --- End diff -- changed ---
[GitHub] carbondata pull request #2830: [CARBONDATA-3025]Added CLI enhancements
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2830#discussion_r228123785 --- Diff: tools/cli/src/main/java/org/apache/carbondata/tool/DataSummary.java --- @@ -226,25 +282,46 @@ private int getColumnIndex(String columnName) { private boolean collected = false; private void printColumnStats(String columnName) throws IOException, MemoryException { -out.println(); -out.println("## Column Statistics for '" + columnName + "'"); +outPuts.add(""); +outPuts.add("## Column Statistics for '" + columnName + "'"); collectStats(columnName); int columnIndex = getColumnIndex(columnName); String[] header = new String[]{"BLK", "BLKLT", "Meta Size", "Data Size", -"LocalDict", "DictEntries", "DictSize", "AvgPageSize", "Min%", "Max%"}; +"LocalDict", "DictEntries", "DictSize", "AvgPageSize", "Min%", "Max%", "Min", "Max"}; -ShardPrinter printer = new ShardPrinter(header); +ShardPrinter printer = new ShardPrinter(header, outPuts); for (Map.Entry entry : dataFiles.entrySet()) { DataFile file = entry.getValue(); for (DataFile.Blocklet blocklet : file.getAllBlocklets()) { -String min, max; +String min, max, minPercent, maxPercent; --- End diff -- handled ---
[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2829#discussion_r228123905 --- Diff: format/src/main/thrift/carbondata.thrift --- @@ -206,6 +206,8 @@ struct FileFooter3{ 4: optional list blocklet_info_list3; // Information about blocklets of all columns in this file for V3 format 5: optional dictionary.ColumnDictionaryChunk dictionary; // Blocklet local dictionary 6: optional bool is_sort; // True if the data is sorted in this file, it is used for compaction to decide whether to use merge sort or not +7: optional string written_by; // written by is used to write who wrote the file, it can be LOAD, or SDK etc --- End diff -- added a map ---
[GitHub] carbondata pull request #2830: [CARBONDATA-3025]Added CLI enhancements
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2830#discussion_r228123653 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonShowSummaryCommand.scala --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.management + +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference} +import org.apache.spark.sql.execution.command.{Checker, DataCommand} +import org.apache.spark.sql.types.StringType + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException +import org.apache.carbondata.tool.CarbonCli + +case class CarbonShowSummaryCommand( --- End diff -- done ---
[GitHub] carbondata pull request #2830: [CARBONDATA-3025]Added CLI enhancements
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2830#discussion_r228123741 --- Diff: tools/cli/src/main/java/org/apache/carbondata/tool/CarbonCli.java --- @@ -40,6 +41,10 @@ @InterfaceStability.Unstable public class CarbonCli { + private static ArrayList outPuts; + + private static boolean isPrintInConsole = true; --- End diff -- added ---
[GitHub] carbondata pull request #2830: [CARBONDATA-3025]Added CLI enhancements
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2830#discussion_r228123616 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSpark2SqlParser.scala --- @@ -493,6 +493,23 @@ class CarbonSpark2SqlParser extends CarbonDDLSqlParser { showHistory.isDefined) } + + protected lazy val cli: Parser[LogicalPlan] = +(SHOW ~> SUMMARY ~> FOR ~> TABLE) ~ (ident <~ ".").? ~ ident ~ --- End diff -- changed ---
[GitHub] carbondata pull request #2853: [CARBONDATA_3025]make cli compilable with jav...
GitHub user akashrn5 opened a pull request: https://github.com/apache/carbondata/pull/2853 [CARBONDATA_3025]make cli compilable with java 1.7 This PR helps to compile the cli module to be compile with java 1.7 Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata cli1.7 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2853.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2853 commit d0b4c86b6c6928030d63539796201d4b6f284ce6 Author: akashrn5 Date: 2018-10-24T12:05:00Z make cli compilable with java 1.7 ---
[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2829#discussion_r228034556 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1845,6 +1845,21 @@ public static final int CARBON_MINMAX_ALLOWED_BYTE_COUNT_MIN = 10; public static final int CARBON_MINMAX_ALLOWED_BYTE_COUNT_MAX = 1000; + /** + * Written by detail to be written in carbondata footer for better maintanability + */ + public static final String CARBON_WRITTEN_BY_FOOTER_INFO = "written_by"; + + /** + * carbon version detail to be written in carbondata footer for better maintanability + */ + public static final String CARBON_VERSION_FOOTER_INFO = "version"; --- End diff -- yes, even this suits better ---
[GitHub] carbondata issue #2829: [CARBONDATA-3025]add more metadata in carbon file fo...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2829 retest this please ---
[GitHub] carbondata issue #2829: [CARBONDATA-3025]add more metadata in carbon file fo...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2829 retest this please ---
[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2829#discussion_r226722751 --- Diff: format/src/main/thrift/carbondata.thrift --- @@ -206,6 +206,7 @@ struct FileFooter3{ 4: optional list blocklet_info_list3; // Information about blocklets of all columns in this file for V3 format 5: optional dictionary.ColumnDictionaryChunk dictionary; // Blocklet local dictionary 6: optional bool is_sort; // True if the data is sorted in this file, it is used for compaction to decide whether to use merge sort or not +7: optional map extra_info; // written by is used to write who wrote the file, it can be Aplication name, or SDK etc and version in which this carbondata file is written etc --- End diff -- for all the extra info, create map, it didnt get much of it, currently, it is map, and this suits for adding extra meta, and about changing test case, since those are the api level changes, we need to change those test cases. ---
[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2829#discussion_r226233891 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java --- @@ -76,6 +80,17 @@ public static Schema readSchemaInDataFile(String dataFilePath) throws IOExceptio return new Schema(columnSchemaList); } + public static String getVersionDetails(String dataFilePath) throws IOException { --- End diff -- yes this info is from footer, in SDK we expose API is schema reader right, i thought may be i can expose one more API there only to get the version details, will this be ok? or need to write this in new class for footer info? please suggest ---
[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2829#discussion_r226229213 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/CarbonDataLoadConfiguration.java --- @@ -460,4 +464,20 @@ public String getColumnCompressor() { public void setColumnCompressor(String columnCompressor) { this.columnCompressor = columnCompressor; } + + public String getAppName() { +return appName; + } + + public void setAppName(String appName) { +this.appName = appName; + } + + public String getVersion() { +return version; + } + + public void setVersion(String version) { --- End diff -- it is carbon version ---
[GitHub] carbondata issue #2830: [CARBONDATA-3025]Added CLI enhancements
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2830 > @akashrn5 , I think it's better to support showing a specified segment info too, if users want to get the details of one segment but there are many blocks in this segment, it's difficult for user to use.. maybe can add this feature later. yes, segment level can be added later ---
[GitHub] carbondata issue #2830: [CARBONDATA-3025]Added CLI enhancements
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2830 @zzcclp it shows all segment info, segment level filtering is not there, but we can send a specific block path to get only that block info ---
[GitHub] carbondata pull request #2828: [CARBONDATA-3025]Added DDL support for cli an...
Github user akashrn5 closed the pull request at: https://github.com/apache/carbondata/pull/2828 ---
[GitHub] carbondata pull request #2830: [CARBONDATA-3025]Added CLI enhancements
GitHub user akashrn5 opened a pull request: https://github.com/apache/carbondata/pull/2830 [CARBONDATA-3025]Added CLI enhancements Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata integrate2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2830.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2830 commit 9b398144f83bf5378b4378ba1ddb47f50bf7a6da Author: akashrn5 Date: 2018-10-17T14:26:49Z add more metadata in carbon file footer commit e31d757ce8757ac276233c62a8e9d1947553fc4a Author: akashrn5 Date: 2018-10-17T14:28:59Z Added CLI enhancements for better maintainability ---
[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...
GitHub user akashrn5 opened a pull request: https://github.com/apache/carbondata/pull/2829 [CARBONDATA-3025]add more metadata in carbon file footer Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata integrate1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2829.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2829 commit 9b398144f83bf5378b4378ba1ddb47f50bf7a6da Author: akashrn5 Date: 2018-10-17T14:26:49Z add more metadata in carbon file footer ---
[GitHub] carbondata pull request #2828: Added DDL support for cli and added more info...
GitHub user akashrn5 opened a pull request: https://github.com/apache/carbondata/pull/2828 Added DDL support for cli and added more info in carbon file footer ### **Changes Proposed in this PR:** 1. Add more info to carbon file footer, like `written_by` (which will be spark application_name) in case of insert into and load command. To read this info one can use CLI For SDK this API will be exposed to write this info in footer and one API will exposed to read this info from SDK. 2. footer will have information about in which `version` of carbon the file is written, which will be helpful for getting details, for comaptibility etc Enhancement in CLI tool 3. a new option called "`-v`" is added to get the written_by and version details in addition to existing options 4. SQL support is added for CLI This is introduced so that user can get the details in beeline only and no need to execute separately. Currently the command is as below, based on comment we can change this DDL `Show summary for table options('command'='-cmd,summary,-a,-p,-b');` 5. when we get details for column statistics ,we will get the Min and Max percentage, if we add actual min and max values, then it will be helpful for the developer ``` BLK BLKLT Meta Size Data Size LocalDict DictEntries DictSize AvgPageSize Min% Max% Min Max 0 0 2.90KB 4.87MB false 0 0.0B 93.76KB 0.0 100.0 0290 0 1 2.90KB 2.29MB false 0 0.0B 93.76KB 0.0 100.0 1292 1 0 2.90KB 4.87MB false 0 0.0B 93.76KB 0.0 100.0 3293 1 1 2.90KB 2.29MB false 0 0.0B 93.76KB 0.0 100.0 4295 2 0 2.90KB 5.52MB false 0 0.0B 93.76KB 0.0 100.0 6297 2 1 2.90KB 2.94MB false 0 0.0B 93.76KB 0.0 100.0 8298 2 2 830.0B 586.81KB false 0 0.0B 83.71KB0.0 100.0 9299 ``` 6. Currently CLI tool get blocklet details for all the blockfiles, so if we have more number of carbondata files, then it will take lot of time to get the details, so limit is added to it, by default when we give option as "-b", only 4 outputs will be given, if we want more, we can pass the option value for b as limit number, Example: "`-b 30`" =>> Then limit will be increased to 30 7. one more is a new Option is added called "`-B`", which takes mandatory option value as a block path. This is added just to get the block detail like, number of blocklets, number of pages, rows and size Example: "`-B /home/ss/ss/part-0-0_batchno0-0-0-1539782855178.carbondata`" ``` ## Filtered Block Details for: part-0-0_batchno0-0-0-1539782855178.carbondata BLKT NumPages NumRows Size 04 20 10.0B ``` Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? NA - [x] Any backward compatibility impacted? Need o handle - [x] Document update required? Yes, will be raised in separate PR - [x] Testing done UTs are added Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata integrate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2828.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2828 commit b747030671155067bf729e032ebdc88b87033534 Author: akashrn5 Date: 2018-10-10T13:15:31Z added DDL support for cli and add more info in carbon file footer ---
[GitHub] carbondata issue #2791: [WIP][HOTFIX]correct the exception handling in looku...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2791 i am marking it as WIP, i will check and whitelost all the other exception which can be thrown from spark ---
[GitHub] carbondata pull request #2791: [HOTFIX]correct the exception handling in loo...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2791#discussion_r222537606 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala --- @@ -23,6 +23,7 @@ import java.net.URI import scala.collection.mutable.ArrayBuffer import org.apache.hadoop.fs.permission.{FsAction, FsPermission} +import org.apache.hadoop.hive.ql.metadata.HiveException --- End diff -- scalastyle fails if i give the space, i have checked it ---
[GitHub] carbondata pull request #2791: [HOTFIX]correct the exception handling in loo...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2791#discussion_r222537366 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala --- @@ -208,7 +209,10 @@ class CarbonFileMetastore extends CarbonMetaStore { try { lookupRelation(tableIdentifier)(sparkSession) } catch { - case _: Exception => + case ex: Exception => +if (ex.getCause.isInstanceOf[HiveException]) { + throw ex +} --- End diff -- here we arereturning false directly after catching exception, telling the table does not exists, this is wrong, i have check the flow , we might get hiveException(regarding permission) also if the user is not allowed to access the table, that time also we return false and we will not get the proper error. ---