[GitHub] carbondata issue #1556: [CARBONDATA-1770] Updated documentaion for data-mana...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1556 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1833/ ---
[GitHub] carbondata issue #1556: [CARBONDATA-1770] Updated documentaion for data-mana...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1556 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1386/ ---
[GitHub] carbondata pull request #1556: [CARBONDATA-1770] Updated documentaion for da...
GitHub user vandana7 opened a pull request: https://github.com/apache/carbondata/pull/1556 [CARBONDATA-1770] Updated documentaion for data-management-on-carbondata.md and useful-tips-on-carbondata.md While reviewing PR #1534, still there exist some changes which needs to be fixed. I have done those fixes in this PR, please review. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vandana7/incubator-carbondata document_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1556.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1556 commit 6b24c4fccda9be8b089cff2e7d0fdcbcab00d557 Author: vandana Date: 2017-11-23T06:51:20Z some remaing fix ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152730693 --- Diff: integration/spark2/src/main/spark2.1/CarbonSessionState.scala --- @@ -107,15 +110,14 @@ class CarbonSessionCatalog( carbonEnv.carbonMetastore. checkSchemasModifiedTimeAndReloadTables(storePath) -val tableMeta = carbonEnv.carbonMetastore - .getTableFromMetadataCache(carbonDatasourceHadoopRelation.carbonTable.getDatabaseName, -carbonDatasourceHadoopRelation.carbonTable.getFactTableName) -if (tableMeta.isEmpty || (tableMeta.isDefined && -tableMeta.get.carbonTable.getTableLastUpdatedTime != - carbonDatasourceHadoopRelation.carbonTable.getTableLastUpdatedTime)) { +val table = carbonEnv.carbonMetastore.getTableFromMetadataCache( + carbonDatasourceHadoopRelation.carbonTable.getDatabaseName, + carbonDatasourceHadoopRelation.carbonTable.getFactTableName) +if (table.isEmpty || (table.isDefined && + table.get.carbonTable.getTableLastUpdatedTime != --- End diff -- wrong indent ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152730664 --- Diff: integration/spark2/src/main/spark2.1/CarbonSessionState.scala --- @@ -84,8 +87,8 @@ class CarbonSessionCatalog( var toRefreshRelation = false rtnRelation match { case SubqueryAlias(_, - LogicalRelation(carbonDatasourceHadoopRelation: CarbonDatasourceHadoopRelation, _, _), - _) => + LogicalRelation(carbonDatasourceHadoopRelation: CarbonDatasourceHadoopRelation, _, _), --- End diff -- wrong indent ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152730610 --- Diff: integration/spark2/src/main/spark2.1/CarbonSessionState.scala --- @@ -16,6 +16,7 @@ */ package org.apache.spark.sql.hive + --- End diff -- remove empty line ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152730468 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParser.scala --- @@ -38,7 +40,26 @@ import org.apache.carbondata.spark.util.CommonUtil */ class CarbonSparkSqlParser(conf: SQLConf, sparkSession: SparkSession) extends AbstractSqlParser { - val astBuilder = new CarbonSqlAstBuilder(conf) + val parser = new CarbonSpark2SqlParser + val astBuilder = getAstBuilder() + + def getAstBuilder(): AstBuilder = { +if (sparkSession.version.contains("2.1")) { + val clazz = Utils.classForName("org.apache.spark.sql.hive.CarbonSqlAstBuilder") + val ctor = clazz.getConstructors.head + ctor.setAccessible(true) + val astBuilder = ctor.newInstance(conf, parser).asInstanceOf[AstBuilder] + astBuilder +} else if (sparkSession.version.contains("2.2")) { + val clazz = Utils.classForName("org.apache.spark.sql.hive.CarbonSqlAstBuilder") + val ctor = clazz.getConstructors.head + ctor.setAccessible(true) + val astBuilder = ctor.newInstance(conf, parser).asInstanceOf[AstBuilder] + astBuilder --- End diff -- what's the difference between the code for 2.1 and 2.2 ---
[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1542 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1385/ ---
[GitHub] carbondata issue #1554: [CARBONDATA-1717] Fix issue of no sort when data in ...
Github user QiangCai commented on the issue: https://github.com/apache/carbondata/pull/1554 LGTM ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152729141 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParser.scala --- @@ -38,7 +40,26 @@ import org.apache.carbondata.spark.util.CommonUtil */ class CarbonSparkSqlParser(conf: SQLConf, sparkSession: SparkSession) extends AbstractSqlParser { - val astBuilder = new CarbonSqlAstBuilder(conf) + val parser = new CarbonSpark2SqlParser + val astBuilder = getAstBuilder() + + def getAstBuilder(): AstBuilder = { +if (sparkSession.version.contains("2.1")) { + val clazz = Utils.classForName("org.apache.spark.sql.hive.CarbonSqlAstBuilder") + val ctor = clazz.getConstructors.head + ctor.setAccessible(true) + val astBuilder = ctor.newInstance(conf, parser).asInstanceOf[AstBuilder] + astBuilder +} else if (sparkSession.version.contains("2.2")) { --- End diff -- use startsWith instead of contains ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152729132 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParser.scala --- @@ -38,7 +40,26 @@ import org.apache.carbondata.spark.util.CommonUtil */ class CarbonSparkSqlParser(conf: SQLConf, sparkSession: SparkSession) extends AbstractSqlParser { - val astBuilder = new CarbonSqlAstBuilder(conf) + val parser = new CarbonSpark2SqlParser + val astBuilder = getAstBuilder() + + def getAstBuilder(): AstBuilder = { +if (sparkSession.version.contains("2.1")) { --- End diff -- use startsWith instead of contains ---
[GitHub] carbondata pull request #1554: [CARBONDATA-1717] Fix issue of no sort when d...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1554 ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152729085 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSpark2SqlParser.scala --- @@ -17,19 +17,23 @@ package org.apache.spark.sql.parser +import java.lang.reflect.InvocationTargetException + import scala.collection.mutable import scala.language.implicitConversions import org.apache.spark.sql.{DeleteRecords, ShowLoadsCommand, UpdateTable} -import org.apache.spark.sql.catalyst.CarbonDDLSqlParser +import org.apache.spark.sql.catalyst.{CarbonDDLSqlParser, TableIdentifier} import org.apache.spark.sql.catalyst.CarbonTableIdentifierImplicit._ -import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.execution.command._ import org.apache.spark.sql.execution.command.management.{AlterTableCompactionCommand, CleanFilesCommand, DeleteLoadByIdCommand, DeleteLoadByLoadDateCommand, LoadTableCommand} import org.apache.spark.sql.execution.command.partition.{AlterTableDropCarbonPartitionCommand, AlterTableSplitCarbonPartitionCommand} import org.apache.spark.sql.execution.command.schema.{AlterTableAddColumnCommand, AlterTableDataTypeChangeCommand, AlterTableDropColumnCommand} import org.apache.spark.sql.types.StructField +import org.apache.spark.sql.CarbonExpressions.CarbonUnresolvedRelation +import org.apache.spark.sql.catalyst.analysis.{UnresolvedAlias, UnresolvedRelation, UnresolvedStar} --- End diff -- the order of imports is wrong. ---
[GitHub] carbondata issue #1554: [CARBONDATA-1717] Fix issue of no sort when data in ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1554 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1832/ ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152728991 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/internal/CarbonSqlConf.scala --- @@ -32,76 +32,6 @@ class CarbonSQLConf(sparkSession: SparkSession) { /** * To initialize dynamic param defaults along with usage docs */ - def addDefaultCarbonParams(): Unit = { -val ENABLE_UNSAFE_SORT = - SQLConfigBuilder(CarbonCommonConstants.ENABLE_UNSAFE_SORT) -.doc("To enable/ disable unsafe sort.") -.booleanConf - .createWithDefault(carbonProperties.getProperty(CarbonCommonConstants.ENABLE_UNSAFE_SORT, - CarbonCommonConstants.ENABLE_UNSAFE_SORT_DEFAULT).toBoolean) -val CARBON_CUSTOM_BLOCK_DISTRIBUTION = - SQLConfigBuilder(CarbonCommonConstants.CARBON_CUSTOM_BLOCK_DISTRIBUTION) -.doc("To enable/ disable carbon custom block distribution.") -.booleanConf -.createWithDefault(carbonProperties - .getProperty(CarbonCommonConstants.CARBON_CUSTOM_BLOCK_DISTRIBUTION, - CarbonCommonConstants.CARBON_CUSTOM_BLOCK_DISTRIBUTION_DEFAULT).toBoolean) -val BAD_RECORDS_LOGGER_ENABLE = - SQLConfigBuilder(CarbonLoadOptionConstants.CARBON_OPTIONS_BAD_RECORDS_LOGGER_ENABLE) -.doc("To enable/ disable carbon bad record logger.") -.booleanConf -.createWithDefault(CarbonLoadOptionConstants - .CARBON_OPTIONS_BAD_RECORDS_LOGGER_ENABLE_DEFAULT.toBoolean) -val BAD_RECORDS_ACTION = - SQLConfigBuilder(CarbonLoadOptionConstants.CARBON_OPTIONS_BAD_RECORDS_ACTION) -.doc("To configure the bad records action.") -.stringConf -.createWithDefault(carbonProperties - .getProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, -CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION_DEFAULT)) -val IS_EMPTY_DATA_BAD_RECORD = - SQLConfigBuilder(CarbonLoadOptionConstants.CARBON_OPTIONS_IS_EMPTY_DATA_BAD_RECORD) -.doc("Property to decide weather empty data to be considered bad/ good record.") -.booleanConf - .createWithDefault(CarbonLoadOptionConstants.CARBON_OPTIONS_IS_EMPTY_DATA_BAD_RECORD_DEFAULT - .toBoolean) -val SORT_SCOPE = - SQLConfigBuilder(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE) -.doc("Property to specify sort scope.") -.stringConf - .createWithDefault(carbonProperties.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, - CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT)) -val BATCH_SORT_SIZE_INMB = - SQLConfigBuilder(CarbonLoadOptionConstants.CARBON_OPTIONS_BATCH_SORT_SIZE_INMB) -.doc("Property to specify batch sort size in MB.") -.stringConf -.createWithDefault(carbonProperties - .getProperty(CarbonCommonConstants.LOAD_BATCH_SORT_SIZE_INMB, -CarbonCommonConstants.LOAD_BATCH_SORT_SIZE_INMB_DEFAULT)) -val SINGLE_PASS = - SQLConfigBuilder(CarbonLoadOptionConstants.CARBON_OPTIONS_SINGLE_PASS) -.doc("Property to enable/disable single_pass.") -.booleanConf - .createWithDefault(CarbonLoadOptionConstants.CARBON_OPTIONS_SINGLE_PASS_DEFAULT.toBoolean) -val BAD_RECORD_PATH = - SQLConfigBuilder(CarbonLoadOptionConstants.CARBON_OPTIONS_BAD_RECORD_PATH) -.doc("Property to configure the bad record location.") -.stringConf - .createWithDefault(carbonProperties.getProperty(CarbonCommonConstants.CARBON_BADRECORDS_LOC, - CarbonCommonConstants.CARBON_BADRECORDS_LOC_DEFAULT_VAL)) -val GLOBAL_SORT_PARTITIONS = - SQLConfigBuilder(CarbonLoadOptionConstants.CARBON_OPTIONS_GLOBAL_SORT_PARTITIONS) -.doc("Property to configure the global sort partitions.") -.stringConf -.createWithDefault(carbonProperties - .getProperty(CarbonCommonConstants.LOAD_GLOBAL_SORT_PARTITIONS, -CarbonCommonConstants.LOAD_GLOBAL_SORT_PARTITIONS_DEFAULT)) -val DATEFORMAT = - SQLConfigBuilder(CarbonLoadOptionConstants.CARBON_OPTIONS_DATEFORMAT) -.doc("Property to configure data format for date type columns.") -.stringConf - .createWithDefault(CarbonLoadOptionConstants.CARBON_OPTIONS_DATEFORMAT_DEFAULT) - } --- End diff -- why does it need to remove above lines? ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152728865 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonSqlConfFactory.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import org.apache.spark.internal.config.ConfigBuilder +import org.apache.spark.sql.hive.CarbonSqlConfCompileCode.AbstractCarbonSqlConfFactory +import org.apache.spark.util.ScalaCompilerUtil + + +private[sql] class CarbonSqlConfCodeGenerateFactory(version: String) { + + val carbonSqlConfFactory = if (version.equals("2.1")) { --- End diff -- use startsWith instead of equals ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152728764 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonOptimizer.scala --- @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import org.apache.spark.sql.ExperimentalMethods +import org.apache.spark.sql.catalyst.catalog.SessionCatalog +import org.apache.spark.sql.catalyst.optimizer.Optimizer +import org.apache.spark.sql.hive.CarbonOptimizerCompileCode.AbstractCarbonOptimizerFactory +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.util.ScalaCompilerUtil + + +private[sql] class CarbonOptimizerCodeGenerateFactory(version: String) { + + val carbonoptimizerFactory = if (version.equals("2.1")) { --- End diff -- use startsWith instead of equals. ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152728579 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonHiveMetaStore.scala --- @@ -153,8 +153,11 @@ class CarbonHiveMetaStore extends CarbonFileMetastore { val dbName = oldTableIdentifier.getDatabaseName val tableName = oldTableIdentifier.getTableName val schemaParts = CarbonUtil.convertToMultiGsonStrings(wrapperTableInfo, "=", "'", "") - sparkSession.sessionState.asInstanceOf[CarbonSessionState].metadataHive.runSqlHive( - s"ALTER TABLE $dbName.$tableName SET SERDEPROPERTIES($schemaParts)") +val hiveClient = sparkSession.asInstanceOf[CarbonSession].sharedState.externalCatalog + .asInstanceOf[HiveExternalCatalog].client +hiveClient.runSqlHive(s"ALTER TABLE $dbName.$tableName SET SERDEPROPERTIES($schemaParts)") + +sparkSession.sessionState --- End diff -- unused code, remove it. ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152728527 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala --- @@ -115,18 +121,50 @@ class CarbonFileMetastore extends CarbonMetaStore { lookupRelation(TableIdentifier(tableName, dbName))(sparkSession) } + val rm = universe.runtimeMirror(getClass.getClassLoader) + + def getField[T: TypeTag : reflect.ClassTag](name: String, obj: T): CatalogTable = { +val im = rm.reflect(obj) +val sym = im.symbol.typeSignature.member(TermName(name)) +val tableMeta = im.reflectMethod(sym.asMethod).apply() +tableMeta.asInstanceOf[CatalogTable] + } + override def lookupRelation(tableIdentifier: TableIdentifier) (sparkSession: SparkSession): LogicalPlan = { val database = tableIdentifier.database.getOrElse( sparkSession.catalog.currentDatabase) val relation = sparkSession.sessionState.catalog.lookupRelation(tableIdentifier) match { case SubqueryAlias(_, - LogicalRelation(carbonDatasourceHadoopRelation: CarbonDatasourceHadoopRelation, _, _), - _) => + LogicalRelation(carbonDatasourceHadoopRelation: CarbonDatasourceHadoopRelation, _, _)) => carbonDatasourceHadoopRelation.carbonRelation case LogicalRelation( carbonDatasourceHadoopRelation: CarbonDatasourceHadoopRelation, _, _) => carbonDatasourceHadoopRelation.carbonRelation + +// case SubqueryAlias(_, c: CatalogRelation) if sparkSession.version.contains("2.2") && +// getField("tableMeta", c) +// .asInstanceOf[CatalogTable].provider +// .isDefined && +// getField("tableMeta", c) +// .asInstanceOf[String] +// .equals("org.apache.spark.sql.CarbonSource") => +//new CarbonSource() +// .createRelation(sparkSession.sqlContext, +//c.tableMeta.storage.properties) +// .asInstanceOf[CarbonDatasourceHadoopRelation].carbonRelation + + case SubqueryAlias(_, c: CatalogRelation) if sparkSession.version.contains("2.2") && --- End diff -- use startsWith instead of contains ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152728433 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonAnalysisRules.scala --- @@ -143,52 +246,250 @@ case class CarbonIUDAnalysisRule(sparkSession: SparkSession) extends Rule[Logica selectPlan } val finalPlan = if (filter.length > 0) { - val alias = table.alias.getOrElse("") var transformed: Boolean = false // Create a dummy projection to include filter conditions var newPlan: LogicalPlan = null if (table.tableIdentifier.database.isDefined) { newPlan = parser.parsePlan("select * from " + - table.tableIdentifier.database.getOrElse("") + "." + - table.tableIdentifier.table + " " + alias + " " + filter) + table.tableIdentifier.database.getOrElse("") + "." + + table.tableIdentifier.table + " " + alias.getOrElse("") + " " + + filter) } else { newPlan = parser.parsePlan("select * from " + - table.tableIdentifier.table + " " + alias + " " + filter) + table.tableIdentifier.table + " " + alias.getOrElse("") + " " + + filter) } newPlan transform { -case UnresolvedRelation(t, Some(a)) - if !transformed && t == table.tableIdentifier && a == alias => +case CarbonUnresolvedRelation(t) + if !transformed && t == table.tableIdentifier => transformed = true // Add the filter condition of update statement on destination table - SubqueryAlias(alias, updatedSelectPlan, Option(table.tableIdentifier)) + // SubqueryAlias(alias.getOrElse(""), updatedSelectPlan, Option(table.tableIdentifier)) + if (sparkSession.version.contains("2.1")) { +// SubqueryAlias(alias1, updatedSelectPlan, Option(table.tableIdentifier)) +val clazz = Utils + .classForName("org.apache.spark.sql.catalyst.plans.logical.SubqueryAlias") +val ctor = clazz.getConstructors.head +ctor.setAccessible(true) +val subqueryAlias = ctor + .newInstance(alias.getOrElse(""), updatedSelectPlan, Option(table.tableIdentifier)) + .asInstanceOf[SubqueryAlias] +subqueryAlias + } else if (sparkSession.version.contains("2.2")) { +// SubqueryAlias(table.output.map(_.withQualifier(Some(table.tableName))).toString(), +// Project(projList, relation)) +val clazz = Utils + .classForName("org.apache.spark.sql.catalyst.plans.logical.SubqueryAlias") +val ctor = clazz.getConstructors.head +ctor.setAccessible(true) +val subqueryAlias = ctor.newInstance(alias.getOrElse(""), updatedSelectPlan) + .asInstanceOf[SubqueryAlias] +subqueryAlias + } else { +throw new UnsupportedOperationException("Unsupported Spark version") + } } } else { updatedSelectPlan } val tid = CarbonTableIdentifierImplicit.toTableIdentifier(Seq(table.tableIdentifier.toString())) val tidSeq = Seq(GetDB.getDatabaseName(tid.database, sparkSession)) -val destinationTable = UnresolvedRelation(table.tableIdentifier, table.alias) +// TODO use reflection +// val destinationTable = UnresolvedRelation(table.tableIdentifier, Some(alias.getOrElse(""))) +val destinationTable = + if (sparkSession.version.contains("2.1")) { --- End diff -- use startsWith instead of contains ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152728420 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonAnalysisRules.scala --- @@ -143,52 +246,250 @@ case class CarbonIUDAnalysisRule(sparkSession: SparkSession) extends Rule[Logica selectPlan } val finalPlan = if (filter.length > 0) { - val alias = table.alias.getOrElse("") var transformed: Boolean = false // Create a dummy projection to include filter conditions var newPlan: LogicalPlan = null if (table.tableIdentifier.database.isDefined) { newPlan = parser.parsePlan("select * from " + - table.tableIdentifier.database.getOrElse("") + "." + - table.tableIdentifier.table + " " + alias + " " + filter) + table.tableIdentifier.database.getOrElse("") + "." + + table.tableIdentifier.table + " " + alias.getOrElse("") + " " + + filter) } else { newPlan = parser.parsePlan("select * from " + - table.tableIdentifier.table + " " + alias + " " + filter) + table.tableIdentifier.table + " " + alias.getOrElse("") + " " + + filter) } newPlan transform { -case UnresolvedRelation(t, Some(a)) - if !transformed && t == table.tableIdentifier && a == alias => +case CarbonUnresolvedRelation(t) + if !transformed && t == table.tableIdentifier => transformed = true // Add the filter condition of update statement on destination table - SubqueryAlias(alias, updatedSelectPlan, Option(table.tableIdentifier)) + // SubqueryAlias(alias.getOrElse(""), updatedSelectPlan, Option(table.tableIdentifier)) + if (sparkSession.version.contains("2.1")) { +// SubqueryAlias(alias1, updatedSelectPlan, Option(table.tableIdentifier)) +val clazz = Utils + .classForName("org.apache.spark.sql.catalyst.plans.logical.SubqueryAlias") +val ctor = clazz.getConstructors.head +ctor.setAccessible(true) +val subqueryAlias = ctor + .newInstance(alias.getOrElse(""), updatedSelectPlan, Option(table.tableIdentifier)) + .asInstanceOf[SubqueryAlias] +subqueryAlias + } else if (sparkSession.version.contains("2.2")) { --- End diff -- use startsWith instead of contains ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152728413 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonAnalysisRules.scala --- @@ -143,52 +246,250 @@ case class CarbonIUDAnalysisRule(sparkSession: SparkSession) extends Rule[Logica selectPlan } val finalPlan = if (filter.length > 0) { - val alias = table.alias.getOrElse("") var transformed: Boolean = false // Create a dummy projection to include filter conditions var newPlan: LogicalPlan = null if (table.tableIdentifier.database.isDefined) { newPlan = parser.parsePlan("select * from " + - table.tableIdentifier.database.getOrElse("") + "." + - table.tableIdentifier.table + " " + alias + " " + filter) + table.tableIdentifier.database.getOrElse("") + "." + + table.tableIdentifier.table + " " + alias.getOrElse("") + " " + + filter) } else { newPlan = parser.parsePlan("select * from " + - table.tableIdentifier.table + " " + alias + " " + filter) + table.tableIdentifier.table + " " + alias.getOrElse("") + " " + + filter) } newPlan transform { -case UnresolvedRelation(t, Some(a)) - if !transformed && t == table.tableIdentifier && a == alias => +case CarbonUnresolvedRelation(t) + if !transformed && t == table.tableIdentifier => transformed = true // Add the filter condition of update statement on destination table - SubqueryAlias(alias, updatedSelectPlan, Option(table.tableIdentifier)) + // SubqueryAlias(alias.getOrElse(""), updatedSelectPlan, Option(table.tableIdentifier)) + if (sparkSession.version.contains("2.1")) { --- End diff -- use startsWith instead of contains ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152728438 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonAnalysisRules.scala --- @@ -143,52 +246,250 @@ case class CarbonIUDAnalysisRule(sparkSession: SparkSession) extends Rule[Logica selectPlan } val finalPlan = if (filter.length > 0) { - val alias = table.alias.getOrElse("") var transformed: Boolean = false // Create a dummy projection to include filter conditions var newPlan: LogicalPlan = null if (table.tableIdentifier.database.isDefined) { newPlan = parser.parsePlan("select * from " + - table.tableIdentifier.database.getOrElse("") + "." + - table.tableIdentifier.table + " " + alias + " " + filter) + table.tableIdentifier.database.getOrElse("") + "." + + table.tableIdentifier.table + " " + alias.getOrElse("") + " " + + filter) } else { newPlan = parser.parsePlan("select * from " + - table.tableIdentifier.table + " " + alias + " " + filter) + table.tableIdentifier.table + " " + alias.getOrElse("") + " " + + filter) } newPlan transform { -case UnresolvedRelation(t, Some(a)) - if !transformed && t == table.tableIdentifier && a == alias => +case CarbonUnresolvedRelation(t) + if !transformed && t == table.tableIdentifier => transformed = true // Add the filter condition of update statement on destination table - SubqueryAlias(alias, updatedSelectPlan, Option(table.tableIdentifier)) + // SubqueryAlias(alias.getOrElse(""), updatedSelectPlan, Option(table.tableIdentifier)) + if (sparkSession.version.contains("2.1")) { +// SubqueryAlias(alias1, updatedSelectPlan, Option(table.tableIdentifier)) +val clazz = Utils + .classForName("org.apache.spark.sql.catalyst.plans.logical.SubqueryAlias") +val ctor = clazz.getConstructors.head +ctor.setAccessible(true) +val subqueryAlias = ctor + .newInstance(alias.getOrElse(""), updatedSelectPlan, Option(table.tableIdentifier)) + .asInstanceOf[SubqueryAlias] +subqueryAlias + } else if (sparkSession.version.contains("2.2")) { +// SubqueryAlias(table.output.map(_.withQualifier(Some(table.tableName))).toString(), +// Project(projList, relation)) +val clazz = Utils + .classForName("org.apache.spark.sql.catalyst.plans.logical.SubqueryAlias") +val ctor = clazz.getConstructors.head +ctor.setAccessible(true) +val subqueryAlias = ctor.newInstance(alias.getOrElse(""), updatedSelectPlan) + .asInstanceOf[SubqueryAlias] +subqueryAlias + } else { +throw new UnsupportedOperationException("Unsupported Spark version") + } } } else { updatedSelectPlan } val tid = CarbonTableIdentifierImplicit.toTableIdentifier(Seq(table.tableIdentifier.toString())) val tidSeq = Seq(GetDB.getDatabaseName(tid.database, sparkSession)) -val destinationTable = UnresolvedRelation(table.tableIdentifier, table.alias) +// TODO use reflection +// val destinationTable = UnresolvedRelation(table.tableIdentifier, Some(alias.getOrElse(""))) +val destinationTable = + if (sparkSession.version.contains("2.1")) { + val clazz = Utils.classForName("org.apache.spark.sql.catalyst.analysis.UnresolvedRelation") + val ctor = clazz.getConstructors.head + ctor.setAccessible(true) + val unresolvedrelation = ctor +.newInstance(table.tableIdentifier, + Some(alias.getOrElse(""))).asInstanceOf[UnresolvedRelation] +unresolvedrelation +} else if (sparkSession.version.contains("2.2")) { --- End diff -- use startsWith instead of contains ---
[GitHub] carbondata issue #1554: [CARBONDATA-1717] Fix issue of no sort when data in ...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1554 LGTM ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152728384 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonAnalysisRules.scala --- @@ -143,52 +246,250 @@ case class CarbonIUDAnalysisRule(sparkSession: SparkSession) extends Rule[Logica selectPlan } val finalPlan = if (filter.length > 0) { - val alias = table.alias.getOrElse("") var transformed: Boolean = false // Create a dummy projection to include filter conditions var newPlan: LogicalPlan = null if (table.tableIdentifier.database.isDefined) { newPlan = parser.parsePlan("select * from " + - table.tableIdentifier.database.getOrElse("") + "." + - table.tableIdentifier.table + " " + alias + " " + filter) + table.tableIdentifier.database.getOrElse("") + "." + + table.tableIdentifier.table + " " + alias.getOrElse("") + " " + + filter) --- End diff -- the indent of above 3 lines is wrong ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152728263 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonAnalysisRules.scala --- @@ -95,11 +165,40 @@ case class CarbonIUDAnalysisRule(sparkSession: SparkSession) extends Rule[Logica def prepareTargetReleation(relation: UnresolvedRelation): SubqueryAlias = { val tupleId = UnresolvedAlias(Alias(UnresolvedFunction("getTupleId", Seq.empty, isDistinct = false), "tupleId")()) + + val localalias = alias match { +case Some(a) => Some(alias.toSeq) +case _ => None + } val projList = Seq( -UnresolvedAlias(UnresolvedStar(Option(table.alias.toSeq))), tupleId) +UnresolvedAlias(UnresolvedStar(localalias)), tupleId) // include tuple id and rest of the required columns in subqury - SubqueryAlias(table.alias.getOrElse(""), -Project(projList, relation), Option(table.tableIdentifier)) +// SubqueryAlias(alias.getOrElse(""), +//Project(projList, relation), Option(table.tableIdentifier)) +// + if (sparkSession.version.contains("2.1")) { +// SubqueryAlias(table.output.map(_.withQualifier(Some(table.tableName))).toString(), +// Project(projList, relation), Option(table.tableIdentifier)) +val clazz = Utils.classForName("org.apache.spark.sql.catalyst.plans.logical.SubqueryAlias") +val ctor = clazz.getConstructors.head +ctor.setAccessible(true) +val subqueryAlias = ctor + .newInstance(alias.getOrElse(""), +Project(projList, relation), Option(table.tableIdentifier)).asInstanceOf[SubqueryAlias] +subqueryAlias + } else if (sparkSession.version.contains("2.2")) { --- End diff -- use startsWith instead of contains ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152728246 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonAnalysisRules.scala --- @@ -95,11 +165,40 @@ case class CarbonIUDAnalysisRule(sparkSession: SparkSession) extends Rule[Logica def prepareTargetReleation(relation: UnresolvedRelation): SubqueryAlias = { val tupleId = UnresolvedAlias(Alias(UnresolvedFunction("getTupleId", Seq.empty, isDistinct = false), "tupleId")()) + + val localalias = alias match { +case Some(a) => Some(alias.toSeq) +case _ => None + } val projList = Seq( -UnresolvedAlias(UnresolvedStar(Option(table.alias.toSeq))), tupleId) +UnresolvedAlias(UnresolvedStar(localalias)), tupleId) // include tuple id and rest of the required columns in subqury - SubqueryAlias(table.alias.getOrElse(""), -Project(projList, relation), Option(table.tableIdentifier)) +// SubqueryAlias(alias.getOrElse(""), +//Project(projList, relation), Option(table.tableIdentifier)) +// + if (sparkSession.version.contains("2.1")) { --- End diff -- use startsWith instead of contains ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152728216 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonAnalysisRules.scala --- @@ -95,11 +165,40 @@ case class CarbonIUDAnalysisRule(sparkSession: SparkSession) extends Rule[Logica def prepareTargetReleation(relation: UnresolvedRelation): SubqueryAlias = { val tupleId = UnresolvedAlias(Alias(UnresolvedFunction("getTupleId", Seq.empty, isDistinct = false), "tupleId")()) + + val localalias = alias match { --- End diff -- use localAlias instead of localalias. ---
[GitHub] carbondata issue #1554: [CARBONDATA-1717] Fix issue of no sort when data in ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1554 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1384/ ---
[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/1542 retest this please ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152726082 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/DDLStrategy.scala --- @@ -26,13 +26,20 @@ import org.apache.spark.sql.execution.command.management.{AlterTableCompactionCo import org.apache.spark.sql.execution.command.partition.ShowCarbonPartitionsCommand import org.apache.spark.sql.execution.command.schema.{AlterTableAddColumnCommand, AlterTableDataTypeChangeCommand, AlterTableDropColumnCommand, AlterTableRenameTableCommand} import org.apache.spark.sql.hive.execution.command.{CarbonDropDatabaseCommand, CarbonResetCommand, CarbonSetCommand} +import org.apache.spark.sql.CarbonExpressions.{CarbonDescribeTable => DescribeTableCommand} +import org.apache.spark.sql.catalyst.catalog.CatalogTypes import org.apache.carbondata.core.util.CarbonUtil import org.apache.carbondata.spark.exception.MalformedCarbonCommandException + --- End diff -- remove empty line ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152725990 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala --- @@ -42,6 +43,7 @@ import org.apache.carbondata.spark.CarbonAliasDecoderRelation import org.apache.carbondata.spark.rdd.CarbonScanRDD import org.apache.carbondata.spark.util.CarbonScalaUtil + --- End diff -- remove empty line ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152725691 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSession.scala --- @@ -42,15 +42,46 @@ class CarbonSession(@transient val sc: SparkContext, this(sc, None) } + + // SessionStateCodeGenerateFactory.init(sc.version) + // CarbonOptimizerCodeGenerateFactory.init(sc.version) + // val carbonDefaultOptimizer = CarbonOptimizerCodeGenerateFactory.getInstance() + // .carbonoptimizerFactory.createCarbonOptimizer() + // @transient + // override lazy val sessionState: SessionState = new CarbonSessionState(this) + + + + def getSessionState(sparkContext: SparkContext): SessionState = { +if (sparkContext.version.contains("2.1")) { + val clazz = Utils.classForName("org.apache.spark.sql.hive.CarbonSessionState") + val ctor = clazz.getConstructors.head + ctor.setAccessible(true) + val sessionState1 = ctor.newInstance(this).asInstanceOf[SessionState] + sessionState1 +} else if (sparkContext.version.contains("2.2")) { --- End diff -- use sparkContext.version.startsWith("2.2") ---
[GitHub] carbondata pull request #1543: [CARBONDATA-1786] [BugFix] Refactored code to...
Github user geetikagupta16 closed the pull request at: https://github.com/apache/carbondata/pull/1543 ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152721088 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSession.scala --- @@ -42,15 +42,46 @@ class CarbonSession(@transient val sc: SparkContext, this(sc, None) } + + // SessionStateCodeGenerateFactory.init(sc.version) + // CarbonOptimizerCodeGenerateFactory.init(sc.version) + // val carbonDefaultOptimizer = CarbonOptimizerCodeGenerateFactory.getInstance() + // .carbonoptimizerFactory.createCarbonOptimizer() + // @transient + // override lazy val sessionState: SessionState = new CarbonSessionState(this) + + + + def getSessionState(sparkContext: SparkContext): SessionState = { +if (sparkContext.version.contains("2.1")) { --- End diff -- It'd better use sparkContext.version.**startsWith**("2.1"), if version = 2.2.1, contains("2.1") will return true. ---
[jira] [Created] (CARBONDATA-1801) Remove unnecessary mdk computation code
jiangmanhua created CARBONDATA-1801: --- Summary: Remove unnecessary mdk computation code Key: CARBONDATA-1801 URL: https://issues.apache.org/jira/browse/CARBONDATA-1801 Project: CarbonData Issue Type: Improvement Reporter: jiangmanhua Priority: Minor In `org.apache.carbondata.core.datastore.page.key.TablePageKey#update`, argument `mdk` can be reused to avoid duplicate computation for mdk by replacing `WriteStepRowUtil.getMdk(row, mdkGenerator)` Original Code: {code:java} /** update all keys based on the input row */ public void update(int rowId, CarbonRow row, byte[] mdk) throws KeyGenException { if (hasNoDictionary) { currentNoDictionaryKey = WriteStepRowUtil.getNoDictAndComplexDimension(row); } if (rowId == 0) { startKey = WriteStepRowUtil.getMdk(row, mdkGenerator); noDictStartKey = currentNoDictionaryKey; } noDictEndKey = currentNoDictionaryKey; if (rowId == pageSize - 1) { endKey = WriteStepRowUtil.getMdk(row, mdkGenerator); finalizeKeys(); } } {code} https://github.com/apache/carbondata/blob/74226907990cdee41a6ccbd69e2a813077792f89/core/src/main/java/org/apache/carbondata/core/datastore/page/key/TablePageKey.java#L66 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1547: [CARBONDATA-1792] Add example of data management for...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1547 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1831/ ---
[jira] [Updated] (CARBONDATA-1800) Mistyping in DataMapRow compare
[ https://issues.apache.org/jira/browse/CARBONDATA-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiangmanhua updated CARBONDATA-1800: Description: In BlockletDMComparator {code:java} public int compare(DataMapRow first, DataMapRow second) { ... byte[][] firstBytes = splitKey(first.getByteArray(0)); byte[][] secondBytes = splitKey(first.getByteArray(0)); ... } {code} line 65 and 66 use same input argument, the second line should be assign as " splitKey(second.getByteArray(0)) " https://github.com/apache/carbondata/blob/79feac96ae789851c5ad7306a7acaaba25d8e6c9/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDMComparator.java#L66 was: In BlockletDMComparator public int compare(DataMapRow first, DataMapRow second) { ... byte[][] firstBytes = splitKey(first.getByteArray(0)); byte[][] secondBytes = splitKey(first.getByteArray(0)); ... } line 65 and 66 use same input argument, the second line should be assign as " splitKey(second.getByteArray(0)) " https://github.com/apache/carbondata/blob/79feac96ae789851c5ad7306a7acaaba25d8e6c9/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDMComparator.java#L66 > Mistyping in DataMapRow compare > --- > > Key: CARBONDATA-1800 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1800 > Project: CarbonData > Issue Type: Bug > Components: core >Reporter: jiangmanhua > > In BlockletDMComparator > {code:java} > public int compare(DataMapRow first, DataMapRow second) { >... > byte[][] firstBytes = splitKey(first.getByteArray(0)); > byte[][] secondBytes = splitKey(first.getByteArray(0)); >... > } > {code} > line 65 and 66 use same input argument, the second line should be assign as " > splitKey(second.getByteArray(0)) " > https://github.com/apache/carbondata/blob/79feac96ae789851c5ad7306a7acaaba25d8e6c9/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDMComparator.java#L66 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1547: [CARBONDATA-1792] Add example of data management for...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1547 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1383/ ---
[jira] [Created] (CARBONDATA-1800) Mistyping in DataMapRow compare
jiangmanhua created CARBONDATA-1800: --- Summary: Mistyping in DataMapRow compare Key: CARBONDATA-1800 URL: https://issues.apache.org/jira/browse/CARBONDATA-1800 Project: CarbonData Issue Type: Bug Components: core Reporter: jiangmanhua In BlockletDMComparator public int compare(DataMapRow first, DataMapRow second) { ... byte[][] firstBytes = splitKey(first.getByteArray(0)); byte[][] secondBytes = splitKey(first.getByteArray(0)); ... } line 65 and 66 use same input argument, the second line should be assign as " splitKey(second.getByteArray(0)) " https://github.com/apache/carbondata/blob/79feac96ae789851c5ad7306a7acaaba25d8e6c9/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDMComparator.java#L66 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1460: [Docs] Fix partition-guide.md docs NUM_PARTITIONS wr...
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/1460 please rebase to branch master ---
[GitHub] carbondata issue #1554: [CARBONDATA-1717] Fix issue of no sort when data in ...
Github user QiangCai commented on the issue: https://github.com/apache/carbondata/pull/1554 @chenerlu can you unify the variable name to "hadoopConf"? ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1537 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1382/ ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1537 retest this please ---
[jira] [Commented] (CARBONDATA-1789) Carbon1.3.0 Concurrent Load-Drop: user is able to drop table even if insert/load job is running
[ https://issues.apache.org/jira/browse/CARBONDATA-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263644#comment-16263644 ] xuchuanyin commented on CARBONDATA-1789: I previously read it from somewhere that the `drop` has higher priority than that of `load`, so maybe this is not an issue. If you insist, I think talk it in maillist is better. > Carbon1.3.0 Concurrent Load-Drop: user is able to drop table even if > insert/load job is running > --- > > Key: CARBONDATA-1789 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1789 > Project: CarbonData > Issue Type: Bug > Components: data-load > Environment: 3 Node ant cluster >Reporter: Ajeet Rai > Labels: dfx > Fix For: 1.3.0 > > > Carbon1.3.0 Concurrent Load-Drop: user is able to drop table even if > insert/load job is running > Steps: > 1: Create a table > 2: Start a insert job > 3: Concurrently drop the table > 4: Observe that drop is success > 5: Observe that insert job is running and after some times job fails > Expected behvaiour: drop job should wait for insert job to complete -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1555: [CARBONDATA-1799] conf added in testcase
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1555 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1381/ ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1537 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1380/ ---
[GitHub] carbondata issue #1555: [CARBONDATA-1799] conf added in testcase
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1555 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1830/ ---
[GitHub] carbondata issue #1541: [CARBONDATA-1785][Build] add coveralls badge to carb...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1541 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1379/ ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1537 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1829/ ---
[GitHub] carbondata issue #1554: [CARBONDATA-1717] Fix issue of no sort when data in ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1554 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1378/ ---
[jira] [Resolved] (CARBONDATA-1795) Fix code issue of all examples
[ https://issues.apache.org/jira/browse/CARBONDATA-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Pesala resolved CARBONDATA-1795. - Resolution: Fixed Fix Version/s: 1.3.0 > Fix code issue of all examples > -- > > Key: CARBONDATA-1795 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1795 > Project: CarbonData > Issue Type: Bug > Components: examples >Reporter: Liang Chen >Assignee: Liang Chen > Fix For: 1.3.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Fix code issue of all examples: > CarbonDataFrameExample,CarbonSortColumnsExample,HadoopFileExample -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1551: [CARBONDATA-1795] Fix code issues of examples
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1551 ---
[GitHub] carbondata issue #1551: [CARBONDATA-1795] Fix code issues of examples
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1551 LGTM ---
[GitHub] carbondata pull request #1555: [CARBONDATA-1799] conf added in testcase
GitHub user rahulforallp opened a pull request: https://github.com/apache/carbondata/pull/1555 [CARBONDATA-1799] conf added in testcase Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [X] Any interfaces changed? No - [X] Any backward compatibility impacted? No - [X] Document update required? No - [X] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rahulforallp/incubator-carbondata CARBONDATA-1799 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1555.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1555 commit c887fd7da43be9f57acc2adcb5e85a223f7aa695 Author: rahulforallp Date: 2017-11-22T17:04:28Z conf added in testcase ---
[jira] [Created] (CARBONDATA-1799) CarbonInputMapperTest is failing
Rahul Kumar created CARBONDATA-1799: --- Summary: CarbonInputMapperTest is failing Key: CARBONDATA-1799 URL: https://issues.apache.org/jira/browse/CARBONDATA-1799 Project: CarbonData Issue Type: Bug Reporter: Rahul Kumar Assignee: Rahul Kumar -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1551: [CARBONDATA-1795] Fix code issues of examples
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1551 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1828/ ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1537 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1377/ ---
[GitHub] carbondata issue #1551: [CARBONDATA-1795] Fix code issues of examples
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1551 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1376/ ---
[GitHub] carbondata issue #1554: [CARBONDATA-1717] Fix issue of no sort when data in ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1554 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1827/ ---
[GitHub] carbondata issue #1534: [CARBONDATA-1770] Update error docs and consolidate ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1534 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1375/ ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1537 retest this please ---
[GitHub] carbondata issue #1554: [CARBONDATA-1717] Fix issue of no sort when data in ...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1554 retest this please ---
[GitHub] carbondata issue #1554: [CARBONDATA-1717] Fix issue of no sort when data in ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1554 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1374/ ---
[GitHub] carbondata issue #1534: [CARBONDATA-1770] Update error docs and consolidate ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1534 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1826/ ---
[GitHub] carbondata issue #1554: [CARBONDATA-1717] Fix issue of no sort when data in ...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1554 @chenerlu please follow the below template to provide description. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. ---
[GitHub] carbondata issue #1541: [CARBONDATA-1785][Build] add coveralls badge to carb...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1541 Whether can embed coverage ratio to each PRs, or not ? ---
[GitHub] carbondata pull request #1508: [CARBONDATA-1738] [PreAgg] Block direct inser...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1508#discussion_r152597828 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/execution/command/CarbonHiveCommands.scala --- @@ -95,8 +95,14 @@ object CarbonSetCommand { } } else if (key.startsWith(CarbonCommonConstants.VALIDATE_CARBON_INPUT_SEGMENTS)) { sessionParams.addProperty(key.toLowerCase(), value) +} else if (key.startsWith(CarbonCommonConstants.IS_INTERNAL_LOAD_CALL)) { --- End diff -- I don't think it is required to use `set command` for this internal call. we are not going to give the option to load the aggregate table as it may corrupt the table. ---
[GitHub] carbondata issue #1541: [CARBONDATA-1785][Build] add coveralls badge to carb...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1541 retest this please ---
[GitHub] carbondata issue #1543: [CARBONDATA-1786] [BugFix] Refactored code to avoid ...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1543 @geetikagupta16 thanks for your contribution, pr1550 already fixed this issue and merged, please close this PR. ---
[jira] [Resolved] (CARBONDATA-1770) Update documents and consolidate DDL,DML,Partition docs
[ https://issues.apache.org/jira/browse/CARBONDATA-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Pesala resolved CARBONDATA-1770. - Resolution: Fixed Fix Version/s: 1.3.0 > Update documents and consolidate DDL,DML,Partition docs > --- > > Key: CARBONDATA-1770 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1770 > Project: CarbonData > Issue Type: Improvement > Components: docs >Reporter: Liang Chen >Assignee: Liang Chen > Fix For: 1.3.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > 1. Update documents : there are some error description. > 2. Consolidate Data management, DDL,DML,Partition docs, to ensure one feature > which only be described in one place. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1547: [CARBONDATA-1792]add example of data manageme...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1547#discussion_r152595009 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/DataManagementExample.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.examples + +import java.io.File + +object DataManagementExample { + + def main(args: Array[String]) { +val spark = ExampleUtils.createCarbonSession("DataManagementExample") +spark.sparkContext.setLogLevel("WARN") + +spark.sql("DROP TABLE IF EXISTS carbon_table") + +// Create table +spark.sql( + s""" + | CREATE TABLE IF NOT EXISTS carbon_table( + | ID Int, + | date Date, + | country String, + | name String, + | phonetype String, + | serialname String, + | salary Int, + | floatField float + | ) STORED BY 'carbondata' + """.stripMargin) + +// data.csv has 10 lines --- End diff -- this row comment is not required : // data.csv has 10 lines ---
[GitHub] carbondata pull request #1534: [CARBONDATA-1770] Update error docs and conso...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1534 ---
[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1542 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1373/ ---
[GitHub] carbondata issue #1534: [CARBONDATA-1770] Update error docs and consolidate ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1534 LGTM ---
[GitHub] carbondata pull request #1554: [CARBONDATA-1717] Fix issue of no sort when d...
GitHub user chenerlu opened a pull request: https://github.com/apache/carbondata/pull/1554 [CARBONDATA-1717] Fix issue of no sort when data in carbon table is all numeric Modification reason: Fix issue of no sort when data in carbon table is all numeric. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenerlu/incubator-carbondata 1122 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1554.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1554 commit ac84728dbc7ca064149c5b61a10ae4686c729d2d Author: chenerlu Date: 2017-11-22T15:04:13Z [CARBONDATA-1717] Fix issue of no sort when data in carbon table is all numeric ---
[GitHub] carbondata pull request #1534: [CARBONDATA-1770] Update error docs and conso...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1534#discussion_r152589826 --- Diff: docs/data-management-on-carbondata.md --- @@ -0,0 +1,713 @@ + + +# Data Management on CarbonData + +This tutorial is going to introduce all commands and data operations on CarbonData. + +* [CREATE TABLE](#create-table) +* [TABLE MANAGEMENT](#table-management) +* [LOAD DATA](#load-data) +* [UPDATE AND DELETE](#update-and-delete) +* [COMPACTION](#compaction) +* [PARTITION](#partition) +* [BUCKETING](#bucketing) +* [SEGMENT MANAGEMENT](#segment-management) + +## CREATE TABLE + + This command can be used to create a CarbonData table by specifying the list of fields along with the table properties. + + ``` + CREATE TABLE [IF NOT EXISTS] [db_name.]table_name[(col_name data_type , ...)] + STORED BY 'carbondata' + [TBLPROPERTIES (property_name=property_value, ...)] + ``` + +### Usage Guidelines + + Following are the guidelines for TBLPROPERTIES, CarbonData's additional table options can be set via carbon.properties. + + - **Dictionary Encoding Configuration** + + Dictionary encoding is turned off for all columns by default from 1.3 onwards, you can use this command for including columns to do dictionary encoding. + Suggested use cases : do dictionary encoding for low cardinality columns, it might help to improve data compression ratio and performance. + + ``` + TBLPROPERTIES ('DICTIONARY_INCLUDE'='column1, column2') + ``` + + - **Inverted Index Configuration** + + By default inverted index is enabled, it might help to improve compression ratio and query speed, especially for low cardinality columns which are in reward position. + Suggested use cases : For high cardinality columns, you can disable the inverted index for improving the data loading performance. + + ``` + TBLPROPERTIES ('NO_INVERTED_INDEX'='column1, column3') + ``` + + - **Sort Columns Configuration** + + This property is for users to specify which columns belong to the MDK(Multi-Dimensions-Key) index. + * If users don't specify "SORT_COLUMN" property, by default MDK index be built by using all dimension columns except complex datatype column. + * If this property is specified but with empty argument, then the table will be loaded without sort.. + Suggested use cases : Only build MDK index for required columns,it might help to improve the data loading performance. + + ``` + TBLPROPERTIES ('SORT_COLUMNS'='column1, column3') + OR + TBLPROPERTIES ('SORT_COLUMNS'='') + ``` + + - **Sort Scope Configuration** + + This property is for users to specify the scope of the sort during data load, following are the types of sort scope. + + * LOCAL_SORT: It is the default sort scope. + * NO_SORT: It will load the data in unsorted manner, it will significantly increase load performance. + * BATCH_SORT: It increases the load performance but decreases the query performance if identified blocks > parallelism. + * GLOBAL_SORT: It increases the query performance, especially high concurrent point query. + And if you care about loading resources isolation strictly, because the system uses the spark GroupBy to sort data, the resource can be controlled by spark. + + - **Table Block Size Configuration** + + This command is for setting block size of this table, the default value is 1024 MB and supports a range of 1 MB to 2048 MB. + + ``` + TBLPROPERTIES ('TABLE_BLOCKSIZE'='512') + //512 or 512M both are accepted. --- End diff -- accept, fixed. ---
[GitHub] carbondata pull request #1534: [CARBONDATA-1770] Update error docs and conso...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1534#discussion_r152589597 --- Diff: docs/data-management-on-carbondata.md --- @@ -0,0 +1,713 @@ + + +# Data Management on CarbonData + +This tutorial is going to introduce all commands and data operations on CarbonData. + +* [CREATE TABLE](#create-table) +* [TABLE MANAGEMENT](#table-management) +* [LOAD DATA](#load-data) +* [UPDATE AND DELETE](#update-and-delete) +* [COMPACTION](#compaction) +* [PARTITION](#partition) +* [BUCKETING](#bucketing) +* [SEGMENT MANAGEMENT](#segment-management) + +## CREATE TABLE + + This command can be used to create a CarbonData table by specifying the list of fields along with the table properties. + + ``` + CREATE TABLE [IF NOT EXISTS] [db_name.]table_name[(col_name data_type , ...)] + STORED BY 'carbondata' + [TBLPROPERTIES (property_name=property_value, ...)] + ``` + +### Usage Guidelines + + Following are the guidelines for TBLPROPERTIES, CarbonData's additional table options can be set via carbon.properties. + + - **Dictionary Encoding Configuration** + + Dictionary encoding is turned off for all columns by default from 1.3 onwards, you can use this command for including columns to do dictionary encoding. + Suggested use cases : do dictionary encoding for low cardinality columns, it might help to improve data compression ratio and performance. + + ``` + TBLPROPERTIES ('DICTIONARY_INCLUDE'='column1, column2') + ``` + + - **Inverted Index Configuration** + + By default inverted index is enabled, it might help to improve compression ratio and query speed, especially for low cardinality columns which are in reward position. + Suggested use cases : For high cardinality columns, you can disable the inverted index for improving the data loading performance. + + ``` + TBLPROPERTIES ('NO_INVERTED_INDEX'='column1, column3') + ``` + + - **Sort Columns Configuration** + + This property is for users to specify which columns belong to the MDK(Multi-Dimensions-Key) index. + * If users don't specify "SORT_COLUMN" property, by default MDK index be built by using all dimension columns except complex datatype column. + * If this property is specified but with empty argument, then the table will be loaded without sort.. + Suggested use cases : Only build MDK index for required columns,it might help to improve the data loading performance. + + ``` + TBLPROPERTIES ('SORT_COLUMNS'='column1, column3') + OR + TBLPROPERTIES ('SORT_COLUMNS'='') + ``` + + - **Sort Scope Configuration** + + This property is for users to specify the scope of the sort during data load, following are the types of sort scope. + + * LOCAL_SORT: It is the default sort scope. + * NO_SORT: It will load the data in unsorted manner, it will significantly increase load performance. + * BATCH_SORT: It increases the load performance but decreases the query performance if identified blocks > parallelism. + * GLOBAL_SORT: It increases the query performance, especially high concurrent point query. + And if you care about loading resources isolation strictly, because the system uses the spark GroupBy to sort data, the resource can be controlled by spark. + + - **Table Block Size Configuration** + + This command is for setting block size of this table, the default value is 1024 MB and supports a range of 1 MB to 2048 MB. + + ``` + TBLPROPERTIES ('TABLE_BLOCKSIZE'='512') + //512 or 512M both are accepted. + ``` + +### Example: +``` +CREATE TABLE IF NOT EXISTS productSchema.productSalesTable ( + productNumber Int, + productName String, + storeCity String, + storeProvince String, + productCategory String, + productBatch String, + saleQuantity Int, + revenue Int) +STORED BY 'carbondata' +TBLPROPERTIES ('DICTIONARY_INCLUDE'='productNumber', + 'NO_INVERTED_INDEX'='productBatch', + 'SORT_COLUMNS'='productName,storeCity', + 'SORT_SCOPE'='NO_SORT', + 'TABLE_BLOCKSIZE'='512') +``` + +## TABLE MANAGEMENT + +### SHOW TABLE + + This command can be used
[GitHub] carbondata issue #1534: [CARBONDATA-1770] Update error docs and consolidate ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1534 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1372/ ---
[GitHub] carbondata issue #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1469 @sounakr Please check the build for 2.2 http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/375/ ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152584958 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/CarbonCreateTableCommand.scala --- @@ -80,7 +80,13 @@ case class CarbonCreateTableCommand( val fields = new Array[Field](cm.dimCols.size + cm.msrCols.size) cm.dimCols.foreach(f => fields(f.schemaOrdinal) = f) cm.msrCols.foreach(f => fields(f.schemaOrdinal) = f) - +// +// sparkSession.sql( +//s"""CREATE TABLE $dbName.$tbName +// |(${ fields.map(f => f.rawSchema.replace("`", "")).mkString(",") }) +// |USING org.apache.spark.sql.CarbonSource""".stripMargin + +//s""" OPTIONS (tableName "$tbName", dbName "$dbName", tablePath """.stripMargin + +//s$tablePath"$carbonSchemaString) """) --- End diff -- remove commented code ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152583029 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/CastExpressionOptimization.scala --- @@ -21,16 +21,19 @@ import java.text.{ParseException, SimpleDateFormat} import java.util import java.util.{Locale, TimeZone} +import scala.Option import scala.collection.JavaConverters._ -import org.apache.spark.sql.catalyst.expressions.{Attribute, Cast, EmptyRow, EqualTo, Expression, GreaterThan, GreaterThanOrEqual, In, LessThan, LessThanOrEqual, Literal, Not} +import org.apache.spark.sql.catalyst.expressions.{Attribute, EmptyRow, EqualTo, Expression, GreaterThan, GreaterThanOrEqual, In, LessThan, LessThanOrEqual, Literal, Not} import org.apache.spark.sql.CastExpr import org.apache.spark.sql.sources -import org.apache.spark.sql.types.{DoubleType, IntegerType, StringType, TimestampType} +import org.apache.spark.sql.types.{DataType, DoubleType, IntegerType, StringType, TimestampType} +import org.apache.spark.sql.CarbonExpressions.{MatchCast => Cast} import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties + --- End diff -- remove the empty line ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152582968 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/CastExpressionOptimization.scala --- @@ -96,6 +99,8 @@ object CastExpressionOptimization { } } + + --- End diff -- remove the empty lines ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152582582 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSession.scala --- @@ -42,15 +42,46 @@ class CarbonSession(@transient val sc: SparkContext, this(sc, None) } + + // SessionStateCodeGenerateFactory.init(sc.version) + // CarbonOptimizerCodeGenerateFactory.init(sc.version) + // val carbonDefaultOptimizer = CarbonOptimizerCodeGenerateFactory.getInstance() + // .carbonoptimizerFactory.createCarbonOptimizer() + // @transient + // override lazy val sessionState: SessionState = new CarbonSessionState(this) + + + + def getSessionState(sparkContext: SparkContext): SessionState = { --- End diff -- move this reflection code to one common util ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152582387 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSession.scala --- @@ -42,15 +42,46 @@ class CarbonSession(@transient val sc: SparkContext, this(sc, None) } + + // SessionStateCodeGenerateFactory.init(sc.version) + // CarbonOptimizerCodeGenerateFactory.init(sc.version) + // val carbonDefaultOptimizer = CarbonOptimizerCodeGenerateFactory.getInstance() + // .carbonoptimizerFactory.createCarbonOptimizer() + // @transient + // override lazy val sessionState: SessionState = new CarbonSessionState(this) + + + + def getSessionState(sparkContext: SparkContext): SessionState = { +if (sparkContext.version.contains("2.1")) { + val clazz = Utils.classForName("org.apache.spark.sql.hive.CarbonSessionState") + val ctor = clazz.getConstructors.head + ctor.setAccessible(true) + val sessionState1 = ctor.newInstance(this).asInstanceOf[SessionState] + sessionState1 +} else if (sparkContext.version.contains("2.2")) { + val clazz = Utils.classForName("org.apache.spark.sql.hive.CarbonSessionStateBuilder") + val ctor = clazz.getConstructors.head + ctor.setAccessible(true) + val sessionStateBuilder = ctor.newInstance(this, None) + val method = clazz.getMethod("build") + val sessionState1: SessionState = method.invoke(sessionStateBuilder) +.asInstanceOf[SessionState] --- End diff -- move to above line ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152582199 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSession.scala --- @@ -42,15 +42,46 @@ class CarbonSession(@transient val sc: SparkContext, this(sc, None) } + + // SessionStateCodeGenerateFactory.init(sc.version) + // CarbonOptimizerCodeGenerateFactory.init(sc.version) + // val carbonDefaultOptimizer = CarbonOptimizerCodeGenerateFactory.getInstance() + // .carbonoptimizerFactory.createCarbonOptimizer() + // @transient + // override lazy val sessionState: SessionState = new CarbonSessionState(this) + --- End diff -- remove the commented code ---
[GitHub] carbondata issue #1553: [CARBONDATA-1797] Segment_Index compaction should ta...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1553 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1825/ ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152582049 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSession.scala --- @@ -19,10 +19,10 @@ package org.apache.spark.sql import java.io.File import org.apache.hadoop.conf.Configuration -import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.scheduler.{SparkListener, SparkListenerApplicationEnd} +import org.apache.spark.SparkConf +import org.apache.spark.SparkContext --- End diff -- Imports order don't change , keep as combined like earlier ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152579581 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonExpressions.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation +import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec +import org.apache.spark.sql.catalyst.expressions.{Attribute, Cast, Expression} +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias} +import org.apache.spark.sql.execution.command.DescribeTableCommand +import org.apache.spark.sql.types.DataType + + +object CarbonExpressions { + + object MatchCast { +def unapply(expr: Expression): Option[(Attribute, DataType)] = { + if (expr.isInstanceOf[Cast]) { +val castExpr = expr.asInstanceOf[Cast] +if (castExpr.child.isInstanceOf[Attribute]) { + Some((castExpr.child.asInstanceOf[Attribute], castExpr.dataType)) +} else { + None +} + } else { +None + } + +} + } + + object CarbonDescribeTable { +def unapply(plan: LogicalPlan): Option[(TableIdentifier, TablePartitionSpec, Boolean)] = { + if (plan.isInstanceOf[DescribeTableCommand]) { +val describeTableCommand = plan.asInstanceOf[DescribeTableCommand] +if (describeTableCommand.table.isInstanceOf[TableIdentifier]) { + if (describeTableCommand.partitionSpec.isInstanceOf[TablePartitionSpec]) { +if (describeTableCommand.isExtended.isInstanceOf[Boolean]) { + Some(describeTableCommand.table, +describeTableCommand.partitionSpec, +describeTableCommand.isExtended) +} else { + None +} + } else { +None + } +} else { + None +} + } else { +None + } +} + } + + object CarbonSubqueryAlias { +def unapply(plan: LogicalPlan): Option[(String, LogicalPlan)] = { + if (plan.isInstanceOf[SubqueryAlias]) { +val subqueryAlias = plan.asInstanceOf[SubqueryAlias] +if (subqueryAlias.alias.isInstanceOf[String]) { + if (subqueryAlias.child.isInstanceOf[LogicalPlan]) { +Some(subqueryAlias.alias, + subqueryAlias.child) --- End diff -- Move to above line ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152578680 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonExpressions.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation +import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec +import org.apache.spark.sql.catalyst.expressions.{Attribute, Cast, Expression} +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias} +import org.apache.spark.sql.execution.command.DescribeTableCommand +import org.apache.spark.sql.types.DataType + + +object CarbonExpressions { + + object MatchCast { +def unapply(expr: Expression): Option[(Attribute, DataType)] = { + if (expr.isInstanceOf[Cast]) { +val castExpr = expr.asInstanceOf[Cast] +if (castExpr.child.isInstanceOf[Attribute]) { + Some((castExpr.child.asInstanceOf[Attribute], castExpr.dataType)) +} else { + None +} + } else { +None + } + +} + } + + object CarbonDescribeTable { +def unapply(plan: LogicalPlan): Option[(TableIdentifier, TablePartitionSpec, Boolean)] = { + if (plan.isInstanceOf[DescribeTableCommand]) { +val describeTableCommand = plan.asInstanceOf[DescribeTableCommand] +if (describeTableCommand.table.isInstanceOf[TableIdentifier]) { + if (describeTableCommand.partitionSpec.isInstanceOf[TablePartitionSpec]) { +if (describeTableCommand.isExtended.isInstanceOf[Boolean]) { + Some(describeTableCommand.table, +describeTableCommand.partitionSpec, +describeTableCommand.isExtended) +} else { + None +} + } else { +None + } +} else { + None +} + } else { +None + } +} + } + + object CarbonSubqueryAlias { +def unapply(plan: LogicalPlan): Option[(String, LogicalPlan)] = { + if (plan.isInstanceOf[SubqueryAlias]) { +val subqueryAlias = plan.asInstanceOf[SubqueryAlias] +if (subqueryAlias.alias.isInstanceOf[String]) { + if (subqueryAlias.child.isInstanceOf[LogicalPlan]) { --- End diff -- combine both `if` conditions ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152578348 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonExpressions.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation +import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec +import org.apache.spark.sql.catalyst.expressions.{Attribute, Cast, Expression} +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias} +import org.apache.spark.sql.execution.command.DescribeTableCommand +import org.apache.spark.sql.types.DataType + + +object CarbonExpressions { + + object MatchCast { +def unapply(expr: Expression): Option[(Attribute, DataType)] = { + if (expr.isInstanceOf[Cast]) { +val castExpr = expr.asInstanceOf[Cast] +if (castExpr.child.isInstanceOf[Attribute]) { + Some((castExpr.child.asInstanceOf[Attribute], castExpr.dataType)) +} else { + None +} + } else { +None + } + +} + } + + object CarbonDescribeTable { +def unapply(plan: LogicalPlan): Option[(TableIdentifier, TablePartitionSpec, Boolean)] = { + if (plan.isInstanceOf[DescribeTableCommand]) { +val describeTableCommand = plan.asInstanceOf[DescribeTableCommand] +if (describeTableCommand.table.isInstanceOf[TableIdentifier]) { + if (describeTableCommand.partitionSpec.isInstanceOf[TablePartitionSpec]) { +if (describeTableCommand.isExtended.isInstanceOf[Boolean]) { --- End diff -- Combine all 3 `if` conditions to one `if` ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152577825 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonExpressions.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation +import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec +import org.apache.spark.sql.catalyst.expressions.{Attribute, Cast, Expression} +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias} +import org.apache.spark.sql.execution.command.DescribeTableCommand +import org.apache.spark.sql.types.DataType + + +object CarbonExpressions { + + object MatchCast { --- End diff -- Add comments ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152577180 --- Diff: integration/spark-common-cluster-test/pom.xml --- @@ -177,6 +177,17 @@ spark-2.1 + --- End diff -- why this dependency added separately to 2.1 ---
[GitHub] carbondata issue #1551: [CARBONDATA-1795] Fix code issues of examples
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1551 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1371/ ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152575948 --- Diff: assembly/pom.xml --- @@ -126,7 +126,7 @@ - spark-2.1 --- End diff -- It should not be removed, need to add another profile for 2.2 ---
[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/1542 retest this please ---
[GitHub] carbondata issue #1534: [CARBONDATA-1770] Update error docs and consolidate ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1534 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1824/ ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152571649 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala --- @@ -85,7 +86,7 @@ object CarbonEnv { def getInstance(sparkSession: SparkSession): CarbonEnv = { if (sparkSession.isInstanceOf[CarbonSession]) { - sparkSession.sessionState.catalog.asInstanceOf[CarbonSessionCatalog].carbonEnv + sparkSession.sessionState.catalog.asInstanceOf[CarbonSessionCatalog].carbonEnv --- End diff -- don't change the format ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152571409 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala --- @@ -58,7 +59,7 @@ class CarbonEnv { ThreadLocalSessionInfo.setCarbonSessionInfo(carbonSessionInfo) val config = new CarbonSQLConf(sparkSession) if(sparkSession.conf.getOption(CarbonCommonConstants.ENABLE_UNSAFE_SORT) == None) { -config.addDefaultCarbonParams() +//config.addDefaultCarbonParams() --- End diff -- remove if not required ---
[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1469#discussion_r152571235 --- Diff: integration/spark2/pom.xml --- @@ -36,7 +36,7 @@ org.apache.carbondata - carbondata-spark-common + carbondata-streaming --- End diff -- why it is required to change ---