[jira] [Resolved] (CARBONDATA-2246) Fix not-enough-memory bugs in unsafe data loading
[ https://issues.apache.org/jira/browse/CARBONDATA-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-2246. Resolution: Fixed > Fix not-enough-memory bugs in unsafe data loading > - > > Key: CARBONDATA-2246 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2246 > Project: CarbonData > Issue Type: Bug >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Currently in carbon data loading, if we enable the unsafe loading and specify > corresponding properties, data loading will end in OOM. > The key properties to reproduce the bug are as following: > ``` > 01: > CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_INMEMORY_MERGE_SORT, > "true") > CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_UNSAFE_SORT, > "true") > 02: > 03: // unsafe sort memory manager > 04: > CarbonProperties.getInstance().addProperty(CarbonCommonConstants.IN_MEMORY_STORAGE_FOR_SORTED_DATA_IN_MB, > "1024") > 05: > 06: // unsafe working memory manager > 07: > CarbonProperties.getInstance().addProperty(CarbonCommonConstants.UNSAFE_WORKING_MEMORY_IN_MB, > "512") > 08: > 09: // one unsafe page, better if loading_cores * this < memory > 10: > CarbonProperties.getInstance().addProperty(CarbonCommonConstants.OFFHEAP_SORT_CHUNK_SIZE_IN_MB, > "512") > ``` > > Notice that the `OFFHEAP_SORT_CHUNK_SIZE_IN_MB` are exactly the same as > `UNSAFE_WORKING_MEMORY_IN_MB` which will cause problem -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2238) Optimization in unsafe sort during data loading
[ https://issues.apache.org/jira/browse/CARBONDATA-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-2238. Resolution: Fixed > Optimization in unsafe sort during data loading > --- > > Key: CARBONDATA-2238 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2238 > Project: CarbonData > Issue Type: Improvement > Components: data-load >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > Inspired by batch_sort, if we have enough memory, in local_sort with unsafe > property, we can hold all the row pages in memory if possible and only spill > the pages to disk as sort temp file if the memory is unavailable. > Before spilling the pages, we can do in-memory merge sort of the pages. > Each time we request an unsafe row page, if the memory is unavailable, we can > trigger a merge sort for the in-memory pages and spill the result to disk as > a sort temp file. So the incoming pages will be held into the memory instead > of spilling to disk directly. > After this implementation, the data size during each spilling will be bigger > than that of before and will benefit the disk IO. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2052: [CARBONDATA-2246][DataLoad] Fix exhausted mem...
Github user xuchuanyin closed the pull request at: https://github.com/apache/carbondata/pull/2052 ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Added comment for SDK writer API m...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2141 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4337/ ---
[GitHub] carbondata issue #2127: [CARBONDATA-2301][SDK] CarbonStore interface and two...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2127 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4336/ ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Added comment for SDK writer API m...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2141 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3646/ ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Added comment for SDK writer API m...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2141 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4868/ ---
[GitHub] carbondata issue #2127: [CARBONDATA-2301][SDK] CarbonStore interface and two...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2127 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4335/ ---
[GitHub] carbondata pull request #2141: [CARBONDATA-2313] Added comment for SDK write...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2141#discussion_r179951717 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java --- @@ -58,44 +58,85 @@ private boolean isUnManagedTable; private long UUID; + /** + * prepares the builder with the schema provided + * @param schema --- End diff -- done. Added. ---
[GitHub] carbondata issue #2139: [CARBONDATA-2267] [Presto] Support Reading CarbonDat...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2139 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4334/ ---
[GitHub] carbondata issue #2127: [CARBONDATA-2301][SDK] CarbonStore interface and two...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2127 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4867/ ---
[GitHub] carbondata issue #2127: [CARBONDATA-2301][SDK] CarbonStore interface and two...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2127 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3645/ ---
[GitHub] carbondata pull request #2141: [CARBONDATA-2313] Added comment for SDK write...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2141#discussion_r179950388 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java --- @@ -58,44 +58,85 @@ private boolean isUnManagedTable; private long UUID; + /** + * prepares the builder with the schema provided + * @param schema --- End diff -- provide description for @param and @return ---
[GitHub] carbondata pull request #2103: [CARBONDATA-2312]Support In Memory Catalog
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2103#discussion_r179950362 --- Diff: integration/spark2/src/main/spark2.2/org/apache/spark/sql/hive/CarbonSqlAstBuilder.scala --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.parser.ParserUtils.{string, withOrigin} +import org.apache.spark.sql.catalyst.parser.SqlBaseParser.{AddTableColumnsContext, ChangeColumnContext, CreateHiveTableContext, CreateTableContext, ShowTablesContext} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.execution.SparkSqlAstBuilder +import org.apache.spark.sql.execution.command.{AlterTableAddColumnsModel, AlterTableDataTypeChangeModel} +import org.apache.spark.sql.execution.command.schema.{CarbonAlterTableAddColumnCommand, CarbonAlterTableDataTypeChangeCommand} +import org.apache.spark.sql.execution.command.table.CarbonShowTablesCommand +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.parser.{CarbonHelperSqlAstBuilder, CarbonSpark2SqlParser} +import org.apache.spark.sql.types.DecimalType + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties + +class CarbonSqlAstBuilder(conf: SQLConf, parser: CarbonSpark2SqlParser, sparkSession: SparkSession) + extends SparkSqlAstBuilder(conf) { + + val helper = new CarbonHelperSqlAstBuilder(conf, parser, sparkSession) + + override def visitCreateHiveTable(ctx: CreateHiveTableContext): LogicalPlan = { +val fileStorage = helper.getFileStorage(ctx.createFileFormat) + +if (fileStorage.equalsIgnoreCase("'carbondata'") || +fileStorage.equalsIgnoreCase("carbondata") || +fileStorage.equalsIgnoreCase("'carbonfile'") || +fileStorage.equalsIgnoreCase("'org.apache.carbondata.format'")) { + val createTableTuple = (ctx.createTableHeader, ctx.skewSpec, +ctx.bucketSpec, ctx.partitionColumns, ctx.columns, ctx.tablePropertyList,ctx.locationSpec(), +Option(ctx.STRING()).map(string), ctx.AS, ctx.query, fileStorage) + helper.createCarbonTable(createTableTuple) +} else { + super.visitCreateHiveTable(ctx) +} + } + + override def visitChangeColumn(ctx: ChangeColumnContext): LogicalPlan = { + +val newColumn = visitColType(ctx.colType) +if (!ctx.identifier.getText.equalsIgnoreCase(newColumn.name)) { + throw new MalformedCarbonCommandException( +"Column names provided are different. Both the column names should be same") +} + +val (typeString, values) : (String, Option[List[(Int, Int)]]) = newColumn.dataType match { + case d:DecimalType => ("decimal", Some(List((d.precision, d.scale + case _ => (newColumn.dataType.typeName.toLowerCase, None) +} + +val alterTableChangeDataTypeModel = + AlterTableDataTypeChangeModel(new CarbonSpark2SqlParser().parseDataType(typeString, values), +new CarbonSpark2SqlParser() + .convertDbNameToLowerCase(Option(ctx.tableIdentifier().db).map(_.getText)), +ctx.tableIdentifier().table.getText.toLowerCase, +ctx.identifier.getText.toLowerCase, +newColumn.name.toLowerCase) + +CarbonAlterTableDataTypeChangeCommand(alterTableChangeDataTypeModel) + } + + + override def visitAddTableColumns(ctx: AddTableColumnsContext): LogicalPlan = { + --- End diff -- remove empty line ---
[GitHub] carbondata pull request #2103: [CARBONDATA-2312]Support In Memory Catalog
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2103#discussion_r179950352 --- Diff: integration/spark2/src/main/spark2.2/org/apache/spark/sql/hive/CarbonSessionStateWithoutHive.scala --- @@ -0,0 +1,253 @@ +package org.apache.spark.sql.hive + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.Path +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.{Analyzer, FunctionRegistry} +import org.apache.spark.sql.catalyst.catalog._ +import org.apache.spark.sql.catalyst.expressions.Expression +import org.apache.spark.sql.catalyst.optimizer.Optimizer +import org.apache.spark.sql.catalyst.parser.ParserInterface +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.command.{AlterTableVo, TableRenameVo} +import org.apache.spark.sql.execution.datasources._ +import org.apache.spark.sql.execution.strategy.{CarbonLateDecodeStrategy, DDLStrategy, StreamingTableStrategy} +import org.apache.spark.sql.internal.{SQLConf, SessionResourceLoader, SessionState, SessionStateBuilder} +import org.apache.spark.sql.optimizer.{CarbonIUDRule, CarbonLateDecodeRule, CarbonUDFTransformRule} +import org.apache.spark.sql.parser.CarbonSparkSqlParser +import org.apache.spark.sql.types.{StructField, StructType} +import org.apache.spark.sql.{CarbonDatasourceHadoopRelation, CarbonEnv, SparkSession} +import org.apache.spark.util.CarbonReflectionUtils + +import org.apache.carbondata.core.util.CarbonUtil +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.format.TableInfo +import org.apache.carbondata.spark.util.CarbonScalaUtil + +/** + * This class will have carbon catalog and refresh the relation from cache if the carbontable in + * carbon catalog is not same as cached carbon relation's carbon table + * + * @param externalCatalog + * @param globalTempViewManager + * @param sparkSession + * @param functionResourceLoader + * @param functionRegistry + * @param conf + * @param hadoopConf + */ +class CarbonInMemorySessionCatalog( +externalCatalog: ExternalCatalog, +globalTempViewManager: GlobalTempViewManager, +functionRegistry: FunctionRegistry, +sparkSession: SparkSession, +conf: SQLConf, +hadoopConf: Configuration, +parser: ParserInterface, +functionResourceLoader: FunctionResourceLoader) + extends SessionCatalog( +externalCatalog, +globalTempViewManager, +functionRegistry, +conf, +hadoopConf, +parser, +functionResourceLoader + ) with CarbonSessionCatalog { + + override def alterTableRename(tableRenameVo: TableRenameVo): Unit = { +sparkSession.sessionState.catalog.renameTable( + tableRenameVo.oldTableIdentifier, + tableRenameVo.newTableIdentifier) + } + + override def alterTable(alterTableVo: AlterTableVo) : Unit = { +// NOt Required in case of In-memory catalog + } + + override def alterAddColumns(alterTableVo: AlterTableVo): Unit = { +val catalogTable = sparkSession.sessionState.catalog.getTableMetadata( + alterTableVo.tableIdentifier) +val structType = catalogTable.schema +var newStructType = structType +alterTableVo.newColumns.get.foreach {cols => + newStructType = structType +.add(cols.getColumnName, CarbonScalaUtil.convertCarbonToSparkDataType(cols.getDataType)) +} +alterSchema(newStructType, catalogTable, alterTableVo.tableIdentifier) + } + + override def alterDropColumns(alterTableVo: AlterTableVo): Unit = { +val catalogTable = sparkSession.sessionState.catalog.getTableMetadata( + alterTableVo.tableIdentifier) +val fields = catalogTable.schema.fields.filterNot { field => + alterTableVo.newColumns.get.exists { col => +col.getColumnName.equalsIgnoreCase(field.name) + } +} +alterSchema(new StructType(fields), catalogTable, alterTableVo.tableIdentifier) + } + + override def alterColumnChangeDataType(alterTableVo: AlterTableVo): Unit = { +val catalogTable = sparkSession.sessionState.catalog.getTableMetadata( + alterTableVo.tableIdentifier) +val a = catalogTable.schema.fields.flatMap { field => + alterTableVo.newColumns.get.map { col => +if (col.getColumnName.equalsIgnoreCase(field.name)) { + StructField(col.getColumnName, +CarbonScalaUtil.convertCarbonToSparkDataType(col.getDataType)) +} else { + field +}
[GitHub] carbondata pull request #2103: [CARBONDATA-2312]Support In Memory Catalog
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2103#discussion_r179950322 --- Diff: integration/spark2/src/main/spark2.2/org/apache/spark/sql/hive/CarbonSessionStateWithoutHive.scala --- @@ -0,0 +1,253 @@ +package org.apache.spark.sql.hive + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.Path +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.{Analyzer, FunctionRegistry} +import org.apache.spark.sql.catalyst.catalog._ +import org.apache.spark.sql.catalyst.expressions.Expression +import org.apache.spark.sql.catalyst.optimizer.Optimizer +import org.apache.spark.sql.catalyst.parser.ParserInterface +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.command.{AlterTableVo, TableRenameVo} +import org.apache.spark.sql.execution.datasources._ +import org.apache.spark.sql.execution.strategy.{CarbonLateDecodeStrategy, DDLStrategy, StreamingTableStrategy} +import org.apache.spark.sql.internal.{SQLConf, SessionResourceLoader, SessionState, SessionStateBuilder} +import org.apache.spark.sql.optimizer.{CarbonIUDRule, CarbonLateDecodeRule, CarbonUDFTransformRule} +import org.apache.spark.sql.parser.CarbonSparkSqlParser +import org.apache.spark.sql.types.{StructField, StructType} +import org.apache.spark.sql.{CarbonDatasourceHadoopRelation, CarbonEnv, SparkSession} +import org.apache.spark.util.CarbonReflectionUtils + +import org.apache.carbondata.core.util.CarbonUtil +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.format.TableInfo +import org.apache.carbondata.spark.util.CarbonScalaUtil + +/** + * This class will have carbon catalog and refresh the relation from cache if the carbontable in + * carbon catalog is not same as cached carbon relation's carbon table + * + * @param externalCatalog + * @param globalTempViewManager + * @param sparkSession + * @param functionResourceLoader + * @param functionRegistry + * @param conf + * @param hadoopConf + */ +class CarbonInMemorySessionCatalog( --- End diff -- Can you also restrict the class scope, like putting `private[spark] ---
[GitHub] carbondata pull request #2103: [CARBONDATA-2312]Support In Memory Catalog
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2103#discussion_r179950328 --- Diff: integration/spark2/src/main/spark2.2/org/apache/spark/sql/hive/CarbonSessionStateWithoutHive.scala --- @@ -0,0 +1,253 @@ +package org.apache.spark.sql.hive --- End diff -- license is missing ---
[GitHub] carbondata pull request #2103: [CARBONDATA-2312]Support In Memory Catalog
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2103#discussion_r179950297 --- Diff: integration/spark2/src/main/spark2.2/org/apache/spark/sql/hive/CarbonSessionStateWithoutHive.scala --- @@ -0,0 +1,253 @@ +package org.apache.spark.sql.hive + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.Path +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.{Analyzer, FunctionRegistry} +import org.apache.spark.sql.catalyst.catalog._ +import org.apache.spark.sql.catalyst.expressions.Expression +import org.apache.spark.sql.catalyst.optimizer.Optimizer +import org.apache.spark.sql.catalyst.parser.ParserInterface +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.command.{AlterTableVo, TableRenameVo} +import org.apache.spark.sql.execution.datasources._ +import org.apache.spark.sql.execution.strategy.{CarbonLateDecodeStrategy, DDLStrategy, StreamingTableStrategy} +import org.apache.spark.sql.internal.{SQLConf, SessionResourceLoader, SessionState, SessionStateBuilder} +import org.apache.spark.sql.optimizer.{CarbonIUDRule, CarbonLateDecodeRule, CarbonUDFTransformRule} +import org.apache.spark.sql.parser.CarbonSparkSqlParser +import org.apache.spark.sql.types.{StructField, StructType} +import org.apache.spark.sql.{CarbonDatasourceHadoopRelation, CarbonEnv, SparkSession} +import org.apache.spark.util.CarbonReflectionUtils + +import org.apache.carbondata.core.util.CarbonUtil +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.format.TableInfo +import org.apache.carbondata.spark.util.CarbonScalaUtil + +/** + * This class will have carbon catalog and refresh the relation from cache if the carbontable in + * carbon catalog is not same as cached carbon relation's carbon table + * + * @param externalCatalog + * @param globalTempViewManager + * @param sparkSession + * @param functionResourceLoader + * @param functionRegistry + * @param conf + * @param hadoopConf + */ +class CarbonInMemorySessionCatalog( --- End diff -- I think you can name it `InMemorySessionCatalog` and change the file name also ---
[GitHub] carbondata pull request #2103: [CARBONDATA-2312]Support In Memory Catalog
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2103#discussion_r179950250 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonSessionCatalog.scala --- @@ -63,5 +67,39 @@ trait CarbonSessionCatalog { /** * Update the storageformat with new location information */ - def updateStorageLocation(path: Path, storage: CatalogStorageFormat): CatalogStorageFormat + def updateStorageLocation( + path: Path, + storage: CatalogStorageFormat, + newTableName: String, + dbName: String): CatalogStorageFormat + + /** + * Method used to update the table name + * @param tableRenameVo + */ + def alterTableRename(tableRenameVo: TableRenameVo): Unit --- End diff -- please do not use VO object but put 3 parameters ---
[GitHub] carbondata pull request #2103: [CARBONDATA-2312]Support In Memory Catalog
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2103#discussion_r179950228 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableAddColumnCommand.scala --- @@ -85,11 +85,14 @@ private[sql] case class CarbonAlterTableAddColumnCommand( schemaEvolutionEntry.setAdded(newCols.toList.asJava) val thriftTable = schemaConverter .fromWrapperToExternalTableInfo(wrapperTableInfo, dbName, tableName) - AlterTableUtil + val alterTableVo = AlterTableUtil --- End diff -- Better to return tuple instead of return alterTableVo ---
[GitHub] carbondata pull request #2103: [CARBONDATA-2312]Support In Memory Catalog
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2103#discussion_r179950203 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableAddColumnCommand.scala --- @@ -85,11 +85,14 @@ private[sql] case class CarbonAlterTableAddColumnCommand( schemaEvolutionEntry.setAdded(newCols.toList.asJava) val thriftTable = schemaConverter .fromWrapperToExternalTableInfo(wrapperTableInfo, dbName, tableName) - AlterTableUtil + val alterTableVo = AlterTableUtil .updateSchemaInfo(carbonTable, --- End diff -- change to ```AlterTableUtil.updateSchemaInfo(``` ---
[GitHub] carbondata pull request #2103: [CARBONDATA-2312]Support In Memory Catalog
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2103#discussion_r179950086 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSession.scala --- @@ -124,15 +127,30 @@ object CarbonSession { getOrCreateCarbonSession(null, null) } +def getOrCreateCarbonSessionWithOutHive(): SparkSession = { --- End diff -- instead of providing this, can you add `def enableInMemCatalog(): CarbonBuilder`, so that user can do: ``` SparkSession.builder .enableInMemCatalog .getOrCreateCarbonSession(x,x) ``` ---
[GitHub] carbondata issue #2127: [CARBONDATA-2301][SDK] CarbonStore interface and two...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2127 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3644/ ---
[GitHub] carbondata issue #2127: [CARBONDATA-2301][SDK] CarbonStore interface and two...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2127 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4866/ ---
[GitHub] carbondata issue #2139: [CARBONDATA-2267] [Presto] Support Reading CarbonDat...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2139 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4333/ ---
[GitHub] carbondata pull request #2103: [CARBONDATA-2312]Support In Memory Catalog
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2103#discussion_r179949877 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSession.scala --- @@ -43,16 +43,19 @@ import org.apache.carbondata.streaming.CarbonStreamingQueryListener * User needs to use {CarbonSession.getOrCreateCarbon} to create Carbon session. */ class CarbonSession(@transient val sc: SparkContext, -@transient private val existingSharedState: Option[SharedState] +@transient private val existingSharedState: Option[SharedState], +useHiveMetaStore: Boolean = true --- End diff -- I think it should be @transient also ---
[GitHub] carbondata issue #2139: [CARBONDATA-2267] [Presto] Support Reading CarbonDat...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2139 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4865/ ---
[GitHub] carbondata issue #2139: [CARBONDATA-2267] [Presto] Support Reading CarbonDat...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2139 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3643/ ---
[jira] [Resolved] (CARBONDATA-2308) Compaction should be allow when loading is in progress
[ https://issues.apache.org/jira/browse/CARBONDATA-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-2308. -- Resolution: Fixed > Compaction should be allow when loading is in progress > -- > > Key: CARBONDATA-2308 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2308 > Project: CarbonData > Issue Type: Bug >Reporter: Jacky Li >Assignee: Jacky Li >Priority: Major > Fix For: 1.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > When data loading (or insert into) is in progress, user should be able to do > compaction on same table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2132: [CARBONDATA-2308] Support concurrent loading ...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2132 ---
[jira] [Resolved] (CARBONDATA-2320) Fix error in lucene coarse grain datamap suite
[ https://issues.apache.org/jira/browse/CARBONDATA-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-2320. -- Resolution: Fixed > Fix error in lucene coarse grain datamap suite > -- > > Key: CARBONDATA-2320 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2320 > Project: CarbonData > Issue Type: Bug >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Fix For: 1.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2145: [CARBONDATA-2320][Datamap] Fix error in lunce...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2145 ---
[GitHub] carbondata issue #2145: [CARBONDATA-2320][Datamap] Fix error in luncene coar...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2145 LGTM ---
[GitHub] carbondata issue #2139: [CARBONDATA-2267] [Presto] Support Reading CarbonDat...
Github user anubhav100 commented on the issue: https://github.com/apache/carbondata/pull/2139 retest sdv please ---
[GitHub] carbondata issue #2139: [CARBONDATA-2267] [Presto] Support Reading CarbonDat...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2139 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4332/ ---
[GitHub] carbondata issue #2139: [CARBONDATA-2267] [Presto] Support Reading CarbonDat...
Github user anubhav100 commented on the issue: https://github.com/apache/carbondata/pull/2139 retest this please ---
[GitHub] carbondata issue #2139: [CARBONDATA-2267] [Presto] Support Reading CarbonDat...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2139 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4864/ ---
[jira] [Resolved] (CARBONDATA-2140) Presto Integration - Code Refactoring
[ https://issues.apache.org/jira/browse/CARBONDATA-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Chen resolved CARBONDATA-2140. Resolution: Fixed Fix Version/s: 1.4.0 > Presto Integration - Code Refactoring > - > > Key: CARBONDATA-2140 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2140 > Project: CarbonData > Issue Type: Improvement > Components: presto-integration >Reporter: Bhavya Aggarwal >Assignee: Bhavya Aggarwal >Priority: Minor > Fix For: 1.4.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Presto Integration - Code Refactoring to remove unnecessary class and improve > the performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #1940: [CARBONDATA-2140 ] Refactoring code to improv...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1940 ---
[GitHub] carbondata issue #2139: [CARBONDATA-2267] [Presto] Support Reading CarbonDat...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2139 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4331/ ---
[GitHub] carbondata issue #1940: [CARBONDATA-2140 ] Refactoring code to improve perfo...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1940 @bhavya411 @anubhav100 Thanks for your explanation. LGTM ---
[GitHub] carbondata issue #2146: [CARBONDATA-2321] Fix for selection of partion colum...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2146 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4330/ ---
[GitHub] carbondata issue #2139: [CARBONDATA-2267] [Presto] Support Reading CarbonDat...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2139 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3640/ ---
[GitHub] carbondata issue #2139: [CARBONDATA-2267] [Presto] Support Reading CarbonDat...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2139 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4862/ ---
[GitHub] carbondata issue #2146: [CARBONDATA-2321] Fix for selection of partion colum...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2146 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3639/ ---
[GitHub] carbondata issue #2146: [CARBONDATA-2321] Fix for selection of partion colum...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2146 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4329/ ---
[GitHub] carbondata issue #2146: [CARBONDATA-2321] Fix for selection of partion colum...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2146 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4861/ ---
[GitHub] carbondata issue #2146: [CARBONDATA-2321] Fix for selection of partion colum...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2146 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3638/ ---
[GitHub] carbondata pull request #2146: [CARBONDATA-2321] Fix for selection of partio...
GitHub user jatin9896 opened a pull request: https://github.com/apache/carbondata/pull/2146 [CARBONDATA-2321] Fix for selection of partion column after concurrent load fails randomly Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? No - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? No, Test Manually - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jatin9896/incubator-carbondata CARBONDATA-2321 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2146.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2146 commit 6db9259dcb9dd28bc922572b131edd345f91fb13 Author: JatinDate: 2018-04-08T10:05:19Z fix selection of partion column after concurrent load fails randomly ---
[GitHub] carbondata issue #2127: [CARBONDATA-2301][SDK] CarbonStore interface and two...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2127 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4328/ ---
[jira] [Created] (CARBONDATA-2321) Selecton after a Concurrent Load Failing for Partition columns
Jatin created CARBONDATA-2321: - Summary: Selecton after a Concurrent Load Failing for Partition columns Key: CARBONDATA-2321 URL: https://issues.apache.org/jira/browse/CARBONDATA-2321 Project: CarbonData Issue Type: Bug Components: core Affects Versions: 1.4.0 Environment: Spark-2.1 Reporter: Jatin Assignee: Jatin Fix For: 1.4.0 selection after a Concurrent load fails randomly for partition column. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2136: [CARBONDATA-2307] Fix OOM issue when using Da...
Github user Xaprice commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2136#discussion_r179941484 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala --- @@ -402,7 +402,7 @@ class CarbonScanRDD( // one query id per table model.setQueryId(queryId) // get RecordReader by FileFormat - val reader: RecordReader[Void, Object] = inputSplit.getFileFormat match { + var reader: RecordReader[Void, Object] = inputSplit.getFileFormat match { --- End diff -- reader will be set null in closeReader() method to reduce memory occupation when using coalesce, otherwise there will be lots of reader instances in memory. ---
[GitHub] carbondata issue #2127: [CARBONDATA-2301][SDK] CarbonStore interface and two...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2127 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4327/ ---
[GitHub] carbondata issue #2127: [CARBONDATA-2301][SDK] CarbonStore interface and two...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2127 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4326/ ---
[GitHub] carbondata issue #2145: [CARBONDATA-2320][Datamap] Fix error in luncene coar...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2145 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3634/ ---
[GitHub] carbondata issue #2145: [CARBONDATA-2320][Datamap] Fix error in luncene coar...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2145 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4856/ ---
[GitHub] carbondata issue #2145: [CARBONDATA-2320][Datamap] Fix error in luncene coar...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2145 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4325/ ---
[jira] [Resolved] (CARBONDATA-2056) Hadoop Configuration with access key and secret key should be passed while creating InputStream of distributed carbon file.
[ https://issues.apache.org/jira/browse/CARBONDATA-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-2056. -- Resolution: Fixed Fix Version/s: 1.4.0 > Hadoop Configuration with access key and secret key should be passed while > creating InputStream of distributed carbon file. > --- > > Key: CARBONDATA-2056 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2056 > Project: CarbonData > Issue Type: Bug >Reporter: Mohammad Shahid Khan >Priority: Major > Fix For: 1.4.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2056: [CARBONDATA-2238][DataLoad] Merge and spill i...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2056 ---
[GitHub] carbondata issue #2056: [CARBONDATA-2238][DataLoad] Merge and spill in-memor...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2056 LGTM ---
[GitHub] carbondata issue #2132: [CARBONDATA-2308] Support concurrent loading and com...
Github user QiangCai commented on the issue: https://github.com/apache/carbondata/pull/2132 LGTM ---
[GitHub] carbondata issue #2127: [CARBONDATA-2301][SDK] CarbonStore interface and two...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2127 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3637/ ---
[jira] [Resolved] (CARBONDATA-2299) Support showing all segment information(include visible and invisible segments)
[ https://issues.apache.org/jira/browse/CARBONDATA-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-2299. -- Resolution: Fixed > Support showing all segment information(include visible and invisible > segments) > --- > > Key: CARBONDATA-2299 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2299 > Project: CarbonData > Issue Type: Improvement >Affects Versions: 1.4.0 >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: 1.4.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Use command 'SHOW HISTORY SEGMENTS' to show all segment information(include > visible and invisible segments) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2125: [CARBONDATA-2299]Support showing all segment informa...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2125 LGTM ---
[GitHub] carbondata issue #2127: [CARBONDATA-2301][SDK] CarbonStore interface and two...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2127 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4859/ ---
[GitHub] carbondata pull request #2125: [CARBONDATA-2299]Support showing all segment ...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2125 ---
[GitHub] carbondata issue #2127: [CARBONDATA-2301][SDK] CarbonStore interface and two...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2127 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3636/ ---
[GitHub] carbondata issue #2127: [CARBONDATA-2301][SDK] CarbonStore interface and two...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2127 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4858/ ---
[GitHub] carbondata pull request #1988: [CARBONDATA-2193] Support register analyzer a...
Github user jackylk closed the pull request at: https://github.com/apache/carbondata/pull/1988 ---
[jira] [Resolved] (CARBONDATA-2319) carbon_scan_time and carbon_IO_time are incorrect in task statistics
[ https://issues.apache.org/jira/browse/CARBONDATA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-2319. -- Resolution: Fixed Fix Version/s: 1.4.0 > carbon_scan_time and carbon_IO_time are incorrect in task statistics > > > Key: CARBONDATA-2319 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2319 > Project: CarbonData > Issue Type: Bug >Reporter: QiangCai >Priority: Minor > Fix For: 1.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > carbon_scan_time, carbon_IO_time are incorrect. > > |query_id|task_id|start_time|total_time|load_blocks_time|load_dictionary_time|carbon_scan_time|carbon_IO_time|scan_blocks_num|total_blocklets|valid_blocklets|total_pages|scanned_pages|valid_pages|result_size| > |5385749464281|0|2018-04-08 > 10:52:09.013|47ms|0ms|0ms|-1ms|-1ms|1|1|1|1|0|1|10| -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2127: [CARBONDATA-2301][SDK] CarbonStore interface and two...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2127 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3635/ ---
[GitHub] carbondata pull request #2144: [CARBONDATA-2319][Profiler] Fix carbon_scan_t...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2144 ---
[GitHub] carbondata issue #2127: [CARBONDATA-2301][SDK] CarbonStore interface and two...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2127 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4857/ ---
[GitHub] carbondata issue #2144: [CARBONDATA-2319][Profiler] Fix carbon_scan_time and...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2144 LGTM ---
[GitHub] carbondata pull request #2145: [CARBONDATA-2320][Datamap] Fix error in lunce...
GitHub user xuchuanyin opened a pull request: https://github.com/apache/carbondata/pull/2145 [CARBONDATA-2320][Datamap] Fix error in luncene coarse grain datamap suite add DM-properties while creating datamap, otherwise the test will fail Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? `NO` - [x] Any backward compatibility impacted? `NO` - [x] Document update required? `NO` - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? `NO, only fixed test error` - How it is tested? Please attach test report. `Tested in local` - Is it a performance related change? Please attach the performance test report. `NO` - Any additional information to help reviewers in testing this change. `NO` - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. `Not related` You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata 0408_error_lucene_dm Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2145.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2145 commit c78c201ebc2da0004a749f6fc6cb613fb41a7bc9 Author: xuchuanyinDate: 2018-04-08T06:12:33Z Fix error in luncene coarse grain datamap suite add DM-properties while creating datamap ---
[jira] [Created] (CARBONDATA-2320) Fix error in lucene coarse grain datamap suite
xuchuanyin created CARBONDATA-2320: -- Summary: Fix error in lucene coarse grain datamap suite Key: CARBONDATA-2320 URL: https://issues.apache.org/jira/browse/CARBONDATA-2320 Project: CarbonData Issue Type: Bug Reporter: xuchuanyin Assignee: xuchuanyin Fix For: 1.4.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)