[GitHub] carbondata pull request #1575: [CARBONDATA-1698]Adding support for table lev...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1575#discussion_r157361376 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -873,6 +873,16 @@ public static final String TABLE_BLOCKSIZE = "table_blocksize"; // set in column level to disable inverted index public static final String NO_INVERTED_INDEX = "no_inverted_index"; + // table property name of major compaction size + public static final String MAJOR_COMPACTION_SIZE = "major_compaction_size"; --- End diff -- For these compaction properties of Table level , suggest adding "TABLE", such as : TABLE_MAJOR_COMPACTION_SIZE,TABLE_AUTO_LOAD_MERGE... ---
[GitHub] carbondata issue #1559: [CARBONDATA-1805][Dictionary] Optimize pruning for d...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1559 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/842/ ---
[GitHub] carbondata issue #1654: [CARBONDATA-1856][PARTITION] Support insert/load dat...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1654 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2372/ ---
[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1575 retest this please ---
[GitHub] carbondata issue #1654: [CARBONDATA-1856][PARTITION] Support insert/load dat...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1654 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2067/ ---
[GitHub] carbondata issue #1559: [CARBONDATA-1805][Dictionary] Optimize pruning for d...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/1559 retest this please ---
[GitHub] carbondata issue #1654: [CARBONDATA-1856][PARTITION] Support insert/load dat...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1654 Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/841/ ---
[jira] [Resolved] (CARBONDATA-1855) Add outputformat in carbon.
[ https://issues.apache.org/jira/browse/CARBONDATA-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-1855. -- Resolution: Fixed Fix Version/s: 1.3.0 > Add outputformat in carbon. > --- > > Key: CARBONDATA-1855 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1855 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala > Fix For: 1.3.0 > > Time Spent: 12h 40m > Remaining Estimate: 0h > > Support standard Hadoop outputformat interface for carbon. It will be helpful > for integrations to execution engines like the spark, hive, and presto. > It should maintain segment management as well while writing the data to > support incremental loading feature. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1674: [CARBONDATA-1859][CARBONDATA-1861][PARTITION] Suppor...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1674 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2371/ ---
[GitHub] carbondata pull request #1642: [CARBONDATA-1855][PARTITION] Added outputform...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1642 ---
[GitHub] carbondata issue #1674: [CARBONDATA-1859][CARBONDATA-1861][PARTITION] Suppor...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1674 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2066/ ---
[GitHub] carbondata issue #1642: [CARBONDATA-1855][PARTITION] Added outputformat to c...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1642 LGTM ---
[GitHub] carbondata issue #1674: [CARBONDATA-1859][CARBONDATA-1861][PARTITION] Suppor...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1674 Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/840/ ---
[GitHub] carbondata issue #1642: [CARBONDATA-1855][PARTITION] Added outputformat to c...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1642 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2370/ ---
[GitHub] carbondata issue #1674: [CARBONDATA-1859][CARBONDATA-1861][PARTITION] Suppor...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1674 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2369/ ---
[GitHub] carbondata pull request #1670: [CARBONDATA-1899] Add CarbonData concurrency ...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1670#discussion_r157356774 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/dataretention/ConcurrencyTest.scala --- @@ -0,0 +1,309 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.testsuite.dataretention --- End diff -- You do not need to add in testcase, I think you can add it in a separated benchmark module ---
[GitHub] carbondata issue #1642: [CARBONDATA-1855][PARTITION] Added outputformat to c...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1642 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2065/ ---
[GitHub] carbondata issue #1642: [CARBONDATA-1855][PARTITION] Added outputformat to c...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1642 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/839/ ---
[jira] [Resolved] (CARBONDATA-1895) Fix issue of create table if not exits
[ https://issues.apache.org/jira/browse/CARBONDATA-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-1895. -- Resolution: Fixed Fix Version/s: 1.3.0 > Fix issue of create table if not exits > --- > > Key: CARBONDATA-1895 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1895 > Project: CarbonData > Issue Type: Bug >Reporter: chenerlu >Assignee: chenerlu > Fix For: 1.3.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1657: [CARBONDATA-1895] Fix issue of create table if not e...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1657 LGTM ---
[GitHub] carbondata pull request #1657: [CARBONDATA-1895] Fix issue of create table i...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1657 ---
[GitHub] carbondata pull request #1666: [CARBONDATA-1900][Core,processing] Modify loa...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1666#discussion_r157356451 --- Diff: core/src/main/java/org/apache/carbondata/core/statusmanager/LoadMetadataDetails.java --- @@ -241,26 +248,28 @@ private long convertTimeStampToLong(String factTimeStamp) { * @return */ public Long getTimeStamp(String loadStartTime) { -Date dateToStr = null; try { - dateToStr = parser.parse(loadStartTime); - return dateToStr.getTime() * 1000; -} catch (ParseException e) { - LOGGER.error("Cannot convert" + loadStartTime + " to Time/Long type value" + e.getMessage()); - return null; + // for new loads the factTimeStamp will be long string --- End diff -- In future for maintenance it is not easy to understand what new load is, can you add more background information for better maintenance. ---
[GitHub] carbondata pull request #1669: [CARBONDATA-1880] Combine input small files f...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1669#discussion_r157356409 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/execution/command/package.scala --- @@ -0,0 +1,25 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.internal.SessionState + +package object command { + def sessionState(sparkSession: SparkSession): SessionState = sparkSession.sessionState --- End diff -- Is this required? ---
[GitHub] carbondata pull request #1642: [CARBONDATA-1855][PARTITION] Added outputform...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1642#discussion_r157356357 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java --- @@ -18,22 +18,342 @@ package org.apache.carbondata.hadoop.api; import java.io.IOException; +import java.util.List; -import org.apache.hadoop.fs.FileSystem; -import org.apache.hadoop.mapred.FileOutputFormat; -import org.apache.hadoop.mapred.JobConf; -import org.apache.hadoop.mapred.RecordWriter; -import org.apache.hadoop.util.Progressable; +import org.apache.carbondata.common.CarbonIterator; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.constants.CarbonLoadOptionConstants; +import org.apache.carbondata.core.metadata.datatype.StructField; +import org.apache.carbondata.core.metadata.datatype.StructType; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.metadata.schema.table.TableInfo; +import org.apache.carbondata.core.util.CarbonProperties; +import org.apache.carbondata.hadoop.util.ObjectSerializationUtil; +import org.apache.carbondata.processing.loading.DataLoadExecutor; +import org.apache.carbondata.processing.loading.csvinput.StringArrayWritable; +import org.apache.carbondata.processing.loading.iterator.CarbonOutputIteratorWrapper; +import org.apache.carbondata.processing.loading.model.CarbonDataLoadSchema; +import org.apache.carbondata.processing.loading.model.CarbonLoadModel; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.mapreduce.OutputCommitter; +import org.apache.hadoop.mapreduce.RecordWriter; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; /** - * Base class for all output format for CarbonData file. - * @param + * This is table level output format which writes the data to store in new segment. Each load + * creates new segment folder and manages the folder through tablestatus file. + * It also generate and writes dictionary data during load only if dictionary server is configured. */ -public abstract class CarbonTableOutputFormat extends FileOutputFormat { +// TODO Move dictionary generater which is coded in spark to MR framework. +public class CarbonTableOutputFormat extends FileOutputFormat { - @Override - public RecordWriter getRecordWriter(FileSystem ignored, JobConf job, String name, - Progressable progress) throws IOException { + private static final String LOAD_MODEL = "mapreduce.carbontable.load.model"; + private static final String DATABASE_NAME = "mapreduce.carbontable.databaseName"; + private static final String TABLE_NAME = "mapreduce.carbontable.tableName"; + private static final String TABLE = "mapreduce.carbontable.table"; + private static final String TABLE_PATH = "mapreduce.carbontable.tablepath"; + private static final String INPUT_SCHEMA = "mapreduce.carbontable.inputschema"; + private static final String TEMP_STORE_LOCATIONS = "mapreduce.carbontable.tempstore.locations"; + private static final String OVERWRITE_SET = "mapreduce.carbontable.set.overwrite"; + public static final String COMPLEX_DELIMITERS = "mapreduce.carbontable.complex_delimiters"; + public static final String SERIALIZATION_NULL_FORMAT = + "mapreduce.carbontable.serialization.null.format"; + public static final String BAD_RECORDS_LOGGER_ENABLE = + "mapreduce.carbontable.bad.records.logger.enable"; + public static final String BAD_RECORDS_LOGGER_ACTION = + "mapreduce.carbontable.bad.records.logger.action"; + public static final String IS_EMPTY_DATA_BAD_RECORD = + "mapreduce.carbontable.empty.data.bad.record"; + public static final String SKIP_EMPTY_LINE = "mapreduce.carbontable.skip.empty.line"; + public static final String SORT_SCOPE = "mapreduce.carbontable.load.sort.scope"; + public static final String BATCH_SORT_SIZE_INMB = + "mapreduce.carbontable.batch.sort.size.inmb"; + public static final String GLOBAL_SORT_PARTITIONS = + "mapreduce.carbontable.global.sort.partitions"; + public static final String BAD_RECORD_PATH = "mapreduce.carbontable.bad.record.path"; + public static final String DATE_FORMAT = "mapreduce.carbontable.date.format"; + public static final String TIMESTAMP_FORMAT = "mapreduce.carbontable.timestamp.format"; + public static final String IS_ONE_PASS_LOAD = "mapreduce.carbontable.one.pass.load"; + public static final String DICTIONARY_SERVER_HOST = + "mapreduce.carbontab
[GitHub] carbondata pull request #1669: [CARBONDATA-1880] Combine input small files f...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1669#discussion_r157356369 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala --- @@ -160,4 +162,112 @@ object DataLoadProcessBuilderOnSpark { Array((uniqueLoadStatusId, (loadMetadataDetails, executionErrors))) } } + + /** + * use FileScanRDD to read input csv files --- End diff -- change comment to mention this function creates a RDD that does reading of multiple CSV files ---
[GitHub] carbondata pull request #1669: [CARBONDATA-1880] Combine input small files f...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1669#discussion_r157356347 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala --- @@ -49,29 +59,21 @@ object DataLoadProcessBuilderOnSpark { private val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName) def loadDataUsingGlobalSort( - sc: SparkContext, + sqlContext: SQLContext, --- End diff -- better to use sparkSession instead of sqlContext ---
[GitHub] carbondata pull request #1669: [CARBONDATA-1880] Combine input small files f...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1669#discussion_r157356331 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1277,6 +1277,10 @@ public static final String CARBON_CUSTOM_BLOCK_DISTRIBUTION = "carbon.custom.block.distribution"; public static final String CARBON_CUSTOM_BLOCK_DISTRIBUTION_DEFAULT = "false"; + @CarbonProperty + public static final String CARBON_COMBINE_SMALL_INPUT_FILES = "carbon.combine.small.input.files"; --- End diff -- change to `carbon.mergeSmallFileIO.enable` ---
[GitHub] carbondata pull request #1642: [CARBONDATA-1855][PARTITION] Added outputform...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1642#discussion_r157356282 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java --- @@ -18,22 +18,342 @@ package org.apache.carbondata.hadoop.api; import java.io.IOException; +import java.util.List; -import org.apache.hadoop.fs.FileSystem; -import org.apache.hadoop.mapred.FileOutputFormat; -import org.apache.hadoop.mapred.JobConf; -import org.apache.hadoop.mapred.RecordWriter; -import org.apache.hadoop.util.Progressable; +import org.apache.carbondata.common.CarbonIterator; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.constants.CarbonLoadOptionConstants; +import org.apache.carbondata.core.metadata.datatype.StructField; +import org.apache.carbondata.core.metadata.datatype.StructType; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.metadata.schema.table.TableInfo; +import org.apache.carbondata.core.util.CarbonProperties; +import org.apache.carbondata.hadoop.util.ObjectSerializationUtil; +import org.apache.carbondata.processing.loading.DataLoadExecutor; +import org.apache.carbondata.processing.loading.csvinput.StringArrayWritable; +import org.apache.carbondata.processing.loading.iterator.CarbonOutputIteratorWrapper; +import org.apache.carbondata.processing.loading.model.CarbonDataLoadSchema; +import org.apache.carbondata.processing.loading.model.CarbonLoadModel; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.mapreduce.OutputCommitter; +import org.apache.hadoop.mapreduce.RecordWriter; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; /** - * Base class for all output format for CarbonData file. - * @param + * This is table level output format which writes the data to store in new segment. Each load + * creates new segment folder and manages the folder through tablestatus file. + * It also generate and writes dictionary data during load only if dictionary server is configured. */ -public abstract class CarbonTableOutputFormat extends FileOutputFormat { +// TODO Move dictionary generater which is coded in spark to MR framework. +public class CarbonTableOutputFormat extends FileOutputFormat { - @Override - public RecordWriter getRecordWriter(FileSystem ignored, JobConf job, String name, - Progressable progress) throws IOException { + private static final String LOAD_MODEL = "mapreduce.carbontable.load.model"; + private static final String DATABASE_NAME = "mapreduce.carbontable.databaseName"; + private static final String TABLE_NAME = "mapreduce.carbontable.tableName"; + private static final String TABLE = "mapreduce.carbontable.table"; + private static final String TABLE_PATH = "mapreduce.carbontable.tablepath"; + private static final String INPUT_SCHEMA = "mapreduce.carbontable.inputschema"; + private static final String TEMP_STORE_LOCATIONS = "mapreduce.carbontable.tempstore.locations"; + private static final String OVERWRITE_SET = "mapreduce.carbontable.set.overwrite"; + public static final String COMPLEX_DELIMITERS = "mapreduce.carbontable.complex_delimiters"; + public static final String SERIALIZATION_NULL_FORMAT = + "mapreduce.carbontable.serialization.null.format"; + public static final String BAD_RECORDS_LOGGER_ENABLE = + "mapreduce.carbontable.bad.records.logger.enable"; + public static final String BAD_RECORDS_LOGGER_ACTION = + "mapreduce.carbontable.bad.records.logger.action"; + public static final String IS_EMPTY_DATA_BAD_RECORD = + "mapreduce.carbontable.empty.data.bad.record"; + public static final String SKIP_EMPTY_LINE = "mapreduce.carbontable.skip.empty.line"; + public static final String SORT_SCOPE = "mapreduce.carbontable.load.sort.scope"; + public static final String BATCH_SORT_SIZE_INMB = + "mapreduce.carbontable.batch.sort.size.inmb"; + public static final String GLOBAL_SORT_PARTITIONS = + "mapreduce.carbontable.global.sort.partitions"; + public static final String BAD_RECORD_PATH = "mapreduce.carbontable.bad.record.path"; + public static final String DATE_FORMAT = "mapreduce.carbontable.date.format"; + public static final String TIMESTAMP_FORMAT = "mapreduce.carbontable.timestamp.format"; + public static final String IS_ONE_PASS_LOAD = "mapreduce.carbontable.one.pass.load"; + public static final String DICTIONARY_SERVER_HOST = + "mapreduce.carbontable.
[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1665#discussion_r157356189 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonMetaStore.scala --- @@ -144,6 +145,15 @@ trait CarbonMetaStore { def getThriftTableInfo(tablePath: CarbonTablePath)(sparkSession: SparkSession): TableInfo def getTableFromMetadataCache(database: String, tableName: String): Option[CarbonTable] + + def getCarbonDataSourceHadoopRelation(sparkSession: SparkSession, + tableIdentifier: TableIdentifier): CarbonDatasourceHadoopRelation + + def getSchemaFromUnresolvedRelation(sparkSession: SparkSession, --- End diff -- This function is simple logic only, why not do this logic directly in caller? ---
[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1665#discussion_r157356181 --- Diff: integration/spark2/src/main/spark2.1/CarbonSessionState.scala --- @@ -259,25 +259,26 @@ object CarbonOptimizerUtil { } } -class CarbonSqlAstBuilder(conf: SQLConf, parser: CarbonSpark2SqlParser) extends - SparkSqlAstBuilder(conf) { +class CarbonSqlAstBuilder(conf: SQLConf, parser: CarbonSpark2SqlParser, sparkSession: SparkSession) + extends SparkSqlAstBuilder(conf) { - val helper = new CarbonHelperSqlAstBuilder(conf, parser) + val helper = new CarbonHelperSqlAstBuilder(conf, parser, sparkSession) override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = { val fileStorage = helper.getFileStorage(ctx.createFileFormat) if (fileStorage.equalsIgnoreCase("'carbondata'") || fileStorage.equalsIgnoreCase("'org.apache.carbondata.format'")) { helper.createCarbonTable(ctx.createTableHeader, --- End diff -- Sine you modify this function, to make it more readable, please add parameter name also, like ``` foo( paramA = a, paramB = b ...) ``` ---
[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1665#discussion_r157356168 --- Diff: integration/spark2/src/main/spark2.2/CarbonSessionState.scala --- @@ -280,25 +280,26 @@ class CarbonOptimizer( } } -class CarbonSqlAstBuilder(conf: SQLConf, parser: CarbonSpark2SqlParser) extends - SparkSqlAstBuilder(conf) { +class CarbonSqlAstBuilder(conf: SQLConf, parser: CarbonSpark2SqlParser, sparkSession: SparkSession) + extends SparkSqlAstBuilder(conf) { - val helper = new CarbonHelperSqlAstBuilder(conf, parser) + val helper = new CarbonHelperSqlAstBuilder(conf, parser, sparkSession) override def visitCreateHiveTable(ctx: CreateHiveTableContext): LogicalPlan = { val fileStorage = helper.getFileStorage(ctx.createFileFormat) if (fileStorage.equalsIgnoreCase("'carbondata'") || fileStorage.equalsIgnoreCase("'org.apache.carbondata.format'")) { helper.createCarbonTable(ctx.createTableHeader, - ctx.skewSpec, - ctx.bucketSpec, - ctx.partitionColumns, - ctx.columns, - ctx.tablePropertyList, - ctx.locationSpec(), - Option(ctx.STRING()).map(string), - ctx.AS) +ctx.skewSpec, --- End diff -- Sine you modify this function, to make it more readable, please add parameter name also, like ``` foo( paramA = a, paramB = b ) ---
[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1665#discussion_r157356143 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParser.scala --- @@ -210,13 +211,40 @@ class CarbonHelperSqlAstBuilder(conf: SQLConf, parser: CarbonSpark2SqlParser) } } -val fields = parser.getFields(cols ++ partitionByStructFields) +var fields = parser.getFields(cols ++ partitionByStructFields) val options = new CarbonOption(properties) // validate tblProperties val bucketFields = parser.getBucketFields(tableProperties, fields, options) validateStreamingProperty(options) +// validate for create table as select --- End diff -- move this logic to a validateSelectQueryForCTAS function ---
[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1665#discussion_r157356131 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonMetaStore.scala --- @@ -144,6 +145,15 @@ trait CarbonMetaStore { def getThriftTableInfo(tablePath: CarbonTablePath)(sparkSession: SparkSession): TableInfo def getTableFromMetadataCache(database: String, tableName: String): Option[CarbonTable] + + def getCarbonDataSourceHadoopRelation(sparkSession: SparkSession, + tableIdentifier: TableIdentifier): CarbonDatasourceHadoopRelation + + def getSchemaFromUnresolvedRelation(sparkSession: SparkSession, --- End diff -- add comment for interface function ---
[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1665#discussion_r157356121 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonMetaStore.scala --- @@ -144,6 +145,15 @@ trait CarbonMetaStore { def getThriftTableInfo(tablePath: CarbonTablePath)(sparkSession: SparkSession): TableInfo def getTableFromMetadataCache(database: String, tableName: String): Option[CarbonTable] + + def getCarbonDataSourceHadoopRelation(sparkSession: SparkSession, --- End diff -- move parameter to next line ---
[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1665#discussion_r157356103 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala --- @@ -528,4 +529,46 @@ class CarbonFileMetastore extends CarbonMetaStore { val tableMetadataFile = tablePath.getSchemaFilePath CarbonUtil.readSchemaFile(tableMetadataFile) } + + override def getCarbonDataSourceHadoopRelation(sparkSession: SparkSession, --- End diff -- please provide comment for this function ---
[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1665#discussion_r157356091 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala --- @@ -528,4 +529,46 @@ class CarbonFileMetastore extends CarbonMetaStore { val tableMetadataFile = tablePath.getSchemaFilePath CarbonUtil.readSchemaFile(tableMetadataFile) } + + override def getCarbonDataSourceHadoopRelation(sparkSession: SparkSession, --- End diff -- move parameter to next line ---
[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1665#discussion_r157356062 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonCreateTableAsSelectCommand.scala --- @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.table + +import scala.util.control.NonFatal + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.execution.command.MetadataCommand +import org.apache.spark.sql.execution.command.management.CarbonInsertIntoCommand + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.metadata.schema.table.TableInfo + +/** + * Create table and insert the query result into it. + * + * @param query the query whose result will be insert into the new relation + * @param tableInfo the Table Describe, which may contains serde, storage handler etc. + * @param ifNotExistsSet allow continue working if it's already exists, otherwise + * raise exception + * @param tableLocation store location where the table need to be created + */ +case class CarbonCreateTableAsSelectCommand(query: LogicalPlan, +tableInfo: TableInfo, +ifNotExistsSet: Boolean = false, +tableLocation: Option[String] = None) extends MetadataCommand { + + override def processMetadata(sparkSession: SparkSession): Seq[Row] = { +val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName) +val tableName = tableInfo.getFactTable.getTableName +var databaseOpt: Option[String] = None +if (tableInfo.getDatabaseName != null) { + databaseOpt = Some(tableInfo.getDatabaseName) +} +val dbName = CarbonEnv.getDatabaseName(databaseOpt)(sparkSession) +LOGGER.audit(s"Request received for CTAS for $dbName.$tableName") +lazy val carbonDataSourceHadoopRelation = { --- End diff -- This can be moved down to line 61 ---
[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1665#discussion_r157356072 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonCreateTableAsSelectCommand.scala --- @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.table + +import scala.util.control.NonFatal + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.execution.command.MetadataCommand +import org.apache.spark.sql.execution.command.management.CarbonInsertIntoCommand + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.metadata.schema.table.TableInfo + +/** + * Create table and insert the query result into it. + * + * @param query the query whose result will be insert into the new relation + * @param tableInfo the Table Describe, which may contains serde, storage handler etc. + * @param ifNotExistsSet allow continue working if it's already exists, otherwise + * raise exception + * @param tableLocation store location where the table need to be created + */ +case class CarbonCreateTableAsSelectCommand(query: LogicalPlan, +tableInfo: TableInfo, +ifNotExistsSet: Boolean = false, +tableLocation: Option[String] = None) extends MetadataCommand { + + override def processMetadata(sparkSession: SparkSession): Seq[Row] = { +val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName) +val tableName = tableInfo.getFactTable.getTableName +var databaseOpt: Option[String] = None +if (tableInfo.getDatabaseName != null) { + databaseOpt = Some(tableInfo.getDatabaseName) +} +val dbName = CarbonEnv.getDatabaseName(databaseOpt)(sparkSession) +LOGGER.audit(s"Request received for CTAS for $dbName.$tableName") +lazy val carbonDataSourceHadoopRelation = { + // execute command to create carbon table + CarbonCreateTableCommand(tableInfo, ifNotExistsSet, tableLocation).run(sparkSession) + CarbonEnv.getInstance(sparkSession).carbonMetastore +.getCarbonDataSourceHadoopRelation(sparkSession, TableIdentifier(tableName, Option(dbName))) +} +// check if table already exists +if (sparkSession.sessionState.catalog.listTables(dbName) + .exists(_.table.equalsIgnoreCase(tableName))) { + if (!ifNotExistsSet) { +LOGGER.audit( + s"Table creation with Database name [$dbName] and Table name [$tableName] failed. " + + s"Table [$tableName] already exists under database [$dbName]") +throw new TableAlreadyExistsException(dbName, tableName) + } +} else { + try { +// execute command to load data into carbon table +CarbonInsertIntoCommand(carbonDataSourceHadoopRelation, --- End diff -- move parameter to next line ---
[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1665#discussion_r157356007 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonCreateTableAsSelectCommand.scala --- @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.table + +import scala.util.control.NonFatal + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.execution.command.MetadataCommand +import org.apache.spark.sql.execution.command.management.CarbonInsertIntoCommand + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.metadata.schema.table.TableInfo + +/** + * Create table and insert the query result into it. + * + * @param query the query whose result will be insert into the new relation + * @param tableInfo the Table Describe, which may contains serde, storage handler etc. + * @param ifNotExistsSet allow continue working if it's already exists, otherwise + * raise exception + * @param tableLocation store location where the table need to be created + */ +case class CarbonCreateTableAsSelectCommand(query: LogicalPlan, +tableInfo: TableInfo, +ifNotExistsSet: Boolean = false, +tableLocation: Option[String] = None) extends MetadataCommand { --- End diff -- I think it should be AtomicRunnableCommand, since it do both data and metadata modification ---
[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1665#discussion_r157355991 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonCreateTableAsSelectCommand.scala --- @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.table + +import scala.util.control.NonFatal + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.execution.command.MetadataCommand +import org.apache.spark.sql.execution.command.management.CarbonInsertIntoCommand + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.metadata.schema.table.TableInfo + +/** + * Create table and insert the query result into it. + * + * @param query the query whose result will be insert into the new relation + * @param tableInfo the Table Describe, which may contains serde, storage handler etc. + * @param ifNotExistsSet allow continue working if it's already exists, otherwise + * raise exception + * @param tableLocation store location where the table need to be created + */ +case class CarbonCreateTableAsSelectCommand(query: LogicalPlan, --- End diff -- move query as the last parameter of this command ---
[GitHub] carbondata issue #1657: [CARBONDATA-1895] Fix issue of create table if not e...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1657 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2064/ ---
[GitHub] carbondata issue #1657: [CARBONDATA-1895] Fix issue of create table if not e...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1657 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/838/ ---
[GitHub] carbondata issue #1082: [CARBONDATA-1218] In case of data-load failure the B...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1082 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2063/ ---
[GitHub] carbondata issue #1654: [CARBONDATA-1856][PARTITION] Support insert/load dat...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1654 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2062/ ---
[GitHub] carbondata issue #1672: [CARBONDATA-1858][PARTITION] Support querying data f...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1672 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2061/ ---
[GitHub] carbondata issue #1657: [CARBONDATA-1895] Fix issue of create table if not e...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1657 retest this please ---
[GitHub] carbondata issue #1674: [CARBONDATA-1859][CARBONDATA-1861] Support show and ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1674 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2060/ ---
[GitHub] carbondata issue #1082: [CARBONDATA-1218] In case of data-load failure the B...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1082 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2368/ ---
[GitHub] carbondata issue #1082: [CARBONDATA-1218] In case of data-load failure the B...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1082 Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/837/ ---
[GitHub] carbondata issue #1654: [CARBONDATA-1856][PARTITION] Support insert/load dat...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1654 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2367/ ---
[GitHub] carbondata issue #1082: [CARBONDATA-1218] In case of data-load failure the B...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1082 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2059/ ---
[GitHub] carbondata issue #1672: [CARBONDATA-1858][PARTITION] Support querying data f...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1672 Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/835/ ---
[GitHub] carbondata issue #1654: [CARBONDATA-1856][PARTITION] Support insert/load dat...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1654 Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/836/ ---
[GitHub] carbondata issue #1672: [CARBONDATA-1858][PARTITION] Support querying data f...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1672 Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/834/ ---
[GitHub] carbondata issue #1674: [CARBONDATA-1859][CARBONDATA-1861] Support show and ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1674 Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/833/ ---
[GitHub] carbondata issue #1082: [CARBONDATA-1218] In case of data-load failure the B...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1082 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/832/ ---
[GitHub] carbondata issue #1672: [CARBONDATA-1858][PARTITION] Support querying data f...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1672 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2366/ ---
[GitHub] carbondata issue #1674: [CARBONDATA-1859][CARBONDATA-1861] Support show and ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1674 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2365/ ---
[GitHub] carbondata issue #1654: [CARBONDATA-1856][PARTITION] Support insert/load dat...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1654 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2058/ ---
[GitHub] carbondata pull request #1654: [CARBONDATA-1856][PARTITION] Support insert/l...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1654#discussion_r157347390 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java --- @@ -573,6 +574,11 @@ public boolean isPartitionTable() { return null != tablePartitionMap.get(getTableName()); } + public boolean isStandardPartitionTable() { --- End diff -- ok ---
[GitHub] carbondata pull request #1654: [CARBONDATA-1856][PARTITION] Support insert/l...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1654#discussion_r157347186 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/partition/PartitionType.java --- @@ -23,5 +23,6 @@ RANGE, RANGE_INTERVAL, LIST, - HASH + HASH, + STANDARD --- End diff -- ok ---
[GitHub] carbondata pull request #1654: [CARBONDATA-1856][PARTITION] Support insert/l...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1654#discussion_r157347107 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/PartitionFileStore.java --- @@ -0,0 +1,199 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.metadata; + +import java.io.BufferedReader; +import java.io.BufferedWriter; +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.IOException; +import java.io.InputStreamReader; +import java.io.OutputStreamWriter; +import java.io.Serializable; +import java.nio.charset.Charset; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.filesystem.CarbonFileFilter; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.fileoperations.AtomicFileOperations; +import org.apache.carbondata.core.fileoperations.AtomicFileOperationsImpl; +import org.apache.carbondata.core.fileoperations.FileWriteOperation; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import com.google.gson.Gson; + +/** + * Reads and writes the partition mapper. + */ +public class PartitionFileStore { --- End diff -- ok ---
[GitHub] carbondata pull request #1654: [CARBONDATA-1856][PARTITION] Support insert/l...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1654#discussion_r157347114 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/PartitionFileStore.java --- @@ -0,0 +1,199 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.metadata; + +import java.io.BufferedReader; +import java.io.BufferedWriter; +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.IOException; +import java.io.InputStreamReader; +import java.io.OutputStreamWriter; +import java.io.Serializable; +import java.nio.charset.Charset; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.filesystem.CarbonFileFilter; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.fileoperations.AtomicFileOperations; +import org.apache.carbondata.core.fileoperations.AtomicFileOperationsImpl; +import org.apache.carbondata.core.fileoperations.FileWriteOperation; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +import com.google.gson.Gson; + +/** + * Reads and writes the partition mapper. --- End diff -- ok ---
[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1672#discussion_r157347049 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/CarbonFileFormat.scala --- @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.datasources + +import java.io.File +import java.util + +import scala.collection.JavaConverters._ +import scala.util.Random + +import org.apache.hadoop.fs.{FileStatus, Path} +import org.apache.hadoop.io.NullWritable +import org.apache.hadoop.mapreduce.{Job, TaskAttemptContext} +import org.apache.spark.SparkEnv +import org.apache.spark.internal.Logging +import org.apache.spark.sql.{Row, SparkSession} +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.DataSourceRegister +import org.apache.spark.sql.types.{DataType, StructType} + +import org.apache.carbondata.core.constants.{CarbonCommonConstants, CarbonLoadOptionConstants} +import org.apache.carbondata.core.metadata.{CarbonMetadata, PartitionFileStore} +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.hadoop.api.{CarbonOutputCommitter, CarbonTableOutputFormat} +import org.apache.carbondata.hadoop.api.CarbonTableOutputFormat.CarbonRecordWriter +import org.apache.carbondata.processing.loading.csvinput.StringArrayWritable +import org.apache.carbondata.processing.loading.model.CarbonLoadModel +import org.apache.carbondata.spark.util.{DataLoadingUtil, Util} + +class CarbonFileFormat + extends FileFormat +with DataSourceRegister +with Logging +with Serializable { + + override def shortName(): String = "carbondata" + + override def inferSchema(sparkSession: SparkSession, + options: Map[String, String], + files: Seq[FileStatus]): Option[StructType] = { +None + } + + override def prepareWrite(sparkSession: SparkSession, + job: Job, + options: Map[String, String], + dataSchema: StructType): OutputWriterFactory = { +val conf = job.getConfiguration +conf.setClass( + SQLConf.OUTPUT_COMMITTER_CLASS.key, + classOf[CarbonOutputCommitter], + classOf[CarbonOutputCommitter]) +conf.set("carbon.commit.protocol", "carbon.commit.protocol") +sparkSession.sessionState.conf.setConfString( + "spark.sql.sources.commitProtocolClass", + "org.apache.spark.sql.execution.datasources.CarbonSQLHadoopMapReduceCommitProtocol") +job.setOutputFormatClass(classOf[CarbonTableOutputFormat]) + +var table = CarbonMetadata.getInstance().getCarbonTable( + options.getOrElse("dbName", "default"), options("tableName")) +//table = CarbonTable.buildFromTableInfo(table.getTableInfo, true) +val model = new CarbonLoadModel +val carbonProperty = CarbonProperties.getInstance() +val optionsFinal = DataLoadingUtil.getDataLoadingOptions(carbonProperty, options) +val tableProperties = table.getTableInfo.getFactTable.getTableProperties +optionsFinal.put("sort_scope", tableProperties.asScala.getOrElse("sort_scope", + carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE, +carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, + CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT +val partitionStr = + table.getTableInfo.getFactTable.getPartitionInfo.getColumnSchemaList.asScala.map( +_.getColumnName.toLowerCase).mkString(",") +optionsFinal.put( + "fileheader", + dataSchema.fields.map(_.name.toLowerCase).mkString(",") + "," + partitionStr) +DataLoadingUtil.buildCarbonLoadModel(
[GitHub] carbondata issue #1654: [CARBONDATA-1856][PARTITION] Support insert/load dat...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1654 Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/831/ ---
[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1672#discussion_r157347031 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/CarbonFileFormat.scala --- @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.datasources + +import java.io.File +import java.util + +import scala.collection.JavaConverters._ +import scala.util.Random + +import org.apache.hadoop.fs.{FileStatus, Path} +import org.apache.hadoop.io.NullWritable +import org.apache.hadoop.mapreduce.{Job, TaskAttemptContext} +import org.apache.spark.SparkEnv +import org.apache.spark.internal.Logging +import org.apache.spark.sql.{Row, SparkSession} +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.DataSourceRegister +import org.apache.spark.sql.types.{DataType, StructType} + +import org.apache.carbondata.core.constants.{CarbonCommonConstants, CarbonLoadOptionConstants} +import org.apache.carbondata.core.metadata.{CarbonMetadata, PartitionFileStore} +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.hadoop.api.{CarbonOutputCommitter, CarbonTableOutputFormat} +import org.apache.carbondata.hadoop.api.CarbonTableOutputFormat.CarbonRecordWriter +import org.apache.carbondata.processing.loading.csvinput.StringArrayWritable +import org.apache.carbondata.processing.loading.model.CarbonLoadModel +import org.apache.carbondata.spark.util.{DataLoadingUtil, Util} + +class CarbonFileFormat + extends FileFormat +with DataSourceRegister +with Logging +with Serializable { + + override def shortName(): String = "carbondata" + + override def inferSchema(sparkSession: SparkSession, + options: Map[String, String], + files: Seq[FileStatus]): Option[StructType] = { +None + } + + override def prepareWrite(sparkSession: SparkSession, + job: Job, + options: Map[String, String], + dataSchema: StructType): OutputWriterFactory = { +val conf = job.getConfiguration +conf.setClass( + SQLConf.OUTPUT_COMMITTER_CLASS.key, + classOf[CarbonOutputCommitter], + classOf[CarbonOutputCommitter]) +conf.set("carbon.commit.protocol", "carbon.commit.protocol") +sparkSession.sessionState.conf.setConfString( + "spark.sql.sources.commitProtocolClass", + "org.apache.spark.sql.execution.datasources.CarbonSQLHadoopMapReduceCommitProtocol") +job.setOutputFormatClass(classOf[CarbonTableOutputFormat]) + +var table = CarbonMetadata.getInstance().getCarbonTable( + options.getOrElse("dbName", "default"), options("tableName")) +//table = CarbonTable.buildFromTableInfo(table.getTableInfo, true) --- End diff -- ok ---
[GitHub] carbondata issue #1674: [CARBONDATA-1859][CARBONDATA-1861] Support show and ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1674 Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/829/ ---
[GitHub] carbondata issue #1672: [CARBONDATA-1858][PARTITION] Support querying data f...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1672 Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/830/ ---
[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1672#discussion_r157347018 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala --- @@ -345,23 +380,172 @@ case class CarbonLoadDataCommand( } else { (dataFrame, dataFrame) } -if (!carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable.isChildDataMap) { +val table = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable +if (!table.isChildDataMap) { GlobalDictionaryUtil.generateGlobalDictionary( sparkSession.sqlContext, carbonLoadModel, hadoopConf, dictionaryDataFrame) } -CarbonDataRDDFactory.loadCarbonData(sparkSession.sqlContext, - carbonLoadModel, - columnar, - partitionStatus, - None, - isOverwriteTable, - hadoopConf, - loadDataFrame, - updateModel, - operationContext) +if (table.isStandardPartitionTable) { + loadStandardPartition(sparkSession, carbonLoadModel, hadoopConf, loadDataFrame) +} else { + CarbonDataRDDFactory.loadCarbonData(sparkSession.sqlContext, +carbonLoadModel, +columnar, +partitionStatus, +None, +isOverwriteTable, +hadoopConf, +loadDataFrame, +updateModel, +operationContext) +} + } + + private def loadStandardPartition(sparkSession: SparkSession, + carbonLoadModel: CarbonLoadModel, + hadoopConf: Configuration, + dataFrame: Option[DataFrame]) = { +val table = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable +val logicalPlan = + sparkSession.sessionState.catalog.lookupRelation( +TableIdentifier(table.getTableName, Some(table.getDatabaseName))) +val relation = logicalPlan.collect { + case l: LogicalRelation => l +}.head + + +val query: LogicalPlan = if (dataFrame.isDefined) { + var timeStampformatString = CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT + val timeStampFormat = new SimpleDateFormat(timeStampformatString) + var dateFormatString = CarbonCommonConstants.CARBON_DATE_DEFAULT_FORMAT + val dateFormat = new SimpleDateFormat(dateFormatString) + val delimiterLevel1 = carbonLoadModel.getComplexDelimiterLevel1 + val delimiterLevel2 = carbonLoadModel.getComplexDelimiterLevel2 + val serializationNullFormat = + carbonLoadModel.getSerializationNullFormat.split(CarbonCommonConstants.COMMA, 2)(1) + val attributes = +StructType(dataFrame.get.schema.fields.map(_.copy(dataType = StringType))).toAttributes + val len = attributes.length + val rdd = dataFrame.get.rdd.map { f => +val data = new Array[Any](len) +var i = 0 +while (i < len) { + data(i) = +UTF8String.fromString( + CarbonScalaUtil.getString(f.get(i), +serializationNullFormat, +delimiterLevel1, +delimiterLevel2, +timeStampFormat, +dateFormat)) + i = i + 1 +} +InternalRow.fromSeq(data) + } + LogicalRDD(attributes, rdd)(sparkSession) + +} else { + var timeStampformatString = carbonLoadModel.getTimestampformat + if (timeStampformatString.isEmpty) { +timeStampformatString = carbonLoadModel.getDefaultTimestampFormat + } + val timeStampFormat = new SimpleDateFormat(timeStampformatString) + var dateFormatString = carbonLoadModel.getDateFormat + if (dateFormatString.isEmpty) { +dateFormatString = carbonLoadModel.getDefaultDateFormat + } + val dateFormat = new SimpleDateFormat(dateFormatString) + // input data from csv files. Convert to logical plan + CommonUtil.configureCSVInputFormat(hadoopConf, carbonLoadModel) + hadoopConf.set(FileInputFormat.INPUT_DIR, carbonLoadModel.getFactFilePath) + val jobConf = new JobConf(hadoopConf) + SparkHadoopUtil.get.addCredentials(jobConf) + val attributes = +StructType(carbonLoadModel.getCsvHeaderColumns.map( + StructField(_, StringType))).toAttributes + val rowDataTypes = attributes.map{f => +relation.output.find(_.name.equalsIgnoreCase(f.name)) match { + case Some(attr) => attr.dataType + case _ => StringType +} + } + val len = rowDataTypes.length + val rdd = +new NewHadoopRDD[NullWritable, StringArrayWrit
[GitHub] carbondata issue #1648: [CARBONDATA-1888][PreAggregate][Bug]Fixed compaction...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1648 Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/828/ ---
[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1672#discussion_r157346999 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala --- @@ -345,23 +380,172 @@ case class CarbonLoadDataCommand( } else { (dataFrame, dataFrame) } -if (!carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable.isChildDataMap) { +val table = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable +if (!table.isChildDataMap) { GlobalDictionaryUtil.generateGlobalDictionary( sparkSession.sqlContext, carbonLoadModel, hadoopConf, dictionaryDataFrame) } -CarbonDataRDDFactory.loadCarbonData(sparkSession.sqlContext, - carbonLoadModel, - columnar, - partitionStatus, - None, - isOverwriteTable, - hadoopConf, - loadDataFrame, - updateModel, - operationContext) +if (table.isStandardPartitionTable) { + loadStandardPartition(sparkSession, carbonLoadModel, hadoopConf, loadDataFrame) +} else { + CarbonDataRDDFactory.loadCarbonData(sparkSession.sqlContext, +carbonLoadModel, +columnar, +partitionStatus, +None, +isOverwriteTable, +hadoopConf, +loadDataFrame, +updateModel, +operationContext) +} + } + + private def loadStandardPartition(sparkSession: SparkSession, + carbonLoadModel: CarbonLoadModel, + hadoopConf: Configuration, + dataFrame: Option[DataFrame]) = { +val table = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable +val logicalPlan = + sparkSession.sessionState.catalog.lookupRelation( +TableIdentifier(table.getTableName, Some(table.getDatabaseName))) +val relation = logicalPlan.collect { + case l: LogicalRelation => l +}.head + + +val query: LogicalPlan = if (dataFrame.isDefined) { + var timeStampformatString = CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT + val timeStampFormat = new SimpleDateFormat(timeStampformatString) + var dateFormatString = CarbonCommonConstants.CARBON_DATE_DEFAULT_FORMAT + val dateFormat = new SimpleDateFormat(dateFormatString) + val delimiterLevel1 = carbonLoadModel.getComplexDelimiterLevel1 + val delimiterLevel2 = carbonLoadModel.getComplexDelimiterLevel2 + val serializationNullFormat = + carbonLoadModel.getSerializationNullFormat.split(CarbonCommonConstants.COMMA, 2)(1) + val attributes = +StructType(dataFrame.get.schema.fields.map(_.copy(dataType = StringType))).toAttributes + val len = attributes.length + val rdd = dataFrame.get.rdd.map { f => +val data = new Array[Any](len) +var i = 0 +while (i < len) { + data(i) = +UTF8String.fromString( + CarbonScalaUtil.getString(f.get(i), +serializationNullFormat, +delimiterLevel1, +delimiterLevel2, +timeStampFormat, +dateFormat)) + i = i + 1 +} +InternalRow.fromSeq(data) + } + LogicalRDD(attributes, rdd)(sparkSession) + +} else { + var timeStampformatString = carbonLoadModel.getTimestampformat + if (timeStampformatString.isEmpty) { +timeStampformatString = carbonLoadModel.getDefaultTimestampFormat + } + val timeStampFormat = new SimpleDateFormat(timeStampformatString) + var dateFormatString = carbonLoadModel.getDateFormat + if (dateFormatString.isEmpty) { +dateFormatString = carbonLoadModel.getDefaultDateFormat + } + val dateFormat = new SimpleDateFormat(dateFormatString) + // input data from csv files. Convert to logical plan + CommonUtil.configureCSVInputFormat(hadoopConf, carbonLoadModel) + hadoopConf.set(FileInputFormat.INPUT_DIR, carbonLoadModel.getFactFilePath) + val jobConf = new JobConf(hadoopConf) + SparkHadoopUtil.get.addCredentials(jobConf) + val attributes = +StructType(carbonLoadModel.getCsvHeaderColumns.map( + StructField(_, StringType))).toAttributes + val rowDataTypes = attributes.map{f => +relation.output.find(_.name.equalsIgnoreCase(f.name)) match { + case Some(attr) => attr.dataType + case _ => StringType +} + } + val len = rowDataTypes.length + val rdd = +new NewHadoopRDD[NullWritable, StringArrayWrit
[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1672#discussion_r157346996 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala --- @@ -345,23 +380,172 @@ case class CarbonLoadDataCommand( } else { (dataFrame, dataFrame) } -if (!carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable.isChildDataMap) { +val table = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable +if (!table.isChildDataMap) { GlobalDictionaryUtil.generateGlobalDictionary( sparkSession.sqlContext, carbonLoadModel, hadoopConf, dictionaryDataFrame) } -CarbonDataRDDFactory.loadCarbonData(sparkSession.sqlContext, - carbonLoadModel, - columnar, - partitionStatus, - None, - isOverwriteTable, - hadoopConf, - loadDataFrame, - updateModel, - operationContext) +if (table.isStandardPartitionTable) { + loadStandardPartition(sparkSession, carbonLoadModel, hadoopConf, loadDataFrame) +} else { + CarbonDataRDDFactory.loadCarbonData(sparkSession.sqlContext, +carbonLoadModel, +columnar, +partitionStatus, +None, +isOverwriteTable, +hadoopConf, +loadDataFrame, +updateModel, +operationContext) +} + } + + private def loadStandardPartition(sparkSession: SparkSession, + carbonLoadModel: CarbonLoadModel, + hadoopConf: Configuration, + dataFrame: Option[DataFrame]) = { +val table = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable +val logicalPlan = + sparkSession.sessionState.catalog.lookupRelation( +TableIdentifier(table.getTableName, Some(table.getDatabaseName))) +val relation = logicalPlan.collect { + case l: LogicalRelation => l +}.head + + +val query: LogicalPlan = if (dataFrame.isDefined) { + var timeStampformatString = CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT + val timeStampFormat = new SimpleDateFormat(timeStampformatString) + var dateFormatString = CarbonCommonConstants.CARBON_DATE_DEFAULT_FORMAT + val dateFormat = new SimpleDateFormat(dateFormatString) + val delimiterLevel1 = carbonLoadModel.getComplexDelimiterLevel1 + val delimiterLevel2 = carbonLoadModel.getComplexDelimiterLevel2 + val serializationNullFormat = + carbonLoadModel.getSerializationNullFormat.split(CarbonCommonConstants.COMMA, 2)(1) + val attributes = +StructType(dataFrame.get.schema.fields.map(_.copy(dataType = StringType))).toAttributes + val len = attributes.length + val rdd = dataFrame.get.rdd.map { f => +val data = new Array[Any](len) +var i = 0 +while (i < len) { + data(i) = +UTF8String.fromString( + CarbonScalaUtil.getString(f.get(i), +serializationNullFormat, +delimiterLevel1, +delimiterLevel2, +timeStampFormat, +dateFormat)) + i = i + 1 +} +InternalRow.fromSeq(data) + } + LogicalRDD(attributes, rdd)(sparkSession) + +} else { + var timeStampformatString = carbonLoadModel.getTimestampformat + if (timeStampformatString.isEmpty) { +timeStampformatString = carbonLoadModel.getDefaultTimestampFormat + } + val timeStampFormat = new SimpleDateFormat(timeStampformatString) + var dateFormatString = carbonLoadModel.getDateFormat + if (dateFormatString.isEmpty) { +dateFormatString = carbonLoadModel.getDefaultDateFormat + } + val dateFormat = new SimpleDateFormat(dateFormatString) + // input data from csv files. Convert to logical plan + CommonUtil.configureCSVInputFormat(hadoopConf, carbonLoadModel) + hadoopConf.set(FileInputFormat.INPUT_DIR, carbonLoadModel.getFactFilePath) + val jobConf = new JobConf(hadoopConf) + SparkHadoopUtil.get.addCredentials(jobConf) + val attributes = +StructType(carbonLoadModel.getCsvHeaderColumns.map( + StructField(_, StringType))).toAttributes + val rowDataTypes = attributes.map{f => +relation.output.find(_.name.equalsIgnoreCase(f.name)) match { + case Some(attr) => attr.dataType + case _ => StringType +} + } + val len = rowDataTypes.length + val rdd = +new NewHadoopRDD[NullWritable, StringArrayWrit
[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1672#discussion_r157346945 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala --- @@ -316,16 +332,35 @@ case class CarbonLoadDataCommand( } else { dataFrame } -CarbonDataRDDFactory.loadCarbonData(sparkSession.sqlContext, - carbonLoadModel, - columnar, - partitionStatus, - server, - isOverwriteTable, - hadoopConf, - loadDataFrame, - updateModel, - operationContext) + +if (carbonTable.isStandardPartitionTable) { + try { +loadStandardPartition(sparkSession, carbonLoadModel, hadoopConf, loadDataFrame) + } finally { +server match { + case Some(dictServer) => +try { + dictServer.writeTableDictionary(carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable +.getCarbonTableIdentifier.getTableId) +} catch { + case _: Exception => +throw new Exception("Dataload failed due to error while writing dictionary file!") +} + case _ => +} + } +} else { + CarbonDataRDDFactory.loadCarbonData(sparkSession.sqlContext, --- End diff -- ok ---
[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1672#discussion_r157346926 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertIntoCommand.scala --- @@ -40,7 +41,11 @@ case class CarbonInsertIntoCommand( scala.collection.immutable.Map("fileheader" -> header), overwrite, null, - Some(df)).run(sparkSession) + Some(df), --- End diff -- ok ---
[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1672#discussion_r157346845 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -1394,20 +1394,20 @@ public static String printLine(String a, int num) { * Below method will be used to get the list of segment in * comma separated string format * - * @param segmentList + * @param strings * @return comma separated segment string */ - public static String getSegmentString(List segmentList) { -if (segmentList.isEmpty()) { + public static String convertToString(List strings) { --- End diff -- ok ---
[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...
GitHub user ravipesala reopened a pull request: https://github.com/apache/carbondata/pull/1672 [CARBONDATA-1858][PARTITION] Support querying data from partition table. This PR depends on https://github.com/apache/carbondata/pull/1642 and https://github.com/apache/carbondata/pull/1654 In case of partition table first, use sessioncatalog to prune the partitions. With the partition information, datamap should read partition.map file to get the index file and corresponding blocklets to prune Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata partition-read Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1672.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1672 commit 61f54a1e564401febd1803adc82b10719babeb1b Author: ravipesala Date: 2017-12-04T10:37:03Z Added outputformat for carbon commit ff5393df80bb23a92783761abc20fff4463fc66c Author: ravipesala Date: 2017-12-12T06:12:45Z Added fileformat in carbon commit aafe97ac86f4818eeab94afb3e4f1a5c6ee74774 Author: ravipesala Date: 2017-12-15T19:18:19Z Added support to query using standard partitions ---
[GitHub] carbondata pull request #1672: [CARBONDATA-1858][PARTITION] Support querying...
Github user ravipesala closed the pull request at: https://github.com/apache/carbondata/pull/1672 ---
[GitHub] carbondata issue #1672: [CARBONDATA-1858][PARTITION] Support querying data f...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1672 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2057/ ---
[GitHub] carbondata issue #1674: [CARBONDATA-1859][CARBONDATA-1861] Support show and ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1674 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2056/ ---
[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1575 Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/827/ ---
[GitHub] carbondata issue #1648: [CARBONDATA-1888][PreAggregate][Bug]Fixed compaction...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1648 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2055/ ---
[GitHub] carbondata pull request #1674: [CARBONDATA-1859][CARBONDATA-1861] Support sh...
GitHub user ravipesala opened a pull request: https://github.com/apache/carbondata/pull/1674 [CARBONDATA-1859][CARBONDATA-1861] Support show and drop partitions This PR depends on https://github.com/apache/carbondata/pull/1642 and https://github.com/apache/carbondata/pull/1654 and https://github.com/apache/carbondata/pull/1672 It supports show and drop partitions. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [X] Any interfaces changed? - [X] Any backward compatibility impacted? NO - [X] Document update required? Yes - [X] Testing done Tests added - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata drop-partition Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1674.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1674 commit 6b49e6c4d9e5a960a657097487e94a83012fe491 Author: ravipesala Date: 2017-12-04T10:37:03Z Added outputformat for carbon commit d8a88314b0a19b8e30a3b1ad1a3540f7f0964c7c Author: ravipesala Date: 2017-12-12T06:12:45Z Added fileformat in carbon commit 67c4f83c5b62f03dffb2fe8a8aa53923a889d411 Author: ravipesala Date: 2017-12-15T19:18:19Z Added support to query using standard partitions commit a127ece350c1a9b1a1c7755cb51e4e1f9342cac3 Author: ravipesala Date: 2017-12-16T17:08:00Z Added drop partition feature ---
[GitHub] carbondata issue #1648: [CARBONDATA-1888][PreAggregate][Bug]Fixed compaction...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1648 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2364/ ---
[jira] [Assigned] (CARBONDATA-1882) select a table with 'group by' and perform insert overwrite to another carbon table it fails
[ https://issues.apache.org/jira/browse/CARBONDATA-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Ramana G reassigned CARBONDATA-1882: Assignee: Kushal > select a table with 'group by' and perform insert overwrite to another carbon > table it fails > > > Key: CARBONDATA-1882 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1882 > Project: CarbonData > Issue Type: Bug >Reporter: Kushal Sah >Assignee: Kushal > Fix For: 1.3.0 > > Time Spent: 8h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (CARBONDATA-1882) select a table with 'group by' and perform insert overwrite to another carbon table it fails
[ https://issues.apache.org/jira/browse/CARBONDATA-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Ramana G resolved CARBONDATA-1882. -- Resolution: Fixed Fix Version/s: 1.3.0 > select a table with 'group by' and perform insert overwrite to another carbon > table it fails > > > Key: CARBONDATA-1882 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1882 > Project: CarbonData > Issue Type: Bug >Reporter: Kushal Sah > Fix For: 1.3.0 > > Time Spent: 8h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1641: [CARBONDATA-1882] select with group by and in...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1641 ---
[GitHub] carbondata issue #1641: [CARBONDATA-1882] select with group by and insertove...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1641 Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/826/ ---
[GitHub] carbondata issue #1641: [CARBONDATA-1882] select with group by and insertove...
Github user gvramana commented on the issue: https://github.com/apache/carbondata/pull/1641 LGTM ---
[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1575 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2054/ ---
[GitHub] carbondata issue #1657: [CARBONDATA-1895] Fix issue of create table if not e...
Github user brijoobopanna commented on the issue: https://github.com/apache/carbondata/pull/1657 retest this please ---
[GitHub] carbondata issue #1648: [CARBONDATA-1888][PreAggregate][Bug]Fixed compaction...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1648 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/825/ ---
[jira] [Resolved] (CARBONDATA-1898) Like, Contains, Ends With query optimization in case of or filter
[ https://issues.apache.org/jira/browse/CARBONDATA-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Pesala resolved CARBONDATA-1898. - Resolution: Fixed Fix Version/s: 1.3.0 > Like, Contains, Ends With query optimization in case of or filter > - > > Key: CARBONDATA-1898 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1898 > Project: CarbonData > Issue Type: Improvement >Reporter: kumar vishal > Fix For: 1.3.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > *Problem:* In case of like, contains, ends with filter With all or condition > query is taking more time in carbon > *Solution*: This type of query avoid filter push down and let spark handle > those filters -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1667: [CARBONDATA-1898]Fixed Like, Contains, Ends w...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1667 ---
[GitHub] carbondata issue #1667: [CARBONDATA-1898]Fixed Like, Contains, Ends with que...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1667 LGTM ---
[GitHub] carbondata issue #1648: [CARBONDATA-1888][PreAggregate][Bug]Fixed compaction...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1648 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2053/ ---
[GitHub] carbondata issue #1667: [CARBONDATA-1898]Fixed Like, Contains, Ends with que...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1667 Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/824/ ---
[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1575 retest this please ---
[GitHub] carbondata pull request #1659: [CARBONDATA-1822] Added SDV test cases for Re...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1659 ---
[GitHub] carbondata issue #1641: [CARBONDATA-1882] select with group by and insertove...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1641 Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/823/ ---
[GitHub] carbondata issue #1667: [CARBONDATA-1898]Fixed Like, Contains, Ends with que...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1667 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2052/ ---