[GitHub] carbondata issue #1438: [WIP]insert overwrite fix
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1438 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/690/ ---
[GitHub] carbondata issue #1432: [WIP][CARBONDATA-1608]Support Column Comment for Cre...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/1432 @ravipesala please review ---
[GitHub] carbondata issue #1435: [WIP][CARBONDATA-1626]add data size and index size i...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/1435 @ravipesala please review ---
[GitHub] carbondata issue #1435: [WIP][CARBONDATA-1626]add data size and index size i...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/1435 @gvramana please review ---
[GitHub] carbondata issue #1432: [WIP][CARBONDATA-1608]Support Column Comment for Cre...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/1432 @gvramana please review ---
[jira] [Commented] (CARBONDATA-1624) If SORT_SCOPE is non-GLOBAL_SORT with Spark, set 'carbon.number.of.cores.while.loading' dynamically as per the available executor cores
[ https://issues.apache.org/jira/browse/CARBONDATA-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221758#comment-16221758 ] Ravindra Pesala commented on CARBONDATA-1624: - Welcome to contribute. We should not use the carbonproperties anymore for this dynamic cores as it impacts other loads. First find the available cores which we can allocate for loading per executor before submitting and pass the same information to carbon in RDD compute. > If SORT_SCOPE is non-GLOBAL_SORT with Spark, set > 'carbon.number.of.cores.while.loading' dynamically as per the available > executor cores > > > Key: CARBONDATA-1624 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1624 > Project: CarbonData > Issue Type: Improvement > Components: data-load, spark-integration >Affects Versions: 1.3.0 >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > > If we are using carbondata + spark to load data, we can set > carbon.number.of.cores.while.loading to the number of executor cores. > For example, when set the number of executor cores to 6, it shows that there > are at > least 6 cores per node for loading data, so we can set > carbon.number.of.cores.while.loading to 6 automatically. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1437: [CARBONDATA-1618] Fix issue of not support table com...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1437 @sounakr please review it. ---
[jira] [Commented] (CARBONDATA-1624) If SORT_SCOPE is non-GLOBAL_SORT with Spark, set 'carbon.number.of.cores.while.loading' dynamically as per the available executor cores
[ https://issues.apache.org/jira/browse/CARBONDATA-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221627#comment-16221627 ] Zhichao Zhang commented on CARBONDATA-1624: I can implement this feature. By the way, I find there are many same code snippets to get 'NUM_CORES_LOADING', but there is a method called 'getNumberOfCores' in CarbonProperties to get 'NUM_CORES_LOADING' too, I think we can use method 'CarbonProperties.getNumberOfCores' uniformly to get 'NUM_CORES_LOADING' . {code:java} Integer.parseInt(CarbonProperties.getInstance() .getProperty(CarbonCommonConstants.NUM_CORES_LOADING, CarbonCommonConstants.NUM_CORES_DEFAULT_VAL)); } catch (NumberFormatException e) { thread_pool_size = Integer.parseInt(CarbonCommonConstants.NUM_CORES_DEFAULT_VAL); } {code} right? > If SORT_SCOPE is non-GLOBAL_SORT with Spark, set > 'carbon.number.of.cores.while.loading' dynamically as per the available > executor cores > > > Key: CARBONDATA-1624 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1624 > Project: CarbonData > Issue Type: Improvement > Components: data-load, spark-integration >Affects Versions: 1.3.0 >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > > If we are using carbondata + spark to load data, we can set > carbon.number.of.cores.while.loading to the number of executor cores. > For example, when set the number of executor cores to 6, it shows that there > are at > least 6 cores per node for loading data, so we can set > carbon.number.of.cores.while.loading to 6 automatically. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1437: [CARBONDATA-1618] Fix issue of not support table com...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1437 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/689/ ---
[jira] [Assigned] (CARBONDATA-1624) If SORT_SCOPE is non-GLOBAL_SORT with Spark, set 'carbon.number.of.cores.while.loading' dynamically as per the available executor cores
[ https://issues.apache.org/jira/browse/CARBONDATA-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang reassigned CARBONDATA-1624: -- Assignee: Zhichao Zhang > If SORT_SCOPE is non-GLOBAL_SORT with Spark, set > 'carbon.number.of.cores.while.loading' dynamically as per the available > executor cores > > > Key: CARBONDATA-1624 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1624 > Project: CarbonData > Issue Type: Improvement > Components: data-load, spark-integration >Affects Versions: 1.3.0 >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > > If we are using carbondata + spark to load data, we can set > carbon.number.of.cores.while.loading to the number of executor cores. > For example, when set the number of executor cores to 6, it shows that there > are at > least 6 cores per node for loading data, so we can set > carbon.number.of.cores.while.loading to 6 automatically. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1437: [CARBONDATA-1618] Fix issue of not support table com...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1437 retest this please ---
[jira] [Updated] (CARBONDATA-1593) Add partition to table cause NoSuchTableException
[ https://issues.apache.org/jira/browse/CARBONDATA-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wyp updated CARBONDATA-1593: Priority: Minor (was: Major) > Add partition to table cause NoSuchTableException > - > > Key: CARBONDATA-1593 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1593 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 1.2.0 >Reporter: wyp >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > When I run the following code snippet, I get NoSuchTableException: > {code} > scala> import org.apache.spark.sql.SparkSession > scala> import org.apache.spark.sql.CarbonSession._ > scala> val carbon = > SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://mycluster/user/wyp/carbon") > scala> carbon.sql("CREATE TABLE temp.order_common(id bigint, order_no > string,create_time timestamp) partitioned by (dt string) STORED BY > 'carbondata' > tblproperties('partition_type'='RANGE','RANGE_INFO'='2010,2011')") > scala> carbon.sql("ALTER TABLE temp.order_common ADD PARTITION('2012')") > org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view > 'order_common' not found in database 'default'; > at > org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:76) > at > org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:76) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:76) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:110) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:110) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95) > at > org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:109) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:601) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:601) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95) > at > org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:600) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:69) > at > org.apache.spark.sql.hive.CarbonSessionCatalog.lookupRelation(CarbonSessionState.scala:83) > at > org.apache.spark.sql.internal.CatalogImpl.refreshTable(CatalogImpl.scala:461) > at > org.apache.spark.sql.execution.command.AlterTableSplitPartitionCommand.processSchema(carbonTableSchema.scala:283) > at > org.apache.spark.sql.execution.command.AlterTableSplitPartitionCommand.run(carbonTableSchema.scala:229) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) > ... 50 elided > {code} > but partition {{2012}} already add to table {{temp.order_common}}: > {code} > scala> carbon.sql("show partitions temp.order_common").show(100, 100) > +--+ > | partition| > +--+ > | 0, dt = DEFAULT| > |
[GitHub] carbondata pull request #1372: [WIP] Support object storage by S3 interface
Github user QiangCai closed the pull request at: https://github.com/apache/carbondata/pull/1372 ---
[GitHub] carbondata issue #1433: [CARBONDATA-1517]- Pre Aggregate Create Table Suppor...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1433 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/688/ ---
[GitHub] carbondata pull request #1433: [CARBONDATA-1517]- Pre Aggregate Create Table...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1433#discussion_r147234581 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -831,4 +832,12 @@ object CommonUtil { LOGGER.error(s) } } + + def getScaleAndPrecision(dataType: String): (Int, Int) = { --- End diff -- Moved this method to commonutils and updated all the callers ---
[GitHub] carbondata pull request #1433: [CARBONDATA-1517]- Pre Aggregate Create Table...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1433#discussion_r147227445 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -831,4 +832,12 @@ object CommonUtil { LOGGER.error(s) } } + + def getScaleAndPrecision(dataType: String): (Int, Int) = { --- End diff -- ok ---
[GitHub] carbondata pull request #1433: [CARBONDATA-1517]- Pre Aggregate Create Table...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1433#discussion_r147227377 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala --- @@ -110,6 +111,40 @@ class CarbonFileMetastore extends CarbonMetaStore { } } + /** + * This method will overwrite the existing schema and update it with the given details + * + * @param newTableIdentifier + * @param thriftTableInfo + * @param carbonStorePath + * @param sparkSession + */ + def updateTableSchemaForPreAgg(newTableIdentifier: CarbonTableIdentifier, + oldTableIdentifier: CarbonTableIdentifier, + thriftTableInfo: org.apache.carbondata.format.TableInfo, + carbonStorePath: String)(sparkSession: SparkSession): String = { +val absoluteTableIdentifier = AbsoluteTableIdentifier.fromTablePath(carbonStorePath) --- End diff -- ok ---
[GitHub] carbondata pull request #1433: [CARBONDATA-1517]- Pre Aggregate Create Table...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1433#discussion_r147226705 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonMetaStore.scala --- @@ -66,25 +66,42 @@ trait CarbonMetaStore { * @param carbonStorePath * @param sparkSession */ - def updateTableSchema(newTableIdentifier: CarbonTableIdentifier, + def updateTableSchemaForAlter(newTableIdentifier: CarbonTableIdentifier, oldTableIdentifier: CarbonTableIdentifier, thriftTableInfo: org.apache.carbondata.format.TableInfo, schemaEvolutionEntry: SchemaEvolutionEntry, carbonStorePath: String)(sparkSession: SparkSession): String /** + * This method will overwrite the existing schema and update it with the given details + * + * @param newTableIdentifier + * @param thriftTableInfo + * @param carbonStorePath + * @param sparkSession + */ + def updateTableSchemaForPreAgg(newTableIdentifier: CarbonTableIdentifier, --- End diff -- ok ---
[GitHub] carbondata issue #1436: [WIP][CARBONDATA-1617] Merging carbonindex files wit...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1436 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/687/ ---
[GitHub] carbondata issue #1434: [CARBONDATA-1593]Add partition to table cause NoSuch...
Github user lionelcao commented on the issue: https://github.com/apache/carbondata/pull/1434 @chenliang613 Please help review. ---
[GitHub] carbondata issue #1434: [CARBONDATA-1593]Add partition to table cause NoSuch...
Github user lionelcao commented on the issue: https://github.com/apache/carbondata/pull/1434 LGTM ---
[GitHub] carbondata issue #1438: [WIP]insert overwrite fix
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1438 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/686/ ---
[GitHub] carbondata pull request #1438: [WIP]insert overwrite fix
GitHub user akashrn5 opened a pull request: https://github.com/apache/carbondata/pull/1438 [WIP]insert overwrite fix Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[CARBONDATA-] Description of pull request` - [ ] Make sure to add PR description including - the root cause/problem statement - What is the implemented solution - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata all_num Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1438.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1438 commit 42f1b59f2481c2b15d4a920ee99a051393b684d9 Author: akashrn5 Date: 2017-10-26T14:06:46Z insert overwrite fix ---
[jira] [Created] (CARBONDATA-1627) one job failed among 100 job while performing select operation with 100 different thread
Kushal Sah created CARBONDATA-1627: -- Summary: one job failed among 100 job while performing select operation with 100 different thread Key: CARBONDATA-1627 URL: https://issues.apache.org/jira/browse/CARBONDATA-1627 Project: CarbonData Issue Type: Bug Reporter: Kushal Sah 1) create query (with any 5 column) 2) load data: only 5 records 3) perform select operation by launching 100 threads in parallel (can use Jmeter tool to launch 100 thread) All request will be success only one job failed with an error message:- java.lang.illegalArgumentException:- Config entry enable.unsafe.sort already registered -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1626) add datasize and index size to table status file
Akash R Nilugal created CARBONDATA-1626: --- Summary: add datasize and index size to table status file Key: CARBONDATA-1626 URL: https://issues.apache.org/jira/browse/CARBONDATA-1626 Project: CarbonData Issue Type: Improvement Reporter: Akash R Nilugal Assignee: Akash R Nilugal Priority: Minor if carbondata is used in cloud which will have charging or billing for the queries ran, adding datasize and indexsize in table status will help in billing features. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1432: [WIP][CARBONDATA-1608]Support Column Comment for Cre...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/1432 @jackylk please review ---
[GitHub] carbondata issue #1435: [WIP]add data size and index size in table status fi...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/1435 @jackylk please review ---
[jira] [Resolved] (CARBONDATA-1619) Loading data to a carbondata table with overwrite=true many times will cause NullPointerException
[ https://issues.apache.org/jira/browse/CARBONDATA-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wyp resolved CARBONDATA-1619. - Resolution: Duplicate Fix Version/s: 1.3.0 > Loading data to a carbondata table with overwrite=true many times will cause > NullPointerException > - > > Key: CARBONDATA-1619 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1619 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 1.2.0 >Reporter: wyp > Fix For: 1.3.0 > > > If you loading data to a carbondata table with {{overwrite=true}} many times > will cause {{NullPointerException}}. The following is the code snippet: > {code} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 2.1.0 > /_/ > > Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77) > Type in expressions to have them evaluated. > Type :help for more information. > scala> import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.SparkSession > scala> import org.apache.spark.sql.CarbonSession._ > import org.apache.spark.sql.CarbonSession._ > scala> val carbon = > SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://mycluster/user/wyp/carb") > 17/10/26 12:58:25 WARN spark.SparkContext: Using an existing SparkContext; > some configuration may not take effect. > 17/10/26 12:58:25 WARN util.CarbonProperties: main The custom block > distribution value "null" is invalid. Using the default value "false > 17/10/26 12:58:25 WARN util.CarbonProperties: main The enable vector reader > value "null" is invalid. Using the default value "true > 17/10/26 12:58:25 WARN util.CarbonProperties: main The value "LOCALLOCK" > configured for key carbon.lock.type is invalid for current file system. Use > the default value HDFSLOCK instead. > 17/10/26 12:58:43 WARN metastore.ObjectStore: Failed to get database > global_temp, returning NoSuchObjectException > carbon: org.apache.spark.sql.SparkSession = > org.apache.spark.sql.CarbonSession@718b9d56 > scala> carbon.sql("CREATE TABLE temp.my_table(id bigint) STORED BY > 'carbondata'") > 17/10/26 12:59:03 AUDIT command.CreateTable: > [l-sparkcluster1.test.com][wyp][Thread-1]Creating Table with Database name > [temp] and Table name [my_table] > 17/10/26 12:59:03 WARN hive.HiveExternalCatalog: Couldn't find corresponding > Hive SerDe for data source provider org.apache.spark.sql.CarbonSource. > Persisting data source table `temp`.`my_table` into Hive metastore in Spark > SQL specific format, which is NOT compatible with Hive. > 17/10/26 12:59:03 AUDIT command.CreateTable: > [l-sparkcluster1.test.com][wyp][Thread-1]Table created with Database name > [temp] and Table name [my_table] > res0: org.apache.spark.sql.DataFrame = [] > scala> carbon.sql("insert overwrite table temp.my_table select id from > co.order_common_p where dt = '2010-10'") > 17/10/26 12:59:23 AUDIT rdd.CarbonDataRDDFactory$: > [l-sparkcluster1.test.com][wyp][Thread-1]Data load request has been received > for table temp.my_table > 17/10/26 12:59:23 WARN util.CarbonDataProcessorUtil: main sort scope is set > to LOCAL_SORT > 17/10/26 12:59:26 AUDIT rdd.CarbonDataRDDFactory$: > [l-sparkcluster1.test.com][wyp][Thread-1]Data load is successful for > temp.my_table > res1: org.apache.spark.sql.DataFrame = [] > scala> carbon.sql("insert overwrite table temp.my_table select id from > co.order_common_p where dt = '2010-10'") > 17/10/26 12:59:33 AUDIT rdd.CarbonDataRDDFactory$: > [l-sparkcluster1.test.com][wyp][Thread-1]Data load request has been received > for table temp.my_table > 17/10/26 12:59:33 WARN util.CarbonDataProcessorUtil: main sort scope is set > to LOCAL_SORT > 17/10/26 12:59:52 AUDIT rdd.CarbonDataRDDFactory$: > [l-sparkcluster1.test.com][wyp][Thread-1]Data load is successful for > temp.my_table > res2: org.apache.spark.sql.DataFrame = [] > scala> carbon.sql("insert overwrite table temp.my_table select id from > co.order_common_p where dt = '2012-10'") > 17/10/26 13:00:05 AUDIT rdd.CarbonDataRDDFactory$: > [l-sparkcluster1.test.com][wyp][Thread-1]Data load request has been received > for table temp.my_table > 17/10/26 13:00:05 WARN util.CarbonDataProcessorUtil: main sort scope is set > to LOCAL_SORT > 17/10/26 13:00:08 ERROR filesystem.AbstractDFSCarbonFile: main Exception > occurred:File does not exist: > hdfs://mycluster/user/wyp/carb/temp/my_table/Fact/Part0/Segment_0 > 17/10/26 13:00:09 ERROR command.LoadTable: main > java.lang.NullPointerException > at > org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88) > at > org.a
[jira] [Updated] (CARBONDATA-1625) Introduce new datatype of varchar(size) to store column length more than short limit.
[ https://issues.apache.org/jira/browse/CARBONDATA-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang updated CARBONDATA-1625: --- Description: I am using Spark 2.1 + CarbonData 1.2, and find that if enable.unsafe.sort=true, the length of bytes of column exceed 32768, it will load data unsuccessfully. My test code: {code:java} val longStr = sb.toString() // the getBytes length of longStr exceeds 32768 println(longStr.length()) println(longStr.getBytes("UTF-8").length) import spark.implicits._ val df1 = spark.sparkContext.parallelize(0 to 1000) .map(x => ("a", x.toString(), longStr, x, x.toLong, x * 2)) .toDF("stringField1", "stringField2", "stringField3", "intField", "longField", "int2Field") val df2 = spark.sparkContext.parallelize(1001 to 2000) .map(x => ("b", x.toString(), (x % 2).toString(), x, x.toLong, x * 2)) .toDF("stringField1", "stringField2", "stringField3", "intField", "longField", "int2Field") val df3 = df1.union(df2) val tableName = "study_carbondata_test" spark.sql(s"DROP TABLE IF EXISTS ${tableName} ").show() val sortScope = "LOCAL_SORT" // LOCAL_SORT GLOBAL_SORT spark.sql(s""" | CREATE TABLE IF NOT EXISTS ${tableName} ( |stringField1 string, |stringField2 string, |stringField3 string, |intField int, |longField bigint, |int2Field int | ) | STORED BY 'carbondata' | TBLPROPERTIES('DICTIONARY_INCLUDE'='stringField1, stringField2', |'SORT_COLUMNS'='stringField1, stringField2, intField, longField', |'SORT_SCOPE'='${sortScope}', |'NO_INVERTED_INDEX'='stringField3, int2Field', |'TABLE_BLOCKSIZE'='64' | ) """.stripMargin) df3.write .format("carbondata") .option("tableName", "study_carbondata_test") .option("compress", "true") // just valid when tempCSV is true .option("tempCSV", "false") .option("single_pass", "true") .mode(SaveMode.Append) .save() {code} The error message: {code:java} *java.lang.NegativeArraySizeException at org.apache.carbondata.processing.newflow.sort.unsafe.UnsafeCarbonRowPage.getRow(UnsafeCarbonRowPage.java:182) at org.apache.carbondata.processing.newflow.sort.unsafe.holder.UnsafeInmemoryHolder.readRow(UnsafeInmemoryHolder.java:63) at org.apache.carbondata.processing.newflow.sort.unsafe.merger.UnsafeSingleThreadFinalSortFilesMerger.startSorting(UnsafeSingleThreadFinalSortFilesMerger.java:114) at org.apache.carbondata.processing.newflow.sort.unsafe.merger.UnsafeSingleThreadFinalSortFilesMerger.startFinalMerge(UnsafeSingleThreadFinalSortFilesMerger.java:81) at org.apache.carbondata.processing.newflow.sort.impl.UnsafeParallelReadMergeSorterImpl.sort(UnsafeParallelReadMergeSorterImpl.java:105) at org.apache.carbondata.processing.newflow.steps.SortProcessorStepImpl.execute(SortProcessorStepImpl.java:62) at org.apache.carbondata.processing.newflow.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:87) at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:51) at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.(NewCarbonDataLoadRDD.scala:442) at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.internalCompute(NewCarbonDataLoadRDD.scala:405) at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:62) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)* {code} Currently, the length of column was stored by short type. Introduce new datatype of varchar(size) to store column length more than short limit. was: I am using Spark 2.1 + CarbonData 1.2, and find that if enable.unsafe.sort=true, the length of bytes of column exceed 32768, it will load data unsuccessfully. My test code: val longStr = sb.toString() // the getBytes length of longStr exceeds 32768 println(longStr.length()) println(longStr.getBytes("UTF-8").length) import spark.implicits._ val df1 = spark.sparkContext.parallelize(0 to 1000) .map(x => ("a", x.toString(), longStr, x, x.toLong, x * 2)) .toDF("stringField1", "stringField2", "stringField3", "intField", "longField", "int2Field") val df2 = spark.sparkContext.parallelize(1001 to 2000) .map(x => ("b", x.toString(), (x % 2).toString(), x, x.toLong, x * 2)) .toDF("stringField1", "stringField2", "stringField3", "intField", "longField", "int2Field") val df3 = df1.union(df2) val tab
[jira] [Created] (CARBONDATA-1625) Introduce new datatype of varchar(size) to store column length more than short limit.
Zhichao Zhang created CARBONDATA-1625: -- Summary: Introduce new datatype of varchar(size) to store column length more than short limit. Key: CARBONDATA-1625 URL: https://issues.apache.org/jira/browse/CARBONDATA-1625 Project: CarbonData Issue Type: New Feature Components: file-format Reporter: Zhichao Zhang Priority: Minor I am using Spark 2.1 + CarbonData 1.2, and find that if enable.unsafe.sort=true, the length of bytes of column exceed 32768, it will load data unsuccessfully. My test code: val longStr = sb.toString() // the getBytes length of longStr exceeds 32768 println(longStr.length()) println(longStr.getBytes("UTF-8").length) import spark.implicits._ val df1 = spark.sparkContext.parallelize(0 to 1000) .map(x => ("a", x.toString(), longStr, x, x.toLong, x * 2)) .toDF("stringField1", "stringField2", "stringField3", "intField", "longField", "int2Field") val df2 = spark.sparkContext.parallelize(1001 to 2000) .map(x => ("b", x.toString(), (x % 2).toString(), x, x.toLong, x * 2)) .toDF("stringField1", "stringField2", "stringField3", "intField", "longField", "int2Field") val df3 = df1.union(df2) val tableName = "study_carbondata_test" spark.sql(s"DROP TABLE IF EXISTS ${tableName} ").show() val sortScope = "LOCAL_SORT" // LOCAL_SORT GLOBAL_SORT spark.sql(s""" | CREATE TABLE IF NOT EXISTS ${tableName} ( |stringField1 string, |stringField2 string, |stringField3 string, |intField int, |longField bigint, |int2Field int | ) | STORED BY 'carbondata' | TBLPROPERTIES('DICTIONARY_INCLUDE'='stringField1, stringField2', |'SORT_COLUMNS'='stringField1, stringField2, intField, longField', |'SORT_SCOPE'='${sortScope}', |'NO_INVERTED_INDEX'='stringField3, int2Field', |'TABLE_BLOCKSIZE'='64' | ) """.stripMargin) df3.write .format("carbondata") .option("tableName", "study_carbondata_test") .option("compress", "true") // just valid when tempCSV is true .option("tempCSV", "false") .option("single_pass", "true") .mode(SaveMode.Append) .save() The error message: *java.lang.NegativeArraySizeException at org.apache.carbondata.processing.newflow.sort.unsafe.UnsafeCarbonRowPage.getRow(UnsafeCarbonRowPage.java:182) at org.apache.carbondata.processing.newflow.sort.unsafe.holder.UnsafeInmemoryHolder.readRow(UnsafeInmemoryHolder.java:63) at org.apache.carbondata.processing.newflow.sort.unsafe.merger.UnsafeSingleThreadFinalSortFilesMerger.startSorting(UnsafeSingleThreadFinalSortFilesMerger.java:114) at org.apache.carbondata.processing.newflow.sort.unsafe.merger.UnsafeSingleThreadFinalSortFilesMerger.startFinalMerge(UnsafeSingleThreadFinalSortFilesMerger.java:81) at org.apache.carbondata.processing.newflow.sort.impl.UnsafeParallelReadMergeSorterImpl.sort(UnsafeParallelReadMergeSorterImpl.java:105) at org.apache.carbondata.processing.newflow.steps.SortProcessorStepImpl.execute(SortProcessorStepImpl.java:62) at org.apache.carbondata.processing.newflow.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:87) at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:51) at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.(NewCarbonDataLoadRDD.scala:442) at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.internalCompute(NewCarbonDataLoadRDD.scala:405) at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:62) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)* Currently, the length of column was stored by short type. Introduce new datatype of varchar(size) to store column length more than short limit. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1624) If SORT_SCOPE is non-GLOBAL_SORT with Spark, set 'carbon.number.of.cores.while.loading' dynamically as per the available executor cores
Zhichao Zhang created CARBONDATA-1624: -- Summary: If SORT_SCOPE is non-GLOBAL_SORT with Spark, set 'carbon.number.of.cores.while.loading' dynamically as per the available executor cores Key: CARBONDATA-1624 URL: https://issues.apache.org/jira/browse/CARBONDATA-1624 Project: CarbonData Issue Type: Improvement Components: data-load, spark-integration Affects Versions: 1.3.0 Reporter: Zhichao Zhang Priority: Minor If we are using carbondata + spark to load data, we can set carbon.number.of.cores.while.loading to the number of executor cores. For example, when set the number of executor cores to 6, it shows that there are at least 6 cores per node for loading data, so we can set carbon.number.of.cores.while.loading to 6 automatically. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1418: [CARBONDATA-1573] Support Database Location Configur...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1418 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/685/ ---
[GitHub] carbondata issue #1418: [CARBONDATA-1573] Support Database Location Configur...
Github user mohammadshahidkhan commented on the issue: https://github.com/apache/carbondata/pull/1418 retest this please ---
[jira] [Updated] (CARBONDATA-1573) Support Database Location Configuration while Creating Database/ Support Creation of carbon Table in the database location
[ https://issues.apache.org/jira/browse/CARBONDATA-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Shahid Khan updated CARBONDATA-1573: - Summary: Support Database Location Configuration while Creating Database/ Support Creation of carbon Table in the database location (was: Support Database Location Configuration while Creating Database) > Support Database Location Configuration while Creating Database/ Support > Creation of carbon Table in the database location > -- > > Key: CARBONDATA-1573 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1573 > Project: CarbonData > Issue Type: New Feature > Components: core, data-load, data-query, hadoop-integration, > presto-integration, spark-integration >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan > > Support Creation of carbon table at the database location > *Please refer to for Design and discussion:* > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Support-Database-Location-Configuration-while-Creating-Database-td23492.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (CARBONDATA-1622) Ignore empty line when load from csv
[ https://issues.apache.org/jira/browse/CARBONDATA-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weizhong closed CARBONDATA-1622. Resolution: Duplicate > Ignore empty line when load from csv > > > Key: CARBONDATA-1622 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1622 > Project: CarbonData > Issue Type: Improvement > Components: data-load >Reporter: Weizhong >Priority: Minor > > if csv have many empty line, then will store null for empty line on > CarbonData, but this is unused and waste space. > for example: > in csv the data is > -- > 1,a > // emptyline > 2,b > // emptyline > -- > store to CarbonData is > -- > 1,a > null,null > 2,b > null,null > -- > after change, then it will be > -- > 1,a > 2,b > -- -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (CARBONDATA-1621) Ignore empty line when load from csv
[ https://issues.apache.org/jira/browse/CARBONDATA-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weizhong closed CARBONDATA-1621. Resolution: Duplicate > Ignore empty line when load from csv > > > Key: CARBONDATA-1621 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1621 > Project: CarbonData > Issue Type: Improvement > Components: data-load >Reporter: Weizhong >Priority: Minor > > if csv have many empty line, then will store null for empty line on > CarbonData, but this is unused and waste space. > for example: > in csv the data is > -- > 1,a > // emptyline > 2,b > // emptyline > -- > store to CarbonData is > -- > 1,a > null,null > 2,b > null,null > -- > after change, then it will be > -- > 1,a > 2,b > -- -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1436: [WIP][CARBONDATA-1617] Merging carbonindex files wit...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1436 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1321/ ---
[jira] [Closed] (CARBONDATA-1620) Ignore empty line when load from csv
[ https://issues.apache.org/jira/browse/CARBONDATA-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weizhong closed CARBONDATA-1620. Resolution: Duplicate > Ignore empty line when load from csv > > > Key: CARBONDATA-1620 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1620 > Project: CarbonData > Issue Type: Improvement > Components: data-load >Reporter: Weizhong >Priority: Minor > > if csv have many empty line, then will store null for empty line on > CarbonData, but this is unused and waste space. > for example: > in csv the data is > -- > 1,a > // emptyline > 2,b > // emptyline > -- > store to CarbonData is > -- > 1,a > null,null > 2,b > null,null > -- > after change, then it will be > -- > 1,a > 2,b > -- -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1623) Ignore empty line when load from csv
Weizhong created CARBONDATA-1623: Summary: Ignore empty line when load from csv Key: CARBONDATA-1623 URL: https://issues.apache.org/jira/browse/CARBONDATA-1623 Project: CarbonData Issue Type: Improvement Components: data-load Reporter: Weizhong Priority: Minor if csv have many empty line, then will store null for empty line on CarbonData, but this is unused and waste space. for example: in csv the data is -- 1,a // emptyline 2,b // emptyline -- store to CarbonData is -- 1,a null,null 2,b null,null -- after change, then it will be -- 1,a 2,b -- -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1623) Ignore empty line when load from csv
[ https://issues.apache.org/jira/browse/CARBONDATA-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weizhong reassigned CARBONDATA-1623: Assignee: Weizhong > Ignore empty line when load from csv > > > Key: CARBONDATA-1623 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1623 > Project: CarbonData > Issue Type: Improvement > Components: data-load >Reporter: Weizhong >Assignee: Weizhong >Priority: Minor > > if csv have many empty line, then will store null for empty line on > CarbonData, but this is unused and waste space. > for example: > in csv the data is > -- > 1,a > // emptyline > 2,b > // emptyline > -- > store to CarbonData is > -- > 1,a > null,null > 2,b > null,null > -- > after change, then it will be > -- > 1,a > 2,b > -- -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1620) Ignore empty line when load from csv
Weizhong created CARBONDATA-1620: Summary: Ignore empty line when load from csv Key: CARBONDATA-1620 URL: https://issues.apache.org/jira/browse/CARBONDATA-1620 Project: CarbonData Issue Type: Improvement Components: data-load Reporter: Weizhong Priority: Minor if csv have many empty line, then will store null for empty line on CarbonData, but this is unused and waste space. for example: in csv the data is -- 1,a // emptyline 2,b // emptyline -- store to CarbonData is -- 1,a null,null 2,b null,null -- after change, then it will be -- 1,a 2,b -- -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1621) Ignore empty line when load from csv
Weizhong created CARBONDATA-1621: Summary: Ignore empty line when load from csv Key: CARBONDATA-1621 URL: https://issues.apache.org/jira/browse/CARBONDATA-1621 Project: CarbonData Issue Type: Improvement Components: data-load Reporter: Weizhong Priority: Minor if csv have many empty line, then will store null for empty line on CarbonData, but this is unused and waste space. for example: in csv the data is -- 1,a // emptyline 2,b // emptyline -- store to CarbonData is -- 1,a null,null 2,b null,null -- after change, then it will be -- 1,a 2,b -- -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1622) Ignore empty line when load from csv
Weizhong created CARBONDATA-1622: Summary: Ignore empty line when load from csv Key: CARBONDATA-1622 URL: https://issues.apache.org/jira/browse/CARBONDATA-1622 Project: CarbonData Issue Type: Improvement Components: data-load Reporter: Weizhong Priority: Minor if csv have many empty line, then will store null for empty line on CarbonData, but this is unused and waste space. for example: in csv the data is -- 1,a // emptyline 2,b // emptyline -- store to CarbonData is -- 1,a null,null 2,b null,null -- after change, then it will be -- 1,a 2,b -- -- This message was sent by Atlassian JIRA (v6.4.14#64029)