date:20171026

[GitHub] carbondata issue #1438: [WIP]insert overwrite fix

2017-10-26 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1438
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/690/



---

[GitHub] carbondata issue #1432: [WIP][CARBONDATA-1608]Support Column Comment for Cre...

2017-10-26 Thread akashrn5

Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/1432
  
@ravipesala please review


---

[GitHub] carbondata issue #1435: [WIP][CARBONDATA-1626]add data size and index size i...

2017-10-26 Thread akashrn5

Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/1435
  
@ravipesala please review


---

[GitHub] carbondata issue #1435: [WIP][CARBONDATA-1626]add data size and index size i...

2017-10-26 Thread akashrn5

Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/1435
  
@gvramana please review


---

[GitHub] carbondata issue #1432: [WIP][CARBONDATA-1608]Support Column Comment for Cre...

2017-10-26 Thread akashrn5

Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/1432
  
@gvramana please review


---

[jira] [Commented] (CARBONDATA-1624) If SORT_SCOPE is non-GLOBAL_SORT with Spark, set 'carbon.number.of.cores.while.loading' dynamically as per the available executor cores

2017-10-26 Thread Ravindra Pesala (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221758#comment-16221758
 ] 

Ravindra Pesala commented on CARBONDATA-1624:
-

Welcome to contribute.
We should not use the carbonproperties anymore for this dynamic cores as it 
impacts other loads. First find the available cores which we can allocate for 
loading per executor before  submitting and pass the same information to carbon 
in RDD compute.

> If SORT_SCOPE is non-GLOBAL_SORT with Spark, set 
> 'carbon.number.of.cores.while.loading' dynamically as per the available 
> executor cores 
> 
>
> Key: CARBONDATA-1624
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1624
> Project: CarbonData
>  Issue Type: Improvement
>  Components: data-load, spark-integration
>Affects Versions: 1.3.0
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
>
> If we are using carbondata + spark to load data, we can set 
> carbon.number.of.cores.while.loading to the  number of executor cores. 
> For example, when set the number of executor cores to 6, it shows that there 
> are at 
> least 6 cores per node for loading data, so we can set 
> carbon.number.of.cores.while.loading to 6 automatically. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1437: [CARBONDATA-1618] Fix issue of not support table com...

2017-10-26 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/1437
  
@sounakr  please review it.


---

[jira] [Commented] (CARBONDATA-1624) If SORT_SCOPE is non-GLOBAL_SORT with Spark, set 'carbon.number.of.cores.while.loading' dynamically as per the available executor cores

2017-10-26 Thread Zhichao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221627#comment-16221627
 ] 

Zhichao  Zhang commented on CARBONDATA-1624:


I can implement this feature.
By the way, I find there are many same code snippets to get 
'NUM_CORES_LOADING', but there is a method called 'getNumberOfCores' in 
CarbonProperties to get  'NUM_CORES_LOADING' too, I think we can use method 
'CarbonProperties.getNumberOfCores' uniformly to get  'NUM_CORES_LOADING' .

{code:java}
Integer.parseInt(CarbonProperties.getInstance()
  .getProperty(CarbonCommonConstants.NUM_CORES_LOADING,
  CarbonCommonConstants.NUM_CORES_DEFAULT_VAL));
} catch (NumberFormatException e) {
  thread_pool_size = 
Integer.parseInt(CarbonCommonConstants.NUM_CORES_DEFAULT_VAL);
}
{code}

right?


> If SORT_SCOPE is non-GLOBAL_SORT with Spark, set 
> 'carbon.number.of.cores.while.loading' dynamically as per the available 
> executor cores 
> 
>
> Key: CARBONDATA-1624
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1624
> Project: CarbonData
>  Issue Type: Improvement
>  Components: data-load, spark-integration
>Affects Versions: 1.3.0
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
>
> If we are using carbondata + spark to load data, we can set 
> carbon.number.of.cores.while.loading to the  number of executor cores. 
> For example, when set the number of executor cores to 6, it shows that there 
> are at 
> least 6 cores per node for loading data, so we can set 
> carbon.number.of.cores.while.loading to 6 automatically. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1437: [CARBONDATA-1618] Fix issue of not support table com...

2017-10-26 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1437
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/689/



---

[jira] [Assigned] (CARBONDATA-1624) If SORT_SCOPE is non-GLOBAL_SORT with Spark, set 'carbon.number.of.cores.while.loading' dynamically as per the available executor cores

2017-10-26 Thread Zhichao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang reassigned CARBONDATA-1624:
--

Assignee: Zhichao  Zhang

> If SORT_SCOPE is non-GLOBAL_SORT with Spark, set 
> 'carbon.number.of.cores.while.loading' dynamically as per the available 
> executor cores 
> 
>
> Key: CARBONDATA-1624
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1624
> Project: CarbonData
>  Issue Type: Improvement
>  Components: data-load, spark-integration
>Affects Versions: 1.3.0
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
>
> If we are using carbondata + spark to load data, we can set 
> carbon.number.of.cores.while.loading to the  number of executor cores. 
> For example, when set the number of executor cores to 6, it shows that there 
> are at 
> least 6 cores per node for loading data, so we can set 
> carbon.number.of.cores.while.loading to 6 automatically. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1437: [CARBONDATA-1618] Fix issue of not support table com...

2017-10-26 Thread chenerlu

Github user chenerlu commented on the issue:

https://github.com/apache/carbondata/pull/1437
  
retest this please


---

[jira] [Updated] (CARBONDATA-1593) Add partition to table cause NoSuchTableException

2017-10-26 Thread wyp (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wyp updated CARBONDATA-1593:

Priority: Minor  (was: Major)

> Add partition to table cause NoSuchTableException
> -
>
> Key: CARBONDATA-1593
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1593
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 1.2.0
>Reporter: wyp
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When I run the following code snippet, I get NoSuchTableException:
> {code}
> scala> import org.apache.spark.sql.SparkSession
> scala> import org.apache.spark.sql.CarbonSession._
> scala> val carbon = 
> SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://mycluster/user/wyp/carbon")
> scala> carbon.sql("CREATE TABLE temp.order_common(id bigint,  order_no 
> string,create_time timestamp)  partitioned by (dt string) STORED BY 
> 'carbondata' 
> tblproperties('partition_type'='RANGE','RANGE_INFO'='2010,2011')")
> scala> carbon.sql("ALTER TABLE temp.order_common ADD PARTITION('2012')")
> org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 
> 'order_common' not found in database 'default';
>   at 
> org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:76)
>   at 
> org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:76)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:76)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:110)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:110)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:109)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:601)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:601)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:600)
>   at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:106)
>   at 
> org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:69)
>   at 
> org.apache.spark.sql.hive.CarbonSessionCatalog.lookupRelation(CarbonSessionState.scala:83)
>   at 
> org.apache.spark.sql.internal.CatalogImpl.refreshTable(CatalogImpl.scala:461)
>   at 
> org.apache.spark.sql.execution.command.AlterTableSplitPartitionCommand.processSchema(carbonTableSchema.scala:283)
>   at 
> org.apache.spark.sql.execution.command.AlterTableSplitPartitionCommand.run(carbonTableSchema.scala:229)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:185)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
>   ... 50 elided
> {code}
> but partition {{2012}} already add to table {{temp.order_common}}:
> {code}
> scala> carbon.sql("show partitions  temp.order_common").show(100, 100)
> +--+
> | partition|
> +--+
> |   0, dt = DEFAULT|
> |

[GitHub] carbondata pull request #1372: [WIP] Support object storage by S3 interface

2017-10-26 Thread QiangCai

Github user QiangCai closed the pull request at:

https://github.com/apache/carbondata/pull/1372


---

[GitHub] carbondata issue #1433: [CARBONDATA-1517]- Pre Aggregate Create Table Suppor...

2017-10-26 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1433
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/688/



---

[GitHub] carbondata pull request #1433: [CARBONDATA-1517]- Pre Aggregate Create Table...

2017-10-26 Thread kumarvishal09

Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1433#discussion_r147234581
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
 ---
@@ -831,4 +832,12 @@ object CommonUtil {
 LOGGER.error(s)
 }
   }
+
+  def getScaleAndPrecision(dataType: String): (Int, Int) = {
--- End diff --

Moved this method to commonutils and updated all the callers


---

[GitHub] carbondata pull request #1433: [CARBONDATA-1517]- Pre Aggregate Create Table...

2017-10-26 Thread kumarvishal09

Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1433#discussion_r147227445
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
 ---
@@ -831,4 +832,12 @@ object CommonUtil {
 LOGGER.error(s)
 }
   }
+
+  def getScaleAndPrecision(dataType: String): (Int, Int) = {
--- End diff --

ok


---

[GitHub] carbondata pull request #1433: [CARBONDATA-1517]- Pre Aggregate Create Table...

2017-10-26 Thread kumarvishal09

Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1433#discussion_r147227377
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala
 ---
@@ -110,6 +111,40 @@ class CarbonFileMetastore extends CarbonMetaStore {
 }
   }
 
+  /**
+   * This method will overwrite the existing schema and update it with the 
given details
+   *
+   * @param newTableIdentifier
+   * @param thriftTableInfo
+   * @param carbonStorePath
+   * @param sparkSession
+   */
+  def updateTableSchemaForPreAgg(newTableIdentifier: CarbonTableIdentifier,
+  oldTableIdentifier: CarbonTableIdentifier,
+  thriftTableInfo: org.apache.carbondata.format.TableInfo,
+  carbonStorePath: String)(sparkSession: SparkSession): String = {
+val absoluteTableIdentifier = 
AbsoluteTableIdentifier.fromTablePath(carbonStorePath)
--- End diff --

ok


---

[GitHub] carbondata pull request #1433: [CARBONDATA-1517]- Pre Aggregate Create Table...

2017-10-26 Thread kumarvishal09

Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1433#discussion_r147226705
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonMetaStore.scala
 ---
@@ -66,25 +66,42 @@ trait CarbonMetaStore {
* @param carbonStorePath
* @param sparkSession
*/
-  def updateTableSchema(newTableIdentifier: CarbonTableIdentifier,
+  def updateTableSchemaForAlter(newTableIdentifier: CarbonTableIdentifier,
   oldTableIdentifier: CarbonTableIdentifier,
   thriftTableInfo: org.apache.carbondata.format.TableInfo,
   schemaEvolutionEntry: SchemaEvolutionEntry,
   carbonStorePath: String)(sparkSession: SparkSession): String
 
   /**
+   * This method will overwrite the existing schema and update it with the 
given details
+   *
+   * @param newTableIdentifier
+   * @param thriftTableInfo
+   * @param carbonStorePath
+   * @param sparkSession
+   */
+  def updateTableSchemaForPreAgg(newTableIdentifier: CarbonTableIdentifier,
--- End diff --

ok


---

[GitHub] carbondata issue #1436: [WIP][CARBONDATA-1617] Merging carbonindex files wit...

2017-10-26 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1436
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/687/



---

[GitHub] carbondata issue #1434: [CARBONDATA-1593]Add partition to table cause NoSuch...

2017-10-26 Thread lionelcao

Github user lionelcao commented on the issue:

https://github.com/apache/carbondata/pull/1434
  
@chenliang613 Please help review.


---

[GitHub] carbondata issue #1434: [CARBONDATA-1593]Add partition to table cause NoSuch...

2017-10-26 Thread lionelcao

Github user lionelcao commented on the issue:

https://github.com/apache/carbondata/pull/1434
  
LGTM


---

[GitHub] carbondata issue #1438: [WIP]insert overwrite fix

2017-10-26 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1438
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/686/



---

[GitHub] carbondata pull request #1438: [WIP]insert overwrite fix

2017-10-26 Thread akashrn5

GitHub user akashrn5 opened a pull request:

https://github.com/apache/carbondata/pull/1438

[WIP]insert overwrite fix

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[CARBONDATA-] Description of pull request`
   
 - [ ] Make sure to add PR description including

- the root cause/problem statement
- What is the implemented solution

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
 
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
 
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
 
---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akashrn5/incubator-carbondata all_num

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1438.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1438


commit 42f1b59f2481c2b15d4a920ee99a051393b684d9
Author: akashrn5 
Date:   2017-10-26T14:06:46Z

insert overwrite fix




---

[jira] [Created] (CARBONDATA-1627) one job failed among 100 job while performing select operation with 100 different thread

2017-10-26 Thread Kushal Sah (JIRA)

Kushal Sah created CARBONDATA-1627:
--

 Summary: one job failed among 100 job while performing select 
operation with 100 different thread
 Key: CARBONDATA-1627
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1627
 Project: CarbonData
  Issue Type: Bug
Reporter: Kushal Sah


1) create query (with any 5 column)
2) load data: only 5 records
3) perform select operation by launching 100 threads in parallel (can use 
Jmeter tool to launch 100 thread)
All request will be success only one job failed 
with an error message:- 
java.lang.illegalArgumentException:- Config entry enable.unsafe.sort already 
registered



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (CARBONDATA-1626) add datasize and index size to table status file

2017-10-26 Thread Akash R Nilugal (JIRA)

Akash R Nilugal created CARBONDATA-1626:
---

 Summary: add datasize and index size to table status file
 Key: CARBONDATA-1626
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1626
 Project: CarbonData
  Issue Type: Improvement
Reporter: Akash R Nilugal
Assignee: Akash R Nilugal
Priority: Minor


if carbondata is used in cloud which will have charging or billing for the 
queries ran, adding datasize and indexsize in table status will help in billing 
features.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1432: [WIP][CARBONDATA-1608]Support Column Comment for Cre...

2017-10-26 Thread akashrn5

Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/1432
  
@jackylk please review


---

[GitHub] carbondata issue #1435: [WIP]add data size and index size in table status fi...

2017-10-26 Thread akashrn5

Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/1435
  
@jackylk please review


---

[jira] [Resolved] (CARBONDATA-1619) Loading data to a carbondata table with overwrite=true many times will cause NullPointerException

2017-10-26 Thread wyp (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wyp resolved CARBONDATA-1619.
-
   Resolution: Duplicate
Fix Version/s: 1.3.0

> Loading data to a carbondata table with overwrite=true many times will cause 
> NullPointerException
> -
>
> Key: CARBONDATA-1619
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1619
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 1.2.0
>Reporter: wyp
> Fix For: 1.3.0
>
>
> If you loading data to a carbondata table with {{overwrite=true}} many times 
> will cause {{NullPointerException}}. The following is the code snippet:
> {code}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.1.0
>   /_/
>  
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.SparkSession
> scala> import org.apache.spark.sql.CarbonSession._
> import org.apache.spark.sql.CarbonSession._
> scala> val carbon = 
> SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://mycluster/user/wyp/carb")
> 17/10/26 12:58:25 WARN spark.SparkContext: Using an existing SparkContext; 
> some configuration may not take effect.
> 17/10/26 12:58:25 WARN util.CarbonProperties: main The custom block 
> distribution value "null" is invalid. Using the default value "false
> 17/10/26 12:58:25 WARN util.CarbonProperties: main The enable vector reader 
> value "null" is invalid. Using the default value "true
> 17/10/26 12:58:25 WARN util.CarbonProperties: main The value "LOCALLOCK" 
> configured for key carbon.lock.type is invalid for current file system. Use 
> the default value HDFSLOCK instead.
> 17/10/26 12:58:43 WARN metastore.ObjectStore: Failed to get database 
> global_temp, returning NoSuchObjectException
> carbon: org.apache.spark.sql.SparkSession = 
> org.apache.spark.sql.CarbonSession@718b9d56
> scala> carbon.sql("CREATE TABLE temp.my_table(id bigint)  STORED BY 
> 'carbondata'")
> 17/10/26 12:59:03 AUDIT command.CreateTable: 
> [l-sparkcluster1.test.com][wyp][Thread-1]Creating Table with Database name 
> [temp] and Table name [my_table]
> 17/10/26 12:59:03 WARN hive.HiveExternalCatalog: Couldn't find corresponding 
> Hive SerDe for data source provider org.apache.spark.sql.CarbonSource. 
> Persisting data source table `temp`.`my_table` into Hive metastore in Spark 
> SQL specific format, which is NOT compatible with Hive.
> 17/10/26 12:59:03 AUDIT command.CreateTable: 
> [l-sparkcluster1.test.com][wyp][Thread-1]Table created with Database name 
> [temp] and Table name [my_table]
> res0: org.apache.spark.sql.DataFrame = []
> scala> carbon.sql("insert overwrite table temp.my_table select id from  
> co.order_common_p where dt = '2010-10'")
> 17/10/26 12:59:23 AUDIT rdd.CarbonDataRDDFactory$: 
> [l-sparkcluster1.test.com][wyp][Thread-1]Data load request has been received 
> for table temp.my_table
> 17/10/26 12:59:23 WARN util.CarbonDataProcessorUtil: main sort scope is set 
> to LOCAL_SORT
> 17/10/26 12:59:26 AUDIT rdd.CarbonDataRDDFactory$: 
> [l-sparkcluster1.test.com][wyp][Thread-1]Data load is successful for 
> temp.my_table
> res1: org.apache.spark.sql.DataFrame = []
> scala> carbon.sql("insert overwrite table temp.my_table select id from  
> co.order_common_p where dt = '2010-10'")
> 17/10/26 12:59:33 AUDIT rdd.CarbonDataRDDFactory$: 
> [l-sparkcluster1.test.com][wyp][Thread-1]Data load request has been received 
> for table temp.my_table
> 17/10/26 12:59:33 WARN util.CarbonDataProcessorUtil: main sort scope is set 
> to LOCAL_SORT
> 17/10/26 12:59:52 AUDIT rdd.CarbonDataRDDFactory$: 
> [l-sparkcluster1.test.com][wyp][Thread-1]Data load is successful for 
> temp.my_table
> res2: org.apache.spark.sql.DataFrame = []
> scala> carbon.sql("insert overwrite table temp.my_table select id from  
> co.order_common_p where dt = '2012-10'")
> 17/10/26 13:00:05 AUDIT rdd.CarbonDataRDDFactory$: 
> [l-sparkcluster1.test.com][wyp][Thread-1]Data load request has been received 
> for table temp.my_table
> 17/10/26 13:00:05 WARN util.CarbonDataProcessorUtil: main sort scope is set 
> to LOCAL_SORT
> 17/10/26 13:00:08 ERROR filesystem.AbstractDFSCarbonFile: main Exception 
> occurred:File does not exist: 
> hdfs://mycluster/user/wyp/carb/temp/my_table/Fact/Part0/Segment_0
> 17/10/26 13:00:09 ERROR command.LoadTable: main 
> java.lang.NullPointerException
>   at 
> org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88)
>   at 
> org.a

[jira] [Updated] (CARBONDATA-1625) Introduce new datatype of varchar(size) to store column length more than short limit.

2017-10-26 Thread Zhichao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang updated CARBONDATA-1625:
---
Description: 
I am using Spark 2.1 + CarbonData 1.2, and find that if 
enable.unsafe.sort=true, the length of bytes of column exceed 32768, it will 
load data unsuccessfully. 

My test code: 

{code:java}
val longStr = sb.toString()  // the getBytes length of longStr exceeds 32768 
println(longStr.length()) 
println(longStr.getBytes("UTF-8").length) 

import spark.implicits._ 
val df1 = spark.sparkContext.parallelize(0 to 1000) 
  .map(x => ("a", x.toString(), longStr, x, x.toLong, x * 2)) 
  .toDF("stringField1", "stringField2", "stringField3", "intField", 
"longField", "int2Field") 
  
val df2 = spark.sparkContext.parallelize(1001 to 2000) 
  .map(x => ("b", x.toString(), (x % 2).toString(), x, x.toLong, x * 2)) 
  .toDF("stringField1", "stringField2", "stringField3", "intField", 
"longField", "int2Field") 
  
val df3 = df1.union(df2) 
val tableName = "study_carbondata_test" 
spark.sql(s"DROP TABLE IF EXISTS ${tableName} ").show() 
val sortScope = "LOCAL_SORT"   // LOCAL_SORT   GLOBAL_SORT 
spark.sql(s""" 
|  CREATE TABLE IF NOT EXISTS ${tableName} ( 
|stringField1  string, 
|stringField2  string, 
|stringField3  string, 
|intField  int, 
|longField bigint, 
|int2Field int 
|  ) 
|  STORED BY 'carbondata' 
|  TBLPROPERTIES('DICTIONARY_INCLUDE'='stringField1, stringField2', 
|'SORT_COLUMNS'='stringField1, stringField2, intField, 
longField', 
|'SORT_SCOPE'='${sortScope}', 
|'NO_INVERTED_INDEX'='stringField3, int2Field', 
|'TABLE_BLOCKSIZE'='64' 
|  ) 
   """.stripMargin) 
df3.write 
  .format("carbondata")   
  .option("tableName", "study_carbondata_test") 
  .option("compress", "true")  // just valid when tempCSV is true 
  .option("tempCSV", "false") 
  .option("single_pass", "true") 
  .mode(SaveMode.Append) 
  .save()
{code}


The error message: 

{code:java}
*java.lang.NegativeArraySizeException 
at 
org.apache.carbondata.processing.newflow.sort.unsafe.UnsafeCarbonRowPage.getRow(UnsafeCarbonRowPage.java:182)
 
at 
org.apache.carbondata.processing.newflow.sort.unsafe.holder.UnsafeInmemoryHolder.readRow(UnsafeInmemoryHolder.java:63)
 
at 
org.apache.carbondata.processing.newflow.sort.unsafe.merger.UnsafeSingleThreadFinalSortFilesMerger.startSorting(UnsafeSingleThreadFinalSortFilesMerger.java:114)
 
at 
org.apache.carbondata.processing.newflow.sort.unsafe.merger.UnsafeSingleThreadFinalSortFilesMerger.startFinalMerge(UnsafeSingleThreadFinalSortFilesMerger.java:81)
 
at 
org.apache.carbondata.processing.newflow.sort.impl.UnsafeParallelReadMergeSorterImpl.sort(UnsafeParallelReadMergeSorterImpl.java:105)
 
at 
org.apache.carbondata.processing.newflow.steps.SortProcessorStepImpl.execute(SortProcessorStepImpl.java:62)
 
at 
org.apache.carbondata.processing.newflow.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:87)
 
at 
org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:51)
 
at 
org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.(NewCarbonDataLoadRDD.scala:442)
 
at 
org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.internalCompute(NewCarbonDataLoadRDD.scala:405)
 
at 
org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:62) 
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) 
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)* 
{code}


Currently, the length of column was stored by short type.

Introduce new datatype of  varchar(size) to store column length more than short 
limit.

  was:
I am using Spark 2.1 + CarbonData 1.2, and find that if 
enable.unsafe.sort=true, the length of bytes of column exceed 32768, it will 
load data unsuccessfully. 

My test code: 
val longStr = sb.toString()  // the getBytes length of longStr exceeds 
32768 
println(longStr.length()) 
println(longStr.getBytes("UTF-8").length) 

import spark.implicits._ 
val df1 = spark.sparkContext.parallelize(0 to 1000) 
  .map(x => ("a", x.toString(), longStr, x, x.toLong, x * 2)) 
  .toDF("stringField1", "stringField2", "stringField3", "intField", 
"longField", "int2Field") 
  
val df2 = spark.sparkContext.parallelize(1001 to 2000) 
  .map(x => ("b", x.toString(), (x % 2).toString(), x, x.toLong, x * 2)) 
  .toDF("stringField1", "stringField2", "stringField3", "intField", 
"longField", "int2Field") 
  
val df3 = df1.union(df2) 
val tab

[jira] [Created] (CARBONDATA-1625) Introduce new datatype of varchar(size) to store column length more than short limit.

2017-10-26 Thread Zhichao Zhang (JIRA)

Zhichao  Zhang created CARBONDATA-1625:
--

 Summary: Introduce new datatype of  varchar(size) to store column 
length more than short limit.
 Key: CARBONDATA-1625
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1625
 Project: CarbonData
  Issue Type: New Feature
  Components: file-format
Reporter: Zhichao  Zhang
Priority: Minor


I am using Spark 2.1 + CarbonData 1.2, and find that if 
enable.unsafe.sort=true, the length of bytes of column exceed 32768, it will 
load data unsuccessfully. 

My test code: 
val longStr = sb.toString()  // the getBytes length of longStr exceeds 
32768 
println(longStr.length()) 
println(longStr.getBytes("UTF-8").length) 

import spark.implicits._ 
val df1 = spark.sparkContext.parallelize(0 to 1000) 
  .map(x => ("a", x.toString(), longStr, x, x.toLong, x * 2)) 
  .toDF("stringField1", "stringField2", "stringField3", "intField", 
"longField", "int2Field") 
  
val df2 = spark.sparkContext.parallelize(1001 to 2000) 
  .map(x => ("b", x.toString(), (x % 2).toString(), x, x.toLong, x * 2)) 
  .toDF("stringField1", "stringField2", "stringField3", "intField", 
"longField", "int2Field") 
  
val df3 = df1.union(df2) 
val tableName = "study_carbondata_test" 
spark.sql(s"DROP TABLE IF EXISTS ${tableName} ").show() 
val sortScope = "LOCAL_SORT"   // LOCAL_SORT   GLOBAL_SORT 
spark.sql(s""" 
|  CREATE TABLE IF NOT EXISTS ${tableName} ( 
|stringField1  string, 
|stringField2  string, 
|stringField3  string, 
|intField  int, 
|longField bigint, 
|int2Field int 
|  ) 
|  STORED BY 'carbondata' 
|  TBLPROPERTIES('DICTIONARY_INCLUDE'='stringField1, stringField2', 
|'SORT_COLUMNS'='stringField1, stringField2, intField, 
longField', 
|'SORT_SCOPE'='${sortScope}', 
|'NO_INVERTED_INDEX'='stringField3, int2Field', 
|'TABLE_BLOCKSIZE'='64' 
|  ) 
   """.stripMargin) 
df3.write 
  .format("carbondata")   
  .option("tableName", "study_carbondata_test") 
  .option("compress", "true")  // just valid when tempCSV is true 
  .option("tempCSV", "false") 
  .option("single_pass", "true") 
  .mode(SaveMode.Append) 
  .save()

The error message: 
*java.lang.NegativeArraySizeException 
at 
org.apache.carbondata.processing.newflow.sort.unsafe.UnsafeCarbonRowPage.getRow(UnsafeCarbonRowPage.java:182)
 
at 
org.apache.carbondata.processing.newflow.sort.unsafe.holder.UnsafeInmemoryHolder.readRow(UnsafeInmemoryHolder.java:63)
 
at 
org.apache.carbondata.processing.newflow.sort.unsafe.merger.UnsafeSingleThreadFinalSortFilesMerger.startSorting(UnsafeSingleThreadFinalSortFilesMerger.java:114)
 
at 
org.apache.carbondata.processing.newflow.sort.unsafe.merger.UnsafeSingleThreadFinalSortFilesMerger.startFinalMerge(UnsafeSingleThreadFinalSortFilesMerger.java:81)
 
at 
org.apache.carbondata.processing.newflow.sort.impl.UnsafeParallelReadMergeSorterImpl.sort(UnsafeParallelReadMergeSorterImpl.java:105)
 
at 
org.apache.carbondata.processing.newflow.steps.SortProcessorStepImpl.execute(SortProcessorStepImpl.java:62)
 
at 
org.apache.carbondata.processing.newflow.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:87)
 
at 
org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:51)
 
at 
org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.(NewCarbonDataLoadRDD.scala:442)
 
at 
org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.internalCompute(NewCarbonDataLoadRDD.scala:405)
 
at 
org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:62) 
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) 
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)* 

Currently, the length of column was stored by short type.

Introduce new datatype of  varchar(size) to store column length more than short 
limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (CARBONDATA-1624) If SORT_SCOPE is non-GLOBAL_SORT with Spark, set 'carbon.number.of.cores.while.loading' dynamically as per the available executor cores

2017-10-26 Thread Zhichao Zhang (JIRA)

Zhichao  Zhang created CARBONDATA-1624:
--

 Summary: If SORT_SCOPE is non-GLOBAL_SORT with Spark, set 
'carbon.number.of.cores.while.loading' dynamically as per the available 
executor cores 
 Key: CARBONDATA-1624
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1624
 Project: CarbonData
  Issue Type: Improvement
  Components: data-load, spark-integration
Affects Versions: 1.3.0
Reporter: Zhichao  Zhang
Priority: Minor


If we are using carbondata + spark to load data, we can set 
carbon.number.of.cores.while.loading to the  number of executor cores. 

For example, when set the number of executor cores to 6, it shows that there 
are at 
least 6 cores per node for loading data, so we can set 
carbon.number.of.cores.while.loading to 6 automatically. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1418: [CARBONDATA-1573] Support Database Location Configur...

2017-10-26 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1418
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/685/



---

[GitHub] carbondata issue #1418: [CARBONDATA-1573] Support Database Location Configur...

2017-10-26 Thread mohammadshahidkhan

Github user mohammadshahidkhan commented on the issue:

https://github.com/apache/carbondata/pull/1418
  
retest this please


---

[jira] [Updated] (CARBONDATA-1573) Support Database Location Configuration while Creating Database/ Support Creation of carbon Table in the database location

2017-10-26 Thread Mohammad Shahid Khan (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Shahid Khan updated CARBONDATA-1573:
-
Summary: Support Database Location Configuration while Creating Database/ 
Support Creation of carbon Table in the database location  (was: Support 
Database Location Configuration while Creating Database)

> Support Database Location Configuration while Creating Database/ Support 
> Creation of carbon Table in the database location
> --
>
> Key: CARBONDATA-1573
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1573
> Project: CarbonData
>  Issue Type: New Feature
>  Components: core, data-load, data-query, hadoop-integration, 
> presto-integration, spark-integration
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
>
> Support Creation of carbon table at the database location
> *Please refer to  for Design and discussion:*
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Support-Database-Location-Configuration-while-Creating-Database-td23492.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (CARBONDATA-1622) Ignore empty line when load from csv

2017-10-26 Thread Weizhong (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weizhong closed CARBONDATA-1622.

Resolution: Duplicate

> Ignore empty line when load from csv
> 
>
> Key: CARBONDATA-1622
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1622
> Project: CarbonData
>  Issue Type: Improvement
>  Components: data-load
>Reporter: Weizhong
>Priority: Minor
>
> if csv have many empty line, then will store null for empty line on 
> CarbonData, but this is unused and waste space.
> for example:
> in csv the data is
> --
> 1,a
> // emptyline
> 2,b
> // emptyline
> --
> store to CarbonData is
> --
> 1,a
> null,null
> 2,b
> null,null
> --
> after change, then it will be
> --
> 1,a
> 2,b
> --



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (CARBONDATA-1621) Ignore empty line when load from csv

2017-10-26 Thread Weizhong (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weizhong closed CARBONDATA-1621.

Resolution: Duplicate

> Ignore empty line when load from csv
> 
>
> Key: CARBONDATA-1621
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1621
> Project: CarbonData
>  Issue Type: Improvement
>  Components: data-load
>Reporter: Weizhong
>Priority: Minor
>
> if csv have many empty line, then will store null for empty line on 
> CarbonData, but this is unused and waste space.
> for example:
> in csv the data is
> --
> 1,a
> // emptyline
> 2,b
> // emptyline
> --
> store to CarbonData is
> --
> 1,a
> null,null
> 2,b
> null,null
> --
> after change, then it will be
> --
> 1,a
> 2,b
> --



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1436: [WIP][CARBONDATA-1617] Merging carbonindex files wit...

2017-10-26 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1436
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1321/



---

[jira] [Closed] (CARBONDATA-1620) Ignore empty line when load from csv

2017-10-26 Thread Weizhong (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weizhong closed CARBONDATA-1620.

Resolution: Duplicate

> Ignore empty line when load from csv
> 
>
> Key: CARBONDATA-1620
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1620
> Project: CarbonData
>  Issue Type: Improvement
>  Components: data-load
>Reporter: Weizhong
>Priority: Minor
>
> if csv have many empty line, then will store null for empty line on 
> CarbonData, but this is unused and waste space.
> for example:
> in csv the data is
> --
> 1,a
> // emptyline
> 2,b
> // emptyline
> --
> store to CarbonData is
> --
> 1,a
> null,null
> 2,b
> null,null
> --
> after change, then it will be
> --
> 1,a
> 2,b
> --



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (CARBONDATA-1623) Ignore empty line when load from csv

2017-10-26 Thread Weizhong (JIRA)

Weizhong created CARBONDATA-1623:


 Summary: Ignore empty line when load from csv
 Key: CARBONDATA-1623
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1623
 Project: CarbonData
  Issue Type: Improvement
  Components: data-load
Reporter: Weizhong
Priority: Minor


if csv have many empty line, then will store null for empty line on CarbonData, 
but this is unused and waste space.
for example:
in csv the data is
--
1,a
// emptyline
2,b
// emptyline
--
store to CarbonData is
--
1,a
null,null
2,b
null,null
--
after change, then it will be
--
1,a
2,b
--




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (CARBONDATA-1623) Ignore empty line when load from csv

2017-10-26 Thread Weizhong (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weizhong reassigned CARBONDATA-1623:


Assignee: Weizhong

> Ignore empty line when load from csv
> 
>
> Key: CARBONDATA-1623
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1623
> Project: CarbonData
>  Issue Type: Improvement
>  Components: data-load
>Reporter: Weizhong
>Assignee: Weizhong
>Priority: Minor
>
> if csv have many empty line, then will store null for empty line on 
> CarbonData, but this is unused and waste space.
> for example:
> in csv the data is
> --
> 1,a
> // emptyline
> 2,b
> // emptyline
> --
> store to CarbonData is
> --
> 1,a
> null,null
> 2,b
> null,null
> --
> after change, then it will be
> --
> 1,a
> 2,b
> --



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (CARBONDATA-1620) Ignore empty line when load from csv

2017-10-26 Thread Weizhong (JIRA)

Weizhong created CARBONDATA-1620:


 Summary: Ignore empty line when load from csv
 Key: CARBONDATA-1620
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1620
 Project: CarbonData
  Issue Type: Improvement
  Components: data-load
Reporter: Weizhong
Priority: Minor


if csv have many empty line, then will store null for empty line on CarbonData, 
but this is unused and waste space.
for example:
in csv the data is
--
1,a
// emptyline
2,b
// emptyline
--
store to CarbonData is
--
1,a
null,null
2,b
null,null
--
after change, then it will be
--
1,a
2,b
--




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (CARBONDATA-1621) Ignore empty line when load from csv

2017-10-26 Thread Weizhong (JIRA)

Weizhong created CARBONDATA-1621:


 Summary: Ignore empty line when load from csv
 Key: CARBONDATA-1621
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1621
 Project: CarbonData
  Issue Type: Improvement
  Components: data-load
Reporter: Weizhong
Priority: Minor


if csv have many empty line, then will store null for empty line on CarbonData, 
but this is unused and waste space.
for example:
in csv the data is
--
1,a
// emptyline
2,b
// emptyline
--
store to CarbonData is
--
1,a
null,null
2,b
null,null
--
after change, then it will be
--
1,a
2,b
--




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (CARBONDATA-1622) Ignore empty line when load from csv

2017-10-26 Thread Weizhong (JIRA)

Weizhong created CARBONDATA-1622:


 Summary: Ignore empty line when load from csv
 Key: CARBONDATA-1622
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1622
 Project: CarbonData
  Issue Type: Improvement
  Components: data-load
Reporter: Weizhong
Priority: Minor


if csv have many empty line, then will store null for empty line on CarbonData, 
but this is unused and waste space.
for example:
in csv the data is
--
1,a
// emptyline
2,b
// emptyline
--
store to CarbonData is
--
1,a
null,null
2,b
null,null
--
after change, then it will be
--
1,a
2,b
--




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

43 matches

Mail list logo