date:20161012

[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570931#comment-15570931
 ] 

ASF GitHub Bot commented on CARBONDATA-297:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/229#discussion_r83147351
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/iterators/RecordReaderIterator.java
 ---
@@ -0,0 +1,40 @@
+package org.apache.carbondata.processing.newflow.iterators;
+
+import java.io.IOException;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+
+import org.apache.hadoop.mapred.RecordReader;
+
+/**
+ * This iterator iterates RecordReader.
+ */
+public class RecordReaderIterator extends CarbonIterator {
--- End diff --

It is used for iterating RecordReader. I can move it to carbon-hadoop 
module but processing module need to be dependent on it. Already processing 
module is dependent on hadoop module so it becomes dependent on each other.


> 2. Add interfaces for data loading.
> ---
>
> Key: CARBONDATA-297
> URL: https://issues.apache.org/jira/browse/CARBONDATA-297
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.2.0-incubating
>
>
> Add the major interface classes for data loading so that the following jiras 
> can use this interfaces to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570920#comment-15570920
 ] 

ASF GitHub Bot commented on CARBONDATA-297:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/229#discussion_r83147018
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/DataLoadProcessorStep.java
 ---
@@ -0,0 +1,40 @@
+package org.apache.carbondata.processing.newflow;
+
+import java.util.Iterator;
+
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+
+/**
+ * This base interface for data loading. It can do transformation jobs as 
per the implementation.
+ *
+ */
+public interface DataLoadProcessorStep {
+
+  /**
+   * The output meta for this step. The data returns from this step is as 
per this meta.
+   * @return
+   */
+  DataField[] getOutput();
+
+  /**
+   * Intialization process for this step.
+   * @param configuration
+   * @param child
+   * @throws CarbonDataLoadingException
+   */
+  void intialize(CarbonDataLoadConfiguration configuration, 
DataLoadProcessorStep child) throws
+  CarbonDataLoadingException;
+
+  /**
+   * Tranform the data as per the implemetation.
+   * @return Iterator of data
+   * @throws CarbonDataLoadingException
+   */
+  Iterator execute() throws CarbonDataLoadingException;
+
+  /**
+   * Any closing of resources after step execution can be done here.
+   */
+  void finish();
--- End diff --

Ok. I will add close method along with finish method


> 2. Add interfaces for data loading.
> ---
>
> Key: CARBONDATA-297
> URL: https://issues.apache.org/jira/browse/CARBONDATA-297
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.2.0-incubating
>
>
> Add the major interface classes for data loading so that the following jiras 
> can use this interfaces to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-285) Use path parameter in Spark datasource API

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570916#comment-15570916
 ] 

ASF GitHub Bot commented on CARBONDATA-285:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/212#discussion_r83146927
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceRelation.scala
 ---
@@ -55,18 +55,11 @@ class CarbonSource extends RelationProvider
   override def createRelation(
   sqlContext: SQLContext,
   parameters: Map[String, String]): BaseRelation = {
-// if path is provided we can directly create Hadoop relation. \
-// Otherwise create datasource relation
-parameters.get("path") match {
-  case Some(path) => CarbonDatasourceHadoopRelation(sqlContext, 
Array(path), parameters, None)
-  case _ =>
-val options = new CarbonOption(parameters)
-val tableIdentifier = options.tableIdentifier.split("""\.""").toSeq
-val identifier = tableIdentifier match {
-  case Seq(name) => TableIdentifier(name, None)
-  case Seq(db, name) => TableIdentifier(name, Some(db))
-}
-CarbonDatasourceRelation(identifier, None)(sqlContext)
+val options = new CarbonOption(parameters)
+if (sqlContext.isInstanceOf[CarbonContext]) {
--- End diff --

sorry, yes `carboncontext.load(path)` cannot work now right?


> Use path parameter in Spark datasource API
> --
>
> Key: CARBONDATA-285
> URL: https://issues.apache.org/jira/browse/CARBONDATA-285
> Project: CarbonData
>  Issue Type: Improvement
>  Components: spark-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> Currently, when using carbon with spark datasource API, it need to give 
> database name and table name as parameter, it is not the normal way of 
> datasource API usage. In this PR, database name and table name is not 
> required to give, user need to specify the `path` parameter (indicating the 
> path to table folder) only when using datasource API



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CARBONDATA-308) Unify CarbonScanRDD and CarbonHadoopFSRDD

2016-10-12 Thread Jacky Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li updated CARBONDATA-308:

Description: 
Take CarbonScanRDD as the target RDD, modify as following:

1. In driver side, only getSplit is required, so only filter condition is 
required, no need to create full QueryModel object, so we can move creation of 
QueryModel from driver side to executor side.
2. use CarbonInputFormat.createRecordReader in CarbonScanRDD.compute instead of 
use QueryExecutor directly


  was:
Take CarbonScanRDD as the target RDD, modify as following:

In driver side, only getSplit is required, so only filter condition is 
required, no need to create full QueryModel object, so we can move creation of 
QueryModel from driver side to executor side



> Unify CarbonScanRDD and CarbonHadoopFSRDD
> -
>
> Key: CARBONDATA-308
> URL: https://issues.apache.org/jira/browse/CARBONDATA-308
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: spark-integration
>Reporter: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> Take CarbonScanRDD as the target RDD, modify as following:
> 1. In driver side, only getSplit is required, so only filter condition is 
> required, no need to create full QueryModel object, so we can move creation 
> of QueryModel from driver side to executor side.
> 2. use CarbonInputFormat.createRecordReader in CarbonScanRDD.compute instead 
> of use QueryExecutor directly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CARBONDATA-314) Make CarbonContext to use standard Datasource strategy

2016-10-12 Thread Jacky Li (JIRA)

Jacky Li created CARBONDATA-314:
---

 Summary: Make CarbonContext to use standard Datasource strategy
 Key: CARBONDATA-314
 URL: https://issues.apache.org/jira/browse/CARBONDATA-314
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Jacky Li


Move the dictionary stratey out of CarbonTableScan, make a separate strategy 
for it.
Then make CarbonContext use standard datasource strategy for creation of 
relation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CARBONDATA-313) Update CarbonSource to use CarbonDatasourceHadoopRelation

2016-10-12 Thread Jacky Li (JIRA)

Jacky Li created CARBONDATA-313:
---

 Summary: Update CarbonSource to use CarbonDatasourceHadoopRelation
 Key: CARBONDATA-313
 URL: https://issues.apache.org/jira/browse/CARBONDATA-313
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Jacky Li


Change CarbonSource to use CarbonDatasourceHadoopRelation only, remove 
extension of BaseRelation, extend from HadoopFsRelation only



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CARBONDATA-312) Unify two datasource: CarbonDatasourceHadoopRelation and CarbonDatasourceRelation

2016-10-12 Thread Jacky Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li updated CARBONDATA-312:

Description: 
Take CarbonDatasourceHadoopRelation as the target datasource definition, after 
that, CarbonContext can use standard Datasource strategy

In this Issue, change CarbonDatasourceHadoopRelation to use CarbonScanRDD in 
buildScan function.

  was:
Take CarbonDatasourceHadoopRelation as the target datasource definition, after 
that, CarbonContext can use standard Datasource strategy

In this Issue, following change is required:
1. Move the dictionary stratey out of CarbonTableScan, make a separate strategy 
for it.
2. CarbonDatasourceHadoopRelation should use CarbonScanRDD in buildScan 
function.
3. Change CarbonSource to use CarbonDatasourceHadoopRelation only, remove 
extension of BaseRelation, extend from HadoopFsRelation only.


> Unify two datasource: CarbonDatasourceHadoopRelation and 
> CarbonDatasourceRelation
> -
>
> Key: CARBONDATA-312
> URL: https://issues.apache.org/jira/browse/CARBONDATA-312
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: spark-integration
>Reporter: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> Take CarbonDatasourceHadoopRelation as the target datasource definition, 
> after that, CarbonContext can use standard Datasource strategy
> In this Issue, change CarbonDatasourceHadoopRelation to use CarbonScanRDD in 
> buildScan function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CARBONDATA-307) Support executor side scan using CarbonInputFormat

2016-10-12 Thread Jacky Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li updated CARBONDATA-307:

Description: 
Currently, there are two read path in carbon-spark module: 
1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use 
QueryExecutor for scan.

2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => 
CarbonRecordReader => QueryExecutor
In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and 
scan

Because of this, there are unnecessary duplicate code, they need to be unified.


  was:
Currently, there are two read path in carbon-spark module: 
1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use 
QueryExecutor for scan.

2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => 
CarbonRecordReader
In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and 
scan

Because of this, there are unnecessary duplicate code, they need to be unified.



> Support executor side scan using CarbonInputFormat
> --
>
> Key: CARBONDATA-307
> URL: https://issues.apache.org/jira/browse/CARBONDATA-307
> Project: CarbonData
>  Issue Type: Improvement
>  Components: spark-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> Currently, there are two read path in carbon-spark module: 
> 1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
> In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use 
> QueryExecutor for scan.
> 2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => 
> CarbonRecordReader => QueryExecutor
> In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split 
> and scan
> Because of this, there are unnecessary duplicate code, they need to be 
> unified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CARBONDATA-307) Support executor side scan using CarbonInputFormat

2016-10-12 Thread Jacky Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li updated CARBONDATA-307:

Description: 
Currently, there are two read path in carbon-spark module: 
1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use 
QueryExecutor for scan.

2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => 
CarbonRecordReader
In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and 
scan

Because of this, there are unnecessary duplicate code, they need to be unified.


  was:
Currently, there are two read path in carbon-spark module: 
1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use 
QueryExecutor for scan.

2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => 
CarbonRecordReader
In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and 
scan

It create unnecessary duplicate code, they need to be unified.



> Support executor side scan using CarbonInputFormat
> --
>
> Key: CARBONDATA-307
> URL: https://issues.apache.org/jira/browse/CARBONDATA-307
> Project: CarbonData
>  Issue Type: Improvement
>  Components: spark-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> Currently, there are two read path in carbon-spark module: 
> 1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
> In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use 
> QueryExecutor for scan.
> 2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => 
> CarbonRecordReader
> In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split 
> and scan
> Because of this, there are unnecessary duplicate code, they need to be 
> unified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CARBONDATA-312) Unify two datasource: CarbonDatasourceHadoopRelation and CarbonDatasourceRelation

2016-10-12 Thread Jacky Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li updated CARBONDATA-312:

Description: 
Take CarbonDatasourceHadoopRelation as the target datasource definition, after 
that, CarbonContext can use standard Datasource strategy

In this Issue, following change is required:
1. Move the dictionary stratey out of CarbonTableScan, make a separate strategy 
for it.
2. CarbonDatasourceHadoopRelation should use CarbonScanRDD in buildScan 
function.
3. Change CarbonSource to use CarbonDatasourceHadoopRelation only, remove 
extension of BaseRelation, extend from HadoopFsRelation only.

  was:
Take CarbonDatasourceHadoopRelation as the target datasource definition, after 
that, CarbonContext can use standard Datasource strategy

In this Issue, following change is required:
1. Move the dictionary stratey out of CarbonTableScan, make a separate strategy 
for it.
2. CarbonDatasourceHadoopRelation should use CarbonScanRDD in buildScan 
function.
3. Change CarbonSource to use CarbonDatasourceHadoopRelation only


> Unify two datasource: CarbonDatasourceHadoopRelation and 
> CarbonDatasourceRelation
> -
>
> Key: CARBONDATA-312
> URL: https://issues.apache.org/jira/browse/CARBONDATA-312
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: spark-integration
>Reporter: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> Take CarbonDatasourceHadoopRelation as the target datasource definition, 
> after that, CarbonContext can use standard Datasource strategy
> In this Issue, following change is required:
> 1. Move the dictionary stratey out of CarbonTableScan, make a separate 
> strategy for it.
> 2. CarbonDatasourceHadoopRelation should use CarbonScanRDD in buildScan 
> function.
> 3. Change CarbonSource to use CarbonDatasourceHadoopRelation only, remove 
> extension of BaseRelation, extend from HadoopFsRelation only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-306) block size info should be show in Desc Formatted and executor log

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570706#comment-15570706
 ] 

ASF GitHub Bot commented on CARBONDATA-306:
---

Github user Zhangshunyu commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/230#discussion_r83139950
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java
 ---
@@ -252,6 +252,9 @@ private static long getMaxOfBlockAndFileSize(long 
blockSize, long fileSize) {
 if (remainder > 0) {
   maxSize = maxSize + HDFS_CHECKSUM_LENGTH - remainder;
 }
+LOGGER.info("The configured block size is " + blockSize + " byte, " +
--- End diff --

@Jay357089 I think this is a good idea to extract ConvertByteToReadable as 
a method, since it can be used in many logs, especially for analyzing 
performance.


> block size info should be show in Desc Formatted and executor log
> -
>
> Key: CARBONDATA-306
> URL: https://issues.apache.org/jira/browse/CARBONDATA-306
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jay
>Priority: Minor
>
> when run desc formatted command, the table block size should be show, as well 
> as in executor log when run load command



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-306) block size info should be show in Desc Formatted and executor log

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570699#comment-15570699
 ] 

ASF GitHub Bot commented on CARBONDATA-306:
---

Github user Jay357089 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/230#discussion_r83139600
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java
 ---
@@ -252,6 +252,9 @@ private static long getMaxOfBlockAndFileSize(long 
blockSize, long fileSize) {
 if (remainder > 0) {
   maxSize = maxSize + HDFS_CHECKSUM_LENGTH - remainder;
 }
+LOGGER.info("The configured block size is " + blockSize + " byte, " +
--- End diff --

@jackylk  Maybe i should extract if .. else part to a method called 
ConvertByteToReadable, what's your opinion?


> block size info should be show in Desc Formatted and executor log
> -
>
> Key: CARBONDATA-306
> URL: https://issues.apache.org/jira/browse/CARBONDATA-306
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jay
>Priority: Minor
>
> when run desc formatted command, the table block size should be show, as well 
> as in executor log when run load command



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CARBONDATA-312) Unify two datasource: CarbonDatasourceHadoopRelation and CarbonDatasourceRelation

2016-10-12 Thread Jacky Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li updated CARBONDATA-312:

Description: 
Take CarbonDatasourceHadoopRelation as the target datasource definition, after 
that, CarbonContext can use standard Datasource strategy

In this Issue, following change is required:
1. Move the dictionary stratey out of CarbonTableScan, make a separate strategy 
for it.
2. CarbonDatasourceHadoopRelation should use CarbonScanRDD in buildScan 
function.
3. Change CarbonSource to use CarbonDatasourceHadoopRelation only

  was:
Take CarbonDatasourceHadoopRelation as the target datasource definition, after 
that, CarbonContext can use standard Datasource strategy

In this Issue, following change is required:
1. Move the dictionary stratey out of CarbonTableScan, make a separate strategy 
for it.
2. CarbonDatasourceHadoopRelation should use CarbonScanRDD in buildScan 
function.


> Unify two datasource: CarbonDatasourceHadoopRelation and 
> CarbonDatasourceRelation
> -
>
> Key: CARBONDATA-312
> URL: https://issues.apache.org/jira/browse/CARBONDATA-312
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: spark-integration
>Reporter: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> Take CarbonDatasourceHadoopRelation as the target datasource definition, 
> after that, CarbonContext can use standard Datasource strategy
> In this Issue, following change is required:
> 1. Move the dictionary stratey out of CarbonTableScan, make a separate 
> strategy for it.
> 2. CarbonDatasourceHadoopRelation should use CarbonScanRDD in buildScan 
> function.
> 3. Change CarbonSource to use CarbonDatasourceHadoopRelation only



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-310) Compilation failed when using spark 1.6.2

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570690#comment-15570690
 ] 

ASF GitHub Bot commented on CARBONDATA-310:
---

GitHub user foryou2030 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/232

[CARBONDATA-310]Fixed compilation failure when using spark 1.6.2

# Why raise this pr?
Compilation failed when using spark 1.6.2, because class not found: 
AggregateExpression
# How to solve?
Once Removing the import "import 
org.apache.spark.sql.catalyst.expressions.aggregate._" will cause compilation 
failure when using Spark 1.6.2, in which AggregateExpression is moved to 
subpackage "aggregate". So neeed changing it back.

Thanks for you remind, @harperjiang

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/foryou2030/incubator-carbondata agg_ex

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/232.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #232


commit ee4f6832d893c6ac99e1694b607b6f2d38ec9231
Author: foryou2030 
Date:   2016-10-13T03:17:38Z

fix compile on spark1.6.2




> Compilation failed when using spark 1.6.2
> -
>
> Key: CARBONDATA-310
> URL: https://issues.apache.org/jira/browse/CARBONDATA-310
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Gin-zhj
>Assignee: Gin-zhj
>Priority: Minor
>
> Compilation failed when using spark 1.6.2,
> caused by class not found: AggregateExpression



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-311) Log the data size of blocklet during data load.

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570686#comment-15570686
 ] 

ASF GitHub Bot commented on CARBONDATA-311:
---

GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/231

[CARBONDATA-311]Log the data size of blocklet during data load.

## Why raise this pr?
The blocklet size is an important parameter for analyzing data load and 
query, this info should be logged.
## How to test?
Pass all the test case.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata logblocklet

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/231.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #231


commit a110504f58e688e42223e896f7a1cf729463cf9d
Author: Zhangshunyu 
Date:   2016-10-13T03:17:21Z

Log the data size of each blocklet




> Log the data size of blocklet during data load.
> ---
>
> Key: CARBONDATA-311
> URL: https://issues.apache.org/jira/browse/CARBONDATA-311
> Project: CarbonData
>  Issue Type: Improvement
>Affects Versions: 0.1.1-incubating
>Reporter: zhangshunyu
>Assignee: zhangshunyu
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> Log the data size of blocklet during data load.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CARBONDATA-307) Support executor side scan using CarbonInputFormat

2016-10-12 Thread Jacky Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li updated CARBONDATA-307:

Summary: Support executor side scan using CarbonInputFormat  (was: Support 
full functionality in CarbonInputFormat)

> Support executor side scan using CarbonInputFormat
> --
>
> Key: CARBONDATA-307
> URL: https://issues.apache.org/jira/browse/CARBONDATA-307
> Project: CarbonData
>  Issue Type: Improvement
>  Components: spark-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> Currently, there are two read path in carbon-spark module: 
> 1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
> In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use 
> QueryExecutor for scan.
> 2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => 
> CarbonRecordReader
> In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split 
> and scan
> It create unnecessary duplicate code, they need to be unified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CARBONDATA-310) Compilation failed when using spark 1.6.2

2016-10-12 Thread Gin-zhj (JIRA)

Gin-zhj created CARBONDATA-310:
--

 Summary: Compilation failed when using spark 1.6.2
 Key: CARBONDATA-310
 URL: https://issues.apache.org/jira/browse/CARBONDATA-310
 Project: CarbonData
  Issue Type: Bug
Reporter: Gin-zhj
Assignee: Gin-zhj
Priority: Minor


Compilation failed when using spark 1.6.2,
caused by class not found: AggregateExpression



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CARBONDATA-312) Unify two datasource: CarbonDatasourceHadoopRelation and CarbonDatasourceRelation

2016-10-12 Thread Jacky Li (JIRA)

Jacky Li created CARBONDATA-312:
---

 Summary: Unify two datasource: CarbonDatasourceHadoopRelation and 
CarbonDatasourceRelation
 Key: CARBONDATA-312
 URL: https://issues.apache.org/jira/browse/CARBONDATA-312
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Jacky Li


Take CarbonDatasourceHadoopRelation as the target datasource definition, after 
that, CarbonContext can use standard Datasource strategy

In this Issue, following change is required:
1. Move the dictionary stratey out of CarbonTableScan, make a separate strategy 
for it.
2. CarbonDatasourceHadoopRelation should use CarbonScanRDD in buildScan 
function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CARBONDATA-311) Log the data size of blocklet during data load.

2016-10-12 Thread zhangshunyu (JIRA)

zhangshunyu created CARBONDATA-311:
--

 Summary: Log the data size of blocklet during data load.
 Key: CARBONDATA-311
 URL: https://issues.apache.org/jira/browse/CARBONDATA-311
 Project: CarbonData
  Issue Type: Improvement
Affects Versions: 0.1.1-incubating
Reporter: zhangshunyu
Assignee: zhangshunyu
Priority: Minor
 Fix For: 0.2.0-incubating


Log the data size of blocklet during data load.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CARBONDATA-308) Unify CarbonScanRDD and CarbonHadoopFSRDD

2016-10-12 Thread Jacky Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li updated CARBONDATA-308:

Description: 
Take CarbonScanRDD as the target RDD, modify as following:

In driver side, only getSplit is required, so only filter condition is 
required, no need to create full QueryModel object, so we can move creation of 
QueryModel from driver side to executor side

Summary: Unify CarbonScanRDD and CarbonHadoopFSRDD  (was: Support 
multiple segment in CarbonHadoopFSRDD)

> Unify CarbonScanRDD and CarbonHadoopFSRDD
> -
>
> Key: CARBONDATA-308
> URL: https://issues.apache.org/jira/browse/CARBONDATA-308
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: spark-integration
>Reporter: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> Take CarbonScanRDD as the target RDD, modify as following:
> In driver side, only getSplit is required, so only filter condition is 
> required, no need to create full QueryModel object, so we can move creation 
> of QueryModel from driver side to executor side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CARBONDATA-309) Support two types of ReadSupport in CarbonRecordReader

2016-10-12 Thread Jacky Li (JIRA)

Jacky Li created CARBONDATA-309:
---

 Summary: Support two types of ReadSupport in CarbonRecordReader
 Key: CARBONDATA-309
 URL: https://issues.apache.org/jira/browse/CARBONDATA-309
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Jacky Li


CarbonRecordReader should support late decode based on passed Configuration
A config indicating late decode need to be added in CarbonInputFormat for this 
purpose. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CARBONDATA-308) Support multiple segment in CarbonHadoopFSRDD

2016-10-12 Thread Jacky Li (JIRA)

Jacky Li created CARBONDATA-308:
---

 Summary: Support multiple segment in CarbonHadoopFSRDD
 Key: CARBONDATA-308
 URL: https://issues.apache.org/jira/browse/CARBONDATA-308
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Jacky Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CARBONDATA-307) Support full functionality in CarbonInputFormat

2016-10-12 Thread Jacky Li (JIRA)

Jacky Li created CARBONDATA-307:
---

 Summary: Support full functionality in CarbonInputFormat
 Key: CARBONDATA-307
 URL: https://issues.apache.org/jira/browse/CARBONDATA-307
 Project: CarbonData
  Issue Type: Improvement
  Components: spark-integration
Affects Versions: 0.1.0-incubating
Reporter: Jacky Li
 Fix For: 0.2.0-incubating


Currently, there are two read path in carbon-spark module: 
1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use 
QueryExecutor for scan.

2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => 
CarbonRecordReader
In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and 
scan

It create unnecessary duplicate code, they need to be unified.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-292) add COLUMNDICT operation info in DML operation guide

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570540#comment-15570540
 ] 

ASF GitHub Bot commented on CARBONDATA-292:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/223#discussion_r83133189
  
--- Diff: docs/DML-Operations-on-Carbon.md ---
@@ -104,8 +109,10 @@ Following are the options that can be used in load 
data:
  'MULTILINE'='true', 'ESCAPECHAR'='\', 
  'COMPLEX_DELIMITER_LEVEL_1'='$', 
  'COMPLEX_DELIMITER_LEVEL_2'=':',
- 
'ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary'
+ 
'ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary',
+ 
'COLUMNDICT'='empno:/dictFilePath/empnoDict.csv, 
empname:/dictFilePath/empnameDict.csv'
--- End diff --

No, I mean just delete it from the Example section. 
And that node should be added in the option expanation section.


> add COLUMNDICT operation info in DML operation guide
> 
>
> Key: CARBONDATA-292
> URL: https://issues.apache.org/jira/browse/CARBONDATA-292
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jay
>Priority: Minor
>
> there is no COLUMNDICT operation guide in DML-Operations-on-Carbon.md, so 
> need to add. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-292) add COLUMNDICT operation info in DML operation guide

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570512#comment-15570512
 ] 

ASF GitHub Bot commented on CARBONDATA-292:
---

Github user Jay357089 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/223#discussion_r83132221
  
--- Diff: docs/DML-Operations-on-Carbon.md ---
@@ -104,8 +109,10 @@ Following are the options that can be used in load 
data:
  'MULTILINE'='true', 'ESCAPECHAR'='\', 
  'COMPLEX_DELIMITER_LEVEL_1'='$', 
  'COMPLEX_DELIMITER_LEVEL_2'=':',
- 
'ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary'
+ 
'ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary',
+ 
'COLUMNDICT'='empno:/dictFilePath/empnoDict.csv, 
empname:/dictFilePath/empnameDict.csv'
--- End diff --

i have give Note in the below.  if it is not enough, should i delete this 
option or close this pr?


> add COLUMNDICT operation info in DML operation guide
> 
>
> Key: CARBONDATA-292
> URL: https://issues.apache.org/jira/browse/CARBONDATA-292
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jay
>Priority: Minor
>
> there is no COLUMNDICT operation guide in DML-Operations-on-Carbon.md, so 
> need to add. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-304) Load data failure when set table_blocksize=2048

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570511#comment-15570511
 ] 

ASF GitHub Bot commented on CARBONDATA-304:
---

Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/227#discussion_r83132187
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java
 ---
@@ -197,8 +197,9 @@ public AbstractFactDataWriter(String storeLocation, int 
measureCount, int mdKeyL
 blockIndexInfoList = new ArrayList<>();
 // get max file size;
 CarbonProperties propInstance = CarbonProperties.getInstance();
-this.fileSizeInBytes = blocksize * 
CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR
-* CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR * 1L;
+// if blocksize=2048, then 2048*1024*1024 will beyond the range of Int
+this.fileSizeInBytes = 1L * blocksize * 
CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR
--- End diff --

fixed


> Load data failure when set table_blocksize=2048
> ---
>
> Key: CARBONDATA-304
> URL: https://issues.apache.org/jira/browse/CARBONDATA-304
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Gin-zhj
>Assignee: Gin-zhj
>
> First ,create a table with table_blocksize=2048
> CREATE TABLE IF NOT EXISTS t3 (ID Int, date Timestamp, country String, name 
> String, phonetype String, serialname String, salary Int) STORED BY 
> 'carbondata' TBLPROPERTIES('table_blocksize'='2048');
> Then load data, failure and catch exception:
> org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException:
>  Problem while copying file from local store to carbon store



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570458#comment-15570458
 ] 

ASF GitHub Bot commented on CARBONDATA-297:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/229#discussion_r83130319
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/DataLoadProcessorStep.java
 ---
@@ -0,0 +1,40 @@
+package org.apache.carbondata.processing.newflow;
+
+import java.util.Iterator;
+
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+
+/**
+ * This base interface for data loading. It can do transformation jobs as 
per the implementation.
+ *
+ */
+public interface DataLoadProcessorStep {
+
+  /**
+   * The output meta for this step. The data returns from this step is as 
per this meta.
+   * @return
+   */
+  DataField[] getOutput();
+
+  /**
+   * Intialization process for this step.
+   * @param configuration
+   * @param child
+   * @throws CarbonDataLoadingException
+   */
+  void intialize(CarbonDataLoadConfiguration configuration, 
DataLoadProcessorStep child) throws
+  CarbonDataLoadingException;
+
+  /**
+   * Tranform the data as per the implemetation.
+   * @return Iterator of data
+   * @throws CarbonDataLoadingException
+   */
+  Iterator execute() throws CarbonDataLoadingException;
+
+  /**
+   * Any closing of resources after step execution can be done here.
+   */
+  void finish();
--- End diff --

It should be called in both failure and success cases. So i will rename it 
to `close`


> 2. Add interfaces for data loading.
> ---
>
> Key: CARBONDATA-297
> URL: https://issues.apache.org/jira/browse/CARBONDATA-297
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.2.0-incubating
>
>
> Add the major interface classes for data loading so that the following jiras 
> can use this interfaces to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570455#comment-15570455
 ] 

ASF GitHub Bot commented on CARBONDATA-297:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/229#discussion_r83130123
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/DataLoadProcessorStep.java
 ---
@@ -0,0 +1,40 @@
+package org.apache.carbondata.processing.newflow;
+
+import java.util.Iterator;
+
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+
+/**
+ * This base interface for data loading. It can do transformation jobs as 
per the implementation.
+ *
+ */
+public interface DataLoadProcessorStep {
--- End diff --

ok


> 2. Add interfaces for data loading.
> ---
>
> Key: CARBONDATA-297
> URL: https://issues.apache.org/jira/browse/CARBONDATA-297
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.2.0-incubating
>
>
> Add the major interface classes for data loading so that the following jiras 
> can use this interfaces to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570434#comment-15570434
 ] 

ASF GitHub Bot commented on CARBONDATA-297:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/229#discussion_r83129418
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/CarbonDataLoadConfiguration.java
 ---
@@ -0,0 +1,185 @@
+package org.apache.carbondata.processing.newflow;
+
+import java.util.Iterator;
+
+import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier;
+
+public class CarbonDataLoadConfiguration {
--- End diff --

Ok. I will keep the configuration but only keep important and global 
configurations I will keep in it, remaining configurations we can move to `Map` 
and keep it inside configuartion it self.  what do you say?


> 2. Add interfaces for data loading.
> ---
>
> Key: CARBONDATA-297
> URL: https://issues.apache.org/jira/browse/CARBONDATA-297
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.2.0-incubating
>
>
> Add the major interface classes for data loading so that the following jiras 
> can use this interfaces to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570425#comment-15570425
 ] 

ASF GitHub Bot commented on CARBONDATA-297:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/229#discussion_r83129008
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/CarbonDataLoadConfiguration.java
 ---
@@ -0,0 +1,185 @@
+package org.apache.carbondata.processing.newflow;
--- End diff --

ok


> 2. Add interfaces for data loading.
> ---
>
> Key: CARBONDATA-297
> URL: https://issues.apache.org/jira/browse/CARBONDATA-297
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.2.0-incubating
>
>
> Add the major interface classes for data loading so that the following jiras 
> can use this interfaces to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-285) Use path parameter in Spark datasource API

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570315#comment-15570315
 ] 

ASF GitHub Bot commented on CARBONDATA-285:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/212#discussion_r83123943
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceRelation.scala
 ---
@@ -55,18 +55,11 @@ class CarbonSource extends RelationProvider
   override def createRelation(
   sqlContext: SQLContext,
   parameters: Map[String, String]): BaseRelation = {
-// if path is provided we can directly create Hadoop relation. \
-// Otherwise create datasource relation
-parameters.get("path") match {
-  case Some(path) => CarbonDatasourceHadoopRelation(sqlContext, 
Array(path), parameters, None)
-  case _ =>
-val options = new CarbonOption(parameters)
-val tableIdentifier = options.tableIdentifier.split("""\.""").toSeq
-val identifier = tableIdentifier match {
-  case Seq(name) => TableIdentifier(name, None)
-  case Seq(db, name) => TableIdentifier(name, Some(db))
-}
-CarbonDatasourceRelation(identifier, None)(sqlContext)
+val options = new CarbonOption(parameters)
+if (sqlContext.isInstanceOf[CarbonContext]) {
--- End diff --

There is no `load` method in dataframe, only in context class.


> Use path parameter in Spark datasource API
> --
>
> Key: CARBONDATA-285
> URL: https://issues.apache.org/jira/browse/CARBONDATA-285
> Project: CarbonData
>  Issue Type: Improvement
>  Components: spark-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> Currently, when using carbon with spark datasource API, it need to give 
> database name and table name as parameter, it is not the normal way of 
> datasource API usage. In this PR, database name and table name is not 
> required to give, user need to specify the `path` parameter (indicating the 
> path to table folder) only when using datasource API



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569219#comment-15569219
 ] 

ASF GitHub Bot commented on CARBONDATA-297:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/229#discussion_r83049479
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/DataLoadProcessorStep.java
 ---
@@ -0,0 +1,40 @@
+package org.apache.carbondata.processing.newflow;
+
+import java.util.Iterator;
+
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+
+/**
+ * This base interface for data loading. It can do transformation jobs as 
per the implementation.
+ *
+ */
+public interface DataLoadProcessorStep {
+
+  /**
+   * The output meta for this step. The data returns from this step is as 
per this meta.
+   * @return
+   */
+  DataField[] getOutput();
+
+  /**
+   * Intialization process for this step.
+   * @param configuration
+   * @param child
+   * @throws CarbonDataLoadingException
+   */
+  void intialize(CarbonDataLoadConfiguration configuration, 
DataLoadProcessorStep child) throws
+  CarbonDataLoadingException;
+
+  /**
+   * Tranform the data as per the implemetation.
+   * @return Iterator of data
+   * @throws CarbonDataLoadingException
+   */
+  Iterator execute() throws CarbonDataLoadingException;
--- End diff --

For suppose if we are loading 50GB of csv files and each HDFS block size is 
256MB then total number of partitions are 200. If we allow one task per 
partition then it would be 200 tasks. In carbondata one btree is created for 
each task. So if we allow all 200 tasks then it would be massively 200 btrees 
and it is not effective both in performance and memory wise. 
That is the reason why we pool multiple blocks per task in the current 
kettle implementation. And these blocks are processed parallely. We can take 
the same way and use iterator for each thread and returns array of iterator.

What do you mean by datanode-scope sorting? how to synchronize between 
multiple tasks?


> 2. Add interfaces for data loading.
> ---
>
> Key: CARBONDATA-297
> URL: https://issues.apache.org/jira/browse/CARBONDATA-297
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.2.0-incubating
>
>
> Add the major interface classes for data loading so that the following jiras 
> can use this interfaces to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-276) Add trim option

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569112#comment-15569112
 ] 

ASF GitHub Bot commented on CARBONDATA-276:
---

Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r83039531
  
--- Diff: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestDataLoadWithTrimOption.scala
 ---
@@ -0,0 +1,78 @@
+package org.apache.carbondata.spark.testsuite.dataload
+
+import java.io.File
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+import org.apache.spark.sql.common.util.CarbonHiveContext._
+import org.apache.spark.sql.common.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+import org.apache.spark.sql.Row
+
+/**
+  * Created by x00381807 on 2016/9/26.
--- End diff --

Oh, my fault


> Add trim option
> ---
>
> Key: CARBONDATA-276
> URL: https://issues.apache.org/jira/browse/CARBONDATA-276
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Fix a bug and add trim option.
> Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query 
> result is null. This is because the dictionary ignore the LeadingWhiteSpace 
> and TrailingWhiteSpace and the csvInput dose not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569065#comment-15569065
 ] 

ASF GitHub Bot commented on CARBONDATA-297:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/229#discussion_r83033746
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/iterators/CarbonArrayWritable.java
 ---
@@ -0,0 +1,51 @@
+package org.apache.carbondata.processing.newflow.iterators;
+
+import java.io.DataInput;
+import java.io.DataOutput;
+import java.io.IOException;
+import java.nio.charset.Charset;
+import java.util.Arrays;
+
+import org.apache.hadoop.io.Writable;
+
+/**
+ * It is hadoop's writable value wrapper.
+ */
+public class CarbonArrayWritable implements Writable {
--- End diff --

Why this is carbon-processing but not carbon-hadoop module? All hadoop 
related class should be in carbon-hadoop module


> 2. Add interfaces for data loading.
> ---
>
> Key: CARBONDATA-297
> URL: https://issues.apache.org/jira/browse/CARBONDATA-297
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.2.0-incubating
>
>
> Add the major interface classes for data loading so that the following jiras 
> can use this interfaces to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-281) improve the test cases in LCM module.

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569082#comment-15569082
 ] 

ASF GitHub Bot commented on CARBONDATA-281:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/205


> improve the test cases in LCM module.
> -
>
> Key: CARBONDATA-281
> URL: https://issues.apache.org/jira/browse/CARBONDATA-281
> Project: CarbonData
>  Issue Type: Improvement
>  Components: spark-integration
>Affects Versions: 0.1.0-incubating
>Reporter: ravikiran
>Assignee: ravikiran
>Priority: Minor
>
> improving the test cases in the lcm.
>  adding the test cases for the compaction with boundary test cases.
>  added test cases to verify the minor compaction threshold check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569067#comment-15569067
 ] 

ASF GitHub Bot commented on CARBONDATA-297:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/229#discussion_r83032703
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/DataLoadProcessorStep.java
 ---
@@ -0,0 +1,40 @@
+package org.apache.carbondata.processing.newflow;
+
+import java.util.Iterator;
+
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+
+/**
+ * This base interface for data loading. It can do transformation jobs as 
per the implementation.
+ *
+ */
+public interface DataLoadProcessorStep {
+
+  /**
+   * The output meta for this step. The data returns from this step is as 
per this meta.
+   * @return
+   */
+  DataField[] getOutput();
+
+  /**
+   * Intialization process for this step.
+   * @param configuration
+   * @param child
+   * @throws CarbonDataLoadingException
+   */
+  void intialize(CarbonDataLoadConfiguration configuration, 
DataLoadProcessorStep child) throws
+  CarbonDataLoadingException;
+
+  /**
+   * Tranform the data as per the implemetation.
+   * @return Iterator of data
+   * @throws CarbonDataLoadingException
+   */
+  Iterator execute() throws CarbonDataLoadingException;
+
+  /**
+   * Any closing of resources after step execution can be done here.
+   */
+  void finish();
--- End diff --

This is called when the step successfully finished. But what about failure 
case, should there be a 
`void close();` 
interface for failure case?


> 2. Add interfaces for data loading.
> ---
>
> Key: CARBONDATA-297
> URL: https://issues.apache.org/jira/browse/CARBONDATA-297
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.2.0-incubating
>
>
> Add the major interface classes for data loading so that the following jiras 
> can use this interfaces to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569060#comment-15569060
 ] 

ASF GitHub Bot commented on CARBONDATA-297:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/229#discussion_r83031958
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/DataLoadProcessorStep.java
 ---
@@ -0,0 +1,40 @@
+package org.apache.carbondata.processing.newflow;
+
+import java.util.Iterator;
+
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+
+/**
+ * This base interface for data loading. It can do transformation jobs as 
per the implementation.
+ *
+ */
+public interface DataLoadProcessorStep {
--- End diff --

I think each implementation of this interface have similar logic in the 
execute function, can we create a abstract class to implement the common logic? 
The common logic like:
```
Iterator execute() throws CarbonDataLoadingException {
Iterator childIter = child.execute();
return new Iterator {
  public boolean hasNext() { 
return childIter.hasNext();
  }
  public Object[] next() {
// processInput is the abstract func in this class
return processInput(childItor.next());
  }
 }
}
```


> 2. Add interfaces for data loading.
> ---
>
> Key: CARBONDATA-297
> URL: https://issues.apache.org/jira/browse/CARBONDATA-297
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.2.0-incubating
>
>
> Add the major interface classes for data loading so that the following jiras 
> can use this interfaces to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569064#comment-15569064
 ] 

ASF GitHub Bot commented on CARBONDATA-297:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/229#discussion_r83028336
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/CarbonDataLoadConfiguration.java
 ---
@@ -0,0 +1,185 @@
+package org.apache.carbondata.processing.newflow;
--- End diff --

add license header


> 2. Add interfaces for data loading.
> ---
>
> Key: CARBONDATA-297
> URL: https://issues.apache.org/jira/browse/CARBONDATA-297
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.2.0-incubating
>
>
> Add the major interface classes for data loading so that the following jiras 
> can use this interfaces to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569063#comment-15569063
 ] 

ASF GitHub Bot commented on CARBONDATA-297:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/229#discussion_r83030022
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/CarbonDataLoadConfiguration.java
 ---
@@ -0,0 +1,185 @@
+package org.apache.carbondata.processing.newflow;
+
+import java.util.Iterator;
+
+import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier;
+
+public class CarbonDataLoadConfiguration {
--- End diff --

It seems this configuration is quite complex, I think it is because it 
contains configuration for all steps. 
Can we just have a simple `Map` as the configuration and let the `Step` 
decide what to keep in it?


> 2. Add interfaces for data loading.
> ---
>
> Key: CARBONDATA-297
> URL: https://issues.apache.org/jira/browse/CARBONDATA-297
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.2.0-incubating
>
>
> Add the major interface classes for data loading so that the following jiras 
> can use this interfaces to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569061#comment-15569061
 ] 

ASF GitHub Bot commented on CARBONDATA-297:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/229#discussion_r83032371
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/DataLoadProcessorStep.java
 ---
@@ -0,0 +1,40 @@
+package org.apache.carbondata.processing.newflow;
+
+import java.util.Iterator;
+
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+
+/**
+ * This base interface for data loading. It can do transformation jobs as 
per the implementation.
+ *
+ */
+public interface DataLoadProcessorStep {
+
+  /**
+   * The output meta for this step. The data returns from this step is as 
per this meta.
+   * @return
+   */
+  DataField[] getOutput();
+
+  /**
+   * Intialization process for this step.
+   * @param configuration
+   * @param child
+   * @throws CarbonDataLoadingException
+   */
+  void intialize(CarbonDataLoadConfiguration configuration, 
DataLoadProcessorStep child) throws
--- End diff --

If there is a abstract class, it can have the child as its member variable, 
then this `initialize` function takes no parameter as input


> 2. Add interfaces for data loading.
> ---
>
> Key: CARBONDATA-297
> URL: https://issues.apache.org/jira/browse/CARBONDATA-297
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.2.0-incubating
>
>
> Add the major interface classes for data loading so that the following jiras 
> can use this interfaces to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569066#comment-15569066
 ] 

ASF GitHub Bot commented on CARBONDATA-297:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/229#discussion_r83033298
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/iterators/RecordReaderIterator.java
 ---
@@ -0,0 +1,40 @@
+package org.apache.carbondata.processing.newflow.iterators;
+
+import java.io.IOException;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+
+import org.apache.hadoop.mapred.RecordReader;
+
+/**
+ * This iterator iterates RecordReader.
+ */
+public class RecordReaderIterator extends CarbonIterator {
--- End diff --

why this is carbon-processing but not carbon-hadoop module?


> 2. Add interfaces for data loading.
> ---
>
> Key: CARBONDATA-297
> URL: https://issues.apache.org/jira/browse/CARBONDATA-297
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.2.0-incubating
>
>
> Add the major interface classes for data loading so that the following jiras 
> can use this interfaces to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569062#comment-15569062
 ] 

ASF GitHub Bot commented on CARBONDATA-297:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/229#discussion_r83033524
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/iterators/RecordReaderIterator.java
 ---
@@ -0,0 +1,40 @@
+package org.apache.carbondata.processing.newflow.iterators;
+
+import java.io.IOException;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+
+import org.apache.hadoop.mapred.RecordReader;
+
+/**
+ * This iterator iterates RecordReader.
+ */
+public class RecordReaderIterator extends CarbonIterator {
--- End diff --

what is it used for?


> 2. Add interfaces for data loading.
> ---
>
> Key: CARBONDATA-297
> URL: https://issues.apache.org/jira/browse/CARBONDATA-297
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.2.0-incubating
>
>
> Add the major interface classes for data loading so that the following jiras 
> can use this interfaces to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-306) block size info should be show in Desc Formatted and executor log

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569003#comment-15569003
 ] 

ASF GitHub Bot commented on CARBONDATA-306:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/230#discussion_r83028164
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java
 ---
@@ -252,6 +252,9 @@ private static long getMaxOfBlockAndFileSize(long 
blockSize, long fileSize) {
 if (remainder > 0) {
   maxSize = maxSize + HDFS_CHECKSUM_LENGTH - remainder;
 }
+LOGGER.info("The configured block size is " + blockSize + " byte, " +
--- End diff --

Suggest to have the `blockSize` convert to a proper number before logging 
it, otherwise it is hard to check this value by human


> block size info should be show in Desc Formatted and executor log
> -
>
> Key: CARBONDATA-306
> URL: https://issues.apache.org/jira/browse/CARBONDATA-306
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jay
>Priority: Minor
>
> when run desc formatted command, the table block size should be show, as well 
> as in executor log when run load command



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-276) Add trim option

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568996#comment-15568996
 ] 

ASF GitHub Bot commented on CARBONDATA-276:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r82523962
  
--- Diff: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestDataLoadWithTrimOption.scala
 ---
@@ -0,0 +1,78 @@
+package org.apache.carbondata.spark.testsuite.dataload
+
+import java.io.File
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+import org.apache.spark.sql.common.util.CarbonHiveContext._
+import org.apache.spark.sql.common.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+import org.apache.spark.sql.Row
+
+/**
+  * Created by x00381807 on 2016/9/26.
--- End diff --

please remove


> Add trim option
> ---
>
> Key: CARBONDATA-276
> URL: https://issues.apache.org/jira/browse/CARBONDATA-276
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Fix a bug and add trim option.
> Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query 
> result is null. This is because the dictionary ignore the LeadingWhiteSpace 
> and TrailingWhiteSpace and the csvInput dose not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-306) block size info should be show in Desc Formatted and executor log

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568998#comment-15568998
 ] 

ASF GitHub Bot commented on CARBONDATA-306:
---

Github user Zhangshunyu commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/230#discussion_r83027603
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java
 ---
@@ -252,6 +252,9 @@ private static long getMaxOfBlockAndFileSize(long 
blockSize, long fileSize) {
 if (remainder > 0) {
   maxSize = maxSize + HDFS_CHECKSUM_LENGTH - remainder;
 }
+LOGGER.info("The configured block size is " + blockSize + " byte, " +
--- End diff --

@jackylk set in mb，but here already converted to byte.


> block size info should be show in Desc Formatted and executor log
> -
>
> Key: CARBONDATA-306
> URL: https://issues.apache.org/jira/browse/CARBONDATA-306
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jay
>Priority: Minor
>
> when run desc formatted command, the table block size should be show, as well 
> as in executor log when run load command



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-280) when table properties is repeated it only set the last one

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568990#comment-15568990
 ] 

ASF GitHub Bot commented on CARBONDATA-280:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/204#discussion_r83027008
  
--- Diff: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/deleteTable/TestDeleteTableNewDDL.scala
 ---
@@ -97,7 +97,7 @@ class TestDeleteTableNewDDL extends QueryTest with 
BeforeAndAfterAll {
   "CREATE table CaseInsensitiveTable (ID int, date String, country 
String, name " +
   "String," +
   "phonetype String, serialname String, salary int) stored by 
'org.apache.carbondata.format'" +
-  "TBLPROPERTIES('DICTIONARY_INCLUDE'='ID', 
'DICTIONARY_INCLUDE'='salary')"
+  "TBLPROPERTIES('DICTIONARY_INCLUDE'='ID,salary')"
--- End diff --

add space after `,`


>  when table properties is repeated it only set the last one
> ---
>
> Key: CARBONDATA-280
> URL: https://issues.apache.org/jira/browse/CARBONDATA-280
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 0.1.1-incubating
>Reporter: zhangshunyu
>Assignee: zhangshunyu
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> when table properties is repeated it only set the last one:
> For example,
> CREATE TABLE IF NOT EXISTS carbontable
> (ID Int, date Timestamp, country String,
> name String, phonetype String, serialname String, salary Int)
> STORED BY 'carbondata'
>  TBLPROPERTIES('DICTIONARY_EXCLUDE'='country','DICTIONARY_INCLUDE'='ID',
>  'DICTIONARY_EXCLUDE'='phonetype', 'DICTIONARY_INCLUDE'='salary')
> only salary is set to DICTIONARY_INCLUDE and only phonetype is set to 
> DICTIONARY_EXCLUDE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-283) Improve the test cases for concurrent scenarios

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568981#comment-15568981
 ] 

ASF GitHub Bot commented on CARBONDATA-283:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/207#discussion_r83025827
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonTableStatusUtil.java
 ---
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.processing.util;
+
+import java.text.SimpleDateFormat;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Date;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.load.LoadMetadataDetails;
+
+/**
+ * This class contains all table status file utilities
+ */
+public final class CarbonTableStatusUtil {
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(CarbonTableStatusUtil.class.getName());
+
+  private CarbonTableStatusUtil() {
+
+  }
+
+  /**
+   * updates table status details using latest metadata
+   *
+   * @param oldMetadata
+   * @param newMetadata
+   * @return
+   */
+
+  public static List updateLatestTableStatusDetails(
+  LoadMetadataDetails[] oldMetadata, LoadMetadataDetails[] 
newMetadata) {
+
+List newListMetadata =
+new ArrayList(Arrays.asList(newMetadata));
+for (LoadMetadataDetails oldSegment : oldMetadata) {
+  if 
(CarbonCommonConstants.MARKED_FOR_DELETE.equalsIgnoreCase(oldSegment.getLoadStatus()))
 {
+
updateSegmentMetadataDetails(newListMetadata.get(newListMetadata.indexOf(oldSegment)));
+  }
+}
+return newListMetadata;
+  }
+
+  /**
+   * returns current time
+   *
+   * @return
+   */
+  private static String readCurrentTime() {
+SimpleDateFormat sdf = new 
SimpleDateFormat(CarbonCommonConstants.CARBON_TIMESTAMP);
+String date = null;
+
+date = sdf.format(new Date());
+
+return date;
+  }
+
+  /**
+   * updates segment status and modificaton time details
+   *
+   * @param loadMetadata
+   */
+  public static void updateSegmentMetadataDetails(LoadMetadataDetails 
loadMetadata) {
--- End diff --

Can you improve on the function name to depict the behavior of this 
function?


> Improve the test cases for concurrent scenarios
> ---
>
> Key: CARBONDATA-283
> URL: https://issues.apache.org/jira/browse/CARBONDATA-283
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Manohar Vanam
>Assignee: Manohar Vanam
>Priority: Minor
>
> Improve test cases for data retention concurrent scenarios



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-283) Improve the test cases for concurrent scenarios

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568980#comment-15568980
 ] 

ASF GitHub Bot commented on CARBONDATA-283:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/207#discussion_r83026458
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonTableStatusUtil.java
 ---
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.processing.util;
+
+import java.text.SimpleDateFormat;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Date;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.load.LoadMetadataDetails;
+
+/**
+ * This class contains all table status file utilities
+ */
+public final class CarbonTableStatusUtil {
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(CarbonTableStatusUtil.class.getName());
+
+  private CarbonTableStatusUtil() {
+
+  }
+
+  /**
+   * updates table status details using latest metadata
+   *
+   * @param oldMetadata
+   * @param newMetadata
+   * @return
+   */
+
+  public static List updateLatestTableStatusDetails(
--- End diff --

I think these should not be utility functions, but should be member 
function of LoadMetadataDetails


> Improve the test cases for concurrent scenarios
> ---
>
> Key: CARBONDATA-283
> URL: https://issues.apache.org/jira/browse/CARBONDATA-283
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Manohar Vanam
>Assignee: Manohar Vanam
>Priority: Minor
>
> Improve test cases for data retention concurrent scenarios



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-292) add COLUMNDICT operation info in DML operation guide

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568937#comment-15568937
 ] 

ASF GitHub Bot commented on CARBONDATA-292:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/223#discussion_r83022245
  
--- Diff: docs/DML-Operations-on-Carbon.md ---
@@ -104,8 +109,10 @@ Following are the options that can be used in load 
data:
  'MULTILINE'='true', 'ESCAPECHAR'='\', 
  'COMPLEX_DELIMITER_LEVEL_1'='$', 
  'COMPLEX_DELIMITER_LEVEL_2'=':',
- 
'ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary'
+ 
'ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary',
+ 
'COLUMNDICT'='empno:/dictFilePath/empnoDict.csv, 
empname:/dictFilePath/empnameDict.csv'
--- End diff --

do not give this option since it can not be used together with 
`ALL_DICTIONARY_PATH`


> add COLUMNDICT operation info in DML operation guide
> 
>
> Key: CARBONDATA-292
> URL: https://issues.apache.org/jira/browse/CARBONDATA-292
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jay
>Priority: Minor
>
> there is no COLUMNDICT operation guide in DML-Operations-on-Carbon.md, so 
> need to add. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-306) block size info should be show in Desc Formatted and executor log

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568922#comment-15568922
 ] 

ASF GitHub Bot commented on CARBONDATA-306:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/230#discussion_r83021340
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java
 ---
@@ -252,6 +252,9 @@ private static long getMaxOfBlockAndFileSize(long 
blockSize, long fileSize) {
 if (remainder > 0) {
   maxSize = maxSize + HDFS_CHECKSUM_LENGTH - remainder;
 }
+LOGGER.info("The configured block size is " + blockSize + " byte, " +
--- End diff --

Is `blockSize` in bytes or MB?


> block size info should be show in Desc Formatted and executor log
> -
>
> Key: CARBONDATA-306
> URL: https://issues.apache.org/jira/browse/CARBONDATA-306
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jay
>Priority: Minor
>
> when run desc formatted command, the table block size should be show, as well 
> as in executor log when run load command



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1556#comment-1556
 ] 

ASF GitHub Bot commented on CARBONDATA-297:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/229#discussion_r83018479
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/DataLoadProcessorStep.java
 ---
@@ -0,0 +1,40 @@
+package org.apache.carbondata.processing.newflow;
+
+import java.util.Iterator;
+
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+
+/**
+ * This base interface for data loading. It can do transformation jobs as 
per the implementation.
+ *
+ */
+public interface DataLoadProcessorStep {
+
+  /**
+   * The output meta for this step. The data returns from this step is as 
per this meta.
+   * @return
+   */
+  DataField[] getOutput();
+
+  /**
+   * Intialization process for this step.
+   * @param configuration
+   * @param child
+   * @throws CarbonDataLoadingException
+   */
+  void intialize(CarbonDataLoadConfiguration configuration, 
DataLoadProcessorStep child) throws
+  CarbonDataLoadingException;
+
+  /**
+   * Tranform the data as per the implemetation.
+   * @return Iterator of data
+   * @throws CarbonDataLoadingException
+   */
+  Iterator execute() throws CarbonDataLoadingException;
--- End diff --

I think `execute()` is called for every parallel unit of the input, right? 
For example, when using spark to load from dataframe, `execute()` is called for 
every spark partition (execute one task for one partition). When loading from 
CSV HDFS file, `execute()` is called for every HDFS block. So I do not think 
returning array of iterator is required.

The loading process of carbon in every executor, some of the step can be 
parallelized, but sort step need to be synchronized (potential bottle net), 
since we need datanode-scope sorting. Am I correct?


> 2. Add interfaces for data loading.
> ---
>
> Key: CARBONDATA-297
> URL: https://issues.apache.org/jira/browse/CARBONDATA-297
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.2.0-incubating
>
>
> Add the major interface classes for data loading so that the following jiras 
> can use this interfaces to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-288) In hdfs bad record logger is failing in writting the bad records

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568858#comment-15568858
 ] 

ASF GitHub Bot commented on CARBONDATA-288:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/218#discussion_r83014256
  
--- Diff: 
integration/spark/src/main/java/org/apache/carbondata/spark/load/CarbonLoadModel.java
 ---
@@ -117,9 +117,9 @@
   private String badRecordsLoggerEnable;
 
   /**
-   * defines the option to specify the bad record log redirect to raw csv
+   * defines the option to specify the bad record logger action
*/
-  private String badRecordsLoggerRedirect;
+  private String badRecordsLoggerAction;
--- End diff --

This action is not for Logger, right? Perhaps `badRecordsAction` is a 
better name?
And it should be an enum instead of String


> In hdfs bad record logger is failing in writting the bad records
> 
>
> Key: CARBONDATA-288
> URL: https://issues.apache.org/jira/browse/CARBONDATA-288
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> For HDFS file system 
> CarbonFile logFile = FileFactory.getCarbonFile(filePath, FileType.HDFS);
> if filePath does not exits then
> Calling CarbonFile.getPath() throws NullPointerException.
> Solution:
> If file does not exist then before accessing the file must be created first



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-288) In hdfs bad record logger is failing in writting the bad records

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568857#comment-15568857
 ] 

ASF GitHub Bot commented on CARBONDATA-288:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/218#discussion_r83014871
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/BadRecordslogger.java
 ---
@@ -69,9 +68,13 @@
   private BufferedWriter bufferedCSVWriter;
   private DataOutputStream outCSVStream;
   /**
-   *
+   * bad record log file path
+   */
+  private String logFilePath;
+  /**
+   * csv file path
*/
-  private CarbonFile logFile;
+  private String csvFilePath;
--- End diff --

What is this csv file? What is the difference from logFilePath?


> In hdfs bad record logger is failing in writting the bad records
> 
>
> Key: CARBONDATA-288
> URL: https://issues.apache.org/jira/browse/CARBONDATA-288
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> For HDFS file system 
> CarbonFile logFile = FileFactory.getCarbonFile(filePath, FileType.HDFS);
> if filePath does not exits then
> Calling CarbonFile.getPath() throws NullPointerException.
> Solution:
> If file does not exist then before accessing the file must be created first



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-288) In hdfs bad record logger is failing in writting the bad records

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568856#comment-15568856
 ] 

ASF GitHub Bot commented on CARBONDATA-288:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/218#discussion_r83015590
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java
 ---
@@ -458,9 +462,11 @@ public boolean processRow(StepMetaInterface smi, 
StepDataInterface sdi) throws K
   break;
 case REDIRECT:
   badRecordsLogRedirect = true;
+  badRecordConvertNullDisable= true;
--- End diff --

add space before `=`


> In hdfs bad record logger is failing in writting the bad records
> 
>
> Key: CARBONDATA-288
> URL: https://issues.apache.org/jira/browse/CARBONDATA-288
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> For HDFS file system 
> CarbonFile logFile = FileFactory.getCarbonFile(filePath, FileType.HDFS);
> if filePath does not exits then
> Calling CarbonFile.getPath() throws NullPointerException.
> Solution:
> If file does not exist then before accessing the file must be created first



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-304) Load data failure when set table_blocksize=2048

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568825#comment-15568825
 ] 

ASF GitHub Bot commented on CARBONDATA-304:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/227#discussion_r83012391
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java
 ---
@@ -197,8 +197,9 @@ public AbstractFactDataWriter(String storeLocation, int 
measureCount, int mdKeyL
 blockIndexInfoList = new ArrayList<>();
 // get max file size;
 CarbonProperties propInstance = CarbonProperties.getInstance();
-this.fileSizeInBytes = blocksize * 
CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR
-* CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR * 1L;
+// if blocksize=2048, then 2048*1024*1024 will beyond the range of Int
+this.fileSizeInBytes = 1L * blocksize * 
CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR
--- End diff --

instead of multiple by `1L`, you can just convert `blocksize` to 
`(long)blocksize` instead


> Load data failure when set table_blocksize=2048
> ---
>
> Key: CARBONDATA-304
> URL: https://issues.apache.org/jira/browse/CARBONDATA-304
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Gin-zhj
>Assignee: Gin-zhj
>
> First ,create a table with table_blocksize=2048
> CREATE TABLE IF NOT EXISTS t3 (ID Int, date Timestamp, country String, name 
> String, phonetype String, serialname String, salary Int) STORED BY 
> 'carbondata' TBLPROPERTIES('table_blocksize'='2048');
> Then load data, failure and catch exception:
> org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException:
>  Problem while copying file from local store to carbon store



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-239) Failure of one compaction in queue should not affect the others.

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568425#comment-15568425
 ] 

ASF GitHub Bot commented on CARBONDATA-239:
---

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/224#discussion_r82983904
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/scanner/impl/FilterScanner.java 
---
@@ -78,10 +80,11 @@ public FilterScanner(BlockExecutionInfo 
blockExecutionInfo) {
* @throws QueryExecutionException
* @throws FilterUnsupportedException
*/
-  @Override public AbstractScannedResult scanBlocklet(BlocksChunkHolder 
blocksChunkHolder)
+  @Override public AbstractScannedResult scanBlocklet(BlocksChunkHolder 
blocksChunkHolder,
+  QueryStatisticsModel 
queryStatisticsModel)
   throws QueryExecutionException {
 try {
-  fillScannedResult(blocksChunkHolder);
+  fillScannedResult(blocksChunkHolder, queryStatisticsModel);
--- End diff --

Pass the model in constructor so that no need to change in all API


> Failure of one compaction in queue should not affect the others.
> 
>
> Key: CARBONDATA-239
> URL: https://issues.apache.org/jira/browse/CARBONDATA-239
> Project: CarbonData
>  Issue Type: Bug
>Reporter: ravikiran
>Assignee: ravikiran
> Fix For: 0.2.0-incubating
>
>
> Failure of one compaction in queue should not affect the others.
> If a compaction is triggered by the user on table1 , and other requests will 
> go to queue.  and if the compaction is failed for table1 then the requests in 
> queue should continue and at the end the beeline will show the failure 
> message to the user.
> if any compaction gets failed for a table which is other than the user 
> requested table then the error in the beeline should not appear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-276) Add trim option

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568336#comment-15568336
 ] 

ASF GitHub Bot commented on CARBONDATA-276:
---

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r82977743
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -102,8 +102,8 @@ public void initialize() throws IOException {
 parserSettings.setMaxColumns(
 getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), 
csvParserVo.getMaxColumns()));
 parserSettings.setNullValue("");
-parserSettings.setIgnoreLeadingWhitespaces(false);
-parserSettings.setIgnoreTrailingWhitespaces(false);
+parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim());
--- End diff --

So better to set this while creating the table as column properties metadata


> Add trim option
> ---
>
> Key: CARBONDATA-276
> URL: https://issues.apache.org/jira/browse/CARBONDATA-276
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Fix a bug and add trim option.
> Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query 
> result is null. This is because the dictionary ignore the LeadingWhiteSpace 
> and TrailingWhiteSpace and the csvInput dose not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-276) Add trim option

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568332#comment-15568332
 ] 

ASF GitHub Bot commented on CARBONDATA-276:
---

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r82977592
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -102,8 +102,8 @@ public void initialize() throws IOException {
 parserSettings.setMaxColumns(
 getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), 
csvParserVo.getMaxColumns()));
 parserSettings.setNullValue("");
-parserSettings.setIgnoreLeadingWhitespaces(false);
-parserSettings.setIgnoreTrailingWhitespaces(false);
+parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim());
--- End diff --

pros of this approach will be suppose in one load user loaded with dirty 
data and suddenly he realizes no i need to trim then in the next load he will 
enable the option and load the data, this will increase the dictionary space 
also, also in query dictionary lookup overhead will increase.


> Add trim option
> ---
>
> Key: CARBONDATA-276
> URL: https://issues.apache.org/jira/browse/CARBONDATA-276
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Fix a bug and add trim option.
> Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query 
> result is null. This is because the dictionary ignore the LeadingWhiteSpace 
> and TrailingWhiteSpace and the csvInput dose not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-276) Add trim option

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568196#comment-15568196
 ] 

ASF GitHub Bot commented on CARBONDATA-276:
---

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r82968804
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -102,8 +102,8 @@ public void initialize() throws IOException {
 parserSettings.setMaxColumns(
 getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), 
csvParserVo.getMaxColumns()));
 parserSettings.setNullValue("");
-parserSettings.setIgnoreLeadingWhitespaces(false);
-parserSettings.setIgnoreTrailingWhitespaces(false);
+parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim());
--- End diff --

Also one more point it will be better to set this property in column level 
while creating the table itself as its column properties , this will avoid user 
to provide this option every time while data loading


> Add trim option
> ---
>
> Key: CARBONDATA-276
> URL: https://issues.apache.org/jira/browse/CARBONDATA-276
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Fix a bug and add trim option.
> Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query 
> result is null. This is because the dictionary ignore the LeadingWhiteSpace 
> and TrailingWhiteSpace and the csvInput dose not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-276) Add trim option

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568190#comment-15568190
 ] 

ASF GitHub Bot commented on CARBONDATA-276:
---

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r82968468
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -102,8 +102,8 @@ public void initialize() throws IOException {
 parserSettings.setMaxColumns(
 getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), 
csvParserVo.getMaxColumns()));
 parserSettings.setNullValue("");
-parserSettings.setIgnoreLeadingWhitespaces(false);
-parserSettings.setIgnoreTrailingWhitespaces(false);
+parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim());
--- End diff --

Guys i think if while data loading we are reading from configuration 
inorder to trim it or not same we need to do while doing filter also, based on 
configuration value decide.
ex:  while loading i enabled trim property, so system will trim and load 
the data, now in filter query also if user provides while space then it needs 
to be trimmed while creating filter model. This will provide more system 
consistentency. if user enable trim then we wont trim it while loading and also 
while querying.


> Add trim option
> ---
>
> Key: CARBONDATA-276
> URL: https://issues.apache.org/jira/browse/CARBONDATA-276
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Fix a bug and add trim option.
> Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query 
> result is null. This is because the dictionary ignore the LeadingWhiteSpace 
> and TrailingWhiteSpace and the csvInput dose not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-276) Add trim option

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568077#comment-15568077
 ] 

ASF GitHub Bot commented on CARBONDATA-276:
---

Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r82960653
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -102,8 +102,8 @@ public void initialize() throws IOException {
 parserSettings.setMaxColumns(
 getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), 
csvParserVo.getMaxColumns()));
 parserSettings.setNullValue("");
-parserSettings.setIgnoreLeadingWhitespaces(false);
-parserSettings.setIgnoreTrailingWhitespaces(false);
+parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim());
--- End diff --

hi sujith,
I agree with Eason, when user query with some space filter, it should be 
considered as forbidden action.


> Add trim option
> ---
>
> Key: CARBONDATA-276
> URL: https://issues.apache.org/jira/browse/CARBONDATA-276
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Fix a bug and add trim option.
> Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query 
> result is null. This is because the dictionary ignore the LeadingWhiteSpace 
> and TrailingWhiteSpace and the csvInput dose not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CARBONDATA-239) Failure of one compaction in queue should not affect the others.

2016-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568046#comment-15568046
 ] 

ASF GitHub Bot commented on CARBONDATA-239:
---

Github user Zhangshunyu commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/224#discussion_r82958230
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/processor/AbstractDataBlockIterator.java
 ---
@@ -127,11 +133,15 @@ protected boolean updateScanner() {
 }
   }
 
-  private AbstractScannedResult getNextScannedResult() throws 
QueryExecutionException {
+  private AbstractScannedResult 
getNextScannedResult(QueryStatisticsRecorder recorder,
--- End diff --

@sujith71955 OK, i will use a statistics model, thanks!


> Failure of one compaction in queue should not affect the others.
> 
>
> Key: CARBONDATA-239
> URL: https://issues.apache.org/jira/browse/CARBONDATA-239
> Project: CarbonData
>  Issue Type: Bug
>Reporter: ravikiran
>Assignee: ravikiran
> Fix For: 0.2.0-incubating
>
>
> Failure of one compaction in queue should not affect the others.
> If a compaction is triggered by the user on table1 , and other requests will 
> go to queue.  and if the compaction is failed for table1 then the requests in 
> queue should continue and at the end the beeline will show the failure 
> message to the user.
> if any compaction gets failed for a table which is other than the user 
> requested table then the error in the beeline should not appear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

62 matches

Mail list logo