[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.
[ https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570931#comment-15570931 ] ASF GitHub Bot commented on CARBONDATA-297: --- Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83147351 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/iterators/RecordReaderIterator.java --- @@ -0,0 +1,40 @@ +package org.apache.carbondata.processing.newflow.iterators; + +import java.io.IOException; + +import org.apache.carbondata.common.CarbonIterator; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; + +import org.apache.hadoop.mapred.RecordReader; + +/** + * This iterator iterates RecordReader. + */ +public class RecordReaderIterator extends CarbonIterator { --- End diff -- It is used for iterating RecordReader. I can move it to carbon-hadoop module but processing module need to be dependent on it. Already processing module is dependent on hadoop module so it becomes dependent on each other. > 2. Add interfaces for data loading. > --- > > Key: CARBONDATA-297 > URL: https://issues.apache.org/jira/browse/CARBONDATA-297 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 0.2.0-incubating > > > Add the major interface classes for data loading so that the following jiras > can use this interfaces to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.
[ https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570920#comment-15570920 ] ASF GitHub Bot commented on CARBONDATA-297: --- Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83147018 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/DataLoadProcessorStep.java --- @@ -0,0 +1,40 @@ +package org.apache.carbondata.processing.newflow; + +import java.util.Iterator; + +import org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException; + +/** + * This base interface for data loading. It can do transformation jobs as per the implementation. + * + */ +public interface DataLoadProcessorStep { + + /** + * The output meta for this step. The data returns from this step is as per this meta. + * @return + */ + DataField[] getOutput(); + + /** + * Intialization process for this step. + * @param configuration + * @param child + * @throws CarbonDataLoadingException + */ + void intialize(CarbonDataLoadConfiguration configuration, DataLoadProcessorStep child) throws + CarbonDataLoadingException; + + /** + * Tranform the data as per the implemetation. + * @return Iterator of data + * @throws CarbonDataLoadingException + */ + Iterator execute() throws CarbonDataLoadingException; + + /** + * Any closing of resources after step execution can be done here. + */ + void finish(); --- End diff -- Ok. I will add close method along with finish method > 2. Add interfaces for data loading. > --- > > Key: CARBONDATA-297 > URL: https://issues.apache.org/jira/browse/CARBONDATA-297 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 0.2.0-incubating > > > Add the major interface classes for data loading so that the following jiras > can use this interfaces to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-285) Use path parameter in Spark datasource API
[ https://issues.apache.org/jira/browse/CARBONDATA-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570916#comment-15570916 ] ASF GitHub Bot commented on CARBONDATA-285: --- Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/212#discussion_r83146927 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceRelation.scala --- @@ -55,18 +55,11 @@ class CarbonSource extends RelationProvider override def createRelation( sqlContext: SQLContext, parameters: Map[String, String]): BaseRelation = { -// if path is provided we can directly create Hadoop relation. \ -// Otherwise create datasource relation -parameters.get("path") match { - case Some(path) => CarbonDatasourceHadoopRelation(sqlContext, Array(path), parameters, None) - case _ => -val options = new CarbonOption(parameters) -val tableIdentifier = options.tableIdentifier.split("""\.""").toSeq -val identifier = tableIdentifier match { - case Seq(name) => TableIdentifier(name, None) - case Seq(db, name) => TableIdentifier(name, Some(db)) -} -CarbonDatasourceRelation(identifier, None)(sqlContext) +val options = new CarbonOption(parameters) +if (sqlContext.isInstanceOf[CarbonContext]) { --- End diff -- sorry, yes `carboncontext.load(path)` cannot work now right? > Use path parameter in Spark datasource API > -- > > Key: CARBONDATA-285 > URL: https://issues.apache.org/jira/browse/CARBONDATA-285 > Project: CarbonData > Issue Type: Improvement > Components: spark-integration >Affects Versions: 0.1.0-incubating >Reporter: Jacky Li > Fix For: 0.2.0-incubating > > > Currently, when using carbon with spark datasource API, it need to give > database name and table name as parameter, it is not the normal way of > datasource API usage. In this PR, database name and table name is not > required to give, user need to specify the `path` parameter (indicating the > path to table folder) only when using datasource API -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-308) Unify CarbonScanRDD and CarbonHadoopFSRDD
[ https://issues.apache.org/jira/browse/CARBONDATA-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-308: Description: Take CarbonScanRDD as the target RDD, modify as following: 1. In driver side, only getSplit is required, so only filter condition is required, no need to create full QueryModel object, so we can move creation of QueryModel from driver side to executor side. 2. use CarbonInputFormat.createRecordReader in CarbonScanRDD.compute instead of use QueryExecutor directly was: Take CarbonScanRDD as the target RDD, modify as following: In driver side, only getSplit is required, so only filter condition is required, no need to create full QueryModel object, so we can move creation of QueryModel from driver side to executor side > Unify CarbonScanRDD and CarbonHadoopFSRDD > - > > Key: CARBONDATA-308 > URL: https://issues.apache.org/jira/browse/CARBONDATA-308 > Project: CarbonData > Issue Type: Sub-task > Components: spark-integration >Reporter: Jacky Li > Fix For: 0.2.0-incubating > > > Take CarbonScanRDD as the target RDD, modify as following: > 1. In driver side, only getSplit is required, so only filter condition is > required, no need to create full QueryModel object, so we can move creation > of QueryModel from driver side to executor side. > 2. use CarbonInputFormat.createRecordReader in CarbonScanRDD.compute instead > of use QueryExecutor directly -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-314) Make CarbonContext to use standard Datasource strategy
Jacky Li created CARBONDATA-314: --- Summary: Make CarbonContext to use standard Datasource strategy Key: CARBONDATA-314 URL: https://issues.apache.org/jira/browse/CARBONDATA-314 Project: CarbonData Issue Type: Sub-task Reporter: Jacky Li Move the dictionary stratey out of CarbonTableScan, make a separate strategy for it. Then make CarbonContext use standard datasource strategy for creation of relation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-313) Update CarbonSource to use CarbonDatasourceHadoopRelation
Jacky Li created CARBONDATA-313: --- Summary: Update CarbonSource to use CarbonDatasourceHadoopRelation Key: CARBONDATA-313 URL: https://issues.apache.org/jira/browse/CARBONDATA-313 Project: CarbonData Issue Type: Sub-task Reporter: Jacky Li Change CarbonSource to use CarbonDatasourceHadoopRelation only, remove extension of BaseRelation, extend from HadoopFsRelation only -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-312) Unify two datasource: CarbonDatasourceHadoopRelation and CarbonDatasourceRelation
[ https://issues.apache.org/jira/browse/CARBONDATA-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-312: Description: Take CarbonDatasourceHadoopRelation as the target datasource definition, after that, CarbonContext can use standard Datasource strategy In this Issue, change CarbonDatasourceHadoopRelation to use CarbonScanRDD in buildScan function. was: Take CarbonDatasourceHadoopRelation as the target datasource definition, after that, CarbonContext can use standard Datasource strategy In this Issue, following change is required: 1. Move the dictionary stratey out of CarbonTableScan, make a separate strategy for it. 2. CarbonDatasourceHadoopRelation should use CarbonScanRDD in buildScan function. 3. Change CarbonSource to use CarbonDatasourceHadoopRelation only, remove extension of BaseRelation, extend from HadoopFsRelation only. > Unify two datasource: CarbonDatasourceHadoopRelation and > CarbonDatasourceRelation > - > > Key: CARBONDATA-312 > URL: https://issues.apache.org/jira/browse/CARBONDATA-312 > Project: CarbonData > Issue Type: Sub-task > Components: spark-integration >Reporter: Jacky Li > Fix For: 0.2.0-incubating > > > Take CarbonDatasourceHadoopRelation as the target datasource definition, > after that, CarbonContext can use standard Datasource strategy > In this Issue, change CarbonDatasourceHadoopRelation to use CarbonScanRDD in > buildScan function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-307) Support executor side scan using CarbonInputFormat
[ https://issues.apache.org/jira/browse/CARBONDATA-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-307: Description: Currently, there are two read path in carbon-spark module: 1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor for scan. 2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonRecordReader => QueryExecutor In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan Because of this, there are unnecessary duplicate code, they need to be unified. was: Currently, there are two read path in carbon-spark module: 1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor for scan. 2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonRecordReader In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan Because of this, there are unnecessary duplicate code, they need to be unified. > Support executor side scan using CarbonInputFormat > -- > > Key: CARBONDATA-307 > URL: https://issues.apache.org/jira/browse/CARBONDATA-307 > Project: CarbonData > Issue Type: Improvement > Components: spark-integration >Affects Versions: 0.1.0-incubating >Reporter: Jacky Li > Fix For: 0.2.0-incubating > > > Currently, there are two read path in carbon-spark module: > 1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor > In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use > QueryExecutor for scan. > 2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => > CarbonRecordReader => QueryExecutor > In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split > and scan > Because of this, there are unnecessary duplicate code, they need to be > unified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-307) Support executor side scan using CarbonInputFormat
[ https://issues.apache.org/jira/browse/CARBONDATA-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-307: Description: Currently, there are two read path in carbon-spark module: 1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor for scan. 2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonRecordReader In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan Because of this, there are unnecessary duplicate code, they need to be unified. was: Currently, there are two read path in carbon-spark module: 1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor for scan. 2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonRecordReader In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan It create unnecessary duplicate code, they need to be unified. > Support executor side scan using CarbonInputFormat > -- > > Key: CARBONDATA-307 > URL: https://issues.apache.org/jira/browse/CARBONDATA-307 > Project: CarbonData > Issue Type: Improvement > Components: spark-integration >Affects Versions: 0.1.0-incubating >Reporter: Jacky Li > Fix For: 0.2.0-incubating > > > Currently, there are two read path in carbon-spark module: > 1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor > In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use > QueryExecutor for scan. > 2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => > CarbonRecordReader > In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split > and scan > Because of this, there are unnecessary duplicate code, they need to be > unified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-312) Unify two datasource: CarbonDatasourceHadoopRelation and CarbonDatasourceRelation
[ https://issues.apache.org/jira/browse/CARBONDATA-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-312: Description: Take CarbonDatasourceHadoopRelation as the target datasource definition, after that, CarbonContext can use standard Datasource strategy In this Issue, following change is required: 1. Move the dictionary stratey out of CarbonTableScan, make a separate strategy for it. 2. CarbonDatasourceHadoopRelation should use CarbonScanRDD in buildScan function. 3. Change CarbonSource to use CarbonDatasourceHadoopRelation only, remove extension of BaseRelation, extend from HadoopFsRelation only. was: Take CarbonDatasourceHadoopRelation as the target datasource definition, after that, CarbonContext can use standard Datasource strategy In this Issue, following change is required: 1. Move the dictionary stratey out of CarbonTableScan, make a separate strategy for it. 2. CarbonDatasourceHadoopRelation should use CarbonScanRDD in buildScan function. 3. Change CarbonSource to use CarbonDatasourceHadoopRelation only > Unify two datasource: CarbonDatasourceHadoopRelation and > CarbonDatasourceRelation > - > > Key: CARBONDATA-312 > URL: https://issues.apache.org/jira/browse/CARBONDATA-312 > Project: CarbonData > Issue Type: Sub-task > Components: spark-integration >Reporter: Jacky Li > Fix For: 0.2.0-incubating > > > Take CarbonDatasourceHadoopRelation as the target datasource definition, > after that, CarbonContext can use standard Datasource strategy > In this Issue, following change is required: > 1. Move the dictionary stratey out of CarbonTableScan, make a separate > strategy for it. > 2. CarbonDatasourceHadoopRelation should use CarbonScanRDD in buildScan > function. > 3. Change CarbonSource to use CarbonDatasourceHadoopRelation only, remove > extension of BaseRelation, extend from HadoopFsRelation only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-306) block size info should be show in Desc Formatted and executor log
[ https://issues.apache.org/jira/browse/CARBONDATA-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570706#comment-15570706 ] ASF GitHub Bot commented on CARBONDATA-306: --- Github user Zhangshunyu commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/230#discussion_r83139950 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java --- @@ -252,6 +252,9 @@ private static long getMaxOfBlockAndFileSize(long blockSize, long fileSize) { if (remainder > 0) { maxSize = maxSize + HDFS_CHECKSUM_LENGTH - remainder; } +LOGGER.info("The configured block size is " + blockSize + " byte, " + --- End diff -- @Jay357089 I think this is a good idea to extract ConvertByteToReadable as a method, since it can be used in many logs, especially for analyzing performance. > block size info should be show in Desc Formatted and executor log > - > > Key: CARBONDATA-306 > URL: https://issues.apache.org/jira/browse/CARBONDATA-306 > Project: CarbonData > Issue Type: Improvement >Reporter: Jay >Priority: Minor > > when run desc formatted command, the table block size should be show, as well > as in executor log when run load command -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-306) block size info should be show in Desc Formatted and executor log
[ https://issues.apache.org/jira/browse/CARBONDATA-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570699#comment-15570699 ] ASF GitHub Bot commented on CARBONDATA-306: --- Github user Jay357089 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/230#discussion_r83139600 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java --- @@ -252,6 +252,9 @@ private static long getMaxOfBlockAndFileSize(long blockSize, long fileSize) { if (remainder > 0) { maxSize = maxSize + HDFS_CHECKSUM_LENGTH - remainder; } +LOGGER.info("The configured block size is " + blockSize + " byte, " + --- End diff -- @jackylk Maybe i should extract if .. else part to a method called ConvertByteToReadable, what's your opinion? > block size info should be show in Desc Formatted and executor log > - > > Key: CARBONDATA-306 > URL: https://issues.apache.org/jira/browse/CARBONDATA-306 > Project: CarbonData > Issue Type: Improvement >Reporter: Jay >Priority: Minor > > when run desc formatted command, the table block size should be show, as well > as in executor log when run load command -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-312) Unify two datasource: CarbonDatasourceHadoopRelation and CarbonDatasourceRelation
[ https://issues.apache.org/jira/browse/CARBONDATA-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-312: Description: Take CarbonDatasourceHadoopRelation as the target datasource definition, after that, CarbonContext can use standard Datasource strategy In this Issue, following change is required: 1. Move the dictionary stratey out of CarbonTableScan, make a separate strategy for it. 2. CarbonDatasourceHadoopRelation should use CarbonScanRDD in buildScan function. 3. Change CarbonSource to use CarbonDatasourceHadoopRelation only was: Take CarbonDatasourceHadoopRelation as the target datasource definition, after that, CarbonContext can use standard Datasource strategy In this Issue, following change is required: 1. Move the dictionary stratey out of CarbonTableScan, make a separate strategy for it. 2. CarbonDatasourceHadoopRelation should use CarbonScanRDD in buildScan function. > Unify two datasource: CarbonDatasourceHadoopRelation and > CarbonDatasourceRelation > - > > Key: CARBONDATA-312 > URL: https://issues.apache.org/jira/browse/CARBONDATA-312 > Project: CarbonData > Issue Type: Sub-task > Components: spark-integration >Reporter: Jacky Li > Fix For: 0.2.0-incubating > > > Take CarbonDatasourceHadoopRelation as the target datasource definition, > after that, CarbonContext can use standard Datasource strategy > In this Issue, following change is required: > 1. Move the dictionary stratey out of CarbonTableScan, make a separate > strategy for it. > 2. CarbonDatasourceHadoopRelation should use CarbonScanRDD in buildScan > function. > 3. Change CarbonSource to use CarbonDatasourceHadoopRelation only -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-310) Compilation failed when using spark 1.6.2
[ https://issues.apache.org/jira/browse/CARBONDATA-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570690#comment-15570690 ] ASF GitHub Bot commented on CARBONDATA-310: --- GitHub user foryou2030 opened a pull request: https://github.com/apache/incubator-carbondata/pull/232 [CARBONDATA-310]Fixed compilation failure when using spark 1.6.2 # Why raise this pr? Compilation failed when using spark 1.6.2, because class not found: AggregateExpression # How to solve? Once Removing the import "import org.apache.spark.sql.catalyst.expressions.aggregate._" will cause compilation failure when using Spark 1.6.2, in which AggregateExpression is moved to subpackage "aggregate". So neeed changing it back. Thanks for you remind, @harperjiang You can merge this pull request into a Git repository by running: $ git pull https://github.com/foryou2030/incubator-carbondata agg_ex Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/232.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #232 commit ee4f6832d893c6ac99e1694b607b6f2d38ec9231 Author: foryou2030 Date: 2016-10-13T03:17:38Z fix compile on spark1.6.2 > Compilation failed when using spark 1.6.2 > - > > Key: CARBONDATA-310 > URL: https://issues.apache.org/jira/browse/CARBONDATA-310 > Project: CarbonData > Issue Type: Bug >Reporter: Gin-zhj >Assignee: Gin-zhj >Priority: Minor > > Compilation failed when using spark 1.6.2, > caused by class not found: AggregateExpression -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-311) Log the data size of blocklet during data load.
[ https://issues.apache.org/jira/browse/CARBONDATA-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570686#comment-15570686 ] ASF GitHub Bot commented on CARBONDATA-311: --- GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/231 [CARBONDATA-311]Log the data size of blocklet during data load. ## Why raise this pr? The blocklet size is an important parameter for analyzing data load and query, this info should be logged. ## How to test? Pass all the test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata logblocklet Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/231.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #231 commit a110504f58e688e42223e896f7a1cf729463cf9d Author: Zhangshunyu Date: 2016-10-13T03:17:21Z Log the data size of each blocklet > Log the data size of blocklet during data load. > --- > > Key: CARBONDATA-311 > URL: https://issues.apache.org/jira/browse/CARBONDATA-311 > Project: CarbonData > Issue Type: Improvement >Affects Versions: 0.1.1-incubating >Reporter: zhangshunyu >Assignee: zhangshunyu >Priority: Minor > Fix For: 0.2.0-incubating > > > Log the data size of blocklet during data load. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-307) Support executor side scan using CarbonInputFormat
[ https://issues.apache.org/jira/browse/CARBONDATA-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-307: Summary: Support executor side scan using CarbonInputFormat (was: Support full functionality in CarbonInputFormat) > Support executor side scan using CarbonInputFormat > -- > > Key: CARBONDATA-307 > URL: https://issues.apache.org/jira/browse/CARBONDATA-307 > Project: CarbonData > Issue Type: Improvement > Components: spark-integration >Affects Versions: 0.1.0-incubating >Reporter: Jacky Li > Fix For: 0.2.0-incubating > > > Currently, there are two read path in carbon-spark module: > 1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor > In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use > QueryExecutor for scan. > 2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => > CarbonRecordReader > In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split > and scan > It create unnecessary duplicate code, they need to be unified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-310) Compilation failed when using spark 1.6.2
Gin-zhj created CARBONDATA-310: -- Summary: Compilation failed when using spark 1.6.2 Key: CARBONDATA-310 URL: https://issues.apache.org/jira/browse/CARBONDATA-310 Project: CarbonData Issue Type: Bug Reporter: Gin-zhj Assignee: Gin-zhj Priority: Minor Compilation failed when using spark 1.6.2, caused by class not found: AggregateExpression -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-312) Unify two datasource: CarbonDatasourceHadoopRelation and CarbonDatasourceRelation
Jacky Li created CARBONDATA-312: --- Summary: Unify two datasource: CarbonDatasourceHadoopRelation and CarbonDatasourceRelation Key: CARBONDATA-312 URL: https://issues.apache.org/jira/browse/CARBONDATA-312 Project: CarbonData Issue Type: Sub-task Reporter: Jacky Li Take CarbonDatasourceHadoopRelation as the target datasource definition, after that, CarbonContext can use standard Datasource strategy In this Issue, following change is required: 1. Move the dictionary stratey out of CarbonTableScan, make a separate strategy for it. 2. CarbonDatasourceHadoopRelation should use CarbonScanRDD in buildScan function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-311) Log the data size of blocklet during data load.
zhangshunyu created CARBONDATA-311: -- Summary: Log the data size of blocklet during data load. Key: CARBONDATA-311 URL: https://issues.apache.org/jira/browse/CARBONDATA-311 Project: CarbonData Issue Type: Improvement Affects Versions: 0.1.1-incubating Reporter: zhangshunyu Assignee: zhangshunyu Priority: Minor Fix For: 0.2.0-incubating Log the data size of blocklet during data load. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-308) Unify CarbonScanRDD and CarbonHadoopFSRDD
[ https://issues.apache.org/jira/browse/CARBONDATA-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-308: Description: Take CarbonScanRDD as the target RDD, modify as following: In driver side, only getSplit is required, so only filter condition is required, no need to create full QueryModel object, so we can move creation of QueryModel from driver side to executor side Summary: Unify CarbonScanRDD and CarbonHadoopFSRDD (was: Support multiple segment in CarbonHadoopFSRDD) > Unify CarbonScanRDD and CarbonHadoopFSRDD > - > > Key: CARBONDATA-308 > URL: https://issues.apache.org/jira/browse/CARBONDATA-308 > Project: CarbonData > Issue Type: Sub-task > Components: spark-integration >Reporter: Jacky Li > Fix For: 0.2.0-incubating > > > Take CarbonScanRDD as the target RDD, modify as following: > In driver side, only getSplit is required, so only filter condition is > required, no need to create full QueryModel object, so we can move creation > of QueryModel from driver side to executor side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-309) Support two types of ReadSupport in CarbonRecordReader
Jacky Li created CARBONDATA-309: --- Summary: Support two types of ReadSupport in CarbonRecordReader Key: CARBONDATA-309 URL: https://issues.apache.org/jira/browse/CARBONDATA-309 Project: CarbonData Issue Type: Sub-task Reporter: Jacky Li CarbonRecordReader should support late decode based on passed Configuration A config indicating late decode need to be added in CarbonInputFormat for this purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-308) Support multiple segment in CarbonHadoopFSRDD
Jacky Li created CARBONDATA-308: --- Summary: Support multiple segment in CarbonHadoopFSRDD Key: CARBONDATA-308 URL: https://issues.apache.org/jira/browse/CARBONDATA-308 Project: CarbonData Issue Type: Sub-task Reporter: Jacky Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-307) Support full functionality in CarbonInputFormat
Jacky Li created CARBONDATA-307: --- Summary: Support full functionality in CarbonInputFormat Key: CARBONDATA-307 URL: https://issues.apache.org/jira/browse/CARBONDATA-307 Project: CarbonData Issue Type: Improvement Components: spark-integration Affects Versions: 0.1.0-incubating Reporter: Jacky Li Fix For: 0.2.0-incubating Currently, there are two read path in carbon-spark module: 1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor for scan. 2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonRecordReader In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan It create unnecessary duplicate code, they need to be unified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-292) add COLUMNDICT operation info in DML operation guide
[ https://issues.apache.org/jira/browse/CARBONDATA-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570540#comment-15570540 ] ASF GitHub Bot commented on CARBONDATA-292: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/223#discussion_r83133189 --- Diff: docs/DML-Operations-on-Carbon.md --- @@ -104,8 +109,10 @@ Following are the options that can be used in load data: 'MULTILINE'='true', 'ESCAPECHAR'='\', 'COMPLEX_DELIMITER_LEVEL_1'='$', 'COMPLEX_DELIMITER_LEVEL_2'=':', - 'ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary' + 'ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary', + 'COLUMNDICT'='empno:/dictFilePath/empnoDict.csv, empname:/dictFilePath/empnameDict.csv' --- End diff -- No, I mean just delete it from the Example section. And that node should be added in the option expanation section. > add COLUMNDICT operation info in DML operation guide > > > Key: CARBONDATA-292 > URL: https://issues.apache.org/jira/browse/CARBONDATA-292 > Project: CarbonData > Issue Type: Improvement >Reporter: Jay >Priority: Minor > > there is no COLUMNDICT operation guide in DML-Operations-on-Carbon.md, so > need to add. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-292) add COLUMNDICT operation info in DML operation guide
[ https://issues.apache.org/jira/browse/CARBONDATA-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570512#comment-15570512 ] ASF GitHub Bot commented on CARBONDATA-292: --- Github user Jay357089 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/223#discussion_r83132221 --- Diff: docs/DML-Operations-on-Carbon.md --- @@ -104,8 +109,10 @@ Following are the options that can be used in load data: 'MULTILINE'='true', 'ESCAPECHAR'='\', 'COMPLEX_DELIMITER_LEVEL_1'='$', 'COMPLEX_DELIMITER_LEVEL_2'=':', - 'ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary' + 'ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary', + 'COLUMNDICT'='empno:/dictFilePath/empnoDict.csv, empname:/dictFilePath/empnameDict.csv' --- End diff -- i have give Note in the below. if it is not enough, should i delete this option or close this pr? > add COLUMNDICT operation info in DML operation guide > > > Key: CARBONDATA-292 > URL: https://issues.apache.org/jira/browse/CARBONDATA-292 > Project: CarbonData > Issue Type: Improvement >Reporter: Jay >Priority: Minor > > there is no COLUMNDICT operation guide in DML-Operations-on-Carbon.md, so > need to add. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-304) Load data failure when set table_blocksize=2048
[ https://issues.apache.org/jira/browse/CARBONDATA-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570511#comment-15570511 ] ASF GitHub Bot commented on CARBONDATA-304: --- Github user foryou2030 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/227#discussion_r83132187 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java --- @@ -197,8 +197,9 @@ public AbstractFactDataWriter(String storeLocation, int measureCount, int mdKeyL blockIndexInfoList = new ArrayList<>(); // get max file size; CarbonProperties propInstance = CarbonProperties.getInstance(); -this.fileSizeInBytes = blocksize * CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR -* CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR * 1L; +// if blocksize=2048, then 2048*1024*1024 will beyond the range of Int +this.fileSizeInBytes = 1L * blocksize * CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR --- End diff -- fixed > Load data failure when set table_blocksize=2048 > --- > > Key: CARBONDATA-304 > URL: https://issues.apache.org/jira/browse/CARBONDATA-304 > Project: CarbonData > Issue Type: Bug >Reporter: Gin-zhj >Assignee: Gin-zhj > > First ,create a table with table_blocksize=2048 > CREATE TABLE IF NOT EXISTS t3 (ID Int, date Timestamp, country String, name > String, phonetype String, serialname String, salary Int) STORED BY > 'carbondata' TBLPROPERTIES('table_blocksize'='2048'); > Then load data, failure and catch exception: > org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException: > Problem while copying file from local store to carbon store -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.
[ https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570458#comment-15570458 ] ASF GitHub Bot commented on CARBONDATA-297: --- Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83130319 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/DataLoadProcessorStep.java --- @@ -0,0 +1,40 @@ +package org.apache.carbondata.processing.newflow; + +import java.util.Iterator; + +import org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException; + +/** + * This base interface for data loading. It can do transformation jobs as per the implementation. + * + */ +public interface DataLoadProcessorStep { + + /** + * The output meta for this step. The data returns from this step is as per this meta. + * @return + */ + DataField[] getOutput(); + + /** + * Intialization process for this step. + * @param configuration + * @param child + * @throws CarbonDataLoadingException + */ + void intialize(CarbonDataLoadConfiguration configuration, DataLoadProcessorStep child) throws + CarbonDataLoadingException; + + /** + * Tranform the data as per the implemetation. + * @return Iterator of data + * @throws CarbonDataLoadingException + */ + Iterator execute() throws CarbonDataLoadingException; + + /** + * Any closing of resources after step execution can be done here. + */ + void finish(); --- End diff -- It should be called in both failure and success cases. So i will rename it to `close` > 2. Add interfaces for data loading. > --- > > Key: CARBONDATA-297 > URL: https://issues.apache.org/jira/browse/CARBONDATA-297 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 0.2.0-incubating > > > Add the major interface classes for data loading so that the following jiras > can use this interfaces to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.
[ https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570455#comment-15570455 ] ASF GitHub Bot commented on CARBONDATA-297: --- Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83130123 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/DataLoadProcessorStep.java --- @@ -0,0 +1,40 @@ +package org.apache.carbondata.processing.newflow; + +import java.util.Iterator; + +import org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException; + +/** + * This base interface for data loading. It can do transformation jobs as per the implementation. + * + */ +public interface DataLoadProcessorStep { --- End diff -- ok > 2. Add interfaces for data loading. > --- > > Key: CARBONDATA-297 > URL: https://issues.apache.org/jira/browse/CARBONDATA-297 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 0.2.0-incubating > > > Add the major interface classes for data loading so that the following jiras > can use this interfaces to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.
[ https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570434#comment-15570434 ] ASF GitHub Bot commented on CARBONDATA-297: --- Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83129418 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/CarbonDataLoadConfiguration.java --- @@ -0,0 +1,185 @@ +package org.apache.carbondata.processing.newflow; + +import java.util.Iterator; + +import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier; + +public class CarbonDataLoadConfiguration { --- End diff -- Ok. I will keep the configuration but only keep important and global configurations I will keep in it, remaining configurations we can move to `Map` and keep it inside configuartion it self. what do you say? > 2. Add interfaces for data loading. > --- > > Key: CARBONDATA-297 > URL: https://issues.apache.org/jira/browse/CARBONDATA-297 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 0.2.0-incubating > > > Add the major interface classes for data loading so that the following jiras > can use this interfaces to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.
[ https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570425#comment-15570425 ] ASF GitHub Bot commented on CARBONDATA-297: --- Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83129008 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/CarbonDataLoadConfiguration.java --- @@ -0,0 +1,185 @@ +package org.apache.carbondata.processing.newflow; --- End diff -- ok > 2. Add interfaces for data loading. > --- > > Key: CARBONDATA-297 > URL: https://issues.apache.org/jira/browse/CARBONDATA-297 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 0.2.0-incubating > > > Add the major interface classes for data loading so that the following jiras > can use this interfaces to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-285) Use path parameter in Spark datasource API
[ https://issues.apache.org/jira/browse/CARBONDATA-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570315#comment-15570315 ] ASF GitHub Bot commented on CARBONDATA-285: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/212#discussion_r83123943 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceRelation.scala --- @@ -55,18 +55,11 @@ class CarbonSource extends RelationProvider override def createRelation( sqlContext: SQLContext, parameters: Map[String, String]): BaseRelation = { -// if path is provided we can directly create Hadoop relation. \ -// Otherwise create datasource relation -parameters.get("path") match { - case Some(path) => CarbonDatasourceHadoopRelation(sqlContext, Array(path), parameters, None) - case _ => -val options = new CarbonOption(parameters) -val tableIdentifier = options.tableIdentifier.split("""\.""").toSeq -val identifier = tableIdentifier match { - case Seq(name) => TableIdentifier(name, None) - case Seq(db, name) => TableIdentifier(name, Some(db)) -} -CarbonDatasourceRelation(identifier, None)(sqlContext) +val options = new CarbonOption(parameters) +if (sqlContext.isInstanceOf[CarbonContext]) { --- End diff -- There is no `load` method in dataframe, only in context class. > Use path parameter in Spark datasource API > -- > > Key: CARBONDATA-285 > URL: https://issues.apache.org/jira/browse/CARBONDATA-285 > Project: CarbonData > Issue Type: Improvement > Components: spark-integration >Affects Versions: 0.1.0-incubating >Reporter: Jacky Li > Fix For: 0.2.0-incubating > > > Currently, when using carbon with spark datasource API, it need to give > database name and table name as parameter, it is not the normal way of > datasource API usage. In this PR, database name and table name is not > required to give, user need to specify the `path` parameter (indicating the > path to table folder) only when using datasource API -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.
[ https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569219#comment-15569219 ] ASF GitHub Bot commented on CARBONDATA-297: --- Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83049479 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/DataLoadProcessorStep.java --- @@ -0,0 +1,40 @@ +package org.apache.carbondata.processing.newflow; + +import java.util.Iterator; + +import org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException; + +/** + * This base interface for data loading. It can do transformation jobs as per the implementation. + * + */ +public interface DataLoadProcessorStep { + + /** + * The output meta for this step. The data returns from this step is as per this meta. + * @return + */ + DataField[] getOutput(); + + /** + * Intialization process for this step. + * @param configuration + * @param child + * @throws CarbonDataLoadingException + */ + void intialize(CarbonDataLoadConfiguration configuration, DataLoadProcessorStep child) throws + CarbonDataLoadingException; + + /** + * Tranform the data as per the implemetation. + * @return Iterator of data + * @throws CarbonDataLoadingException + */ + Iterator execute() throws CarbonDataLoadingException; --- End diff -- For suppose if we are loading 50GB of csv files and each HDFS block size is 256MB then total number of partitions are 200. If we allow one task per partition then it would be 200 tasks. In carbondata one btree is created for each task. So if we allow all 200 tasks then it would be massively 200 btrees and it is not effective both in performance and memory wise. That is the reason why we pool multiple blocks per task in the current kettle implementation. And these blocks are processed parallely. We can take the same way and use iterator for each thread and returns array of iterator. What do you mean by datanode-scope sorting? how to synchronize between multiple tasks? > 2. Add interfaces for data loading. > --- > > Key: CARBONDATA-297 > URL: https://issues.apache.org/jira/browse/CARBONDATA-297 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 0.2.0-incubating > > > Add the major interface classes for data loading so that the following jiras > can use this interfaces to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-276) Add trim option
[ https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569112#comment-15569112 ] ASF GitHub Bot commented on CARBONDATA-276: --- Github user lion-x commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r83039531 --- Diff: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestDataLoadWithTrimOption.scala --- @@ -0,0 +1,78 @@ +package org.apache.carbondata.spark.testsuite.dataload + +import java.io.File + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.spark.sql.common.util.CarbonHiveContext._ +import org.apache.spark.sql.common.util.QueryTest +import org.scalatest.BeforeAndAfterAll +import org.apache.spark.sql.Row + +/** + * Created by x00381807 on 2016/9/26. --- End diff -- Oh, my fault > Add trim option > --- > > Key: CARBONDATA-276 > URL: https://issues.apache.org/jira/browse/CARBONDATA-276 > Project: CarbonData > Issue Type: Bug >Reporter: Lionx >Assignee: Lionx >Priority: Minor > > Fix a bug and add trim option. > Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query > result is null. This is because the dictionary ignore the LeadingWhiteSpace > and TrailingWhiteSpace and the csvInput dose not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.
[ https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569065#comment-15569065 ] ASF GitHub Bot commented on CARBONDATA-297: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83033746 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/iterators/CarbonArrayWritable.java --- @@ -0,0 +1,51 @@ +package org.apache.carbondata.processing.newflow.iterators; + +import java.io.DataInput; +import java.io.DataOutput; +import java.io.IOException; +import java.nio.charset.Charset; +import java.util.Arrays; + +import org.apache.hadoop.io.Writable; + +/** + * It is hadoop's writable value wrapper. + */ +public class CarbonArrayWritable implements Writable { --- End diff -- Why this is carbon-processing but not carbon-hadoop module? All hadoop related class should be in carbon-hadoop module > 2. Add interfaces for data loading. > --- > > Key: CARBONDATA-297 > URL: https://issues.apache.org/jira/browse/CARBONDATA-297 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 0.2.0-incubating > > > Add the major interface classes for data loading so that the following jiras > can use this interfaces to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-281) improve the test cases in LCM module.
[ https://issues.apache.org/jira/browse/CARBONDATA-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569082#comment-15569082 ] ASF GitHub Bot commented on CARBONDATA-281: --- Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/205 > improve the test cases in LCM module. > - > > Key: CARBONDATA-281 > URL: https://issues.apache.org/jira/browse/CARBONDATA-281 > Project: CarbonData > Issue Type: Improvement > Components: spark-integration >Affects Versions: 0.1.0-incubating >Reporter: ravikiran >Assignee: ravikiran >Priority: Minor > > improving the test cases in the lcm. > adding the test cases for the compaction with boundary test cases. > added test cases to verify the minor compaction threshold check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.
[ https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569067#comment-15569067 ] ASF GitHub Bot commented on CARBONDATA-297: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83032703 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/DataLoadProcessorStep.java --- @@ -0,0 +1,40 @@ +package org.apache.carbondata.processing.newflow; + +import java.util.Iterator; + +import org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException; + +/** + * This base interface for data loading. It can do transformation jobs as per the implementation. + * + */ +public interface DataLoadProcessorStep { + + /** + * The output meta for this step. The data returns from this step is as per this meta. + * @return + */ + DataField[] getOutput(); + + /** + * Intialization process for this step. + * @param configuration + * @param child + * @throws CarbonDataLoadingException + */ + void intialize(CarbonDataLoadConfiguration configuration, DataLoadProcessorStep child) throws + CarbonDataLoadingException; + + /** + * Tranform the data as per the implemetation. + * @return Iterator of data + * @throws CarbonDataLoadingException + */ + Iterator execute() throws CarbonDataLoadingException; + + /** + * Any closing of resources after step execution can be done here. + */ + void finish(); --- End diff -- This is called when the step successfully finished. But what about failure case, should there be a `void close();` interface for failure case? > 2. Add interfaces for data loading. > --- > > Key: CARBONDATA-297 > URL: https://issues.apache.org/jira/browse/CARBONDATA-297 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 0.2.0-incubating > > > Add the major interface classes for data loading so that the following jiras > can use this interfaces to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.
[ https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569060#comment-15569060 ] ASF GitHub Bot commented on CARBONDATA-297: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83031958 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/DataLoadProcessorStep.java --- @@ -0,0 +1,40 @@ +package org.apache.carbondata.processing.newflow; + +import java.util.Iterator; + +import org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException; + +/** + * This base interface for data loading. It can do transformation jobs as per the implementation. + * + */ +public interface DataLoadProcessorStep { --- End diff -- I think each implementation of this interface have similar logic in the execute function, can we create a abstract class to implement the common logic? The common logic like: ``` Iterator execute() throws CarbonDataLoadingException { Iterator childIter = child.execute(); return new Iterator { public boolean hasNext() { return childIter.hasNext(); } public Object[] next() { // processInput is the abstract func in this class return processInput(childItor.next()); } } } ``` > 2. Add interfaces for data loading. > --- > > Key: CARBONDATA-297 > URL: https://issues.apache.org/jira/browse/CARBONDATA-297 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 0.2.0-incubating > > > Add the major interface classes for data loading so that the following jiras > can use this interfaces to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.
[ https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569064#comment-15569064 ] ASF GitHub Bot commented on CARBONDATA-297: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83028336 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/CarbonDataLoadConfiguration.java --- @@ -0,0 +1,185 @@ +package org.apache.carbondata.processing.newflow; --- End diff -- add license header > 2. Add interfaces for data loading. > --- > > Key: CARBONDATA-297 > URL: https://issues.apache.org/jira/browse/CARBONDATA-297 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 0.2.0-incubating > > > Add the major interface classes for data loading so that the following jiras > can use this interfaces to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.
[ https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569063#comment-15569063 ] ASF GitHub Bot commented on CARBONDATA-297: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83030022 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/CarbonDataLoadConfiguration.java --- @@ -0,0 +1,185 @@ +package org.apache.carbondata.processing.newflow; + +import java.util.Iterator; + +import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier; + +public class CarbonDataLoadConfiguration { --- End diff -- It seems this configuration is quite complex, I think it is because it contains configuration for all steps. Can we just have a simple `Map` as the configuration and let the `Step` decide what to keep in it? > 2. Add interfaces for data loading. > --- > > Key: CARBONDATA-297 > URL: https://issues.apache.org/jira/browse/CARBONDATA-297 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 0.2.0-incubating > > > Add the major interface classes for data loading so that the following jiras > can use this interfaces to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.
[ https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569061#comment-15569061 ] ASF GitHub Bot commented on CARBONDATA-297: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83032371 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/DataLoadProcessorStep.java --- @@ -0,0 +1,40 @@ +package org.apache.carbondata.processing.newflow; + +import java.util.Iterator; + +import org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException; + +/** + * This base interface for data loading. It can do transformation jobs as per the implementation. + * + */ +public interface DataLoadProcessorStep { + + /** + * The output meta for this step. The data returns from this step is as per this meta. + * @return + */ + DataField[] getOutput(); + + /** + * Intialization process for this step. + * @param configuration + * @param child + * @throws CarbonDataLoadingException + */ + void intialize(CarbonDataLoadConfiguration configuration, DataLoadProcessorStep child) throws --- End diff -- If there is a abstract class, it can have the child as its member variable, then this `initialize` function takes no parameter as input > 2. Add interfaces for data loading. > --- > > Key: CARBONDATA-297 > URL: https://issues.apache.org/jira/browse/CARBONDATA-297 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 0.2.0-incubating > > > Add the major interface classes for data loading so that the following jiras > can use this interfaces to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.
[ https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569066#comment-15569066 ] ASF GitHub Bot commented on CARBONDATA-297: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83033298 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/iterators/RecordReaderIterator.java --- @@ -0,0 +1,40 @@ +package org.apache.carbondata.processing.newflow.iterators; + +import java.io.IOException; + +import org.apache.carbondata.common.CarbonIterator; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; + +import org.apache.hadoop.mapred.RecordReader; + +/** + * This iterator iterates RecordReader. + */ +public class RecordReaderIterator extends CarbonIterator { --- End diff -- why this is carbon-processing but not carbon-hadoop module? > 2. Add interfaces for data loading. > --- > > Key: CARBONDATA-297 > URL: https://issues.apache.org/jira/browse/CARBONDATA-297 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 0.2.0-incubating > > > Add the major interface classes for data loading so that the following jiras > can use this interfaces to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.
[ https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569062#comment-15569062 ] ASF GitHub Bot commented on CARBONDATA-297: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83033524 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/iterators/RecordReaderIterator.java --- @@ -0,0 +1,40 @@ +package org.apache.carbondata.processing.newflow.iterators; + +import java.io.IOException; + +import org.apache.carbondata.common.CarbonIterator; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; + +import org.apache.hadoop.mapred.RecordReader; + +/** + * This iterator iterates RecordReader. + */ +public class RecordReaderIterator extends CarbonIterator { --- End diff -- what is it used for? > 2. Add interfaces for data loading. > --- > > Key: CARBONDATA-297 > URL: https://issues.apache.org/jira/browse/CARBONDATA-297 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 0.2.0-incubating > > > Add the major interface classes for data loading so that the following jiras > can use this interfaces to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-306) block size info should be show in Desc Formatted and executor log
[ https://issues.apache.org/jira/browse/CARBONDATA-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569003#comment-15569003 ] ASF GitHub Bot commented on CARBONDATA-306: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/230#discussion_r83028164 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java --- @@ -252,6 +252,9 @@ private static long getMaxOfBlockAndFileSize(long blockSize, long fileSize) { if (remainder > 0) { maxSize = maxSize + HDFS_CHECKSUM_LENGTH - remainder; } +LOGGER.info("The configured block size is " + blockSize + " byte, " + --- End diff -- Suggest to have the `blockSize` convert to a proper number before logging it, otherwise it is hard to check this value by human > block size info should be show in Desc Formatted and executor log > - > > Key: CARBONDATA-306 > URL: https://issues.apache.org/jira/browse/CARBONDATA-306 > Project: CarbonData > Issue Type: Improvement >Reporter: Jay >Priority: Minor > > when run desc formatted command, the table block size should be show, as well > as in executor log when run load command -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-276) Add trim option
[ https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568996#comment-15568996 ] ASF GitHub Bot commented on CARBONDATA-276: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r82523962 --- Diff: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestDataLoadWithTrimOption.scala --- @@ -0,0 +1,78 @@ +package org.apache.carbondata.spark.testsuite.dataload + +import java.io.File + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.spark.sql.common.util.CarbonHiveContext._ +import org.apache.spark.sql.common.util.QueryTest +import org.scalatest.BeforeAndAfterAll +import org.apache.spark.sql.Row + +/** + * Created by x00381807 on 2016/9/26. --- End diff -- please remove > Add trim option > --- > > Key: CARBONDATA-276 > URL: https://issues.apache.org/jira/browse/CARBONDATA-276 > Project: CarbonData > Issue Type: Bug >Reporter: Lionx >Assignee: Lionx >Priority: Minor > > Fix a bug and add trim option. > Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query > result is null. This is because the dictionary ignore the LeadingWhiteSpace > and TrailingWhiteSpace and the csvInput dose not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-306) block size info should be show in Desc Formatted and executor log
[ https://issues.apache.org/jira/browse/CARBONDATA-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568998#comment-15568998 ] ASF GitHub Bot commented on CARBONDATA-306: --- Github user Zhangshunyu commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/230#discussion_r83027603 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java --- @@ -252,6 +252,9 @@ private static long getMaxOfBlockAndFileSize(long blockSize, long fileSize) { if (remainder > 0) { maxSize = maxSize + HDFS_CHECKSUM_LENGTH - remainder; } +LOGGER.info("The configured block size is " + blockSize + " byte, " + --- End diff -- @jackylk set in mb,but here already converted to byte. > block size info should be show in Desc Formatted and executor log > - > > Key: CARBONDATA-306 > URL: https://issues.apache.org/jira/browse/CARBONDATA-306 > Project: CarbonData > Issue Type: Improvement >Reporter: Jay >Priority: Minor > > when run desc formatted command, the table block size should be show, as well > as in executor log when run load command -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-280) when table properties is repeated it only set the last one
[ https://issues.apache.org/jira/browse/CARBONDATA-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568990#comment-15568990 ] ASF GitHub Bot commented on CARBONDATA-280: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/204#discussion_r83027008 --- Diff: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/deleteTable/TestDeleteTableNewDDL.scala --- @@ -97,7 +97,7 @@ class TestDeleteTableNewDDL extends QueryTest with BeforeAndAfterAll { "CREATE table CaseInsensitiveTable (ID int, date String, country String, name " + "String," + "phonetype String, serialname String, salary int) stored by 'org.apache.carbondata.format'" + - "TBLPROPERTIES('DICTIONARY_INCLUDE'='ID', 'DICTIONARY_INCLUDE'='salary')" + "TBLPROPERTIES('DICTIONARY_INCLUDE'='ID,salary')" --- End diff -- add space after `,` > when table properties is repeated it only set the last one > --- > > Key: CARBONDATA-280 > URL: https://issues.apache.org/jira/browse/CARBONDATA-280 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 0.1.1-incubating >Reporter: zhangshunyu >Assignee: zhangshunyu >Priority: Minor > Fix For: 0.2.0-incubating > > > when table properties is repeated it only set the last one: > For example, > CREATE TABLE IF NOT EXISTS carbontable > (ID Int, date Timestamp, country String, > name String, phonetype String, serialname String, salary Int) > STORED BY 'carbondata' > TBLPROPERTIES('DICTIONARY_EXCLUDE'='country','DICTIONARY_INCLUDE'='ID', > 'DICTIONARY_EXCLUDE'='phonetype', 'DICTIONARY_INCLUDE'='salary') > only salary is set to DICTIONARY_INCLUDE and only phonetype is set to > DICTIONARY_EXCLUDE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-283) Improve the test cases for concurrent scenarios
[ https://issues.apache.org/jira/browse/CARBONDATA-283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568981#comment-15568981 ] ASF GitHub Bot commented on CARBONDATA-283: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/207#discussion_r83025827 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonTableStatusUtil.java --- @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.processing.util; + +import java.text.SimpleDateFormat; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Date; +import java.util.List; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.load.LoadMetadataDetails; + +/** + * This class contains all table status file utilities + */ +public final class CarbonTableStatusUtil { + private static final LogService LOGGER = + LogServiceFactory.getLogService(CarbonTableStatusUtil.class.getName()); + + private CarbonTableStatusUtil() { + + } + + /** + * updates table status details using latest metadata + * + * @param oldMetadata + * @param newMetadata + * @return + */ + + public static List updateLatestTableStatusDetails( + LoadMetadataDetails[] oldMetadata, LoadMetadataDetails[] newMetadata) { + +List newListMetadata = +new ArrayList(Arrays.asList(newMetadata)); +for (LoadMetadataDetails oldSegment : oldMetadata) { + if (CarbonCommonConstants.MARKED_FOR_DELETE.equalsIgnoreCase(oldSegment.getLoadStatus())) { + updateSegmentMetadataDetails(newListMetadata.get(newListMetadata.indexOf(oldSegment))); + } +} +return newListMetadata; + } + + /** + * returns current time + * + * @return + */ + private static String readCurrentTime() { +SimpleDateFormat sdf = new SimpleDateFormat(CarbonCommonConstants.CARBON_TIMESTAMP); +String date = null; + +date = sdf.format(new Date()); + +return date; + } + + /** + * updates segment status and modificaton time details + * + * @param loadMetadata + */ + public static void updateSegmentMetadataDetails(LoadMetadataDetails loadMetadata) { --- End diff -- Can you improve on the function name to depict the behavior of this function? > Improve the test cases for concurrent scenarios > --- > > Key: CARBONDATA-283 > URL: https://issues.apache.org/jira/browse/CARBONDATA-283 > Project: CarbonData > Issue Type: Bug >Reporter: Manohar Vanam >Assignee: Manohar Vanam >Priority: Minor > > Improve test cases for data retention concurrent scenarios -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-283) Improve the test cases for concurrent scenarios
[ https://issues.apache.org/jira/browse/CARBONDATA-283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568980#comment-15568980 ] ASF GitHub Bot commented on CARBONDATA-283: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/207#discussion_r83026458 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonTableStatusUtil.java --- @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.processing.util; + +import java.text.SimpleDateFormat; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Date; +import java.util.List; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.load.LoadMetadataDetails; + +/** + * This class contains all table status file utilities + */ +public final class CarbonTableStatusUtil { + private static final LogService LOGGER = + LogServiceFactory.getLogService(CarbonTableStatusUtil.class.getName()); + + private CarbonTableStatusUtil() { + + } + + /** + * updates table status details using latest metadata + * + * @param oldMetadata + * @param newMetadata + * @return + */ + + public static List updateLatestTableStatusDetails( --- End diff -- I think these should not be utility functions, but should be member function of LoadMetadataDetails > Improve the test cases for concurrent scenarios > --- > > Key: CARBONDATA-283 > URL: https://issues.apache.org/jira/browse/CARBONDATA-283 > Project: CarbonData > Issue Type: Bug >Reporter: Manohar Vanam >Assignee: Manohar Vanam >Priority: Minor > > Improve test cases for data retention concurrent scenarios -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-292) add COLUMNDICT operation info in DML operation guide
[ https://issues.apache.org/jira/browse/CARBONDATA-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568937#comment-15568937 ] ASF GitHub Bot commented on CARBONDATA-292: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/223#discussion_r83022245 --- Diff: docs/DML-Operations-on-Carbon.md --- @@ -104,8 +109,10 @@ Following are the options that can be used in load data: 'MULTILINE'='true', 'ESCAPECHAR'='\', 'COMPLEX_DELIMITER_LEVEL_1'='$', 'COMPLEX_DELIMITER_LEVEL_2'=':', - 'ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary' + 'ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary', + 'COLUMNDICT'='empno:/dictFilePath/empnoDict.csv, empname:/dictFilePath/empnameDict.csv' --- End diff -- do not give this option since it can not be used together with `ALL_DICTIONARY_PATH` > add COLUMNDICT operation info in DML operation guide > > > Key: CARBONDATA-292 > URL: https://issues.apache.org/jira/browse/CARBONDATA-292 > Project: CarbonData > Issue Type: Improvement >Reporter: Jay >Priority: Minor > > there is no COLUMNDICT operation guide in DML-Operations-on-Carbon.md, so > need to add. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-306) block size info should be show in Desc Formatted and executor log
[ https://issues.apache.org/jira/browse/CARBONDATA-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568922#comment-15568922 ] ASF GitHub Bot commented on CARBONDATA-306: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/230#discussion_r83021340 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java --- @@ -252,6 +252,9 @@ private static long getMaxOfBlockAndFileSize(long blockSize, long fileSize) { if (remainder > 0) { maxSize = maxSize + HDFS_CHECKSUM_LENGTH - remainder; } +LOGGER.info("The configured block size is " + blockSize + " byte, " + --- End diff -- Is `blockSize` in bytes or MB? > block size info should be show in Desc Formatted and executor log > - > > Key: CARBONDATA-306 > URL: https://issues.apache.org/jira/browse/CARBONDATA-306 > Project: CarbonData > Issue Type: Improvement >Reporter: Jay >Priority: Minor > > when run desc formatted command, the table block size should be show, as well > as in executor log when run load command -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.
[ https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1556#comment-1556 ] ASF GitHub Bot commented on CARBONDATA-297: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83018479 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/DataLoadProcessorStep.java --- @@ -0,0 +1,40 @@ +package org.apache.carbondata.processing.newflow; + +import java.util.Iterator; + +import org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException; + +/** + * This base interface for data loading. It can do transformation jobs as per the implementation. + * + */ +public interface DataLoadProcessorStep { + + /** + * The output meta for this step. The data returns from this step is as per this meta. + * @return + */ + DataField[] getOutput(); + + /** + * Intialization process for this step. + * @param configuration + * @param child + * @throws CarbonDataLoadingException + */ + void intialize(CarbonDataLoadConfiguration configuration, DataLoadProcessorStep child) throws + CarbonDataLoadingException; + + /** + * Tranform the data as per the implemetation. + * @return Iterator of data + * @throws CarbonDataLoadingException + */ + Iterator execute() throws CarbonDataLoadingException; --- End diff -- I think `execute()` is called for every parallel unit of the input, right? For example, when using spark to load from dataframe, `execute()` is called for every spark partition (execute one task for one partition). When loading from CSV HDFS file, `execute()` is called for every HDFS block. So I do not think returning array of iterator is required. The loading process of carbon in every executor, some of the step can be parallelized, but sort step need to be synchronized (potential bottle net), since we need datanode-scope sorting. Am I correct? > 2. Add interfaces for data loading. > --- > > Key: CARBONDATA-297 > URL: https://issues.apache.org/jira/browse/CARBONDATA-297 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 0.2.0-incubating > > > Add the major interface classes for data loading so that the following jiras > can use this interfaces to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-288) In hdfs bad record logger is failing in writting the bad records
[ https://issues.apache.org/jira/browse/CARBONDATA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568858#comment-15568858 ] ASF GitHub Bot commented on CARBONDATA-288: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/218#discussion_r83014256 --- Diff: integration/spark/src/main/java/org/apache/carbondata/spark/load/CarbonLoadModel.java --- @@ -117,9 +117,9 @@ private String badRecordsLoggerEnable; /** - * defines the option to specify the bad record log redirect to raw csv + * defines the option to specify the bad record logger action */ - private String badRecordsLoggerRedirect; + private String badRecordsLoggerAction; --- End diff -- This action is not for Logger, right? Perhaps `badRecordsAction` is a better name? And it should be an enum instead of String > In hdfs bad record logger is failing in writting the bad records > > > Key: CARBONDATA-288 > URL: https://issues.apache.org/jira/browse/CARBONDATA-288 > Project: CarbonData > Issue Type: Bug >Affects Versions: 0.2.0-incubating >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan >Priority: Minor > Fix For: 0.2.0-incubating > > > For HDFS file system > CarbonFile logFile = FileFactory.getCarbonFile(filePath, FileType.HDFS); > if filePath does not exits then > Calling CarbonFile.getPath() throws NullPointerException. > Solution: > If file does not exist then before accessing the file must be created first -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-288) In hdfs bad record logger is failing in writting the bad records
[ https://issues.apache.org/jira/browse/CARBONDATA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568857#comment-15568857 ] ASF GitHub Bot commented on CARBONDATA-288: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/218#discussion_r83014871 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/BadRecordslogger.java --- @@ -69,9 +68,13 @@ private BufferedWriter bufferedCSVWriter; private DataOutputStream outCSVStream; /** - * + * bad record log file path + */ + private String logFilePath; + /** + * csv file path */ - private CarbonFile logFile; + private String csvFilePath; --- End diff -- What is this csv file? What is the difference from logFilePath? > In hdfs bad record logger is failing in writting the bad records > > > Key: CARBONDATA-288 > URL: https://issues.apache.org/jira/browse/CARBONDATA-288 > Project: CarbonData > Issue Type: Bug >Affects Versions: 0.2.0-incubating >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan >Priority: Minor > Fix For: 0.2.0-incubating > > > For HDFS file system > CarbonFile logFile = FileFactory.getCarbonFile(filePath, FileType.HDFS); > if filePath does not exits then > Calling CarbonFile.getPath() throws NullPointerException. > Solution: > If file does not exist then before accessing the file must be created first -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-288) In hdfs bad record logger is failing in writting the bad records
[ https://issues.apache.org/jira/browse/CARBONDATA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568856#comment-15568856 ] ASF GitHub Bot commented on CARBONDATA-288: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/218#discussion_r83015590 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java --- @@ -458,9 +462,11 @@ public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws K break; case REDIRECT: badRecordsLogRedirect = true; + badRecordConvertNullDisable= true; --- End diff -- add space before `=` > In hdfs bad record logger is failing in writting the bad records > > > Key: CARBONDATA-288 > URL: https://issues.apache.org/jira/browse/CARBONDATA-288 > Project: CarbonData > Issue Type: Bug >Affects Versions: 0.2.0-incubating >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan >Priority: Minor > Fix For: 0.2.0-incubating > > > For HDFS file system > CarbonFile logFile = FileFactory.getCarbonFile(filePath, FileType.HDFS); > if filePath does not exits then > Calling CarbonFile.getPath() throws NullPointerException. > Solution: > If file does not exist then before accessing the file must be created first -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-304) Load data failure when set table_blocksize=2048
[ https://issues.apache.org/jira/browse/CARBONDATA-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568825#comment-15568825 ] ASF GitHub Bot commented on CARBONDATA-304: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/227#discussion_r83012391 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java --- @@ -197,8 +197,9 @@ public AbstractFactDataWriter(String storeLocation, int measureCount, int mdKeyL blockIndexInfoList = new ArrayList<>(); // get max file size; CarbonProperties propInstance = CarbonProperties.getInstance(); -this.fileSizeInBytes = blocksize * CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR -* CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR * 1L; +// if blocksize=2048, then 2048*1024*1024 will beyond the range of Int +this.fileSizeInBytes = 1L * blocksize * CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR --- End diff -- instead of multiple by `1L`, you can just convert `blocksize` to `(long)blocksize` instead > Load data failure when set table_blocksize=2048 > --- > > Key: CARBONDATA-304 > URL: https://issues.apache.org/jira/browse/CARBONDATA-304 > Project: CarbonData > Issue Type: Bug >Reporter: Gin-zhj >Assignee: Gin-zhj > > First ,create a table with table_blocksize=2048 > CREATE TABLE IF NOT EXISTS t3 (ID Int, date Timestamp, country String, name > String, phonetype String, serialname String, salary Int) STORED BY > 'carbondata' TBLPROPERTIES('table_blocksize'='2048'); > Then load data, failure and catch exception: > org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException: > Problem while copying file from local store to carbon store -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-239) Failure of one compaction in queue should not affect the others.
[ https://issues.apache.org/jira/browse/CARBONDATA-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568425#comment-15568425 ] ASF GitHub Bot commented on CARBONDATA-239: --- Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/224#discussion_r82983904 --- Diff: core/src/main/java/org/apache/carbondata/scan/scanner/impl/FilterScanner.java --- @@ -78,10 +80,11 @@ public FilterScanner(BlockExecutionInfo blockExecutionInfo) { * @throws QueryExecutionException * @throws FilterUnsupportedException */ - @Override public AbstractScannedResult scanBlocklet(BlocksChunkHolder blocksChunkHolder) + @Override public AbstractScannedResult scanBlocklet(BlocksChunkHolder blocksChunkHolder, + QueryStatisticsModel queryStatisticsModel) throws QueryExecutionException { try { - fillScannedResult(blocksChunkHolder); + fillScannedResult(blocksChunkHolder, queryStatisticsModel); --- End diff -- Pass the model in constructor so that no need to change in all API > Failure of one compaction in queue should not affect the others. > > > Key: CARBONDATA-239 > URL: https://issues.apache.org/jira/browse/CARBONDATA-239 > Project: CarbonData > Issue Type: Bug >Reporter: ravikiran >Assignee: ravikiran > Fix For: 0.2.0-incubating > > > Failure of one compaction in queue should not affect the others. > If a compaction is triggered by the user on table1 , and other requests will > go to queue. and if the compaction is failed for table1 then the requests in > queue should continue and at the end the beeline will show the failure > message to the user. > if any compaction gets failed for a table which is other than the user > requested table then the error in the beeline should not appear. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-276) Add trim option
[ https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568336#comment-15568336 ] ASF GitHub Bot commented on CARBONDATA-276: --- Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r82977743 --- Diff: processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java --- @@ -102,8 +102,8 @@ public void initialize() throws IOException { parserSettings.setMaxColumns( getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), csvParserVo.getMaxColumns())); parserSettings.setNullValue(""); -parserSettings.setIgnoreLeadingWhitespaces(false); -parserSettings.setIgnoreTrailingWhitespaces(false); +parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim()); --- End diff -- So better to set this while creating the table as column properties metadata > Add trim option > --- > > Key: CARBONDATA-276 > URL: https://issues.apache.org/jira/browse/CARBONDATA-276 > Project: CarbonData > Issue Type: Bug >Reporter: Lionx >Assignee: Lionx >Priority: Minor > > Fix a bug and add trim option. > Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query > result is null. This is because the dictionary ignore the LeadingWhiteSpace > and TrailingWhiteSpace and the csvInput dose not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-276) Add trim option
[ https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568332#comment-15568332 ] ASF GitHub Bot commented on CARBONDATA-276: --- Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r82977592 --- Diff: processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java --- @@ -102,8 +102,8 @@ public void initialize() throws IOException { parserSettings.setMaxColumns( getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), csvParserVo.getMaxColumns())); parserSettings.setNullValue(""); -parserSettings.setIgnoreLeadingWhitespaces(false); -parserSettings.setIgnoreTrailingWhitespaces(false); +parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim()); --- End diff -- pros of this approach will be suppose in one load user loaded with dirty data and suddenly he realizes no i need to trim then in the next load he will enable the option and load the data, this will increase the dictionary space also, also in query dictionary lookup overhead will increase. > Add trim option > --- > > Key: CARBONDATA-276 > URL: https://issues.apache.org/jira/browse/CARBONDATA-276 > Project: CarbonData > Issue Type: Bug >Reporter: Lionx >Assignee: Lionx >Priority: Minor > > Fix a bug and add trim option. > Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query > result is null. This is because the dictionary ignore the LeadingWhiteSpace > and TrailingWhiteSpace and the csvInput dose not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-276) Add trim option
[ https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568196#comment-15568196 ] ASF GitHub Bot commented on CARBONDATA-276: --- Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r82968804 --- Diff: processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java --- @@ -102,8 +102,8 @@ public void initialize() throws IOException { parserSettings.setMaxColumns( getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), csvParserVo.getMaxColumns())); parserSettings.setNullValue(""); -parserSettings.setIgnoreLeadingWhitespaces(false); -parserSettings.setIgnoreTrailingWhitespaces(false); +parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim()); --- End diff -- Also one more point it will be better to set this property in column level while creating the table itself as its column properties , this will avoid user to provide this option every time while data loading > Add trim option > --- > > Key: CARBONDATA-276 > URL: https://issues.apache.org/jira/browse/CARBONDATA-276 > Project: CarbonData > Issue Type: Bug >Reporter: Lionx >Assignee: Lionx >Priority: Minor > > Fix a bug and add trim option. > Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query > result is null. This is because the dictionary ignore the LeadingWhiteSpace > and TrailingWhiteSpace and the csvInput dose not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-276) Add trim option
[ https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568190#comment-15568190 ] ASF GitHub Bot commented on CARBONDATA-276: --- Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r82968468 --- Diff: processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java --- @@ -102,8 +102,8 @@ public void initialize() throws IOException { parserSettings.setMaxColumns( getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), csvParserVo.getMaxColumns())); parserSettings.setNullValue(""); -parserSettings.setIgnoreLeadingWhitespaces(false); -parserSettings.setIgnoreTrailingWhitespaces(false); +parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim()); --- End diff -- Guys i think if while data loading we are reading from configuration inorder to trim it or not same we need to do while doing filter also, based on configuration value decide. ex: while loading i enabled trim property, so system will trim and load the data, now in filter query also if user provides while space then it needs to be trimmed while creating filter model. This will provide more system consistentency. if user enable trim then we wont trim it while loading and also while querying. > Add trim option > --- > > Key: CARBONDATA-276 > URL: https://issues.apache.org/jira/browse/CARBONDATA-276 > Project: CarbonData > Issue Type: Bug >Reporter: Lionx >Assignee: Lionx >Priority: Minor > > Fix a bug and add trim option. > Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query > result is null. This is because the dictionary ignore the LeadingWhiteSpace > and TrailingWhiteSpace and the csvInput dose not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-276) Add trim option
[ https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568077#comment-15568077 ] ASF GitHub Bot commented on CARBONDATA-276: --- Github user lion-x commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r82960653 --- Diff: processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java --- @@ -102,8 +102,8 @@ public void initialize() throws IOException { parserSettings.setMaxColumns( getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), csvParserVo.getMaxColumns())); parserSettings.setNullValue(""); -parserSettings.setIgnoreLeadingWhitespaces(false); -parserSettings.setIgnoreTrailingWhitespaces(false); +parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim()); --- End diff -- hi sujith, I agree with Eason, when user query with some space filter, it should be considered as forbidden action. > Add trim option > --- > > Key: CARBONDATA-276 > URL: https://issues.apache.org/jira/browse/CARBONDATA-276 > Project: CarbonData > Issue Type: Bug >Reporter: Lionx >Assignee: Lionx >Priority: Minor > > Fix a bug and add trim option. > Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query > result is null. This is because the dictionary ignore the LeadingWhiteSpace > and TrailingWhiteSpace and the csvInput dose not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-239) Failure of one compaction in queue should not affect the others.
[ https://issues.apache.org/jira/browse/CARBONDATA-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568046#comment-15568046 ] ASF GitHub Bot commented on CARBONDATA-239: --- Github user Zhangshunyu commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/224#discussion_r82958230 --- Diff: core/src/main/java/org/apache/carbondata/scan/processor/AbstractDataBlockIterator.java --- @@ -127,11 +133,15 @@ protected boolean updateScanner() { } } - private AbstractScannedResult getNextScannedResult() throws QueryExecutionException { + private AbstractScannedResult getNextScannedResult(QueryStatisticsRecorder recorder, --- End diff -- @sujith71955 OK, i will use a statistics model, thanks! > Failure of one compaction in queue should not affect the others. > > > Key: CARBONDATA-239 > URL: https://issues.apache.org/jira/browse/CARBONDATA-239 > Project: CarbonData > Issue Type: Bug >Reporter: ravikiran >Assignee: ravikiran > Fix For: 0.2.0-incubating > > > Failure of one compaction in queue should not affect the others. > If a compaction is triggered by the user on table1 , and other requests will > go to queue. and if the compaction is failed for table1 then the requests in > queue should continue and at the end the beeline will show the failure > message to the user. > if any compaction gets failed for a table which is other than the user > requested table then the error in the beeline should not appear. -- This message was sent by Atlassian JIRA (v6.3.4#6332)