[jira] [Commented] (CARBONDATA-109) 500g Dataload Failure in a spark cluster
[ https://issues.apache.org/jira/browse/CARBONDATA-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393147#comment-15393147 ] ChenLiang commented on CARBONDATA-109: -- [~xiaoyesoso] before submitting this to JIRA as a real issue, it would be better if you could send this question to mailing list for adequate discussion(mailing list : dev@carbondata.incubator.apache.org ) > 500g Dataload Failure in a spark cluster > > > Key: CARBONDATA-109 > URL: https://issues.apache.org/jira/browse/CARBONDATA-109 > Project: CarbonData > Issue Type: Bug > Components: carbon-spark >Reporter: Shoujie Zhuo > > INFO 26-07 10:54:28,630 - starting clean up** > INFO 26-07 10:54:28,766 - clean up done** > AUDIT 26-07 10:54:28,767 - [holodesk01][hdfs][Thread-1]Data load is failed > for tpcds_carbon_500_part.store_sales > WARN 26-07 10:54:28,768 - Unable to write load metadata file > ERROR 26-07 10:54:28,769 - main > java.lang.Exception: Dataload failure > at > org.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:791) > at > org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1167) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:145) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:130) > at > org.carbondata.spark.rdd.CarbonDataFrameRDD.(CarbonDataFrameRDD.scala:23) > at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:131) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:311) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226) > at > org.apache.spark.sql.hive.cli.CarbonSQLCLIDriver$.main(CarbonSQLCLIDriver.scala:40) > at > org.apache.spark.sql.hive.cli.CarbonSQLCLIDriver.main(CarbonSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > AUDIT 26-07 10:54:28,772 - [holodesk01][hdfs][Thread-1]Dataload failure for > tpcds_carbon_500_part.store_sales. Please check the logs > INFO 26-07 10:54:28,775 - Table MetaData Unlocked Successfully after data > load > ERROR 26-07 10:54:28,776 - Failed in [LOAD DATA inpath > 'hdfs://holodesk01/user/carbon-spark-sql/tpcds/500/store_sales' INTO table > store_sales] > java.lang.Exception: Dataload failure > at > org.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:791) > at > org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1167) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:
[jira] [Commented] (CARBONDATA-60) wrong result when using union all
[ https://issues.apache.org/jira/browse/CARBONDATA-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393142#comment-15393142 ] ChenLiang commented on CARBONDATA-60: - Hi ray , this issue has been fixed at the latest master,please verify and close it if ok > wrong result when using union all > - > > Key: CARBONDATA-60 > URL: https://issues.apache.org/jira/browse/CARBONDATA-60 > Project: CarbonData > Issue Type: Bug >Affects Versions: Apache CarbonData 0.1.0-incubating >Reporter: ray >Assignee: Ravindra Pesala > > the issue can be reproduced by following code: > the expected result is 1 row, but actual result is 2 rows. > +---+---+ > | c1|_c1| > +---+---+ > |200| 1| > |279| 1| > +---+---+ > import cc.implicits._ > val df=sc.parallelize(1 to 1000).map(x => (x+"", (x+100)+"")).toDF("c1", > "c2") > import org.carbondata.spark._ > df.saveAsCarbonFile(Map("tableName" -> "carbon1")) > cc.sql(""" > select c1,count(*) from( > select c1 as c1,c2 as c2 from carbon1 > union all > select c2 as c1,c1 as c2 from carbon1 > )t > where c1='200' > group by c1 > """).show() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-109) 500g Dataload Failure in a spark cluster
Shoujie Zhuo created CARBONDATA-109: --- Summary: 500g Dataload Failure in a spark cluster Key: CARBONDATA-109 URL: https://issues.apache.org/jira/browse/CARBONDATA-109 Project: CarbonData Issue Type: Bug Components: carbon-spark Reporter: Shoujie Zhuo INFO 26-07 10:54:28,630 - starting clean up** INFO 26-07 10:54:28,766 - clean up done** AUDIT 26-07 10:54:28,767 - [holodesk01][hdfs][Thread-1]Data load is failed for tpcds_carbon_500_part.store_sales WARN 26-07 10:54:28,768 - Unable to write load metadata file ERROR 26-07 10:54:28,769 - main java.lang.Exception: Dataload failure at org.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:791) at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1167) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) at org.apache.spark.sql.DataFrame.(DataFrame.scala:145) at org.apache.spark.sql.DataFrame.(DataFrame.scala:130) at org.carbondata.spark.rdd.CarbonDataFrameRDD.(CarbonDataFrameRDD.scala:23) at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:131) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:311) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226) at org.apache.spark.sql.hive.cli.CarbonSQLCLIDriver$.main(CarbonSQLCLIDriver.scala:40) at org.apache.spark.sql.hive.cli.CarbonSQLCLIDriver.main(CarbonSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) AUDIT 26-07 10:54:28,772 - [holodesk01][hdfs][Thread-1]Dataload failure for tpcds_carbon_500_part.store_sales. Please check the logs INFO 26-07 10:54:28,775 - Table MetaData Unlocked Successfully after data load ERROR 26-07 10:54:28,776 - Failed in [LOAD DATA inpath 'hdfs://holodesk01/user/carbon-spark-sql/tpcds/500/store_sales' INTO table store_sales] java.lang.Exception: Dataload failure at org.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:791) at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1167) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) at org.apache.spark.sql.DataFrame.(DataFrame.scala:145) at org.apache.spark.sql.DataFrame.(DataFrame.scala:130) at o
[jira] [Commented] (CARBONDATA-108) Remove unnecessary Project for CarbonScan
[ https://issues.apache.org/jira/browse/CARBONDATA-108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393083#comment-15393083 ] ASF GitHub Bot commented on CARBONDATA-108: --- GitHub user jackylk opened a pull request: https://github.com/apache/incubator-carbondata/pull/55 [CARBONDATA-108] Remove project in strategy Modify CarbonStrategy, when project equals scan column, do scan without project Modify some test cases to drop table You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/incubator-carbondata project Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/55.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #55 commit bd9fe7c9b8bb8bd6030a0f137e34a6ba268c Author: jackylk Date: 2016-07-26T02:28:18Z remove project in strategy > Remove unnecessary Project for CarbonScan > - > > Key: CARBONDATA-108 > URL: https://issues.apache.org/jira/browse/CARBONDATA-108 > Project: CarbonData > Issue Type: Improvement > Components: carbon-spark >Reporter: Jacky Li > Fix For: Apache CarbonData 0.1.0-incubating > > > For this SQL: > select ch, sum(c) from (select ch,count(1) as c from t1 group by ch) temp > where c > 1 group by ch > Physical plan is: > == Physical Plan == > Limit 21 > ConvertToSafe > CarbonDictionaryDecoder [CarbonDecoderRelation(Map(word#22 -> word#22, > ch#23 -> ch#23, value#24L -> > value#24L),CarbonDatasourceRelation(`default`.`t1`,None))], > ExcludeProfile(ArrayBuffer()), CarbonAliasDecoderRelation() >TungstenAggregate(key=[ch#23], > functions=[(sum(c#18L),mode=Final,isDistinct=false)], output=[ch#23,_c1#25L]) > TungstenAggregate(key=[ch#23], > functions=[(sum(c#18L),mode=Partial,isDistinct=false)], > output=[ch#23,currentSum#48L]) > Filter (c#18L > 1) > TungstenAggregate(key=[ch#23], > functions=[(count(1),mode=Final,isDistinct=false)], output=[ch#23,c#18L]) >TungstenExchange hashpartitioning(ch#23) > TungstenAggregate(key=[ch#23], > functions=[(count(1),mode=Partial,isDistinct=false)], > output=[ch#23,currentCount#52L]) > Project [ch#23] > ConvertToSafe >CarbonScan [ch#23], (CarbonRelation default, t1, > CarbonMetaData(ArrayBuffer(word, > ch),ArrayBuffer(value),org.carbondata.core.carbon.metadata.schema.table.CarbonTable@6034ef16,DictionaryMap(Map(word > -> true, ch -> true))), > TableMeta(default_t1,/Users/jackylk/code/incubator-carbondata/target/store,org.carbondata.core.carbon.metadata.schema.table.CarbonTable@6034ef16,Partitioner(org.carbondata.spark.partition.api.impl.SampleDataPartitionerImpl,[Ljava.lang.String;@450458d7,1,[Ljava.lang.String;@f8a969d)), > None), true > The Project is unnecessary since CarbonScan only scan the requested column -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-108) Remove unnecessary Project for CarbonScan
Jacky Li created CARBONDATA-108: --- Summary: Remove unnecessary Project for CarbonScan Key: CARBONDATA-108 URL: https://issues.apache.org/jira/browse/CARBONDATA-108 Project: CarbonData Issue Type: Improvement Components: carbon-spark Reporter: Jacky Li Fix For: Apache CarbonData 0.1.0-incubating For this SQL: select ch, sum(c) from (select ch,count(1) as c from t1 group by ch) temp where c > 1 group by ch Physical plan is: == Physical Plan == Limit 21 ConvertToSafe CarbonDictionaryDecoder [CarbonDecoderRelation(Map(word#22 -> word#22, ch#23 -> ch#23, value#24L -> value#24L),CarbonDatasourceRelation(`default`.`t1`,None))], ExcludeProfile(ArrayBuffer()), CarbonAliasDecoderRelation() TungstenAggregate(key=[ch#23], functions=[(sum(c#18L),mode=Final,isDistinct=false)], output=[ch#23,_c1#25L]) TungstenAggregate(key=[ch#23], functions=[(sum(c#18L),mode=Partial,isDistinct=false)], output=[ch#23,currentSum#48L]) Filter (c#18L > 1) TungstenAggregate(key=[ch#23], functions=[(count(1),mode=Final,isDistinct=false)], output=[ch#23,c#18L]) TungstenExchange hashpartitioning(ch#23) TungstenAggregate(key=[ch#23], functions=[(count(1),mode=Partial,isDistinct=false)], output=[ch#23,currentCount#52L]) Project [ch#23] ConvertToSafe CarbonScan [ch#23], (CarbonRelation default, t1, CarbonMetaData(ArrayBuffer(word, ch),ArrayBuffer(value),org.carbondata.core.carbon.metadata.schema.table.CarbonTable@6034ef16,DictionaryMap(Map(word -> true, ch -> true))), TableMeta(default_t1,/Users/jackylk/code/incubator-carbondata/target/store,org.carbondata.core.carbon.metadata.schema.table.CarbonTable@6034ef16,Partitioner(org.carbondata.spark.partition.api.impl.SampleDataPartitionerImpl,[Ljava.lang.String;@450458d7,1,[Ljava.lang.String;@f8a969d)), None), true The Project is unnecessary since CarbonScan only scan the requested column -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-92) Remove the unnecessary intermediate conversion of key while scanning.
[ https://issues.apache.org/jira/browse/CARBONDATA-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392221#comment-15392221 ] ASF GitHub Bot commented on CARBONDATA-92: -- Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/50#discussion_r72096754 --- Diff: core/src/main/java/org/carbondata/core/carbon/datastore/chunk/impl/FixedLengthDimensionDataChunk.java --- @@ -69,9 +69,31 @@ public FixedLengthDimensionDataChunk(byte[] dataChunk, DimensionChunkAttributes } /** + * Converts to column dictionary integer value + * @param rowId + * @param columnIndex + * @param row + * @param restructuringInfo @return --- End diff -- What is the return value > Remove the unnecessary intermediate conversion of key while scanning. > - > > Key: CARBONDATA-92 > URL: https://issues.apache.org/jira/browse/CARBONDATA-92 > Project: CarbonData > Issue Type: Improvement >Reporter: Ravindra Pesala > > Remove the unnecessary intermediate conversion of key while scanning. > Basically it removes one step in result conversion. > It avoids System.arraycopy while converting to result. > It avoids the result preparation step. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-92) Remove the unnecessary intermediate conversion of key while scanning.
[ https://issues.apache.org/jira/browse/CARBONDATA-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392206#comment-15392206 ] ASF GitHub Bot commented on CARBONDATA-92: -- Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/50#discussion_r72095400 --- Diff: core/src/main/java/org/carbondata/core/carbon/datastore/chunk/impl/ColumnGroupDimensionDataChunk.java --- @@ -67,10 +67,29 @@ public ColumnGroupDimensionDataChunk(byte[] dataChunk, DimensionChunkAttributes } /** - * Below method to get the data based in row id + * Converts to column dictionary integer value + * @param rowId + * @param columnIndex + * @param row + * @param info @return + */ + @Override public int fillConvertedChunkData(int rowId, int columnIndex, int[] row, + KeyStructureInfo info) { +int start = rowId * chunkAttributes.getColumnValueSize(); +int sizeInBytes = info.getKeyGenerator().getKeySizeInBytes(); +byte[] key = new byte[sizeInBytes]; +System.arraycopy(dataChunk, start, key, 0, sizeInBytes); +long[] keyArray = info.getKeyGenerator().getKeyArray(key); --- End diff -- we can avoid this copy also by changing getKeyArray interface to take start and end position, or wrapping it with ByteBuffer > Remove the unnecessary intermediate conversion of key while scanning. > - > > Key: CARBONDATA-92 > URL: https://issues.apache.org/jira/browse/CARBONDATA-92 > Project: CarbonData > Issue Type: Improvement >Reporter: Ravindra Pesala > > Remove the unnecessary intermediate conversion of key while scanning. > Basically it removes one step in result conversion. > It avoids System.arraycopy while converting to result. > It avoids the result preparation step. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-107) Remove unnecessary ConverToSafe in spark planner
[ https://issues.apache.org/jira/browse/CARBONDATA-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392151#comment-15392151 ] ASF GitHub Bot commented on CARBONDATA-107: --- GitHub user jackylk opened a pull request: https://github.com/apache/incubator-carbondata/pull/54 [CARBONDATA-107] remove unnecessary ConvertToSafe CarbonDictionaryDecoder is using InternalRow only, so it should be able to process UnsafeRow. By changing `canProcessUnsafeRows` and `canProcessSafeRows` to `true`, the planner will remove unnecessary ConvertToSafe operator You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/incubator-carbondata unsafe Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/54.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #54 commit a5bd2412d7067bddb1f61c4796bc704729252eb6 Author: jackylk Date: 2016-07-25T15:49:14Z remove unnecessary ConvertToSafe > Remove unnecessary ConverToSafe in spark planner > > > Key: CARBONDATA-107 > URL: https://issues.apache.org/jira/browse/CARBONDATA-107 > Project: CarbonData > Issue Type: Improvement > Components: carbon-spark >Affects Versions: Apache CarbonData 0.1.0-incubating >Reporter: Jacky Li > Fix For: Apache CarbonData 0.1.0-incubating > > > Query: > select ch, sum(c) from (select ch,count(1) as c from t2 group by ch) temp > where c > 1 group by ch > Output plan looks like: > == Physical Plan == > Limit 21 > ConvertToSafe > CarbonDictionaryDecoder [CarbonDecoderRelation(Map(word#39 -> word#39, > ch#40 -> ch#40, value#41 -> > value#41),CarbonDatasourceRelation(`default`.`t1`,None))], > ExcludeProfile(ArrayBuffer(#103)), CarbonAliasDecoderRelation() >ConvertToSafe > TungstenAggregate(key=[ch#40], > functions=[(sum(c#101L),mode=Final,isDistinct=false)], > output=[ch#40,_c1#102L]) > TungstenAggregate(key=[ch#40], > functions=[(sum(c#101L),mode=Partial,isDistinct=false)], > output=[ch#40,currentSum#122L]) > Filter (c#101L > FakeCarbonCast(1 as bigint)) >CarbonDictionaryDecoder [CarbonDecoderRelation(Map(word#39 -> word#39, > ch#40 -> ch#40, value#41 -> > value#41),CarbonDatasourceRelation(`default`.`t1`,None))], > IncludeProfile(ArrayBuffer(#103)), CarbonAliasDecoderRelation() > ConvertToSafe > TungstenAggregate(key=[ch#40], > functions=[(count(1),mode=Final,isDistinct=false)], output=[ch#40,c#101L]) > TungstenExchange hashpartitioning(ch#40) >TungstenAggregate(key=[ch#40], > functions=[(count(1),mode=Partial,isDistinct=false)], > output=[ch#40,currentCount#126L]) > Project [ch#40] > ConvertToSafe > CarbonScan [ch#40], (CarbonRelation default, t1, > CarbonMetaData(ArrayBuffer(word, > ch),ArrayBuffer(value),org.carbondata.core.carbon.metadata.schema.table.CarbonTable@52d54ca2,DictionaryMap(Map(word > -> true, ch -> true))), > TableMeta(default_t1,/Users/jackylk/code/incubator-carbondata/target/store,org.carbondata.core.carbon.metadata.schema.table.CarbonTable@52d54ca2,Partitioner(org.carbondata.spark.partition.api.impl.SampleDataPartitionerImpl,[Ljava.lang.String;@62a877f4,1,[Ljava.lang.String;@3c180da5)), > None), true > There are unnecessary ConvertToSafe before CarbonDictionaryDecoder. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-80) Dictionary values should be equally distributed in buckets while loading in memory
[ https://issues.apache.org/jira/browse/CARBONDATA-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392140#comment-15392140 ] ASF GitHub Bot commented on CARBONDATA-80: -- Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/44#discussion_r72089681 --- Diff: core/src/main/java/org/carbondata/core/cache/dictionary/ColumnDictionaryInfo.java --- @@ -112,10 +113,35 @@ public ColumnDictionaryInfo(DataType dataType) { /** * This method will add a new dictionary chunk to existing list of dictionary chunks * - * @param dictionaryChunk + * @param newDictionaryChunk */ - @Override public void addDictionaryChunk(List dictionaryChunk) { -dictionaryChunks.add(dictionaryChunk); + @Override public void addDictionaryChunk(List newDictionaryChunk) { +if (dictionaryChunks.size() > 0) { + // Ensure that each time a new dictionary chunk is getting added to the + // dictionary chunks list, equal distribution of dictionary values should + // be there in the sublists of dictionary chunk list + List lastDictionaryChunk = dictionaryChunks.get(dictionaryChunks.size() - 1); + int dictionaryOneChunkSize = CarbonUtil.getDictionaryChunkSize(); + int differenceInLastDictionaryAndOneChunkSize = + dictionaryOneChunkSize - lastDictionaryChunk.size(); + if (differenceInLastDictionaryAndOneChunkSize > 0) { +// if difference is greater than new dictionary size then copy a part of list +// else copy the complete new dictionary chunk list in the last dictionary chunk list +if (differenceInLastDictionaryAndOneChunkSize >= newDictionaryChunk.size()) { + lastDictionaryChunk.addAll(newDictionaryChunk); +} else { + List subListOfNewDictionaryChunk = + newDictionaryChunk.subList(0, differenceInLastDictionaryAndOneChunkSize); + lastDictionaryChunk.addAll(subListOfNewDictionaryChunk); + subListOfNewDictionaryChunk.clear(); --- End diff -- Use one more sub list and add remaining > Dictionary values should be equally distributed in buckets while loading in > memory > -- > > Key: CARBONDATA-80 > URL: https://issues.apache.org/jira/browse/CARBONDATA-80 > Project: CarbonData > Issue Type: Improvement >Reporter: Manish Gupta >Assignee: Manish Gupta >Priority: Minor > > Whenever a query is executed, dictionary for columns queried is loaded in > memory. For incremental loads dictionary values are loaded incrementally and > thus one list contains several sub lists with dictionary values. > The dictionary values on incremental load may not be equally distributed in > the sub buckets and this might increase the search time of a value if there > are too many incremental loads. > Therefore the dictionary values should be divided equally in the sub buckets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-107) Remove unnecessary ConverToSafe in spark planner
Jacky Li created CARBONDATA-107: --- Summary: Remove unnecessary ConverToSafe in spark planner Key: CARBONDATA-107 URL: https://issues.apache.org/jira/browse/CARBONDATA-107 Project: CarbonData Issue Type: Improvement Components: carbon-spark Affects Versions: Apache CarbonData 0.1.0-incubating Reporter: Jacky Li Fix For: Apache CarbonData 0.1.0-incubating Query: select ch, sum(c) from (select ch,count(1) as c from t2 group by ch) temp where c > 1 group by ch Output plan looks like: == Physical Plan == Limit 21 ConvertToSafe CarbonDictionaryDecoder [CarbonDecoderRelation(Map(word#39 -> word#39, ch#40 -> ch#40, value#41 -> value#41),CarbonDatasourceRelation(`default`.`t1`,None))], ExcludeProfile(ArrayBuffer(#103)), CarbonAliasDecoderRelation() ConvertToSafe TungstenAggregate(key=[ch#40], functions=[(sum(c#101L),mode=Final,isDistinct=false)], output=[ch#40,_c1#102L]) TungstenAggregate(key=[ch#40], functions=[(sum(c#101L),mode=Partial,isDistinct=false)], output=[ch#40,currentSum#122L]) Filter (c#101L > FakeCarbonCast(1 as bigint)) CarbonDictionaryDecoder [CarbonDecoderRelation(Map(word#39 -> word#39, ch#40 -> ch#40, value#41 -> value#41),CarbonDatasourceRelation(`default`.`t1`,None))], IncludeProfile(ArrayBuffer(#103)), CarbonAliasDecoderRelation() ConvertToSafe TungstenAggregate(key=[ch#40], functions=[(count(1),mode=Final,isDistinct=false)], output=[ch#40,c#101L]) TungstenExchange hashpartitioning(ch#40) TungstenAggregate(key=[ch#40], functions=[(count(1),mode=Partial,isDistinct=false)], output=[ch#40,currentCount#126L]) Project [ch#40] ConvertToSafe CarbonScan [ch#40], (CarbonRelation default, t1, CarbonMetaData(ArrayBuffer(word, ch),ArrayBuffer(value),org.carbondata.core.carbon.metadata.schema.table.CarbonTable@52d54ca2,DictionaryMap(Map(word -> true, ch -> true))), TableMeta(default_t1,/Users/jackylk/code/incubator-carbondata/target/store,org.carbondata.core.carbon.metadata.schema.table.CarbonTable@52d54ca2,Partitioner(org.carbondata.spark.partition.api.impl.SampleDataPartitionerImpl,[Ljava.lang.String;@62a877f4,1,[Ljava.lang.String;@3c180da5)), None), true There are unnecessary ConvertToSafe before CarbonDictionaryDecoder. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-106) Add audit logs for DDL commands
Manohar Vanam created CARBONDATA-106: Summary: Add audit logs for DDL commands Key: CARBONDATA-106 URL: https://issues.apache.org/jira/browse/CARBONDATA-106 Project: CarbonData Issue Type: Improvement Reporter: Manohar Vanam Assignee: Manohar Vanam Add audit logs for 1.Create table 2. Load table -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CARBONDATA-8) Use create table instead of cube in all test cases
[ https://issues.apache.org/jira/browse/CARBONDATA-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391945#comment-15391945 ] Manohar Vanam edited comment on CARBONDATA-8 at 7/25/16 2:10 PM: - Fixed in all modules was (Author: manoharvanam): Changed in all places > Use create table instead of cube in all test cases > -- > > Key: CARBONDATA-8 > URL: https://issues.apache.org/jira/browse/CARBONDATA-8 > Project: CarbonData > Issue Type: Test >Reporter: Manohar Vanam >Assignee: Manohar Vanam > > 1. Use create table instead of cube in all test cases > 2. Remove unnecessary & duplicate test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-8) Use create table instead of cube in all test cases
[ https://issues.apache.org/jira/browse/CARBONDATA-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391945#comment-15391945 ] Manohar Vanam commented on CARBONDATA-8: Changed in all places > Use create table instead of cube in all test cases > -- > > Key: CARBONDATA-8 > URL: https://issues.apache.org/jira/browse/CARBONDATA-8 > Project: CarbonData > Issue Type: Test >Reporter: Manohar Vanam >Assignee: Manohar Vanam > > 1. Use create table instead of cube in all test cases > 2. Remove unnecessary & duplicate test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-60) wrong result when using union all
[ https://issues.apache.org/jira/browse/CARBONDATA-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391867#comment-15391867 ] ASF GitHub Bot commented on CARBONDATA-60: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/41 > wrong result when using union all > - > > Key: CARBONDATA-60 > URL: https://issues.apache.org/jira/browse/CARBONDATA-60 > Project: CarbonData > Issue Type: Bug >Affects Versions: Apache CarbonData 0.1.0-incubating >Reporter: ray >Assignee: Ravindra Pesala > > the issue can be reproduced by following code: > the expected result is 1 row, but actual result is 2 rows. > +---+---+ > | c1|_c1| > +---+---+ > |200| 1| > |279| 1| > +---+---+ > import cc.implicits._ > val df=sc.parallelize(1 to 1000).map(x => (x+"", (x+100)+"")).toDF("c1", > "c2") > import org.carbondata.spark._ > df.saveAsCarbonFile(Map("tableName" -> "carbon1")) > cc.sql(""" > select c1,count(*) from( > select c1 as c1,c2 as c2 from carbon1 > union all > select c2 as c1,c1 as c2 from carbon1 > )t > where c1='200' > group by c1 > """).show() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-60) wrong result when using union all
[ https://issues.apache.org/jira/browse/CARBONDATA-60?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenLiang updated CARBONDATA-60: Assignee: Ravindra Pesala > wrong result when using union all > - > > Key: CARBONDATA-60 > URL: https://issues.apache.org/jira/browse/CARBONDATA-60 > Project: CarbonData > Issue Type: Bug >Affects Versions: Apache CarbonData 0.1.0-incubating >Reporter: ray >Assignee: Ravindra Pesala > > the issue can be reproduced by following code: > the expected result is 1 row, but actual result is 2 rows. > +---+---+ > | c1|_c1| > +---+---+ > |200| 1| > |279| 1| > +---+---+ > import cc.implicits._ > val df=sc.parallelize(1 to 1000).map(x => (x+"", (x+100)+"")).toDF("c1", > "c2") > import org.carbondata.spark._ > df.saveAsCarbonFile(Map("tableName" -> "carbon1")) > cc.sql(""" > select c1,count(*) from( > select c1 as c1,c2 as c2 from carbon1 > union all > select c2 as c1,c1 as c2 from carbon1 > )t > where c1='200' > group by c1 > """).show() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-60) wrong result when using union all
[ https://issues.apache.org/jira/browse/CARBONDATA-60?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenLiang updated CARBONDATA-60: Affects Version/s: Apache CarbonData 0.1.0-incubating > wrong result when using union all > - > > Key: CARBONDATA-60 > URL: https://issues.apache.org/jira/browse/CARBONDATA-60 > Project: CarbonData > Issue Type: Bug >Affects Versions: Apache CarbonData 0.1.0-incubating >Reporter: ray > > the issue can be reproduced by following code: > the expected result is 1 row, but actual result is 2 rows. > +---+---+ > | c1|_c1| > +---+---+ > |200| 1| > |279| 1| > +---+---+ > import cc.implicits._ > val df=sc.parallelize(1 to 1000).map(x => (x+"", (x+100)+"")).toDF("c1", > "c2") > import org.carbondata.spark._ > df.saveAsCarbonFile(Map("tableName" -> "carbon1")) > cc.sql(""" > select c1,count(*) from( > select c1 as c1,c2 as c2 from carbon1 > union all > select c2 as c1,c1 as c2 from carbon1 > )t > where c1='200' > group by c1 > """).show() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-49) Can not query 3 million rows data which be loaded through local store system(not HDFS)
[ https://issues.apache.org/jira/browse/CARBONDATA-49?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenLiang updated CARBONDATA-49: Assignee: Ravindra Pesala > Can not query 3 million rows data which be loaded through local store > system(not HDFS) > -- > > Key: CARBONDATA-49 > URL: https://issues.apache.org/jira/browse/CARBONDATA-49 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: Apache CarbonData 0.1.0-incubating > Environment: spark 1.6.1 >Reporter: ChenLiang >Assignee: Ravindra Pesala >Priority: Minor > Fix For: Apache CarbonData 0.1.0-incubating > > > CSV data be stored at local machine(not HDSF), test result as below. > 1.If the csv data is 1 million rows, all query is ok. > 2.If the csv data is 3 million rows, query of cc.sql("select * from > tablename") having the below errors: > ERROR 11-07 20:56:54,131 - [Executor task launch > worker-12][partitionID:connectdemo;queryID:33111337863067_0] > org.carbondata.scan.executor.exception.QueryExecutionException: > at > org.carbondata.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:99) > at > org.carbondata.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:178) > at > org.carbondata.scan.executor.impl.DetailRawRecordQueryExecutor.execute(DetailRawRecordQueryExecutor.java:20) > at > org.carbondata.spark.rdd.CarbonScanRDD$$anon$1.(CarbonScanRDD.scala:174) > at > org.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:155) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.carbondata.core.carbon.datastore.exception.IndexBuilderException: > at > org.carbondata.core.carbon.datastore.BlockIndexStore.fillLoadedBlocks(BlockIndexStore.java:211) > at > org.carbondata.core.carbon.datastore.BlockIndexStore.loadAndGetBlocks(BlockIndexStore.java:191) > at > org.carbondata.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:96) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-41) Run example, system can't normally end
[ https://issues.apache.org/jira/browse/CARBONDATA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenLiang updated CARBONDATA-41: Affects Version/s: Apache CarbonData 0.1.0-incubating > Run example, system can't normally end > -- > > Key: CARBONDATA-41 > URL: https://issues.apache.org/jira/browse/CARBONDATA-41 > Project: CarbonData > Issue Type: Bug >Affects Versions: Apache CarbonData 0.1.0-incubating >Reporter: ChenLiang >Assignee: Ravindra Pesala >Priority: Minor > > Reproduce steps: > 1.Run CarbonExample.scala in Intellij IDEA > 2.Get result, but system can't normally end. > 3.Run CarbonExample.scala again in Intellij IDEA, get the below errors: > FAILED SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address > already in use > java.net.BindException: Address already in use > need to manually stop it, then can run CarbonExample.scala again in Intellij > IDEA -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-49) Can not query 3 million rows data which be loaded through local store system(not HDFS)
[ https://issues.apache.org/jira/browse/CARBONDATA-49?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenLiang resolved CARBONDATA-49. - Resolution: Fixed > Can not query 3 million rows data which be loaded through local store > system(not HDFS) > -- > > Key: CARBONDATA-49 > URL: https://issues.apache.org/jira/browse/CARBONDATA-49 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: Apache CarbonData 0.1.0-incubating > Environment: spark 1.6.1 >Reporter: ChenLiang >Assignee: Ravindra Pesala >Priority: Minor > Fix For: Apache CarbonData 0.1.0-incubating > > > CSV data be stored at local machine(not HDSF), test result as below. > 1.If the csv data is 1 million rows, all query is ok. > 2.If the csv data is 3 million rows, query of cc.sql("select * from > tablename") having the below errors: > ERROR 11-07 20:56:54,131 - [Executor task launch > worker-12][partitionID:connectdemo;queryID:33111337863067_0] > org.carbondata.scan.executor.exception.QueryExecutionException: > at > org.carbondata.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:99) > at > org.carbondata.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:178) > at > org.carbondata.scan.executor.impl.DetailRawRecordQueryExecutor.execute(DetailRawRecordQueryExecutor.java:20) > at > org.carbondata.spark.rdd.CarbonScanRDD$$anon$1.(CarbonScanRDD.scala:174) > at > org.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:155) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.carbondata.core.carbon.datastore.exception.IndexBuilderException: > at > org.carbondata.core.carbon.datastore.BlockIndexStore.fillLoadedBlocks(BlockIndexStore.java:211) > at > org.carbondata.core.carbon.datastore.BlockIndexStore.loadAndGetBlocks(BlockIndexStore.java:191) > at > org.carbondata.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:96) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-88) Use ./bin/carbon-spark-shell to run, generated two issues
[ https://issues.apache.org/jira/browse/CARBONDATA-88?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenLiang resolved CARBONDATA-88. - Resolution: Fixed > Use ./bin/carbon-spark-shell to run, generated two issues > - > > Key: CARBONDATA-88 > URL: https://issues.apache.org/jira/browse/CARBONDATA-88 > Project: CarbonData > Issue Type: Bug >Affects Versions: Apache CarbonData 0.1.0-incubating > Environment: Apache Spark1.6.1 > The latest master code of Apache CarbonData >Reporter: ChenLiang >Assignee: Ravindra Pesala >Priority: Minor > > Use ./bin/carbon-spark-shell to run, generated two issues: > 1. The carbonshellstore be created under root directory, propose to move > carbonshellstore to ./bin directory > 2.Data load failure: > scala> cc.sql("LOAD DATA LOCAL INPATH > '/Users/apple/Downloads/spark-1.6.1-bin-hadoop2.6/carbondata/hzmeetup.csv' > INTO TABLE meetupTable") > java.lang.Exception: Dataload failure > at > org.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:791) > at > org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1167) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:145) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:130) > at > org.carbondata.spark.rdd.CarbonDataFrameRDD.(CarbonDataFrameRDD.scala:23) > at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:131) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:42) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-49) Can not query 3 million rows data which be loaded through local store system(not HDFS)
[ https://issues.apache.org/jira/browse/CARBONDATA-49?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenLiang updated CARBONDATA-49: Affects Version/s: Apache CarbonData 0.1.0-incubating Fix Version/s: Apache CarbonData 0.1.0-incubating Component/s: core > Can not query 3 million rows data which be loaded through local store > system(not HDFS) > -- > > Key: CARBONDATA-49 > URL: https://issues.apache.org/jira/browse/CARBONDATA-49 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: Apache CarbonData 0.1.0-incubating > Environment: spark 1.6.1 >Reporter: ChenLiang >Priority: Minor > Fix For: Apache CarbonData 0.1.0-incubating > > > CSV data be stored at local machine(not HDSF), test result as below. > 1.If the csv data is 1 million rows, all query is ok. > 2.If the csv data is 3 million rows, query of cc.sql("select * from > tablename") having the below errors: > ERROR 11-07 20:56:54,131 - [Executor task launch > worker-12][partitionID:connectdemo;queryID:33111337863067_0] > org.carbondata.scan.executor.exception.QueryExecutionException: > at > org.carbondata.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:99) > at > org.carbondata.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:178) > at > org.carbondata.scan.executor.impl.DetailRawRecordQueryExecutor.execute(DetailRawRecordQueryExecutor.java:20) > at > org.carbondata.spark.rdd.CarbonScanRDD$$anon$1.(CarbonScanRDD.scala:174) > at > org.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:155) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.carbondata.core.carbon.datastore.exception.IndexBuilderException: > at > org.carbondata.core.carbon.datastore.BlockIndexStore.fillLoadedBlocks(BlockIndexStore.java:211) > at > org.carbondata.core.carbon.datastore.BlockIndexStore.loadAndGetBlocks(BlockIndexStore.java:191) > at > org.carbondata.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:96) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-103) Rename CreateCube to CreateTable to correct the audit log of create table commnad
[ https://issues.apache.org/jira/browse/CARBONDATA-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Shahid Khan resolved CARBONDATA-103. - Resolution: Duplicate > Rename CreateCube to CreateTable to correct the audit log of create table > commnad > - > > Key: CARBONDATA-103 > URL: https://issues.apache.org/jira/browse/CARBONDATA-103 > Project: CarbonData > Issue Type: Bug >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan >Priority: Trivial > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-103) Rename CreateCube to CreateTable to correct the audit log of create table commnad
[ https://issues.apache.org/jira/browse/CARBONDATA-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Shahid Khan updated CARBONDATA-103: Assignee: (was: Mohammad Shahid Khan) > Rename CreateCube to CreateTable to correct the audit log of create table > commnad > - > > Key: CARBONDATA-103 > URL: https://issues.apache.org/jira/browse/CARBONDATA-103 > Project: CarbonData > Issue Type: Bug >Reporter: Mohammad Shahid Khan >Priority: Trivial > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-105) Correct precalculation of dictionary file existence
Ashok Kumar created CARBONDATA-105: -- Summary: Correct precalculation of dictionary file existence Key: CARBONDATA-105 URL: https://issues.apache.org/jira/browse/CARBONDATA-105 Project: CarbonData Issue Type: Bug Reporter: Ashok Kumar Priority: Minor In case of concurrent data loading,pre calculation of existence of dictionary file will not have proper result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-104) To support varchar datatype
zhangshunyu created CARBONDATA-104: -- Summary: To support varchar datatype Key: CARBONDATA-104 URL: https://issues.apache.org/jira/browse/CARBONDATA-104 Project: CarbonData Issue Type: New Feature Reporter: zhangshunyu Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-30) Record load performance statistics
[ https://issues.apache.org/jira/browse/CARBONDATA-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangshunyu resolved CARBONDATA-30. --- Resolution: Resolved > Record load performance statistics > -- > > Key: CARBONDATA-30 > URL: https://issues.apache.org/jira/browse/CARBONDATA-30 > Project: CarbonData > Issue Type: New Feature >Reporter: zhangshunyu >Assignee: zhangshunyu > > We should use a parameter which can be configured by user to determine > whether the statistics will be recorded and calculated during data loading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-103) Rename CreateCube to CreateTable to correct the audit log of create table commnad
Mohammad Shahid Khan created CARBONDATA-103: --- Summary: Rename CreateCube to CreateTable to correct the audit log of create table commnad Key: CARBONDATA-103 URL: https://issues.apache.org/jira/browse/CARBONDATA-103 Project: CarbonData Issue Type: Bug Reporter: Mohammad Shahid Khan Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CARBONDATA-103) Rename CreateCube to CreateTable to correct the audit log of create table commnad
[ https://issues.apache.org/jira/browse/CARBONDATA-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Shahid Khan reassigned CARBONDATA-103: --- Assignee: Mohammad Shahid Khan > Rename CreateCube to CreateTable to correct the audit log of create table > commnad > - > > Key: CARBONDATA-103 > URL: https://issues.apache.org/jira/browse/CARBONDATA-103 > Project: CarbonData > Issue Type: Bug >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan >Priority: Trivial > -- This message was sent by Atlassian JIRA (v6.3.4#6332)