[jira] [Commented] (CARBONDATA-109) 500g Dataload Failure in a spark cluster

2016-07-25 Thread ChenLiang (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393147#comment-15393147
 ] 

ChenLiang commented on CARBONDATA-109:
--

[~xiaoyesoso] before submitting this to JIRA as a real issue,  it would be 
better if you could send this question to mailing list for adequate 
discussion(mailing list :  dev@carbondata.incubator.apache.org )

> 500g Dataload Failure in a spark cluster
> 
>
> Key: CARBONDATA-109
> URL: https://issues.apache.org/jira/browse/CARBONDATA-109
> Project: CarbonData
>  Issue Type: Bug
>  Components: carbon-spark
>Reporter: Shoujie Zhuo
>
> INFO  26-07 10:54:28,630 - starting clean up**
> INFO  26-07 10:54:28,766 - clean up done**
> AUDIT 26-07 10:54:28,767 - [holodesk01][hdfs][Thread-1]Data load is failed 
> for tpcds_carbon_500_part.store_sales
> WARN  26-07 10:54:28,768 - Unable to write load metadata file
> ERROR 26-07 10:54:28,769 - main 
> java.lang.Exception: Dataload failure
>   at 
> org.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:791)
>   at 
> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1167)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:145)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
>   at 
> org.carbondata.spark.rdd.CarbonDataFrameRDD.(CarbonDataFrameRDD.scala:23)
>   at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:131)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:311)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226)
>   at 
> org.apache.spark.sql.hive.cli.CarbonSQLCLIDriver$.main(CarbonSQLCLIDriver.scala:40)
>   at 
> org.apache.spark.sql.hive.cli.CarbonSQLCLIDriver.main(CarbonSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> AUDIT 26-07 10:54:28,772 - [holodesk01][hdfs][Thread-1]Dataload failure for 
> tpcds_carbon_500_part.store_sales. Please check the logs
> INFO  26-07 10:54:28,775 - Table MetaData Unlocked Successfully after data 
> load
> ERROR 26-07 10:54:28,776 - Failed in [LOAD DATA inpath 
> 'hdfs://holodesk01/user/carbon-spark-sql/tpcds/500/store_sales' INTO table 
> store_sales]
> java.lang.Exception: Dataload failure
>   at 
> org.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:791)
>   at 
> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1167)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:

[jira] [Commented] (CARBONDATA-60) wrong result when using union all

2016-07-25 Thread ChenLiang (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393142#comment-15393142
 ] 

ChenLiang commented on CARBONDATA-60:
-

Hi ray  , this issue has been fixed at the latest master,please verify and 
close it if ok

> wrong result when using union all
> -
>
> Key: CARBONDATA-60
> URL: https://issues.apache.org/jira/browse/CARBONDATA-60
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: Apache CarbonData 0.1.0-incubating
>Reporter: ray
>Assignee: Ravindra Pesala
>
> the issue can be reproduced by following code:
> the expected result is 1 row, but actual result is 2 rows.
> +---+---+
> | c1|_c1|
> +---+---+
> |200|  1|
> |279|  1|
> +---+---+
> import cc.implicits._
> val df=sc.parallelize(1 to 1000).map(x => (x+"", (x+100)+"")).toDF("c1", 
> "c2")
> import org.carbondata.spark._
> df.saveAsCarbonFile(Map("tableName" -> "carbon1"))
> cc.sql("""
> select c1,count(*) from(
>   select c1 as c1,c2 as c2 from carbon1
>   union all
>   select c2 as c1,c1 as c2 from carbon1
>  )t
>   where c1='200'
>   group by c1
> """).show()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-109) 500g Dataload Failure in a spark cluster

2016-07-25 Thread Shoujie Zhuo (JIRA)
Shoujie Zhuo created CARBONDATA-109:
---

 Summary: 500g Dataload Failure in a spark cluster
 Key: CARBONDATA-109
 URL: https://issues.apache.org/jira/browse/CARBONDATA-109
 Project: CarbonData
  Issue Type: Bug
  Components: carbon-spark
Reporter: Shoujie Zhuo


INFO  26-07 10:54:28,630 - starting clean up**
INFO  26-07 10:54:28,766 - clean up done**
AUDIT 26-07 10:54:28,767 - [holodesk01][hdfs][Thread-1]Data load is failed for 
tpcds_carbon_500_part.store_sales
WARN  26-07 10:54:28,768 - Unable to write load metadata file
ERROR 26-07 10:54:28,769 - main 
java.lang.Exception: Dataload failure
at 
org.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:791)
at 
org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1167)
at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at 
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:145)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
at 
org.carbondata.spark.rdd.CarbonDataFrameRDD.(CarbonDataFrameRDD.scala:23)
at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:131)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:311)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226)
at 
org.apache.spark.sql.hive.cli.CarbonSQLCLIDriver$.main(CarbonSQLCLIDriver.scala:40)
at 
org.apache.spark.sql.hive.cli.CarbonSQLCLIDriver.main(CarbonSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
AUDIT 26-07 10:54:28,772 - [holodesk01][hdfs][Thread-1]Dataload failure for 
tpcds_carbon_500_part.store_sales. Please check the logs
INFO  26-07 10:54:28,775 - Table MetaData Unlocked Successfully after data load
ERROR 26-07 10:54:28,776 - Failed in [LOAD DATA inpath 
'hdfs://holodesk01/user/carbon-spark-sql/tpcds/500/store_sales' INTO table 
store_sales]
java.lang.Exception: Dataload failure
at 
org.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:791)
at 
org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1167)
at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at 
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:145)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
at 
o

[jira] [Commented] (CARBONDATA-108) Remove unnecessary Project for CarbonScan

2016-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393083#comment-15393083
 ] 

ASF GitHub Bot commented on CARBONDATA-108:
---

GitHub user jackylk opened a pull request:

https://github.com/apache/incubator-carbondata/pull/55

[CARBONDATA-108] Remove project in strategy

Modify CarbonStrategy, when project equals scan column, do scan without 
project

Modify some test cases to drop table

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jackylk/incubator-carbondata project

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/55.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #55


commit bd9fe7c9b8bb8bd6030a0f137e34a6ba268c
Author: jackylk 
Date:   2016-07-26T02:28:18Z

remove project in strategy




> Remove unnecessary Project for CarbonScan
> -
>
> Key: CARBONDATA-108
> URL: https://issues.apache.org/jira/browse/CARBONDATA-108
> Project: CarbonData
>  Issue Type: Improvement
>  Components: carbon-spark
>Reporter: Jacky Li
> Fix For: Apache CarbonData 0.1.0-incubating
>
>
> For this SQL:
> select ch, sum(c) from (select ch,count(1) as c from t1 group by ch) temp 
> where c > 1 group by ch
> Physical plan is:
> == Physical Plan ==
> Limit 21
>  ConvertToSafe
>   CarbonDictionaryDecoder [CarbonDecoderRelation(Map(word#22 -> word#22, 
> ch#23 -> ch#23, value#24L -> 
> value#24L),CarbonDatasourceRelation(`default`.`t1`,None))], 
> ExcludeProfile(ArrayBuffer()), CarbonAliasDecoderRelation()
>TungstenAggregate(key=[ch#23], 
> functions=[(sum(c#18L),mode=Final,isDistinct=false)], output=[ch#23,_c1#25L])
> TungstenAggregate(key=[ch#23], 
> functions=[(sum(c#18L),mode=Partial,isDistinct=false)], 
> output=[ch#23,currentSum#48L])
>  Filter (c#18L > 1)
>   TungstenAggregate(key=[ch#23], 
> functions=[(count(1),mode=Final,isDistinct=false)], output=[ch#23,c#18L])
>TungstenExchange hashpartitioning(ch#23)
> TungstenAggregate(key=[ch#23], 
> functions=[(count(1),mode=Partial,isDistinct=false)], 
> output=[ch#23,currentCount#52L])
>  Project [ch#23]
>   ConvertToSafe
>CarbonScan [ch#23], (CarbonRelation default, t1, 
> CarbonMetaData(ArrayBuffer(word, 
> ch),ArrayBuffer(value),org.carbondata.core.carbon.metadata.schema.table.CarbonTable@6034ef16,DictionaryMap(Map(word
>  -> true, ch -> true))), 
> TableMeta(default_t1,/Users/jackylk/code/incubator-carbondata/target/store,org.carbondata.core.carbon.metadata.schema.table.CarbonTable@6034ef16,Partitioner(org.carbondata.spark.partition.api.impl.SampleDataPartitionerImpl,[Ljava.lang.String;@450458d7,1,[Ljava.lang.String;@f8a969d)),
>  None), true
> The Project is unnecessary since CarbonScan only scan the requested column



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-108) Remove unnecessary Project for CarbonScan

2016-07-25 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-108:
---

 Summary: Remove unnecessary Project for CarbonScan
 Key: CARBONDATA-108
 URL: https://issues.apache.org/jira/browse/CARBONDATA-108
 Project: CarbonData
  Issue Type: Improvement
  Components: carbon-spark
Reporter: Jacky Li
 Fix For: Apache CarbonData 0.1.0-incubating


For this SQL:
select ch, sum(c) from (select ch,count(1) as c from t1 group by ch) temp where 
c > 1 group by ch

Physical plan is:
== Physical Plan ==
Limit 21
 ConvertToSafe
  CarbonDictionaryDecoder [CarbonDecoderRelation(Map(word#22 -> word#22, ch#23 
-> ch#23, value#24L -> 
value#24L),CarbonDatasourceRelation(`default`.`t1`,None))], 
ExcludeProfile(ArrayBuffer()), CarbonAliasDecoderRelation()
   TungstenAggregate(key=[ch#23], 
functions=[(sum(c#18L),mode=Final,isDistinct=false)], output=[ch#23,_c1#25L])
TungstenAggregate(key=[ch#23], 
functions=[(sum(c#18L),mode=Partial,isDistinct=false)], 
output=[ch#23,currentSum#48L])
 Filter (c#18L > 1)
  TungstenAggregate(key=[ch#23], 
functions=[(count(1),mode=Final,isDistinct=false)], output=[ch#23,c#18L])
   TungstenExchange hashpartitioning(ch#23)
TungstenAggregate(key=[ch#23], 
functions=[(count(1),mode=Partial,isDistinct=false)], 
output=[ch#23,currentCount#52L])
 Project [ch#23]
  ConvertToSafe
   CarbonScan [ch#23], (CarbonRelation default, t1, 
CarbonMetaData(ArrayBuffer(word, 
ch),ArrayBuffer(value),org.carbondata.core.carbon.metadata.schema.table.CarbonTable@6034ef16,DictionaryMap(Map(word
 -> true, ch -> true))), 
TableMeta(default_t1,/Users/jackylk/code/incubator-carbondata/target/store,org.carbondata.core.carbon.metadata.schema.table.CarbonTable@6034ef16,Partitioner(org.carbondata.spark.partition.api.impl.SampleDataPartitionerImpl,[Ljava.lang.String;@450458d7,1,[Ljava.lang.String;@f8a969d)),
 None), true

The Project is unnecessary since CarbonScan only scan the requested column



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-92) Remove the unnecessary intermediate conversion of key while scanning.

2016-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392221#comment-15392221
 ] 

ASF GitHub Bot commented on CARBONDATA-92:
--

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/50#discussion_r72096754
  
--- Diff: 
core/src/main/java/org/carbondata/core/carbon/datastore/chunk/impl/FixedLengthDimensionDataChunk.java
 ---
@@ -69,9 +69,31 @@ public FixedLengthDimensionDataChunk(byte[] dataChunk, 
DimensionChunkAttributes
   }
 
   /**
+   * Converts to column dictionary integer value
+   * @param rowId
+   * @param columnIndex
+   * @param row
+   * @param restructuringInfo  @return
--- End diff --

What is the return value


> Remove the unnecessary intermediate conversion of key while scanning.
> -
>
> Key: CARBONDATA-92
> URL: https://issues.apache.org/jira/browse/CARBONDATA-92
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Ravindra Pesala
>
> Remove the unnecessary intermediate conversion of key while scanning.
> Basically it removes one step in result conversion.
> It avoids System.arraycopy while converting to result. 
> It avoids the result preparation step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-92) Remove the unnecessary intermediate conversion of key while scanning.

2016-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392206#comment-15392206
 ] 

ASF GitHub Bot commented on CARBONDATA-92:
--

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/50#discussion_r72095400
  
--- Diff: 
core/src/main/java/org/carbondata/core/carbon/datastore/chunk/impl/ColumnGroupDimensionDataChunk.java
 ---
@@ -67,10 +67,29 @@ public ColumnGroupDimensionDataChunk(byte[] dataChunk, 
DimensionChunkAttributes
   }
 
   /**
-   * Below method to get the data based in row id
+   * Converts to column dictionary integer value
+   * @param rowId
+   * @param columnIndex
+   * @param row
+   * @param info  @return
+   */
+  @Override public int fillConvertedChunkData(int rowId, int columnIndex, 
int[] row,
+  KeyStructureInfo info) {
+int start = rowId * chunkAttributes.getColumnValueSize();
+int sizeInBytes = info.getKeyGenerator().getKeySizeInBytes();
+byte[] key = new byte[sizeInBytes];
+System.arraycopy(dataChunk, start, key, 0, sizeInBytes);
+long[] keyArray = info.getKeyGenerator().getKeyArray(key);
--- End diff --

we can avoid this copy also by changing getKeyArray interface to take start 
and end position, or wrapping it with ByteBuffer


> Remove the unnecessary intermediate conversion of key while scanning.
> -
>
> Key: CARBONDATA-92
> URL: https://issues.apache.org/jira/browse/CARBONDATA-92
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Ravindra Pesala
>
> Remove the unnecessary intermediate conversion of key while scanning.
> Basically it removes one step in result conversion.
> It avoids System.arraycopy while converting to result. 
> It avoids the result preparation step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-107) Remove unnecessary ConverToSafe in spark planner

2016-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392151#comment-15392151
 ] 

ASF GitHub Bot commented on CARBONDATA-107:
---

GitHub user jackylk opened a pull request:

https://github.com/apache/incubator-carbondata/pull/54

[CARBONDATA-107] remove unnecessary ConvertToSafe

CarbonDictionaryDecoder is using InternalRow only, so it should be able to 
process UnsafeRow.
By changing `canProcessUnsafeRows` and `canProcessSafeRows` to `true`, the 
planner will remove unnecessary ConvertToSafe operator

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jackylk/incubator-carbondata unsafe

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/54.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #54


commit a5bd2412d7067bddb1f61c4796bc704729252eb6
Author: jackylk 
Date:   2016-07-25T15:49:14Z

remove unnecessary ConvertToSafe




> Remove unnecessary ConverToSafe in spark planner
> 
>
> Key: CARBONDATA-107
> URL: https://issues.apache.org/jira/browse/CARBONDATA-107
> Project: CarbonData
>  Issue Type: Improvement
>  Components: carbon-spark
>Affects Versions: Apache CarbonData 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: Apache CarbonData 0.1.0-incubating
>
>
> Query: 
> select ch, sum(c) from (select ch,count(1) as c from t2 group by ch) temp 
> where c > 1 group by ch
> Output plan looks like:
> == Physical Plan ==
> Limit 21
>  ConvertToSafe
>   CarbonDictionaryDecoder [CarbonDecoderRelation(Map(word#39 -> word#39, 
> ch#40 -> ch#40, value#41 -> 
> value#41),CarbonDatasourceRelation(`default`.`t1`,None))], 
> ExcludeProfile(ArrayBuffer(#103)), CarbonAliasDecoderRelation()
>ConvertToSafe
> TungstenAggregate(key=[ch#40], 
> functions=[(sum(c#101L),mode=Final,isDistinct=false)], 
> output=[ch#40,_c1#102L])
>  TungstenAggregate(key=[ch#40], 
> functions=[(sum(c#101L),mode=Partial,isDistinct=false)], 
> output=[ch#40,currentSum#122L])
>   Filter (c#101L > FakeCarbonCast(1 as bigint))
>CarbonDictionaryDecoder [CarbonDecoderRelation(Map(word#39 -> word#39, 
> ch#40 -> ch#40, value#41 -> 
> value#41),CarbonDatasourceRelation(`default`.`t1`,None))], 
> IncludeProfile(ArrayBuffer(#103)), CarbonAliasDecoderRelation()
> ConvertToSafe
>  TungstenAggregate(key=[ch#40], 
> functions=[(count(1),mode=Final,isDistinct=false)], output=[ch#40,c#101L])
>   TungstenExchange hashpartitioning(ch#40)
>TungstenAggregate(key=[ch#40], 
> functions=[(count(1),mode=Partial,isDistinct=false)], 
> output=[ch#40,currentCount#126L])
> Project [ch#40]
>  ConvertToSafe
>   CarbonScan [ch#40], (CarbonRelation default, t1, 
> CarbonMetaData(ArrayBuffer(word, 
> ch),ArrayBuffer(value),org.carbondata.core.carbon.metadata.schema.table.CarbonTable@52d54ca2,DictionaryMap(Map(word
>  -> true, ch -> true))), 
> TableMeta(default_t1,/Users/jackylk/code/incubator-carbondata/target/store,org.carbondata.core.carbon.metadata.schema.table.CarbonTable@52d54ca2,Partitioner(org.carbondata.spark.partition.api.impl.SampleDataPartitionerImpl,[Ljava.lang.String;@62a877f4,1,[Ljava.lang.String;@3c180da5)),
>  None), true
> There are unnecessary ConvertToSafe before CarbonDictionaryDecoder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-80) Dictionary values should be equally distributed in buckets while loading in memory

2016-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392140#comment-15392140
 ] 

ASF GitHub Bot commented on CARBONDATA-80:
--

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/44#discussion_r72089681
  
--- Diff: 
core/src/main/java/org/carbondata/core/cache/dictionary/ColumnDictionaryInfo.java
 ---
@@ -112,10 +113,35 @@ public ColumnDictionaryInfo(DataType dataType) {
   /**
* This method will add a new dictionary chunk to existing list of 
dictionary chunks
*
-   * @param dictionaryChunk
+   * @param newDictionaryChunk
*/
-  @Override public void addDictionaryChunk(List dictionaryChunk) {
-dictionaryChunks.add(dictionaryChunk);
+  @Override public void addDictionaryChunk(List 
newDictionaryChunk) {
+if (dictionaryChunks.size() > 0) {
+  // Ensure that each time a new dictionary chunk is getting added to 
the
+  // dictionary chunks list, equal distribution of dictionary values 
should
+  // be there in the sublists of dictionary chunk list
+  List lastDictionaryChunk = 
dictionaryChunks.get(dictionaryChunks.size() - 1);
+  int dictionaryOneChunkSize = CarbonUtil.getDictionaryChunkSize();
+  int differenceInLastDictionaryAndOneChunkSize =
+  dictionaryOneChunkSize - lastDictionaryChunk.size();
+  if (differenceInLastDictionaryAndOneChunkSize > 0) {
+// if difference is greater than new dictionary size then copy a 
part of list
+// else copy the complete new dictionary chunk list in the last 
dictionary chunk list
+if (differenceInLastDictionaryAndOneChunkSize >= 
newDictionaryChunk.size()) {
+  lastDictionaryChunk.addAll(newDictionaryChunk);
+} else {
+  List subListOfNewDictionaryChunk =
+  newDictionaryChunk.subList(0, 
differenceInLastDictionaryAndOneChunkSize);
+  lastDictionaryChunk.addAll(subListOfNewDictionaryChunk);
+  subListOfNewDictionaryChunk.clear();
--- End diff --

Use one more sub list and add remaining


> Dictionary values should be equally distributed in buckets while loading in 
> memory
> --
>
> Key: CARBONDATA-80
> URL: https://issues.apache.org/jira/browse/CARBONDATA-80
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Manish Gupta
>Assignee: Manish Gupta
>Priority: Minor
>
> Whenever a query is executed, dictionary for columns queried is loaded in 
> memory. For incremental loads dictionary values are loaded incrementally and 
> thus one list contains several sub lists with dictionary values.
> The dictionary values on incremental load may not be equally distributed in 
> the sub buckets and this might increase the search time of a value if there 
> are too many incremental loads.
> Therefore the dictionary values should be divided equally in the sub buckets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-107) Remove unnecessary ConverToSafe in spark planner

2016-07-25 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-107:
---

 Summary: Remove unnecessary ConverToSafe in spark planner
 Key: CARBONDATA-107
 URL: https://issues.apache.org/jira/browse/CARBONDATA-107
 Project: CarbonData
  Issue Type: Improvement
  Components: carbon-spark
Affects Versions: Apache CarbonData 0.1.0-incubating
Reporter: Jacky Li
 Fix For: Apache CarbonData 0.1.0-incubating


Query: 
select ch, sum(c) from (select ch,count(1) as c from t2 group by ch) temp where 
c > 1 group by ch

Output plan looks like:

== Physical Plan ==
Limit 21
 ConvertToSafe
  CarbonDictionaryDecoder [CarbonDecoderRelation(Map(word#39 -> word#39, ch#40 
-> ch#40, value#41 -> 
value#41),CarbonDatasourceRelation(`default`.`t1`,None))], 
ExcludeProfile(ArrayBuffer(#103)), CarbonAliasDecoderRelation()
   ConvertToSafe
TungstenAggregate(key=[ch#40], 
functions=[(sum(c#101L),mode=Final,isDistinct=false)], output=[ch#40,_c1#102L])
 TungstenAggregate(key=[ch#40], 
functions=[(sum(c#101L),mode=Partial,isDistinct=false)], 
output=[ch#40,currentSum#122L])
  Filter (c#101L > FakeCarbonCast(1 as bigint))
   CarbonDictionaryDecoder [CarbonDecoderRelation(Map(word#39 -> word#39, 
ch#40 -> ch#40, value#41 -> 
value#41),CarbonDatasourceRelation(`default`.`t1`,None))], 
IncludeProfile(ArrayBuffer(#103)), CarbonAliasDecoderRelation()
ConvertToSafe
 TungstenAggregate(key=[ch#40], 
functions=[(count(1),mode=Final,isDistinct=false)], output=[ch#40,c#101L])
  TungstenExchange hashpartitioning(ch#40)
   TungstenAggregate(key=[ch#40], 
functions=[(count(1),mode=Partial,isDistinct=false)], 
output=[ch#40,currentCount#126L])
Project [ch#40]
 ConvertToSafe
  CarbonScan [ch#40], (CarbonRelation default, t1, 
CarbonMetaData(ArrayBuffer(word, 
ch),ArrayBuffer(value),org.carbondata.core.carbon.metadata.schema.table.CarbonTable@52d54ca2,DictionaryMap(Map(word
 -> true, ch -> true))), 
TableMeta(default_t1,/Users/jackylk/code/incubator-carbondata/target/store,org.carbondata.core.carbon.metadata.schema.table.CarbonTable@52d54ca2,Partitioner(org.carbondata.spark.partition.api.impl.SampleDataPartitionerImpl,[Ljava.lang.String;@62a877f4,1,[Ljava.lang.String;@3c180da5)),
 None), true

There are unnecessary ConvertToSafe before CarbonDictionaryDecoder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-106) Add audit logs for DDL commands

2016-07-25 Thread Manohar Vanam (JIRA)
Manohar Vanam created CARBONDATA-106:


 Summary: Add audit logs for DDL commands
 Key: CARBONDATA-106
 URL: https://issues.apache.org/jira/browse/CARBONDATA-106
 Project: CarbonData
  Issue Type: Improvement
Reporter: Manohar Vanam
Assignee: Manohar Vanam


Add audit logs for
1.Create table
2. Load table



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CARBONDATA-8) Use create table instead of cube in all test cases

2016-07-25 Thread Manohar Vanam (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391945#comment-15391945
 ] 

Manohar Vanam edited comment on CARBONDATA-8 at 7/25/16 2:10 PM:
-

Fixed in all modules


was (Author: manoharvanam):
 Changed in all places

> Use create table instead of cube in all test cases
> --
>
> Key: CARBONDATA-8
> URL: https://issues.apache.org/jira/browse/CARBONDATA-8
> Project: CarbonData
>  Issue Type: Test
>Reporter: Manohar Vanam
>Assignee: Manohar Vanam
>
> 1. Use create table instead of cube in all test cases
> 2. Remove unnecessary & duplicate  test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-8) Use create table instead of cube in all test cases

2016-07-25 Thread Manohar Vanam (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391945#comment-15391945
 ] 

Manohar Vanam commented on CARBONDATA-8:


 Changed in all places

> Use create table instead of cube in all test cases
> --
>
> Key: CARBONDATA-8
> URL: https://issues.apache.org/jira/browse/CARBONDATA-8
> Project: CarbonData
>  Issue Type: Test
>Reporter: Manohar Vanam
>Assignee: Manohar Vanam
>
> 1. Use create table instead of cube in all test cases
> 2. Remove unnecessary & duplicate  test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-60) wrong result when using union all

2016-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391867#comment-15391867
 ] 

ASF GitHub Bot commented on CARBONDATA-60:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/41


> wrong result when using union all
> -
>
> Key: CARBONDATA-60
> URL: https://issues.apache.org/jira/browse/CARBONDATA-60
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: Apache CarbonData 0.1.0-incubating
>Reporter: ray
>Assignee: Ravindra Pesala
>
> the issue can be reproduced by following code:
> the expected result is 1 row, but actual result is 2 rows.
> +---+---+
> | c1|_c1|
> +---+---+
> |200|  1|
> |279|  1|
> +---+---+
> import cc.implicits._
> val df=sc.parallelize(1 to 1000).map(x => (x+"", (x+100)+"")).toDF("c1", 
> "c2")
> import org.carbondata.spark._
> df.saveAsCarbonFile(Map("tableName" -> "carbon1"))
> cc.sql("""
> select c1,count(*) from(
>   select c1 as c1,c2 as c2 from carbon1
>   union all
>   select c2 as c1,c1 as c2 from carbon1
>  )t
>   where c1='200'
>   group by c1
> """).show()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-60) wrong result when using union all

2016-07-25 Thread ChenLiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-60?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenLiang updated CARBONDATA-60:

Assignee: Ravindra Pesala

> wrong result when using union all
> -
>
> Key: CARBONDATA-60
> URL: https://issues.apache.org/jira/browse/CARBONDATA-60
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: Apache CarbonData 0.1.0-incubating
>Reporter: ray
>Assignee: Ravindra Pesala
>
> the issue can be reproduced by following code:
> the expected result is 1 row, but actual result is 2 rows.
> +---+---+
> | c1|_c1|
> +---+---+
> |200|  1|
> |279|  1|
> +---+---+
> import cc.implicits._
> val df=sc.parallelize(1 to 1000).map(x => (x+"", (x+100)+"")).toDF("c1", 
> "c2")
> import org.carbondata.spark._
> df.saveAsCarbonFile(Map("tableName" -> "carbon1"))
> cc.sql("""
> select c1,count(*) from(
>   select c1 as c1,c2 as c2 from carbon1
>   union all
>   select c2 as c1,c1 as c2 from carbon1
>  )t
>   where c1='200'
>   group by c1
> """).show()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-60) wrong result when using union all

2016-07-25 Thread ChenLiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-60?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenLiang updated CARBONDATA-60:

Affects Version/s: Apache CarbonData 0.1.0-incubating

> wrong result when using union all
> -
>
> Key: CARBONDATA-60
> URL: https://issues.apache.org/jira/browse/CARBONDATA-60
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: Apache CarbonData 0.1.0-incubating
>Reporter: ray
>
> the issue can be reproduced by following code:
> the expected result is 1 row, but actual result is 2 rows.
> +---+---+
> | c1|_c1|
> +---+---+
> |200|  1|
> |279|  1|
> +---+---+
> import cc.implicits._
> val df=sc.parallelize(1 to 1000).map(x => (x+"", (x+100)+"")).toDF("c1", 
> "c2")
> import org.carbondata.spark._
> df.saveAsCarbonFile(Map("tableName" -> "carbon1"))
> cc.sql("""
> select c1,count(*) from(
>   select c1 as c1,c2 as c2 from carbon1
>   union all
>   select c2 as c1,c1 as c2 from carbon1
>  )t
>   where c1='200'
>   group by c1
> """).show()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-49) Can not query 3 million rows data which be loaded through local store system(not HDFS)

2016-07-25 Thread ChenLiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-49?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenLiang updated CARBONDATA-49:

Assignee: Ravindra Pesala

> Can not query 3 million rows data which be loaded through local store 
> system(not HDFS)
> --
>
> Key: CARBONDATA-49
> URL: https://issues.apache.org/jira/browse/CARBONDATA-49
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: Apache CarbonData 0.1.0-incubating
> Environment: spark 1.6.1
>Reporter: ChenLiang
>Assignee: Ravindra Pesala
>Priority: Minor
> Fix For: Apache CarbonData 0.1.0-incubating
>
>
> CSV data be stored at local machine(not HDSF), test result as below.
> 1.If the csv data is 1 million rows, all query is ok.
> 2.If the csv data is 3 million rows, query of cc.sql("select * from 
> tablename")  having the below errors:
> ERROR 11-07 20:56:54,131 - [Executor task launch 
> worker-12][partitionID:connectdemo;queryID:33111337863067_0]
> org.carbondata.scan.executor.exception.QueryExecutionException:
>   at 
> org.carbondata.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:99)
>   at 
> org.carbondata.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:178)
>   at 
> org.carbondata.scan.executor.impl.DetailRawRecordQueryExecutor.execute(DetailRawRecordQueryExecutor.java:20)
>   at 
> org.carbondata.spark.rdd.CarbonScanRDD$$anon$1.(CarbonScanRDD.scala:174)
>   at 
> org.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:155)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.carbondata.core.carbon.datastore.exception.IndexBuilderException:
>   at 
> org.carbondata.core.carbon.datastore.BlockIndexStore.fillLoadedBlocks(BlockIndexStore.java:211)
>   at 
> org.carbondata.core.carbon.datastore.BlockIndexStore.loadAndGetBlocks(BlockIndexStore.java:191)
>   at 
> org.carbondata.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:96)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-41) Run example, system can't normally end

2016-07-25 Thread ChenLiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenLiang updated CARBONDATA-41:

Affects Version/s: Apache CarbonData 0.1.0-incubating

> Run example, system can't normally end
> --
>
> Key: CARBONDATA-41
> URL: https://issues.apache.org/jira/browse/CARBONDATA-41
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: Apache CarbonData 0.1.0-incubating
>Reporter: ChenLiang
>Assignee: Ravindra Pesala
>Priority: Minor
>
> Reproduce steps:
> 1.Run CarbonExample.scala in Intellij IDEA
> 2.Get result, but system can't normally end.
> 3.Run CarbonExample.scala again in Intellij IDEA, get the below errors:
>  FAILED SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address 
> already in use
> java.net.BindException: Address already in use
> need to manually stop it, then can run CarbonExample.scala again in Intellij 
> IDEA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CARBONDATA-49) Can not query 3 million rows data which be loaded through local store system(not HDFS)

2016-07-25 Thread ChenLiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-49?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenLiang resolved CARBONDATA-49.
-
Resolution: Fixed

> Can not query 3 million rows data which be loaded through local store 
> system(not HDFS)
> --
>
> Key: CARBONDATA-49
> URL: https://issues.apache.org/jira/browse/CARBONDATA-49
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: Apache CarbonData 0.1.0-incubating
> Environment: spark 1.6.1
>Reporter: ChenLiang
>Assignee: Ravindra Pesala
>Priority: Minor
> Fix For: Apache CarbonData 0.1.0-incubating
>
>
> CSV data be stored at local machine(not HDSF), test result as below.
> 1.If the csv data is 1 million rows, all query is ok.
> 2.If the csv data is 3 million rows, query of cc.sql("select * from 
> tablename")  having the below errors:
> ERROR 11-07 20:56:54,131 - [Executor task launch 
> worker-12][partitionID:connectdemo;queryID:33111337863067_0]
> org.carbondata.scan.executor.exception.QueryExecutionException:
>   at 
> org.carbondata.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:99)
>   at 
> org.carbondata.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:178)
>   at 
> org.carbondata.scan.executor.impl.DetailRawRecordQueryExecutor.execute(DetailRawRecordQueryExecutor.java:20)
>   at 
> org.carbondata.spark.rdd.CarbonScanRDD$$anon$1.(CarbonScanRDD.scala:174)
>   at 
> org.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:155)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.carbondata.core.carbon.datastore.exception.IndexBuilderException:
>   at 
> org.carbondata.core.carbon.datastore.BlockIndexStore.fillLoadedBlocks(BlockIndexStore.java:211)
>   at 
> org.carbondata.core.carbon.datastore.BlockIndexStore.loadAndGetBlocks(BlockIndexStore.java:191)
>   at 
> org.carbondata.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:96)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CARBONDATA-88) Use ./bin/carbon-spark-shell to run, generated two issues

2016-07-25 Thread ChenLiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-88?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenLiang resolved CARBONDATA-88.
-
Resolution: Fixed

> Use ./bin/carbon-spark-shell to run, generated two issues
> -
>
> Key: CARBONDATA-88
> URL: https://issues.apache.org/jira/browse/CARBONDATA-88
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: Apache CarbonData 0.1.0-incubating
> Environment: Apache Spark1.6.1
> The latest master code of Apache CarbonData
>Reporter: ChenLiang
>Assignee: Ravindra Pesala
>Priority: Minor
>
> Use ./bin/carbon-spark-shell to run, generated two issues:
> 1. The carbonshellstore be created under root directory, propose to move 
> carbonshellstore to ./bin directory
> 2.Data load failure:
> scala> cc.sql("LOAD DATA LOCAL INPATH 
> '/Users/apple/Downloads/spark-1.6.1-bin-hadoop2.6/carbondata/hzmeetup.csv' 
> INTO TABLE meetupTable")
> java.lang.Exception: Dataload failure
>   at 
> org.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:791)
>   at 
> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1167)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:145)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
>   at 
> org.carbondata.spark.rdd.CarbonDataFrameRDD.(CarbonDataFrameRDD.scala:23)
>   at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:131)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:42)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-49) Can not query 3 million rows data which be loaded through local store system(not HDFS)

2016-07-25 Thread ChenLiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-49?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenLiang updated CARBONDATA-49:

Affects Version/s: Apache CarbonData 0.1.0-incubating
Fix Version/s: Apache CarbonData 0.1.0-incubating
  Component/s: core

> Can not query 3 million rows data which be loaded through local store 
> system(not HDFS)
> --
>
> Key: CARBONDATA-49
> URL: https://issues.apache.org/jira/browse/CARBONDATA-49
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: Apache CarbonData 0.1.0-incubating
> Environment: spark 1.6.1
>Reporter: ChenLiang
>Priority: Minor
> Fix For: Apache CarbonData 0.1.0-incubating
>
>
> CSV data be stored at local machine(not HDSF), test result as below.
> 1.If the csv data is 1 million rows, all query is ok.
> 2.If the csv data is 3 million rows, query of cc.sql("select * from 
> tablename")  having the below errors:
> ERROR 11-07 20:56:54,131 - [Executor task launch 
> worker-12][partitionID:connectdemo;queryID:33111337863067_0]
> org.carbondata.scan.executor.exception.QueryExecutionException:
>   at 
> org.carbondata.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:99)
>   at 
> org.carbondata.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:178)
>   at 
> org.carbondata.scan.executor.impl.DetailRawRecordQueryExecutor.execute(DetailRawRecordQueryExecutor.java:20)
>   at 
> org.carbondata.spark.rdd.CarbonScanRDD$$anon$1.(CarbonScanRDD.scala:174)
>   at 
> org.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:155)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.carbondata.core.carbon.datastore.exception.IndexBuilderException:
>   at 
> org.carbondata.core.carbon.datastore.BlockIndexStore.fillLoadedBlocks(BlockIndexStore.java:211)
>   at 
> org.carbondata.core.carbon.datastore.BlockIndexStore.loadAndGetBlocks(BlockIndexStore.java:191)
>   at 
> org.carbondata.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:96)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CARBONDATA-103) Rename CreateCube to CreateTable to correct the audit log of create table commnad

2016-07-25 Thread Mohammad Shahid Khan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Shahid Khan resolved CARBONDATA-103.
-
Resolution: Duplicate

> Rename CreateCube to CreateTable to correct the audit log of create table 
> commnad
> -
>
> Key: CARBONDATA-103
> URL: https://issues.apache.org/jira/browse/CARBONDATA-103
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-103) Rename CreateCube to CreateTable to correct the audit log of create table commnad

2016-07-25 Thread Mohammad Shahid Khan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Shahid Khan updated CARBONDATA-103:

Assignee: (was: Mohammad Shahid Khan)

> Rename CreateCube to CreateTable to correct the audit log of create table 
> commnad
> -
>
> Key: CARBONDATA-103
> URL: https://issues.apache.org/jira/browse/CARBONDATA-103
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Mohammad Shahid Khan
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-105) Correct precalculation of dictionary file existence

2016-07-25 Thread Ashok Kumar (JIRA)
Ashok Kumar created CARBONDATA-105:
--

 Summary: Correct precalculation of dictionary file existence
 Key: CARBONDATA-105
 URL: https://issues.apache.org/jira/browse/CARBONDATA-105
 Project: CarbonData
  Issue Type: Bug
Reporter: Ashok Kumar
Priority: Minor


In case of concurrent data loading,pre calculation of existence of dictionary 
file will not have proper result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-104) To support varchar datatype

2016-07-25 Thread zhangshunyu (JIRA)
zhangshunyu created CARBONDATA-104:
--

 Summary: To support varchar datatype
 Key: CARBONDATA-104
 URL: https://issues.apache.org/jira/browse/CARBONDATA-104
 Project: CarbonData
  Issue Type: New Feature
Reporter: zhangshunyu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CARBONDATA-30) Record load performance statistics

2016-07-25 Thread zhangshunyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangshunyu resolved CARBONDATA-30.
---
Resolution: Resolved

> Record load performance statistics
> --
>
> Key: CARBONDATA-30
> URL: https://issues.apache.org/jira/browse/CARBONDATA-30
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: zhangshunyu
>Assignee: zhangshunyu
>
> We should use a parameter which can be configured by user to determine 
> whether the statistics will be recorded and calculated during data loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-103) Rename CreateCube to CreateTable to correct the audit log of create table commnad

2016-07-25 Thread Mohammad Shahid Khan (JIRA)
Mohammad Shahid Khan created CARBONDATA-103:
---

 Summary: Rename CreateCube to CreateTable to correct the audit log 
of create table commnad
 Key: CARBONDATA-103
 URL: https://issues.apache.org/jira/browse/CARBONDATA-103
 Project: CarbonData
  Issue Type: Bug
Reporter: Mohammad Shahid Khan
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CARBONDATA-103) Rename CreateCube to CreateTable to correct the audit log of create table commnad

2016-07-25 Thread Mohammad Shahid Khan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Shahid Khan reassigned CARBONDATA-103:
---

Assignee: Mohammad Shahid Khan

> Rename CreateCube to CreateTable to correct the audit log of create table 
> commnad
> -
>
> Key: CARBONDATA-103
> URL: https://issues.apache.org/jira/browse/CARBONDATA-103
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)