[jira] [Commented] (CARBONDATA-109) 500g Dataload Failure in a spark cluster

2016-07-25 Thread ChenLiang (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393147#comment-15393147
 ] 

ChenLiang commented on CARBONDATA-109:
--

[~xiaoyesoso] before submitting this to JIRA as a real issue,  it would be 
better if you could send this question to mailing list for adequate 
discussion(mailing list :  dev@carbondata.incubator.apache.org )

> 500g Dataload Failure in a spark cluster
> 
>
> Key: CARBONDATA-109
> URL: https://issues.apache.org/jira/browse/CARBONDATA-109
> Project: CarbonData
>  Issue Type: Bug
>  Components: carbon-spark
>Reporter: Shoujie Zhuo
>
> INFO  26-07 10:54:28,630 - starting clean up**
> INFO  26-07 10:54:28,766 - clean up done**
> AUDIT 26-07 10:54:28,767 - [holodesk01][hdfs][Thread-1]Data load is failed 
> for tpcds_carbon_500_part.store_sales
> WARN  26-07 10:54:28,768 - Unable to write load metadata file
> ERROR 26-07 10:54:28,769 - main 
> java.lang.Exception: Dataload failure
>   at 
> org.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:791)
>   at 
> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1167)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:145)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
>   at 
> org.carbondata.spark.rdd.CarbonDataFrameRDD.(CarbonDataFrameRDD.scala:23)
>   at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:131)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:311)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226)
>   at 
> org.apache.spark.sql.hive.cli.CarbonSQLCLIDriver$.main(CarbonSQLCLIDriver.scala:40)
>   at 
> org.apache.spark.sql.hive.cli.CarbonSQLCLIDriver.main(CarbonSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> AUDIT 26-07 10:54:28,772 - [holodesk01][hdfs][Thread-1]Dataload failure for 
> tpcds_carbon_500_part.store_sales. Please check the logs
> INFO  26-07 10:54:28,775 - Table MetaData Unlocked Successfully after data 
> load
> ERROR 26-07 10:54:28,776 - Failed in [LOAD DATA inpath 
> 'hdfs://holodesk01/user/carbon-spark-sql/tpcds/500/store_sales' INTO table 
> store_sales]
> java.lang.Exception: Dataload failure
>   at 
> org.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:791)
>   at 
> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1167)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>   at 

[jira] [Created] (CARBONDATA-106) Add audit logs for DDL commands

2016-07-25 Thread Manohar Vanam (JIRA)
Manohar Vanam created CARBONDATA-106:


 Summary: Add audit logs for DDL commands
 Key: CARBONDATA-106
 URL: https://issues.apache.org/jira/browse/CARBONDATA-106
 Project: CarbonData
  Issue Type: Improvement
Reporter: Manohar Vanam
Assignee: Manohar Vanam


Add audit logs for
1.Create table
2. Load table



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CARBONDATA-8) Use create table instead of cube in all test cases

2016-07-25 Thread Manohar Vanam (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391945#comment-15391945
 ] 

Manohar Vanam edited comment on CARBONDATA-8 at 7/25/16 2:10 PM:
-

Fixed in all modules


was (Author: manoharvanam):
 Changed in all places

> Use create table instead of cube in all test cases
> --
>
> Key: CARBONDATA-8
> URL: https://issues.apache.org/jira/browse/CARBONDATA-8
> Project: CarbonData
>  Issue Type: Test
>Reporter: Manohar Vanam
>Assignee: Manohar Vanam
>
> 1. Use create table instead of cube in all test cases
> 2. Remove unnecessary & duplicate  test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-60) wrong result when using union all

2016-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391867#comment-15391867
 ] 

ASF GitHub Bot commented on CARBONDATA-60:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/41


> wrong result when using union all
> -
>
> Key: CARBONDATA-60
> URL: https://issues.apache.org/jira/browse/CARBONDATA-60
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: Apache CarbonData 0.1.0-incubating
>Reporter: ray
>Assignee: Ravindra Pesala
>
> the issue can be reproduced by following code:
> the expected result is 1 row, but actual result is 2 rows.
> +---+---+
> | c1|_c1|
> +---+---+
> |200|  1|
> |279|  1|
> +---+---+
> import cc.implicits._
> val df=sc.parallelize(1 to 1000).map(x => (x+"", (x+100)+"")).toDF("c1", 
> "c2")
> import org.carbondata.spark._
> df.saveAsCarbonFile(Map("tableName" -> "carbon1"))
> cc.sql("""
> select c1,count(*) from(
>   select c1 as c1,c2 as c2 from carbon1
>   union all
>   select c2 as c1,c1 as c2 from carbon1
>  )t
>   where c1='200'
>   group by c1
> """).show()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-49) Can not query 3 million rows data which be loaded through local store system(not HDFS)

2016-07-25 Thread ChenLiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-49?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenLiang updated CARBONDATA-49:

Affects Version/s: Apache CarbonData 0.1.0-incubating
Fix Version/s: Apache CarbonData 0.1.0-incubating
  Component/s: core

> Can not query 3 million rows data which be loaded through local store 
> system(not HDFS)
> --
>
> Key: CARBONDATA-49
> URL: https://issues.apache.org/jira/browse/CARBONDATA-49
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: Apache CarbonData 0.1.0-incubating
> Environment: spark 1.6.1
>Reporter: ChenLiang
>Priority: Minor
> Fix For: Apache CarbonData 0.1.0-incubating
>
>
> CSV data be stored at local machine(not HDSF), test result as below.
> 1.If the csv data is 1 million rows, all query is ok.
> 2.If the csv data is 3 million rows, query of cc.sql("select * from 
> tablename")  having the below errors:
> ERROR 11-07 20:56:54,131 - [Executor task launch 
> worker-12][partitionID:connectdemo;queryID:33111337863067_0]
> org.carbondata.scan.executor.exception.QueryExecutionException:
>   at 
> org.carbondata.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:99)
>   at 
> org.carbondata.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:178)
>   at 
> org.carbondata.scan.executor.impl.DetailRawRecordQueryExecutor.execute(DetailRawRecordQueryExecutor.java:20)
>   at 
> org.carbondata.spark.rdd.CarbonScanRDD$$anon$1.(CarbonScanRDD.scala:174)
>   at 
> org.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:155)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.carbondata.core.carbon.datastore.exception.IndexBuilderException:
>   at 
> org.carbondata.core.carbon.datastore.BlockIndexStore.fillLoadedBlocks(BlockIndexStore.java:211)
>   at 
> org.carbondata.core.carbon.datastore.BlockIndexStore.loadAndGetBlocks(BlockIndexStore.java:191)
>   at 
> org.carbondata.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:96)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CARBONDATA-103) Rename CreateCube to CreateTable to correct the audit log of create table commnad

2016-07-25 Thread Mohammad Shahid Khan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Shahid Khan resolved CARBONDATA-103.
-
Resolution: Duplicate

> Rename CreateCube to CreateTable to correct the audit log of create table 
> commnad
> -
>
> Key: CARBONDATA-103
> URL: https://issues.apache.org/jira/browse/CARBONDATA-103
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-103) Rename CreateCube to CreateTable to correct the audit log of create table commnad

2016-07-25 Thread Mohammad Shahid Khan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Shahid Khan updated CARBONDATA-103:

Assignee: (was: Mohammad Shahid Khan)

> Rename CreateCube to CreateTable to correct the audit log of create table 
> commnad
> -
>
> Key: CARBONDATA-103
> URL: https://issues.apache.org/jira/browse/CARBONDATA-103
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Mohammad Shahid Khan
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-105) Correct precalculation of dictionary file existence

2016-07-25 Thread Ashok Kumar (JIRA)
Ashok Kumar created CARBONDATA-105:
--

 Summary: Correct precalculation of dictionary file existence
 Key: CARBONDATA-105
 URL: https://issues.apache.org/jira/browse/CARBONDATA-105
 Project: CarbonData
  Issue Type: Bug
Reporter: Ashok Kumar
Priority: Minor


In case of concurrent data loading,pre calculation of existence of dictionary 
file will not have proper result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-104) To support varchar datatype

2016-07-25 Thread zhangshunyu (JIRA)
zhangshunyu created CARBONDATA-104:
--

 Summary: To support varchar datatype
 Key: CARBONDATA-104
 URL: https://issues.apache.org/jira/browse/CARBONDATA-104
 Project: CarbonData
  Issue Type: New Feature
Reporter: zhangshunyu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-103) Rename CreateCube to CreateTable to correct the audit log of create table commnad

2016-07-25 Thread Mohammad Shahid Khan (JIRA)
Mohammad Shahid Khan created CARBONDATA-103:
---

 Summary: Rename CreateCube to CreateTable to correct the audit log 
of create table commnad
 Key: CARBONDATA-103
 URL: https://issues.apache.org/jira/browse/CARBONDATA-103
 Project: CarbonData
  Issue Type: Bug
Reporter: Mohammad Shahid Khan
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CARBONDATA-103) Rename CreateCube to CreateTable to correct the audit log of create table commnad

2016-07-25 Thread Mohammad Shahid Khan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Shahid Khan reassigned CARBONDATA-103:
---

Assignee: Mohammad Shahid Khan

> Rename CreateCube to CreateTable to correct the audit log of create table 
> commnad
> -
>
> Key: CARBONDATA-103
> URL: https://issues.apache.org/jira/browse/CARBONDATA-103
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


1

2016-07-25 Thread Peng Li



[jira] [Commented] (CARBONDATA-102) Exclude the Spark and hadoop from CarbonData assembly jar by default and reduce the jar file size

2016-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391395#comment-15391395
 ] 

ASF GitHub Bot commented on CARBONDATA-102:
---

GitHub user ravipesala opened a pull request:

https://github.com/apache/incubator-carbondata/pull/53

[CARBONDATA-102]Reduce the size of the CarbonData jar file.

Following modifications done with this PR.
1. Refactored and cleaned POM to remove unnecessary dependency jar files.
2. Default CarbonData assembly jar does not include Spark, Scala and Hadoop 
dependencies. Example:
``` 
mvn clean -DskipTests package
```
3. User can provide profile `include-all` to include all dependencies. 
Example:
```
mvn clean -DskipTests -Pinclude-all package
```

Default size of CarbonData jar is 19MB
With including all dependencies, CarbonData jar size is 207MB


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ravipesala/incubator-carbondata localstore_bug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/53.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #53


commit 3cc132a17454afa466e66da6449de71e9ec0c736
Author: ravipesala 
Date:   2016-07-24T12:01:20Z

Refactored and cleaned up POM




> Exclude the Spark and hadoop from CarbonData assembly jar by default and 
> reduce the jar file size
> -
>
> Key: CARBONDATA-102
> URL: https://issues.apache.org/jira/browse/CARBONDATA-102
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Ravindra Pesala
>Priority: Minor
> Fix For: Apache CarbonData 0.1.0-incubating
>
>
> Currently CarbonData assembly jar is huge and it is about 200MB size because 
> it includes Spark, Scala and Hadoop dependency jars.
> So we should not include Hadoop, Scala and Spark dependencies in CarbonData 
> jar by default as it is going to be deployed in Spark cluster.
> If user wish to include them we should give the option in maven build to 
> include all dependencies.
> Like default build like below will have only CarbonData and its dependencies 
> apart from Spark, Scala and Hadoop dependencies. 
> {code}
> mvn clean -DskipTests package
> {code}
> Below build includes all dependencies like Spark, Scala and Hadoop
> {code}
> mvn clean -DskipTests -Pinclude-all package
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)