date:20171120

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1542
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1329/



---

[GitHub] carbondata pull request #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg ...

2017-11-20 Thread kunal642

GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/1542

[CARBONDATA-1757] [PreAgg] Fix for wrong avg values after pre-agg table 
creation

when a sum/count aggregation function is applied on the same column along 
with avg. The plan that was getting transformed was adding 2 columns for 
sum/count which resulted in wrong data being inserted.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [X] Any interfaces changed?
 No
 - [X] Any backward compatibility impacted?
 No
 - [X] Document update required?
No
 - [X] Testing done
Test case added
 - [X] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
No


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata pre_agg_avg_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1542.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1542


commit f05ba4faabc70c19574ed200ddb88909d69cf6e3
Author: kunal642 
Date:   2017-11-21T07:25:25Z

fixed wrong avg count bug when a sum/count aggregation function is applied 
on the same column along with avg.
The plan that was getting transformed was adding 2 columns for sum/count 
which resulted in wrong data being inserted.




---

[GitHub] carbondata issue #1508: [CARBONDATA-1738] Block direct insert/load on pre-ag...

2017-11-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1508
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1784/



---

[jira] [Assigned] (CARBONDATA-1757) Carbon 1.3.0- Pre_aggregate: After creating datamap on parent table, avg is not correct.

2017-11-20 Thread Kunal Kapoor (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor reassigned CARBONDATA-1757:


Assignee: Kunal Kapoor

> Carbon 1.3.0- Pre_aggregate: After creating datamap on parent table, avg is 
> not correct.
> 
>
> Key: CARBONDATA-1757
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1757
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 1.3.0
>Reporter: Ayushi Sharma
>Assignee: Kunal Kapoor
>  Labels: functional
>
> Steps:
> 1. create table cust_2 (c_custkey int, c_name string, c_address string, 
> c_nationkey bigint, c_phone string,c_acctbal decimal, c_mktsegment string, 
> c_comment string) STORED BY 'org.apache.carbondata.format'; 
> 2. load data  inpath 'hdfs://hacluster/customer/customer3.csv' into table 
> cust_2 
> options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment');
> load data  inpath 'hdfs://hacluster/customer/customer3.csv' into table cust_2 
> options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment');
> load data  inpath 'hdfs://hacluster/customer/customer4.csv' into table cust_2 
> options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment');
> load data  inpath 'hdfs://hacluster/customer/customer5.csv' into table cust_2 
> options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment');
> load data  inpath 'hdfs://hacluster/customer/customer6.csv' into table cust_2 
> options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment');
> load data  inpath 'hdfs://hacluster/customer/customer7.csv' into table cust_2 
> options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment');
> load data  inpath 'hdfs://hacluster/customer/customer8.csv' into table cust_2 
> options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment');
> load data  inpath 'hdfs://hacluster/customer/customer9.csv' into table cust_2 
> options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment');
> load data  inpath 'hdfs://hacluster/customer/customer10.csv' into table 
> cust_2 
> options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment');
> load data  inpath 'hdfs://hacluster/customer/customer11.csv' into table 
> cust_2 
> options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment');
> load data  inpath 'hdfs://hacluster/customer/customer12.csv' into table 
> cust_2 
> options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment');
> load data  inpath 'hdfs://hacluster/customer/customer13.csv' into table 
> cust_2 
> options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment');
> load data  inpath 'hdfs://hacluster/customer/customer14.csv' into table 
> cust_2 
> options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment');
> 3. SELECT c_custkey, c_name, sum(c_acctbal), avg(c_acctbal) FROM cust_2 GROUP 
> BY c_custkey, c_name;
> 4. set carbon.input.segments.default.cust_2=0,1;
> 5. SELECT c_custkey, c_name, sum(c_acctbal), avg(c_acctbal) FROM cust_2 GROUP 
> BY c_custkey, c_name;
> 6. CREATE DATAMAP tt1 ON TABLE cust_2 USING 
> "org.apache.carbondata.datamap.AggregateDataMapHandler" AS SELECT c_custkey, 
> c_name, sum(c_acctbal), avg(c_acctbal) FROM cust_2 GROUP BY c_custkey, c_name;
> 7.  SELECT c_custkey, c_name, sum(c_acctbal), avg(c_acctbal) FROM cust_2 
> GROUP BY c_custkey, c_name;
> 8. set carbon.input.segments.default.cust_2=*;
> 9. SELECT c_custkey, c_name, sum(c_acctbal), avg(c_acctbal) FROM cust_2 GROUP 
> BY c_custkey, c_name;
> Issue:
> After creating datamap, avg is not correct
> Expected Output:
> Avg should have been displayed correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all

2017-11-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1537
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1783/



---

[jira] [Created] (CARBONDATA-1787) Carbon 1.3.0- Global Sort: Global_Sort_Partitions parameter doesn't work, if specified in the Tblproperties, while creating the table.

2017-11-20 Thread Ayushi Sharma (JIRA)

Ayushi Sharma created CARBONDATA-1787:
-

 Summary: Carbon 1.3.0- Global Sort: Global_Sort_Partitions 
parameter doesn't work, if specified in the Tblproperties, while creating the 
table.
 Key: CARBONDATA-1787
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1787
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Affects Versions: 1.3.0
Reporter: Ayushi Sharma
Priority: Minor


Steps:
1. create table tstcust(c_custkey int, c_name string, c_address string, 
c_nationkey bigint, c_phone string,c_acctbal decimal, c_mktsegment string, 
c_comment string) STORED BY 'org.apache.carbondata.format' 
tblproperties('sort_scope'='global_sort','GLOBAL_SORT_PARTITIONS'='2');

Issue: 
Global_Sort_Partitions when specified during creation of table, it doesn't 
work, whereas the same property works, if specified during the data load. 

Expected:
Either it should throw error for the property if it is specified in the load 
like it throws for the sort_scope or the same thing should be updated in the 
document.







--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1167: [CARBONDATA-1304] [IUD BuggFix] Iud with single pass

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1167
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1327/



---

[jira] [Created] (CARBONDATA-1786) Getting null pointer exception while loading data into table and while fetching data getting NULL values

2017-11-20 Thread Vandana Yadav (JIRA)

Vandana Yadav created CARBONDATA-1786:
-

 Summary: Getting null pointer exception while loading data into 
table and while fetching data getting NULL values
 Key: CARBONDATA-1786
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1786
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Affects Versions: 1.3.0
 Environment: spark 2.1
Reporter: Vandana Yadav
Priority: Blocker
 Attachments: 2000_UniqData.csv

Getting null pointer exception while loading data into table and while fetching 
data getting NULL values

Steps to reproduce:
1)Create table:
 CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ("TABLE_BLOCKSIZE"= 
"256 MB");

2)Load Data
LOAD DATA INPATH 'hdfs://localhost:54310/Data/uniqdata/2000_UniqData.csv' into 
table uniqdata OPTIONS('DELIMITER'='/' , 
'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','TIMESTAMPFORMAT'='-mm-dd
 hh:mm:ss');

3) Expected result: it should load data into table successfully.

4) Actual Result: it throws an error
Error: java.lang.NullPointerException (state=,code=0)

logs:
java.lang.NullPointerException
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at 
org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.delete(AbstractDFSCarbonFile.java:142)
at 
org.apache.carbondata.processing.util.DeleteLoadFolders.physicalFactAndMeasureMetadataDeletion(DeleteLoadFolders.java:79)
at 
org.apache.carbondata.processing.util.DeleteLoadFolders.deleteLoadFoldersFromFileSystem(DeleteLoadFolders.java:134)
at 
org.apache.carbondata.spark.rdd.DataManagementFunc$.deleteLoadsAndUpdateMetadata(DataManagementFunc.scala:188)
at 
org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:281)
at 
org.apache.spark.sql.execution.command.management.LoadTableCommand.loadData(LoadTableCommand.scala:347)
at 
org.apache.spark.sql.execution.command.management.LoadTableCommand.processData(LoadTableCommand.scala:183)
at 
org.apache.spark.sql.execution.command.management.LoadTableCommand.run(LoadTableCommand.scala:64)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
at org.apache.spark.sql.Dataset.(Dataset.scala:185)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:220)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:163)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:160)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:173)
at 
ja

[GitHub] carbondata issue #1540: [CARBONDATA-1784] clear column group code

2017-11-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1540
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1782/



---

[GitHub] carbondata issue #1541: [CARBONDATA-1785][Build] add coveralls badge to carb...

2017-11-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1541
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1781/



---

[GitHub] carbondata issue #1508: [CARBONDATA-1738] Block direct insert/load on pre-ag...

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1508
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1326/



---

[GitHub] carbondata issue #1541: [CARBONDATA-1785][Build] add coveralls badge to carb...

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1541
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1325/



---

[GitHub] carbondata issue #1536: [CARBONDATA-1776] Fix some possible test errors that...

2017-11-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1536
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1780/



---

[GitHub] carbondata pull request #1541: [CARBONDATA-1785][Build] add coveralls badge ...

2017-11-20 Thread sraghunandan

GitHub user sraghunandan opened a pull request:

https://github.com/apache/carbondata/pull/1541

[CARBONDATA-1785][Build] add coveralls badge to carbondata

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed? No
 
 - [ ] Any backward compatibility impacted? No
 
 - [ ] Document update required? No

 - [ ] Testing done NA
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. NA



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sraghunandan/carbondata-1 coverage_badge

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1541.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1541


commit b6b38a11ef4f97ef57168aa987d45dbed338b8eb
Author: sraghunandan 
Date:   2017-11-21T05:28:14Z

add coveralls badge to carbondata




---

[jira] [Created] (CARBONDATA-1785) Add Coveralls codecoverage badge to carbondata

2017-11-20 Thread Venkata Ramana G (JIRA)

Venkata Ramana G created CARBONDATA-1785:


 Summary: Add Coveralls codecoverage badge to carbondata
 Key: CARBONDATA-1785
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1785
 Project: CarbonData
  Issue Type: Improvement
Reporter: Venkata Ramana G
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1508: [CARBONDATA-1738] Block direct insert/load on pre-ag...

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1508
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1324/



---

[GitHub] carbondata pull request #1416: [WIP] [CARBONDATA-1592] Adding event listener...

2017-11-20 Thread manishgupta88

Github user manishgupta88 closed the pull request at:

https://github.com/apache/carbondata/pull/1416


---

[GitHub] carbondata issue #1416: [WIP] [CARBONDATA-1592] Adding event listener interf...

2017-11-20 Thread manishgupta88

Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1416
  
Code already merged as part of PR #1473 


---

[GitHub] carbondata pull request #1539: [CARBONDATA-1780] Create configuration from S...

2017-11-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1539


---

[GitHub] carbondata issue #1539: [CARBONDATA-1780] Create configuration from SparkSes...

2017-11-20 Thread jackylk

Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/1539
  
LGTM


---

[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1537
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1323/



---

[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all

2017-11-20 Thread chenerlu

Github user chenerlu commented on the issue:

https://github.com/apache/carbondata/pull/1537
  
retest this please


---

[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1537
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1321/



---

[GitHub] carbondata issue #1540: [CARBONDATA-1784] clear column group code

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1540
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1322/



---

[GitHub] carbondata issue #1503: [CARBONDATA-1730] Support skip.header.line.count opt...

2017-11-20 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/1503
  
@sehriff  please squash all commits to one commit


---

[GitHub] carbondata pull request #1540: [CARBONDATA-1784] clear column group code

2017-11-20 Thread chenliang613

GitHub user chenliang613 opened a pull request:

https://github.com/apache/carbondata/pull/1540

[CARBONDATA-1784] clear column group code

Clear column group code.

 - [X] Any interfaces changed?
 NA
 - [X] Any backward compatibility impacted?
 NA
 - [X] Document update required?
NA
 - [X] Testing done
NA  
 - [X] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenliang613/carbondata col_group

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1540.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1540


commit 0324cce43cde7cb753b6fe958ec01d6312acabe4
Author: chenliang613 
Date:   2017-11-21T03:04:49Z

[CARBONDATA-1784] clear column group code




---

[jira] [Created] (CARBONDATA-1784) Clear column group code

2017-11-20 Thread Liang Chen (JIRA)

Liang Chen created CARBONDATA-1784:
--

 Summary: Clear column group code
 Key: CARBONDATA-1784
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1784
 Project: CarbonData
  Issue Type: Task
  Components: core
Reporter: Liang Chen
Assignee: Liang Chen
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1499: [WIP][CARBONDATA-1235]Add Lucene Datamap

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1499
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1320/



---

[GitHub] carbondata issue #1538: [CARBONDATA-1779] GenericVectorizedReader

2017-11-20 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/1538
  
@bhavya411  please add the detail description for this pull request.


---

[GitHub] carbondata pull request #1538: [CARBONDATA-1779] GenericVectorizedReader

2017-11-20 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1538#discussion_r152168110
  
--- Diff: integration/presto/pom.xml ---
@@ -31,7 +31,7 @@
   presto-plugin
 
   
-0.186
--- End diff --

why changed presto version again ?


---

[GitHub] carbondata issue #1496: [CARBONDATA-1709][DataFrame] Support sort_columns op...

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1496
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1319/



---

[GitHub] carbondata pull request #1516: [CARBONDATA-1729]Fix the compatibility issue ...

2017-11-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1516


---

[jira] [Commented] (CARBONDATA-1778) Support clean garbage segments for all

2017-11-20 Thread xuchuanyin (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16260118#comment-16260118
 ] 

xuchuanyin commented on CARBONDATA-1778:


[~chenerlu] Aren't the garbage segments be cleaned in a specific period? Will 
it be better to leave the work to a timer?

> Support clean garbage segments for all
> --
>
> Key: CARBONDATA-1778
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1778
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: chenerlu
>Assignee: chenerlu
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (CARBONDATA-1777) Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell sessions are not used in the beeline session

2017-11-20 Thread kumar vishal (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kumar vishal reassigned CARBONDATA-1777:


Assignee: kumar vishal  (was: Kunal Kapoor)

> Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell 
> sessions are not used in the beeline session
> -
>
> Key: CARBONDATA-1777
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1777
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.3.0
> Environment: Test - 3 node ant cluster
>Reporter: Ramakrishna S
>Assignee: kumar vishal
>Priority: Minor
>  Labels: DFX
> Fix For: 1.3.0
>
>
> Steps:
> Beeline:
> 1. Create table and load with  data
> Spark-shell:
> 1. create a pre-aggregate table
> Beeline:
> 1. Run aggregate query
> *+Expected:+* Pre-aggregate table should be used in the aggregate query 
> *+Actual:+* Pre-aggregate table is not used
> 1.
> create table if not exists lineitem1(L_SHIPDATE string,L_SHIPMODE 
> string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE 
> string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY   string,L_LINENUMBER 
> int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX 
> double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT  string) STORED BY 
> 'org.apache.carbondata.format' TBLPROPERTIES 
> ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'='');
> load data inpath "hdfs://hacluster/user/test/lineitem.tbl.5" into table 
> lineitem1 
> options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
> 2. 
>  carbon.sql("create datamap agr1_lineitem1 ON TABLE lineitem1 USING 
> 'org.apache.carbondata.datamap.AggregateDataMapHandler' as select 
> l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) 
> from lineitem1 group by l_returnflag, l_linestatus").show();
> 3. 
> select 
> l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) 
> from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus;
> Actual:
> 0: jdbc:hive2://10.18.98.136:23040> show tables;
> +---+---+--+--+
> | database  | tableName | isTemporary  |
> +---+---+--+--+
> | test_db2  | lineitem1 | false|
> | test_db2  | lineitem1_agr1_lineitem1  | false|
> +---+---+--+--+
> 2 rows selected (0.047 seconds)
> Logs:
> 2017-11-20 15:46:48,314 | INFO  | [pool-23-thread-53] | Running query 'select 
> l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) 
> from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus' 
> with 7f3091a8-4d7b-40ac-840f-9db6f564c9cf | 
> org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
> 2017-11-20 15:46:48,314 | INFO  | [pool-23-thread-53] | Parsing command: 
> select 
> l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) 
> from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus | 
> org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
> 2017-11-20 15:46:48,353 | INFO  | [pool-23-thread-53] | 55: get_table : 
> db=test_db2 tbl=lineitem1 | 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:746)
> 2017-11-20 15:46:48,353 | INFO  | [pool-23-thread-53] | ugi=anonymous 
> ip=unknown-ip-addr  cmd=get_table : db=test_db2 tbl=lineitem1| 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:371)
> 2017-11-20 15:46:48,354 | INFO  | [pool-23-thread-53] | 55: Opening raw store 
> with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore | 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:589)
> 2017-11-20 15:46:48,355 | INFO  | [pool-23-thread-53] | ObjectStore, 
> initialize called | 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:289)
> 2017-11-20 15:46:48,360 | INFO  | [pool-23-thread-53] | Reading in results 
> for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection 
> used is closing | org.datanucleus.util.Log4JLogger.info(Log4JLogger.java:77)
> 2017-11-20 15:46:48,362 | INFO  | [pool-23-thread-53] | Using direct SQL, 
> underlying DB is MYSQL | 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.

[jira] [Updated] (CARBONDATA-1783) (Carbon1.3.0 - Streaming) Error "Failed to filter row in vector reader" when filter query executed on streaming data

2017-11-20 Thread Chetan Bhat (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Bhat updated CARBONDATA-1783:

Description: 
Steps :-
Spark submit thrift server is started using the command - bin/spark-submit 
--master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 
5G --num-executors 3 --class 
org.apache.carbondata.spark.thriftserver.CarbonThriftServer 
/srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar
 "hdfs://hacluster/user/hive/warehouse/carbon.store"

Spark shell is launched using the command - bin/spark-shell --master 
yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G 
--num-executors 3 --jars 
/srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar

>From Spark shell user creates table and loads data in the table as shown below.

import java.io.{File, PrintWriter}
import java.net.ServerSocket

import org.apache.spark.sql.{CarbonEnv, SparkSession}
import org.apache.spark.sql.hive.CarbonRelation
import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery}

import org.apache.carbondata.core.constants.CarbonCommonConstants
import org.apache.carbondata.core.util.CarbonProperties
import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath}

CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
 "/MM/dd")

import org.apache.spark.sql.CarbonSession._

val carbonSession = SparkSession.
  builder().
  appName("StreamExample").
  getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/carbon.store")
   
carbonSession.sparkContext.setLogLevel("INFO")

def sql(sql: String) = carbonSession.sql(sql)

def writeSocket(serverSocket: ServerSocket): Thread = {
  val thread = new Thread() {
override def run(): Unit = {
  // wait for client to connection request and accept
  val clientSocket = serverSocket.accept()
  val socketWriter = new PrintWriter(clientSocket.getOutputStream())
  var index = 0
  for (_ <- 1 to 1000) {
// write 5 records per iteration
for (_ <- 0 to 100) {
  index = index + 1
  socketWriter.println(index.toString + ",name_" + index
   + ",city_" + index + "," + (index * 
1.00).toString +
   ",school_" + index + ":school_" + index + index 
+ "$" + index)
}
socketWriter.flush()
Thread.sleep(2000)
  }
  socketWriter.close()
  System.out.println("Socket closed")
}
  }
  thread.start()
  thread
}
  
def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, tableName: 
String, port: Int): Thread = {
val thread = new Thread() {
  override def run(): Unit = {
var qry: StreamingQuery = null
try {
  val readSocketDF = spark.readStream
.format("socket")
.option("host", "10.18.98.34")
.option("port", port)
.load()

  qry = readSocketDF.writeStream
.format("carbondata")
.trigger(ProcessingTime("5 seconds"))
.option("checkpointLocation", tablePath.getStreamingCheckpointDir)
.option("tablePath", tablePath.getPath).option("tableName", 
tableName)
.start()

  qry.awaitTermination()
} catch {
  case ex: Throwable =>
ex.printStackTrace()
println("Done reading and writing streaming data")
} finally {
  qry.stop()
}
  }
}
thread.start()
thread
}

val streamTableName = "all_datatypes_2048"


sql(s"create table all_datatypes_2048 (imei string,deviceInformationId int,MAC 
string,deviceColor string,device_backColor string,modelId string,marketName 
string,AMSize string,ROMSize string,CUPAudit string,CPIClocked string,series 
string,productionDate timestamp,bomCode string,internalModels string, 
deliveryTime string, channelsId string, channelsName string , deliveryAreaId 
string, deliveryCountry string, deliveryProvince string, deliveryCity 
string,deliveryDistrict string, deliveryStreet string, oxSingleNumber string, 
ActiveCheckTime string, ActiveAreaId string, ActiveCountry string, 
ActiveProvince string, Activecity string, ActiveDistrict string, ActiveStreet 
string, ActiveOperatorId string, Active_releaseId string, Active_EMUIVersion 
string, Active_operaSysVersion string, Active_BacVerNumber string, 
Active_BacFlashVer string, Active_webUIVersion string, Active_webUITypeCarrVer 
string,Active_webTypeDataVerNumber string, Active_operatorsVersion string, 
Active_phonePADPartitionedVersions string, Latest_YEAR int, Latest_MONTH int, 
Latest_DAY Decimal(30,10), Latest_HOUR string, Latest_areaId string, 
Latest_country string, Latest_province string, Latest_city string, 
Latest_district string, Latest_stree

[jira] [Updated] (CARBONDATA-1783) (Carbon1.3.0 - Streaming) Error "Failed to filter row in vector reader" when filter query executed on streaming data

2017-11-20 Thread Chetan Bhat (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Bhat updated CARBONDATA-1783:

Description: 
Steps :-
Spark submit thrift server is started using the command - bin/spark-submit 
--master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 
5G --num-executors 3 --class 
org.apache.carbondata.spark.thriftserver.CarbonThriftServer 
/srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar
 "hdfs://hacluster/user/hive/warehouse/carbon.store"

Spark shell is launched using the command - bin/spark-shell --master 
yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G 
--num-executors 3 --jars 
/srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar

>From Spark shell user creates table and loads data in the table as shown below.

import java.io.{File, PrintWriter}
import java.net.ServerSocket

import org.apache.spark.sql.{CarbonEnv, SparkSession}
import org.apache.spark.sql.hive.CarbonRelation
import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery}

import org.apache.carbondata.core.constants.CarbonCommonConstants
import org.apache.carbondata.core.util.CarbonProperties
import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath}

CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
 "/MM/dd")

import org.apache.spark.sql.CarbonSession._

val carbonSession = SparkSession.
  builder().
  appName("StreamExample").
  getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/carbon.store")
   
carbonSession.sparkContext.setLogLevel("INFO")

def sql(sql: String) = carbonSession.sql(sql)

def writeSocket(serverSocket: ServerSocket): Thread = {
  val thread = new Thread() {
override def run(): Unit = {
  // wait for client to connection request and accept
  val clientSocket = serverSocket.accept()
  val socketWriter = new PrintWriter(clientSocket.getOutputStream())
  var index = 0
  for (_ <- 1 to 1000) {
// write 5 records per iteration
for (_ <- 0 to 100) {
  index = index + 1
  socketWriter.println(index.toString + ",name_" + index
   + ",city_" + index + "," + (index * 
1.00).toString +
   ",school_" + index + ":school_" + index + index 
+ "$" + index)
}
socketWriter.flush()
Thread.sleep(2000)
  }
  socketWriter.close()
  System.out.println("Socket closed")
}
  }
  thread.start()
  thread
}
  
def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, tableName: 
String, port: Int): Thread = {
val thread = new Thread() {
  override def run(): Unit = {
var qry: StreamingQuery = null
try {
  val readSocketDF = spark.readStream
.format("socket")
.option("host", "10.18.98.34")
.option("port", port)
.load()

  qry = readSocketDF.writeStream
.format("carbondata")
.trigger(ProcessingTime("5 seconds"))
.option("checkpointLocation", tablePath.getStreamingCheckpointDir)
.option("tablePath", tablePath.getPath).option("tableName", 
tableName)
.start()

  qry.awaitTermination()
} catch {
  case ex: Throwable =>
ex.printStackTrace()
println("Done reading and writing streaming data")
} finally {
  qry.stop()
}
  }
}
thread.start()
thread
}

val streamTableName = "all_datatypes_2048"


sql(s"create table all_datatypes_2048 (imei string,deviceInformationId int,MAC 
string,deviceColor string,device_backColor string,modelId string,marketName 
string,AMSize string,ROMSize string,CUPAudit string,CPIClocked string,series 
string,productionDate timestamp,bomCode string,internalModels string, 
deliveryTime string, channelsId string, channelsName string , deliveryAreaId 
string, deliveryCountry string, deliveryProvince string, deliveryCity 
string,deliveryDistrict string, deliveryStreet string, oxSingleNumber string, 
ActiveCheckTime string, ActiveAreaId string, ActiveCountry string, 
ActiveProvince string, Activecity string, ActiveDistrict string, ActiveStreet 
string, ActiveOperatorId string, Active_releaseId string, Active_EMUIVersion 
string, Active_operaSysVersion string, Active_BacVerNumber string, 
Active_BacFlashVer string, Active_webUIVersion string, Active_webUITypeCarrVer 
string,Active_webTypeDataVerNumber string, Active_operatorsVersion string, 
Active_phonePADPartitionedVersions string, Latest_YEAR int, Latest_MONTH int, 
Latest_DAY Decimal(30,10), Latest_HOUR string, Latest_areaId string, 
Latest_country string, Latest_province string, Latest_city string, 
Latest_district string, Latest_stree

[jira] [Updated] (CARBONDATA-1782) (Carbon1.3.0 - Streaming) Select regexp_extract from table with where clause having is null throws indexoutofbounds exception

2017-11-20 Thread Chetan Bhat (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Bhat updated CARBONDATA-1782:

Description: 
Steps :
Thrift server is started using the command - bin/spark-submit --master 
yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G 
--num-executors 3 --class 
org.apache.carbondata.spark.thriftserver.CarbonThriftServer 
/srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar
 "hdfs://hacluster/user/sparkhive/warehouse"

Spark shell is launched using the command - bin/spark-shell --master 
yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G 
--num-executors 3 --jars 
/srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar

>From Spark shell the streaming table is created and data is loaded to the 
>streaming table.

import java.io.{File, PrintWriter}
import java.net.ServerSocket

import org.apache.spark.sql.{CarbonEnv, SparkSession}
import org.apache.spark.sql.hive.CarbonRelation
import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery}

import org.apache.carbondata.core.constants.CarbonCommonConstants
import org.apache.carbondata.core.util.CarbonProperties
import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath}

CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
 "/MM/dd")

import org.apache.spark.sql.CarbonSession._

val carbonSession = SparkSession.
  builder().
  appName("StreamExample").
  getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/carbon.store")
   
carbonSession.sparkContext.setLogLevel("INFO")

def sql(sql: String) = carbonSession.sql(sql)

def writeSocket(serverSocket: ServerSocket): Thread = {
  val thread = new Thread() {
override def run(): Unit = {
  // wait for client to connection request and accept
  val clientSocket = serverSocket.accept()
  val socketWriter = new PrintWriter(clientSocket.getOutputStream())
  var index = 0
  for (_ <- 1 to 1000) {
// write 5 records per iteration
for (_ <- 0 to 100) {
  index = index + 1
  socketWriter.println(index.toString + ",name_" + index
   + ",city_" + index + "," + (index * 
1.00).toString +
   ",school_" + index + ":school_" + index + index 
+ "$" + index)
}
socketWriter.flush()
Thread.sleep(2000)
  }
  socketWriter.close()
  System.out.println("Socket closed")
}
  }
  thread.start()
  thread
}
  
def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, tableName: 
String, port: Int): Thread = {
val thread = new Thread() {
  override def run(): Unit = {
var qry: StreamingQuery = null
try {
  val readSocketDF = spark.readStream
.format("socket")
.option("host", "10.18.98.34")
.option("port", port)
.load()

  qry = readSocketDF.writeStream
.format("carbondata")
.trigger(ProcessingTime("5 seconds"))
.option("checkpointLocation", tablePath.getStreamingCheckpointDir)
.option("tablePath", tablePath.getPath).option("tableName", 
tableName)
.start()

  qry.awaitTermination()
} catch {
  case ex: Throwable =>
ex.printStackTrace()
println("Done reading and writing streaming data")
} finally {
  qry.stop()
}
  }
}
thread.start()
thread
}

val streamTableName = "uniqdata"


sql(s"CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) STORED BY 'org.apache.carbondata.format' 
TBLPROPERTIES('streaming'='true')")

sql(s"LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table 
uniqdata OPTIONS( 
'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')")




val carbonTable = CarbonEnv.getInstance(carbonSession).carbonMetastore.
  lookupRelation(Some("default"), 
streamTableName)(carbonSession).asInstanceOf[CarbonRelation].carbonTable

val tablePath = 
CarbonStorePath.getCarbonTablePath(carbonTable.getAbsoluteTableIdentifier)

val port = 8006
val serverSocket = new ServerSocket(port)
val socketThread = writeSocket(serverSocket)
val streamingThread = startStreaming(carbonSession, tablePath, streamTableName, 
port)

>From Beeline user executes the query
select regexp_extract(CUST_NAME,'a',1)from uniqdata where 
regexp_extract(CUS

[jira] [Created] (CARBONDATA-1783) (Carbon1.3.0 - Streaming) Error "Failed to filter row in vector reader" when filter query executed on streaming data

2017-11-20 Thread Chetan Bhat (JIRA)

Chetan Bhat created CARBONDATA-1783:
---

 Summary: (Carbon1.3.0 - Streaming) Error "Failed to filter row in 
vector reader" when filter query executed on streaming data
 Key: CARBONDATA-1783
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1783
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Affects Versions: 1.3.0
 Environment: 3 node ant cluster
Reporter: Chetan Bhat


Steps :-
Spark submit thrift server is started using the command - bin/spark-submit 
--master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 
5G --num-executors 3 --class 
org.apache.carbondata.spark.thriftserver.CarbonThriftServer 
/srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar
 "hdfs://hacluster/user/hive/warehouse/carbon.store"

Spark shell is launched using the command - bin/spark-shell --master 
yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G 
--num-executors 3 --jars 
/srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar

>From Spark shell user creates table and loads data in the table as shown below.

import java.io.{File, PrintWriter}
import java.net.ServerSocket

import org.apache.spark.sql.{CarbonEnv, SparkSession}
import org.apache.spark.sql.hive.CarbonRelation
import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery}

import org.apache.carbondata.core.constants.CarbonCommonConstants
import org.apache.carbondata.core.util.CarbonProperties
import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath}

CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
 "/MM/dd")

import org.apache.spark.sql.CarbonSession._

val carbonSession = SparkSession.
  builder().
  appName("StreamExample").
  getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/carbon.store")
   
carbonSession.sparkContext.setLogLevel("INFO")

def sql(sql: String) = carbonSession.sql(sql)

def writeSocket(serverSocket: ServerSocket): Thread = {
  val thread = new Thread() {
override def run(): Unit = {
  // wait for client to connection request and accept
  val clientSocket = serverSocket.accept()
  val socketWriter = new PrintWriter(clientSocket.getOutputStream())
  var index = 0
  for (_ <- 1 to 1000) {
// write 5 records per iteration
for (_ <- 0 to 100) {
  index = index + 1
  socketWriter.println(index.toString + ",name_" + index
   + ",city_" + index + "," + (index * 
1.00).toString +
   ",school_" + index + ":school_" + index + index 
+ "$" + index)
}
socketWriter.flush()
Thread.sleep(2000)
  }
  socketWriter.close()
  System.out.println("Socket closed")
}
  }
  thread.start()
  thread
}
  
def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, tableName: 
String, port: Int): Thread = {
val thread = new Thread() {
  override def run(): Unit = {
var qry: StreamingQuery = null
try {
  val readSocketDF = spark.readStream
.format("socket")
.option("host", "10.18.98.34")
.option("port", port)
.load()

  qry = readSocketDF.writeStream
.format("carbondata")
.trigger(ProcessingTime("5 seconds"))
.option("checkpointLocation", tablePath.getStreamingCheckpointDir)
.option("tablePath", tablePath.getPath).option("tableName", 
tableName)
.start()

  qry.awaitTermination()
} catch {
  case ex: Throwable =>
ex.printStackTrace()
println("Done reading and writing streaming data")
} finally {
  qry.stop()
}
  }
}
thread.start()
thread
}

val streamTableName = "all_datatypes_2048"


sql(s"create table all_datatypes_2048 (imei string,deviceInformationId int,MAC 
string,deviceColor string,device_backColor string,modelId string,marketName 
string,AMSize string,ROMSize string,CUPAudit string,CPIClocked string,series 
string,productionDate timestamp,bomCode string,internalModels string, 
deliveryTime string, channelsId string, channelsName string , deliveryAreaId 
string, deliveryCountry string, deliveryProvince string, deliveryCity 
string,deliveryDistrict string, deliveryStreet string, oxSingleNumber string, 
ActiveCheckTime string, ActiveAreaId string, ActiveCountry string, 
ActiveProvince string, Activecity string, ActiveDistrict string, ActiveStreet 
string, ActiveOperatorId string, Active_releaseId string, Active_EMUIVersion 
string, Active_operaSysVersion string, Active_BacVerNumber string, 
Active_BacFlashVer string, Active_webUIVersion string, Active_webUITypeCarrVer 
string,Active_webTypeData

[jira] [Created] (CARBONDATA-1782) (Carbon1.3.0 - Streaming) Select regexp_extract from table with where clause having is null throws indexoutofbounds exception

2017-11-20 Thread Chetan Bhat (JIRA)

Chetan Bhat created CARBONDATA-1782:
---

 Summary: (Carbon1.3.0 - Streaming) Select regexp_extract from 
table with where clause having is null throws indexoutofbounds exception
 Key: CARBONDATA-1782
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1782
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Affects Versions: 1.3.0
 Environment: 3 node ant cluster
Reporter: Chetan Bhat


Steps :
Thrift server is started using the command - bin/spark-submit --master 
yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G 
--num-executors 3 --class 
org.apache.carbondata.spark.thriftserver.CarbonThriftServer 
/srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar
 "hdfs://hacluster/user/sparkhive/warehouse"

Spark shell is launched using the command - bin/spark-shell --master 
yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G 
--num-executors 3 --jars 
/srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar

>From Spark shell the streaming table is created and data is loaded to the 
>streaming table.

import java.io.{File, PrintWriter}
import java.net.ServerSocket

import org.apache.spark.sql.{CarbonEnv, SparkSession}
import org.apache.spark.sql.hive.CarbonRelation
import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery}

import org.apache.carbondata.core.constants.CarbonCommonConstants
import org.apache.carbondata.core.util.CarbonProperties
import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath}

CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
 "/MM/dd")

import org.apache.spark.sql.CarbonSession._

val carbonSession = SparkSession.
  builder().
  appName("StreamExample").
  getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/carbon.store")
   
carbonSession.sparkContext.setLogLevel("INFO")

def sql(sql: String) = carbonSession.sql(sql)

def writeSocket(serverSocket: ServerSocket): Thread = {
  val thread = new Thread() {
override def run(): Unit = {
  // wait for client to connection request and accept
  val clientSocket = serverSocket.accept()
  val socketWriter = new PrintWriter(clientSocket.getOutputStream())
  var index = 0
  for (_ <- 1 to 1000) {
// write 5 records per iteration
for (_ <- 0 to 100) {
  index = index + 1
  socketWriter.println(index.toString + ",name_" + index
   + ",city_" + index + "," + (index * 
1.00).toString +
   ",school_" + index + ":school_" + index + index 
+ "$" + index)
}
socketWriter.flush()
Thread.sleep(2000)
  }
  socketWriter.close()
  System.out.println("Socket closed")
}
  }
  thread.start()
  thread
}
  
def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, tableName: 
String, port: Int): Thread = {
val thread = new Thread() {
  override def run(): Unit = {
var qry: StreamingQuery = null
try {
  val readSocketDF = spark.readStream
.format("socket")
.option("host", "10.18.98.34")
.option("port", port)
.load()

  qry = readSocketDF.writeStream
.format("carbondata")
.trigger(ProcessingTime("5 seconds"))
.option("checkpointLocation", tablePath.getStreamingCheckpointDir)
.option("tablePath", tablePath.getPath).option("tableName", 
tableName)
.start()

  qry.awaitTermination()
} catch {
  case ex: Throwable =>
ex.printStackTrace()
println("Done reading and writing streaming data")
} finally {
  qry.stop()
}
  }
}
thread.start()
thread
}

val streamTableName = "uniqdata"


sql(s"CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) STORED BY 'org.apache.carbondata.format' 
TBLPROPERTIES('streaming'='true')")

sql(s"LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table 
uniqdata OPTIONS( 
'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')")




val carbonTable = CarbonEnv.getInstance(carbonSession).carbonMetastore.
  lookupRelation(Some("default"), 
streamTableName)(carbonSession).asInstanceOf[CarbonRelation].carbonTable

val tablePath = 
CarbonStorePath.getCarbonTablePath(carbonTable.getAbsoluteTableIdenti

[GitHub] carbondata issue #1525: [CARBONDATA-1751] Make the type of exception and mes...

2017-11-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1525
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1779/



---

[GitHub] carbondata issue #1534: [CARBONDATA-1770] Update error docs and consolidate ...

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1534
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1318/



---

[GitHub] carbondata pull request #1484: [CARBONDATA-1700][DataLoad] Add TableProperti...

2017-11-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1484


---

[jira] [Resolved] (CARBONDATA-1700) Failed to load data to existed table after spark session restarted

2017-11-20 Thread Ravindra Pesala (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala resolved CARBONDATA-1700.
-
Resolution: Fixed

> Failed to load data to existed table after spark session restarted
> --
>
> Key: CARBONDATA-1700
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1700
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.3.0
>Reporter: xuchuanyin
>Assignee: xuchuanyin
> Fix For: 1.3.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> # scenario
> I encounterd loading data to existed carbondata table failure after query the 
> table after restarting spark session. I have this failure in spark local mode 
> (found it during local test) and haven't test in other scenarioes.
> The problem can be reproduced by following steps：
> 0. START: start a session;
> 1. CREATE: create table `t1`;
> 2. LOAD: create a dataframe and write apppend to `t1`;
> 3. STOP: stop current session;
> 4. START: start a session;
> 5. QUERY: query table `t1`;    This step is essential to reproduce the 
> problem.
> 6. LOAD: create a dataframe and write append to `t1`;  --- This step will be 
> failed.
> Error will be thrown in Step6. The error message in console looks like
> ```
> java.lang.NullPointerException was thrown.
> java.lang.NullPointerException
> at 
> org.apache.spark.sql.execution.command.management.LoadTableCommand.processData(LoadTableCommand.scala:92)
> at 
> org.apache.spark.sql.execution.command.management.LoadTableCommand.run(LoadTableCommand.scala:60)
> at 
> org.apache.spark.sql.CarbonDataFrameWriter.loadDataFrame(CarbonDataFrameWriter.scala:141)
> at 
> org.apache.spark.sql.CarbonDataFrameWriter.writeToCarbonFile(CarbonDataFrameWriter.scala:50)
> at 
> org.apache.spark.sql.CarbonDataFrameWriter.appendToCarbonFile(CarbonDataFrameWriter.scala:42)
> at org.apache.spark.sql.CarbonSource.createRelation(CarbonSource.scala:110)
> at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
> ```
> The following code can be pasted in `TestLoadDataFrame.scala` to reproduce 
> this problem —— but keep
> in mind you should manually run the first test and then the second in 
> different iteration (to make sure that the sparksession is restarted).
> ```
>   test("prepare") {
> sql("drop table if exists carbon_stand_alone")
> sql( "create table if not exists carbon_stand_alone (c1 string, c2 
> string, c3 int)" +
> " stored by 'carbondata'").collect()
> sql("select * from carbon_stand_alone").show()
> df.write
>   .format("carbondata")
>   .option("tableName", "carbon_stand_alone")
>   .option("tempCSV", "false")
>   .mode(SaveMode.Append)
>   .save()
>   }
>   test("test load dataframe after query") {
> sql("select * from carbon_stand_alone").show()
> // the following line will cause failure
> df.write
>   .format("carbondata")
>   .option("tableName", "carbon_stand_alone")
>   .option("tempCSV", "false")
>   .mode(SaveMode.Append)
>   .save()
> // if it works fine, it sould be true
> checkAnswer(
>   sql("select count(*) from carbon_stand_alone where c3 > 500"), 
> Row(31500 * 2)
> )
>   }
> ```
> # ANALYSE
> I went through the code and found the problem was caused by NULL 
> `tableProperties` in `tablemeta: tableMeta.carbonTable.getTableInfo
>   .getFactTable.getTableProperties` (we will name it 
> `propertyInTableInfo` for short) is null in Line89 in 
> `LoadTableCommand.scala`.
> After debug, I found that the `propertyInTableInfo` sett in 
> `CarbonTableInputFormat.setTableInfo(...)` had the correct value. But 
> `CarbonTableInputFormat.getTableInfo(...)` had the incorrect value. The 
> setter is used to serialized TableInfo, while the getter is used to 
> deserialized TableInfo  That means there are something wrong in 
> serialization-deserialization.
> Keep diving into the code, I found that serialization and deserialization in 
> `TableSchema`, a member of `TableInfo`, ignores the `tableProperties` member, 
> thus causing this value empty after deserialization. Since this value has not 
> been initialized in construtor, so the value remains `NULL` and cause the NPE 
> problem.
> # RESOLVE
> 1. Initialize `tableProperties` in `TableSchema`
> 2. Include `tableProperties` in serialization-deserialization of `TableSchema`
> # Notes
> Although the bug has been fix, I still can't understand why the problem can 
> be triggered in above way.
> Tests need the sparksession to be restarted, which is impossible currently, 
> so no tests will be added.



--
This message was sent by Atlassian JIRA
(v6.4.

[GitHub] carbondata issue #1484: [CARBONDATA-1700][DataLoad] Add TableProperties duri...

2017-11-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1484
  
LGTM


---

[GitHub] carbondata issue #1516: [CARBONDATA-1729]Fix the compatibility issue with ha...

2017-11-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1516
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1778/



---

[GitHub] carbondata issue #1539: [CARBONDATA-1780] Create configuration from SparkSes...

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1539
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1317/



---

[GitHub] carbondata issue #1514: [CARBONDATA-1746] Count star optimization

2017-11-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1514
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1777/



---

[GitHub] carbondata issue #1539: [CARBONDATA-1780] Create configuration from SparkSes...

2017-11-20 Thread QiangCai

Github user QiangCai commented on the issue:

https://github.com/apache/carbondata/pull/1539
  
retest this please


---

[GitHub] carbondata issue #1516: [CARBONDATA-1729]Fix the compatibility issue with ha...

2017-11-20 Thread QiangCai

Github user QiangCai commented on the issue:

https://github.com/apache/carbondata/pull/1516
  
LGTM


---

[GitHub] carbondata issue #1534: [CARBONDATA-1770] Update error docs and consolidate ...

2017-11-20 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/1534
  
@sgururajshetty ok


---

[jira] [Updated] (CARBONDATA-1781) (Carbon1.3.0 - Streaming) Select * & select column fails but select count(*) is success when .streaming file is removed from HDFS or thrift server is killed when str

2017-11-20 Thread Chetan Bhat (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Bhat updated CARBONDATA-1781:

Description: 
*Steps :*
Thrift server is started using the command - bin/spark-submit --master 
yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G 
--num-executors 3 --class 
org.apache.carbondata.spark.thriftserver.CarbonThriftServer 
/srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar
 "hdfs://hacluster/user/hive/warehouse/carbon.store"

Spark shell is opened using the command - bin/spark-shell --master yarn-client 
--executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 
--jars 
/srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar

>From spark shell the below code is executed -
import java.io.{File, PrintWriter}
import java.net.ServerSocket

import org.apache.spark.sql.{CarbonEnv, SparkSession}
import org.apache.spark.sql.hive.CarbonRelation
import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery}

import org.apache.carbondata.core.constants.CarbonCommonConstants
import org.apache.carbondata.core.util.CarbonProperties
import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath}

CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
 "/MM/dd")

import org.apache.spark.sql.CarbonSession._

val carbonSession = SparkSession.
  builder().
  appName("StreamExample").
  getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/carbon.store")
   
carbonSession.sparkContext.setLogLevel("INFO")

def sql(sql: String) = carbonSession.sql(sql)

def writeSocket(serverSocket: ServerSocket): Thread = {
  val thread = new Thread() {
override def run(): Unit = {
  // wait for client to connection request and accept
  val clientSocket = serverSocket.accept()
  val socketWriter = new PrintWriter(clientSocket.getOutputStream())
  var index = 0
  for (_ <- 1 to 1000) {
// write 5 records per iteration
for (_ <- 0 to 100) {
  index = index + 1
  socketWriter.println(index.toString + ",name_" + index
   + ",city_" + index + "," + (index * 
1.00).toString +
   ",school_" + index + ":school_" + index + index 
+ "$" + index)
}
socketWriter.flush()
Thread.sleep(2000)
  }
  socketWriter.close()
  System.out.println("Socket closed")
}
  }
  thread.start()
  thread
}
  
def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, tableName: 
String, port: Int): Thread = {
val thread = new Thread() {
  override def run(): Unit = {
var qry: StreamingQuery = null
try {
  val readSocketDF = spark.readStream
.format("socket")
.option("host", "10.18.98.34")
.option("port", port)
.load()

  qry = readSocketDF.writeStream
.format("carbondata")
.trigger(ProcessingTime("5 seconds"))
.option("checkpointLocation", tablePath.getStreamingCheckpointDir)
.option("tablePath", tablePath.getPath).option("tableName", 
tableName)
.start()

  qry.awaitTermination()
} catch {
  case ex: Throwable =>
ex.printStackTrace()
println("Done reading and writing streaming data")
} finally {
  qry.stop()
}
  }
}
thread.start()
thread
}

val streamTableName = "brinjal"

sql(s"drop table brinjal").show

sql(s"create table brinjal (imei string,AMSize string,channelsId 
string,ActiveCountry string, Activecity string,gamePointId 
double,deviceInformationId double,productionDate Timestamp,deliveryDate 
timestamp,deliverycharge double) STORED BY 'org.apache.carbondata.format' 
TBLPROPERTIES('streaming'='true','table_blocksize'='1')")

sql(s"LOAD DATA INPATH 'hdfs://hacluster/chetan/vardhandaterestruct.csv' INTO 
TABLE brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= 
'','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 
'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge')")


val carbonTable = CarbonEnv.getInstance(carbonSession).carbonMetastore.
  lookupRelation(Some("default"), 
streamTableName)(carbonSession).asInstanceOf[CarbonRelation].carbonTable

val tablePath = 
CarbonStorePath.getCarbonTablePath(carbonTable.getAbsoluteTableIdentifier)

val port = 8002
val serverSocket = new ServerSocket(port)
val socketThread = writeSocket(serverSocket)
val streamingThread = startStreaming(carbonSession, tablePath, streamTableName, 
port)

>From other terminal user deletes the .streaming file - 
>BLR114307:/srv/spark2.2Bigdata/install/hadoop/datanode # bin/hadoop fs -rm 
>-r /user/hive/ware

[jira] [Updated] (CARBONDATA-1781) (Carbon1.3.0 - Streaming) Select * & select column fails but select count(*) is success when .streaming file is removed from HDFS or thrift server is killed when str

2017-11-20 Thread Chetan Bhat (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Bhat updated CARBONDATA-1781:

Description: 
*Steps :*
Thrift server is started using the command - bin/spark-submit --master 
yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G 
--num-executors 3 --class 
org.apache.carbondata.spark.thriftserver.CarbonThriftServer 
/srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar
 "hdfs://hacluster/user/hive/warehouse/carbon.store"

Spark shell is opened using the command - bin/spark-shell --master yarn-client 
--executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 
--jars 
/srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar

>From spark shell the below code is executed -
import java.io.{File, PrintWriter}
import java.net.ServerSocket

import org.apache.spark.sql.{CarbonEnv, SparkSession}
import org.apache.spark.sql.hive.CarbonRelation
import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery}

import org.apache.carbondata.core.constants.CarbonCommonConstants
import org.apache.carbondata.core.util.CarbonProperties
import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath}

CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
 "/MM/dd")

import org.apache.spark.sql.CarbonSession._

val carbonSession = SparkSession.
  builder().
  appName("StreamExample").
  getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/carbon.store")
   
carbonSession.sparkContext.setLogLevel("INFO")

def sql(sql: String) = carbonSession.sql(sql)

def writeSocket(serverSocket: ServerSocket): Thread = {
  val thread = new Thread() {
override def run(): Unit = {
  // wait for client to connection request and accept
  val clientSocket = serverSocket.accept()
  val socketWriter = new PrintWriter(clientSocket.getOutputStream())
  var index = 0
  for (_ <- 1 to 1000) {
// write 5 records per iteration
for (_ <- 0 to 100) {
  index = index + 1
  socketWriter.println(index.toString + ",name_" + index
   + ",city_" + index + "," + (index * 
1.00).toString +
   ",school_" + index + ":school_" + index + index 
+ "$" + index)
}
socketWriter.flush()
Thread.sleep(2000)
  }
  socketWriter.close()
  System.out.println("Socket closed")
}
  }
  thread.start()
  thread
}
  
def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, tableName: 
String, port: Int): Thread = {
val thread = new Thread() {
  override def run(): Unit = {
var qry: StreamingQuery = null
try {
  val readSocketDF = spark.readStream
.format("socket")
.option("host", "10.18.98.34")
.option("port", port)
.load()

  qry = readSocketDF.writeStream
.format("carbondata")
.trigger(ProcessingTime("5 seconds"))
.option("checkpointLocation", tablePath.getStreamingCheckpointDir)
.option("tablePath", tablePath.getPath).option("tableName", 
tableName)
.start()

  qry.awaitTermination()
} catch {
  case ex: Throwable =>
ex.printStackTrace()
println("Done reading and writing streaming data")
} finally {
  qry.stop()
}
  }
}
thread.start()
thread
}

val streamTableName = "brinjal"

sql(s"drop table brinjal").show

sql(s"create table brinjal (imei string,AMSize string,channelsId 
string,ActiveCountry string, Activecity string,gamePointId 
double,deviceInformationId double,productionDate Timestamp,deliveryDate 
timestamp,deliverycharge double) STORED BY 'org.apache.carbondata.format' 
TBLPROPERTIES('streaming'='true','table_blocksize'='1')")

sql(s"LOAD DATA INPATH 'hdfs://hacluster/chetan/vardhandaterestruct.csv' INTO 
TABLE brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= 
'','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 
'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge')")


val carbonTable = CarbonEnv.getInstance(carbonSession).carbonMetastore.
  lookupRelation(Some("default"), 
streamTableName)(carbonSession).asInstanceOf[CarbonRelation].carbonTable

val tablePath = 
CarbonStorePath.getCarbonTablePath(carbonTable.getAbsoluteTableIdentifier)

val port = 8002
val serverSocket = new ServerSocket(port)
val socketThread = writeSocket(serverSocket)
val streamingThread = startStreaming(carbonSession, tablePath, streamTableName, 
port)

>From other terminal user deletes the .streaming file - 
>BLR114307:/srv/spark2.2Bigdata/install/hadoop/datanode # bin/hadoop fs -rm 
>-r /user/hive/ware

[jira] [Updated] (CARBONDATA-1781) (Carbon1.3.0 - Streaming) Select * & select column fails but select count(*) is success when .streaming file is removed from HDFS or thrift server is killed when str

2017-11-20 Thread Chetan Bhat (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Bhat updated CARBONDATA-1781:

Summary: (Carbon1.3.0 - Streaming) Select * & select column fails but 
select count(*) is success when .streaming file is removed from HDFS or thrift 
server is killed when streaming in progress  (was: (Carbon1.3.0 - Streaming) 
Select * & select column fails but select count(*) is success when .streaming 
file is removed from HDFS)

> (Carbon1.3.0 - Streaming) Select * & select column fails but select count(*) 
> is success when .streaming file is removed from HDFS or thrift server is 
> killed when streaming in progress
> ---
>
> Key: CARBONDATA-1781
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1781
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 1.3.0
> Environment: 3 node ant cluster
>Reporter: Chetan Bhat
>  Labels: DFX
>
> *Steps :*
> Thrift server is started using the command - bin/spark-submit --master 
> yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G 
> --num-executors 3 --class 
> org.apache.carbondata.spark.thriftserver.CarbonThriftServer 
> /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar
>  "hdfs://hacluster/user/hive/warehouse/carbon.store"
> Spark shell is opened using the command - bin/spark-shell --master 
> yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G 
> --num-executors 3 --jars 
> /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar
> From spark shell the below code is executed -
> import java.io.{File, PrintWriter}
> import java.net.ServerSocket
> import org.apache.spark.sql.{CarbonEnv, SparkSession}
> import org.apache.spark.sql.hive.CarbonRelation
> import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery}
> import org.apache.carbondata.core.constants.CarbonCommonConstants
> import org.apache.carbondata.core.util.CarbonProperties
> import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath}
> CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
>  "/MM/dd")
> import org.apache.spark.sql.CarbonSession._
> val carbonSession = SparkSession.
>   builder().
>   appName("StreamExample").
>   
> getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/carbon.store")
>
> carbonSession.sparkContext.setLogLevel("INFO")
> def sql(sql: String) = carbonSession.sql(sql)
> def writeSocket(serverSocket: ServerSocket): Thread = {
>   val thread = new Thread() {
> override def run(): Unit = {
>   // wait for client to connection request and accept
>   val clientSocket = serverSocket.accept()
>   val socketWriter = new PrintWriter(clientSocket.getOutputStream())
>   var index = 0
>   for (_ <- 1 to 1000) {
> // write 5 records per iteration
> for (_ <- 0 to 100) {
>   index = index + 1
>   socketWriter.println(index.toString + ",name_" + index
>+ ",city_" + index + "," + (index * 
> 1.00).toString +
>",school_" + index + ":school_" + index + 
> index + "$" + index)
> }
> socketWriter.flush()
> Thread.sleep(2000)
>   }
>   socketWriter.close()
>   System.out.println("Socket closed")
> }
>   }
>   thread.start()
>   thread
> }
>   
> def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, 
> tableName: String, port: Int): Thread = {
> val thread = new Thread() {
>   override def run(): Unit = {
> var qry: StreamingQuery = null
> try {
>   val readSocketDF = spark.readStream
> .format("socket")
> .option("host", "10.18.98.34")
> .option("port", port)
> .load()
>   qry = readSocketDF.writeStream
> .format("carbondata")
> .trigger(ProcessingTime("5 seconds"))
> .option("checkpointLocation", tablePath.getStreamingCheckpointDir)
> .option("tablePath", tablePath.getPath).option("tableName", 
> tableName)
> .start()
>   qry.awaitTermination()
> } catch {
>   case ex: Throwable =>
> ex.printStackTrace()
> println("Done reading and writing streaming data")
> } finally {
>   qry.stop()
> }
>   }
> }
> thread.start()
> thread
> }
> val streamTableName = "brinjal"
> sql(s"drop table brinjal").show
> sql(s"create table brinjal (imei string,AMSi

[GitHub] carbondata issue #1514: [CARBONDATA-1746] Count star optimization

2017-11-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1514
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1776/



---

[GitHub] carbondata issue #1516: [CARBONDATA-1729]Fix the compatibility issue with ha...

2017-11-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1516
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1775/



---

[GitHub] carbondata pull request #1514: [CARBONDATA-1746] Count star optimization

2017-11-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1514


---

[GitHub] carbondata issue #1534: [CARBONDATA-1770] Update error docs and consolidate ...

2017-11-20 Thread sgururajshetty

Github user sgururajshetty commented on the issue:

https://github.com/apache/carbondata/pull/1534
  
@chenliang613 kindly fins my comments
The following description can be added for user to know what it does.
Description about Minor & Major compaction.
Description for Partition and types.



---

[jira] [Created] (CARBONDATA-1781) (Carbon1.3.0 - Streaming) Select * & select column fails but select count(*) is success when .streaming file is removed from HDFS

2017-11-20 Thread Chetan Bhat (JIRA)

Chetan Bhat created CARBONDATA-1781:
---

 Summary: (Carbon1.3.0 - Streaming) Select * & select column fails 
but select count(*) is success when .streaming file is removed from HDFS
 Key: CARBONDATA-1781
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1781
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Affects Versions: 1.3.0
 Environment: 3 node ant cluster
Reporter: Chetan Bhat


*Steps :*
Thrift server is started using the command - bin/spark-submit --master 
yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G 
--num-executors 3 --class 
org.apache.carbondata.spark.thriftserver.CarbonThriftServer 
/srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar
 "hdfs://hacluster/user/hive/warehouse/carbon.store"

Spark shell is opened using the command - bin/spark-shell --master yarn-client 
--executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 
--jars 
/srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar

>From spark shell the below code is executed -
import java.io.{File, PrintWriter}
import java.net.ServerSocket

import org.apache.spark.sql.{CarbonEnv, SparkSession}
import org.apache.spark.sql.hive.CarbonRelation
import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery}

import org.apache.carbondata.core.constants.CarbonCommonConstants
import org.apache.carbondata.core.util.CarbonProperties
import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath}

CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
 "/MM/dd")

import org.apache.spark.sql.CarbonSession._

val carbonSession = SparkSession.
  builder().
  appName("StreamExample").
  getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/carbon.store")
   
carbonSession.sparkContext.setLogLevel("INFO")

def sql(sql: String) = carbonSession.sql(sql)

def writeSocket(serverSocket: ServerSocket): Thread = {
  val thread = new Thread() {
override def run(): Unit = {
  // wait for client to connection request and accept
  val clientSocket = serverSocket.accept()
  val socketWriter = new PrintWriter(clientSocket.getOutputStream())
  var index = 0
  for (_ <- 1 to 1000) {
// write 5 records per iteration
for (_ <- 0 to 100) {
  index = index + 1
  socketWriter.println(index.toString + ",name_" + index
   + ",city_" + index + "," + (index * 
1.00).toString +
   ",school_" + index + ":school_" + index + index 
+ "$" + index)
}
socketWriter.flush()
Thread.sleep(2000)
  }
  socketWriter.close()
  System.out.println("Socket closed")
}
  }
  thread.start()
  thread
}
  
def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, tableName: 
String, port: Int): Thread = {
val thread = new Thread() {
  override def run(): Unit = {
var qry: StreamingQuery = null
try {
  val readSocketDF = spark.readStream
.format("socket")
.option("host", "10.18.98.34")
.option("port", port)
.load()

  qry = readSocketDF.writeStream
.format("carbondata")
.trigger(ProcessingTime("5 seconds"))
.option("checkpointLocation", tablePath.getStreamingCheckpointDir)
.option("tablePath", tablePath.getPath).option("tableName", 
tableName)
.start()

  qry.awaitTermination()
} catch {
  case ex: Throwable =>
ex.printStackTrace()
println("Done reading and writing streaming data")
} finally {
  qry.stop()
}
  }
}
thread.start()
thread
}

val streamTableName = "brinjal"

sql(s"drop table brinjal").show

sql(s"create table brinjal (imei string,AMSize string,channelsId 
string,ActiveCountry string, Activecity string,gamePointId 
double,deviceInformationId double,productionDate Timestamp,deliveryDate 
timestamp,deliverycharge double) STORED BY 'org.apache.carbondata.format' 
TBLPROPERTIES('streaming'='true','table_blocksize'='1')")

sql(s"LOAD DATA INPATH 'hdfs://hacluster/chetan/vardhandaterestruct.csv' INTO 
TABLE brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= 
'','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 
'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge')")


val carbonTable = CarbonEnv.getInstance(carbonSession).carbonMetastore.
  lookupRelation(Some("default"), 
streamTableName)(carbonSession).asInstanceOf[CarbonRelation].carbonTable

val tablePath = 
CarbonStorePath.getCarbonTablePath(carbonTable.getAbsoluteTableIdentifier)

val port = 8002
val serverSocke

[jira] [Resolved] (CARBONDATA-1771) While segment_index compaction, .carbonindex files of invalid segments are also getting merged

2017-11-20 Thread Ravindra Pesala (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala resolved CARBONDATA-1771.
-
   Resolution: Fixed
Fix Version/s: 1.3.0

> While segment_index compaction, .carbonindex files of invalid segments are 
> also getting merged
> --
>
> Key: CARBONDATA-1771
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1771
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: dhatchayani
>Assignee: dhatchayani
>Priority: Minor
> Fix For: 1.3.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata pull request #1535: [CARBONDATA-1771] While segment_index compact...

2017-11-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1535


---

[GitHub] carbondata issue #1535: [CARBONDATA-1771] While segment_index compaction, .c...

2017-11-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1535
  
LGTM


---

[GitHub] carbondata issue #1514: [CARBONDATA-1746] Count star optimization

2017-11-20 Thread QiangCai

Github user QiangCai commented on the issue:

https://github.com/apache/carbondata/pull/1514
  
LGTM


---

[GitHub] carbondata issue #1539: [CARBONDATA-1780] Create configuration from SparkSes...

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1539
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1316/



---

[GitHub] carbondata issue #1535: [CARBONDATA-1771] While segment_index compaction, .c...

2017-11-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1535
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1774/



---

[GitHub] carbondata pull request #1539: [CARBONDATA-1780] Create configuration from S...

2017-11-20 Thread QiangCai

GitHub user QiangCai opened a pull request:

https://github.com/apache/carbondata/pull/1539

[CARBONDATA-1780] Create configuration from SparkSession for data loading

Create configuration from SparkSession for data loading, so that we can set 
configuration into SparkSession during dataloading.

 - [x] Any interfaces changed?
 
 - [x] Any backward compatibility impacted?
 
 - [x] Document update required?

 - [x] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [x] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/QiangCai/carbondata configuration

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1539.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1539


commit 13f71680c1fe55e670935b05572ec11b3632057b
Author: QiangCai 
Date:   2017-11-20T10:38:20Z

create configuration from sparksession for Data Loading




---

[jira] [Commented] (CARBONDATA-1777) Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell sessions are not used in the beeline session

2017-11-20 Thread Ramakrishna S (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259086#comment-16259086
 ] 

Ramakrishna S commented on CARBONDATA-1777:
---

[~kumarvishal], this happens when pre-aggregate table is created in a different 
session (spark-shell). but select * on aggregate table is working fine.


> Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell 
> sessions are not used in the beeline session
> -
>
> Key: CARBONDATA-1777
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1777
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.3.0
> Environment: Test - 3 node ant cluster
>Reporter: Ramakrishna S
>Assignee: Kunal Kapoor
>  Labels: DFX
> Fix For: 1.3.0
>
>
> Steps:
> Beeline:
> 1. Create table and load with  data
> Spark-shell:
> 1. create a pre-aggregate table
> Beeline:
> 1. Run aggregate query
> *+Expected:+* Pre-aggregate table should be used in the aggregate query 
> *+Actual:+* Pre-aggregate table is not used
> 1.
> create table if not exists lineitem1(L_SHIPDATE string,L_SHIPMODE 
> string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE 
> string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY   string,L_LINENUMBER 
> int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX 
> double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT  string) STORED BY 
> 'org.apache.carbondata.format' TBLPROPERTIES 
> ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'='');
> load data inpath "hdfs://hacluster/user/test/lineitem.tbl.5" into table 
> lineitem1 
> options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
> 2. 
>  carbon.sql("create datamap agr1_lineitem1 ON TABLE lineitem1 USING 
> 'org.apache.carbondata.datamap.AggregateDataMapHandler' as select 
> l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) 
> from lineitem1 group by l_returnflag, l_linestatus").show();
> 3. 
> select 
> l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) 
> from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus;
> Actual:
> 0: jdbc:hive2://10.18.98.136:23040> show tables;
> +---+---+--+--+
> | database  | tableName | isTemporary  |
> +---+---+--+--+
> | test_db2  | lineitem1 | false|
> | test_db2  | lineitem1_agr1_lineitem1  | false|
> +---+---+--+--+
> 2 rows selected (0.047 seconds)
> Logs:
> 2017-11-20 15:46:48,314 | INFO  | [pool-23-thread-53] | Running query 'select 
> l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) 
> from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus' 
> with 7f3091a8-4d7b-40ac-840f-9db6f564c9cf | 
> org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
> 2017-11-20 15:46:48,314 | INFO  | [pool-23-thread-53] | Parsing command: 
> select 
> l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) 
> from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus | 
> org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
> 2017-11-20 15:46:48,353 | INFO  | [pool-23-thread-53] | 55: get_table : 
> db=test_db2 tbl=lineitem1 | 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:746)
> 2017-11-20 15:46:48,353 | INFO  | [pool-23-thread-53] | ugi=anonymous 
> ip=unknown-ip-addr  cmd=get_table : db=test_db2 tbl=lineitem1| 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:371)
> 2017-11-20 15:46:48,354 | INFO  | [pool-23-thread-53] | 55: Opening raw store 
> with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore | 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:589)
> 2017-11-20 15:46:48,355 | INFO  | [pool-23-thread-53] | ObjectStore, 
> initialize called | 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:289)
> 2017-11-20 15:46:48,360 | INFO  | [pool-23-thread-53] | Reading in results 
> for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection 
> used is closing | org.datanucleus.util.Log4JLogger.info(Log4JLogger.java:77)
> 2017-11-20 15:46:48,362 | I

[GitHub] carbondata pull request #1536: [CARBONDATA-1776] Fix some possible test erro...

2017-11-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1536


---

[jira] [Updated] (CARBONDATA-1777) Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell sessions are not used in the beeline session

2017-11-20 Thread Ramakrishna S (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna S updated CARBONDATA-1777:
--
Priority: Minor  (was: Major)

> Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell 
> sessions are not used in the beeline session
> -
>
> Key: CARBONDATA-1777
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1777
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.3.0
> Environment: Test - 3 node ant cluster
>Reporter: Ramakrishna S
>Assignee: Kunal Kapoor
>Priority: Minor
>  Labels: DFX
> Fix For: 1.3.0
>
>
> Steps:
> Beeline:
> 1. Create table and load with  data
> Spark-shell:
> 1. create a pre-aggregate table
> Beeline:
> 1. Run aggregate query
> *+Expected:+* Pre-aggregate table should be used in the aggregate query 
> *+Actual:+* Pre-aggregate table is not used
> 1.
> create table if not exists lineitem1(L_SHIPDATE string,L_SHIPMODE 
> string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE 
> string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY   string,L_LINENUMBER 
> int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX 
> double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT  string) STORED BY 
> 'org.apache.carbondata.format' TBLPROPERTIES 
> ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'='');
> load data inpath "hdfs://hacluster/user/test/lineitem.tbl.5" into table 
> lineitem1 
> options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
> 2. 
>  carbon.sql("create datamap agr1_lineitem1 ON TABLE lineitem1 USING 
> 'org.apache.carbondata.datamap.AggregateDataMapHandler' as select 
> l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) 
> from lineitem1 group by l_returnflag, l_linestatus").show();
> 3. 
> select 
> l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) 
> from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus;
> Actual:
> 0: jdbc:hive2://10.18.98.136:23040> show tables;
> +---+---+--+--+
> | database  | tableName | isTemporary  |
> +---+---+--+--+
> | test_db2  | lineitem1 | false|
> | test_db2  | lineitem1_agr1_lineitem1  | false|
> +---+---+--+--+
> 2 rows selected (0.047 seconds)
> Logs:
> 2017-11-20 15:46:48,314 | INFO  | [pool-23-thread-53] | Running query 'select 
> l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) 
> from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus' 
> with 7f3091a8-4d7b-40ac-840f-9db6f564c9cf | 
> org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
> 2017-11-20 15:46:48,314 | INFO  | [pool-23-thread-53] | Parsing command: 
> select 
> l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) 
> from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus | 
> org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
> 2017-11-20 15:46:48,353 | INFO  | [pool-23-thread-53] | 55: get_table : 
> db=test_db2 tbl=lineitem1 | 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:746)
> 2017-11-20 15:46:48,353 | INFO  | [pool-23-thread-53] | ugi=anonymous 
> ip=unknown-ip-addr  cmd=get_table : db=test_db2 tbl=lineitem1| 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:371)
> 2017-11-20 15:46:48,354 | INFO  | [pool-23-thread-53] | 55: Opening raw store 
> with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore | 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:589)
> 2017-11-20 15:46:48,355 | INFO  | [pool-23-thread-53] | ObjectStore, 
> initialize called | 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:289)
> 2017-11-20 15:46:48,360 | INFO  | [pool-23-thread-53] | Reading in results 
> for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection 
> used is closing | org.datanucleus.util.Log4JLogger.info(Log4JLogger.java:77)
> 2017-11-20 15:46:48,362 | INFO  | [pool-23-thread-53] | Using direct SQL, 
> underlying DB is MYSQL | 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.(MetaStoreDirectSql

[GitHub] carbondata issue #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1469
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1314/



---

[GitHub] carbondata issue #1536: [CARBONDATA-1776] Fix some possible test errors that...

2017-11-20 Thread jackylk

Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/1536
  
LGTM


---

[jira] [Created] (CARBONDATA-1780) Create configuration from SparkSession for data loading

2017-11-20 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-1780:


 Summary: Create configuration from SparkSession for data loading
 Key: CARBONDATA-1780
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1780
 Project: CarbonData
  Issue Type: Improvement
Reporter: QiangCai


Create configuration form SparkSession for data loading



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata pull request #1525: [CARBONDATA-1751] Make the type of exception ...

2017-11-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1525


---

[GitHub] carbondata issue #1525: [CARBONDATA-1751] Make the type of exception and mes...

2017-11-20 Thread jackylk

Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/1525
  
LGTM


---

[jira] [Updated] (CARBONDATA-1711) Carbon1.3.0-DataMap - Show datamap on table does not work

2017-11-20 Thread Ramakrishna S (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna S updated CARBONDATA-1711:
--
Summary: Carbon1.3.0-DataMap - Show datamap  on table  does not 
work  (was: Carbon1.3.0-Pre-AggregateTable - Show datamap  on table  
does not work)

> Carbon1.3.0-DataMap - Show datamap  on table  does not work
> --
>
> Key: CARBONDATA-1711
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1711
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.3.0
> Environment: Test
>Reporter: Ramakrishna S
>Priority: Minor
>  Labels: Functional
> Fix For: 1.3.0
>
>
> 0: jdbc:hive2://10.18.98.34:23040> create datamap agr_lineitem ON TABLE 
> lineitem USING "org.apache.carbondata.datamap.AggregateDataMapHandler" as 
> select L_RETURNFLAG,L_LINESTATUS,sum(L_QUANTITY),sum(L_EXTENDEDPRICE) from 
> lineitem group by  L_RETURNFLAG, L_LINESTATUS;
> Error: java.lang.RuntimeException: Table [lineitem_agr_lineitem] already 
> exists under database [default] (state=,code=0)
> 0: jdbc:hive2://10.18.98.34:23040> show tables;
> +---+---+--+--+
> | database  | tableName | isTemporary  |
> +---+---+--+--+
> | default   | flow_carbon_test4 | false|
> | default   | jl_r3 | false|
> | default   | lineitem  | false|
> | default   | lineitem_agr_lineitem | false|
> | default   | sensor_reading_blockblank_false   | false|
> | default   | sensor_reading_blockblank_false1  | false|
> | default   | sensor_reading_blockblank_false2  | false|
> | default   | sensor_reading_false  | false|
> | default   | sensor_reading_true   | false|
> | default   | t1| false|
> | default   | t1_agg_t1 | false|
> | default   | tc4   | false|
> | default   | uniqdata  | false|
> +---+---+--+--+
> 13 rows selected (0.04 seconds)
> 0: jdbc:hive2://10.18.98.34:23040> show datamap on table lineitem;
> Error: java.lang.RuntimeException:
> BaseSqlParser
> missing 'FUNCTIONS' at 'on'(line 1, pos 13)
> == SQL ==
> show datamap on table lineitem
> -^^^
> CarbonSqlParser [1.6] failure: identifier matching regex (?i)SEGMENTS 
> expected
> show datamap on table lineitem



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1525: [CARBONDATA-1751] Make the type of exception and mes...

2017-11-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1525
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1773/



---

[GitHub] carbondata issue #1508: [CARBONDATA-1738] Block direct insert/load on pre-ag...

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1508
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1313/



---

[GitHub] carbondata issue #1538: [CARBONDATA-1779] GenericVectorizedReader

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1538
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1312/



---

[GitHub] carbondata issue #1525: [CARBONDATA-1751] Make the type of exception and mes...

2017-11-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1525
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1772/



---

[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1537
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1311/



---

[GitHub] carbondata pull request #1538: [CARBONDATA-1779] GenericVectorizedReader

2017-11-20 Thread bhavya411

GitHub user bhavya411 opened a pull request:

https://github.com/apache/carbondata/pull/1538

[CARBONDATA-1779]  GenericVectorizedReader

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - No  interfaces changed?
 
 - No backward compatibility impacted?
 
 - No Document update required?

 - [ Yes] Testing done
- All Unit test cases are passing, no new unit test cases were 
needed as this PR implements a Generic Vectorized Reader.
- Manual Testing completed for the same .
   




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bhavya411/incubator-carbondata CARBONDATA-1779

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1538.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1538


commit ef28391c656cc2d20082e52dd4ab729b0992cfb3
Author: Bhavya 
Date:   2017-11-14T10:05:44Z

Added Generic vectorized Reader




---

[GitHub] carbondata issue #1525: [CARBONDATA-1751] Make the type of exception and mes...

2017-11-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1525
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1771/



---

[GitHub] carbondata issue #1516: [CARBONDATA-1729]Fix the compatibility issue with ha...

2017-11-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1516
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1770/



---

[jira] [Updated] (CARBONDATA-1779) GeneriVectorizedReader for Presto

2017-11-20 Thread Bhavya Aggarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavya Aggarwal updated CARBONDATA-1779:

Summary: GeneriVectorizedReader for Presto  (was: GeneriVectorizedReade for 
Presto)

> GeneriVectorizedReader for Presto
> -
>
> Key: CARBONDATA-1779
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1779
> Project: CarbonData
>  Issue Type: Improvement
>  Components: presto-integration
>Affects Versions: 1.3.0
>Reporter: Bhavya Aggarwal
>Assignee: Bhavya Aggarwal
>Priority: Minor
>
> Write a Generic Vectorized Reader for Presto to remove the dependencies on 
> spark



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (CARBONDATA-1779) GeneriVectorizedReade for Presto

2017-11-20 Thread Bhavya Aggarwal (JIRA)

Bhavya Aggarwal created CARBONDATA-1779:
---

 Summary: GeneriVectorizedReade for Presto
 Key: CARBONDATA-1779
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1779
 Project: CarbonData
  Issue Type: Improvement
  Components: presto-integration
Affects Versions: 1.3.0
Reporter: Bhavya Aggarwal
Assignee: Bhavya Aggarwal
Priority: Minor


Write a Generic Vectorized Reader for Presto to remove the dependencies on spark



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata pull request #1537: [CARBONDATA-1778] Support clean data for all

2017-11-20 Thread chenerlu

GitHub user chenerlu opened a pull request:

https://github.com/apache/carbondata/pull/1537

[CARBONDATA-1778] Support clean data for all

Modification reasons:
Now Carbon only support clean garbage segments for specified table.
Carbon should provide the ability to clean all garbage segments without 
specified the database name and table name.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenerlu/incubator-carbondata cleanfile

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1537.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1537


commit d5e9b19809b75f3cb8af27ff059c24b25e552309
Author: chenerlu 
Date:   2017-11-20T09:01:42Z

Support clean data for all




---

[GitHub] carbondata issue #1536: [CARBONDATA-1776] Fix some possible test errors that...

2017-11-20 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/1536
  
Please review it @jackylk 


---

[jira] [Commented] (CARBONDATA-1778) Support clean garbage segments for all

2017-11-20 Thread chenerlu (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258972#comment-16258972
 ] 

chenerlu commented on CARBONDATA-1778:
--

Now Carbon only support clean garbage segments for specified table.
Carbon should provide the ability to clean all garbage segments without 
specified the database name and table name.

> Support clean garbage segments for all
> --
>
> Key: CARBONDATA-1778
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1778
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: chenerlu
>Assignee: chenerlu
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (CARBONDATA-1778) Support clean garbage segments for all

2017-11-20 Thread chenerlu (JIRA)

chenerlu created CARBONDATA-1778:


 Summary: Support clean garbage segments for all
 Key: CARBONDATA-1778
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1778
 Project: CarbonData
  Issue Type: Improvement
Reporter: chenerlu
Assignee: chenerlu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (CARBONDATA-1777) Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell sessions are not used in the beeline session

2017-11-20 Thread kumar vishal (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258966#comment-16258966
 ] 

kumar vishal commented on CARBONDATA-1777:
--

[~Ram@huawei] please check the executor log in executor log you will get the 
detail: Query will be executed on table:

> Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell 
> sessions are not used in the beeline session
> -
>
> Key: CARBONDATA-1777
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1777
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.3.0
> Environment: Test - 3 node ant cluster
>Reporter: Ramakrishna S
>Assignee: Kunal Kapoor
>  Labels: DFX
> Fix For: 1.3.0
>
>
> Steps:
> Beeline:
> 1. Create table and load with  data
> Spark-shell:
> 1. create a pre-aggregate table
> Beeline:
> 1. Run aggregate query
> *+Expected:+* Pre-aggregate table should be used in the aggregate query 
> *+Actual:+* Pre-aggregate table is not used
> 1.
> create table if not exists lineitem1(L_SHIPDATE string,L_SHIPMODE 
> string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE 
> string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY   string,L_LINENUMBER 
> int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX 
> double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT  string) STORED BY 
> 'org.apache.carbondata.format' TBLPROPERTIES 
> ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'='');
> load data inpath "hdfs://hacluster/user/test/lineitem.tbl.5" into table 
> lineitem1 
> options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
> 2. 
>  carbon.sql("create datamap agr1_lineitem1 ON TABLE lineitem1 USING 
> 'org.apache.carbondata.datamap.AggregateDataMapHandler' as select 
> l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) 
> from lineitem1 group by l_returnflag, l_linestatus").show();
> 3. 
> select 
> l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) 
> from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus;
> Actual:
> 0: jdbc:hive2://10.18.98.136:23040> show tables;
> +---+---+--+--+
> | database  | tableName | isTemporary  |
> +---+---+--+--+
> | test_db2  | lineitem1 | false|
> | test_db2  | lineitem1_agr1_lineitem1  | false|
> +---+---+--+--+
> 2 rows selected (0.047 seconds)
> Logs:
> 2017-11-20 15:46:48,314 | INFO  | [pool-23-thread-53] | Running query 'select 
> l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) 
> from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus' 
> with 7f3091a8-4d7b-40ac-840f-9db6f564c9cf | 
> org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
> 2017-11-20 15:46:48,314 | INFO  | [pool-23-thread-53] | Parsing command: 
> select 
> l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) 
> from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus | 
> org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
> 2017-11-20 15:46:48,353 | INFO  | [pool-23-thread-53] | 55: get_table : 
> db=test_db2 tbl=lineitem1 | 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:746)
> 2017-11-20 15:46:48,353 | INFO  | [pool-23-thread-53] | ugi=anonymous 
> ip=unknown-ip-addr  cmd=get_table : db=test_db2 tbl=lineitem1| 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:371)
> 2017-11-20 15:46:48,354 | INFO  | [pool-23-thread-53] | 55: Opening raw store 
> with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore | 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:589)
> 2017-11-20 15:46:48,355 | INFO  | [pool-23-thread-53] | ObjectStore, 
> initialize called | 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:289)
> 2017-11-20 15:46:48,360 | INFO  | [pool-23-thread-53] | Reading in results 
> for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection 
> used is closing | org.datanucleus.util.Log4JLogger.info(Log4JLogger.java:77)
> 2017-11-20 15:46:48,362 | INFO  | [pool-23-thread-53] | Using di

[jira] [Comment Edited] (CARBONDATA-1777) Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell sessions are not used in the beeline session

2017-11-20 Thread kumar vishal (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258966#comment-16258966
 ] 

kumar vishal edited comment on CARBONDATA-1777 at 11/20/17 8:59 AM:


[~Ram@huawei] please check the executor log in executor log you will get the 
detail: Query will be executed on table:
And you can check the query plan which table it is hitting to execute the query 


was (Author: kumarvishal09):
[~Ram@huawei] please check the executor log in executor log you will get the 
detail: Query will be executed on table:

> Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell 
> sessions are not used in the beeline session
> -
>
> Key: CARBONDATA-1777
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1777
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.3.0
> Environment: Test - 3 node ant cluster
>Reporter: Ramakrishna S
>Assignee: Kunal Kapoor
>  Labels: DFX
> Fix For: 1.3.0
>
>
> Steps:
> Beeline:
> 1. Create table and load with  data
> Spark-shell:
> 1. create a pre-aggregate table
> Beeline:
> 1. Run aggregate query
> *+Expected:+* Pre-aggregate table should be used in the aggregate query 
> *+Actual:+* Pre-aggregate table is not used
> 1.
> create table if not exists lineitem1(L_SHIPDATE string,L_SHIPMODE 
> string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE 
> string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY   string,L_LINENUMBER 
> int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX 
> double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT  string) STORED BY 
> 'org.apache.carbondata.format' TBLPROPERTIES 
> ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'='');
> load data inpath "hdfs://hacluster/user/test/lineitem.tbl.5" into table 
> lineitem1 
> options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
> 2. 
>  carbon.sql("create datamap agr1_lineitem1 ON TABLE lineitem1 USING 
> 'org.apache.carbondata.datamap.AggregateDataMapHandler' as select 
> l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) 
> from lineitem1 group by l_returnflag, l_linestatus").show();
> 3. 
> select 
> l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) 
> from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus;
> Actual:
> 0: jdbc:hive2://10.18.98.136:23040> show tables;
> +---+---+--+--+
> | database  | tableName | isTemporary  |
> +---+---+--+--+
> | test_db2  | lineitem1 | false|
> | test_db2  | lineitem1_agr1_lineitem1  | false|
> +---+---+--+--+
> 2 rows selected (0.047 seconds)
> Logs:
> 2017-11-20 15:46:48,314 | INFO  | [pool-23-thread-53] | Running query 'select 
> l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) 
> from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus' 
> with 7f3091a8-4d7b-40ac-840f-9db6f564c9cf | 
> org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
> 2017-11-20 15:46:48,314 | INFO  | [pool-23-thread-53] | Parsing command: 
> select 
> l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) 
> from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus | 
> org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
> 2017-11-20 15:46:48,353 | INFO  | [pool-23-thread-53] | 55: get_table : 
> db=test_db2 tbl=lineitem1 | 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:746)
> 2017-11-20 15:46:48,353 | INFO  | [pool-23-thread-53] | ugi=anonymous 
> ip=unknown-ip-addr  cmd=get_table : db=test_db2 tbl=lineitem1| 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:371)
> 2017-11-20 15:46:48,354 | INFO  | [pool-23-thread-53] | 55: Opening raw store 
> with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore | 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:589)
> 2017-11-20 15:46:48,355 | INFO  | [pool-23-thread-53] | ObjectStore, 
> initialize called | 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:289)
> 2017-11-20 15:46:48,

[GitHub] carbondata issue #1503: [CARBONDATA-1730] Support skip.header.line.count opt...

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1503
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1310/



---

[GitHub] carbondata issue #1536: [CARBONDATA-1776] Fix some possible test errors that...

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1536
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1309/



---

[GitHub] carbondata issue #1460: [Docs] Fix partition-guide.md docs NUM_PARTITIONS wr...

2017-11-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1460
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1769/



---

[GitHub] carbondata issue #1508: [CARBONDATA-1738] Block direct insert/load on pre-ag...

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1508
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1308/



---

[GitHub] carbondata issue #1460: [Docs] Fix partition-guide.md docs NUM_PARTITIONS wr...

2017-11-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1460
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1768/



---

[GitHub] carbondata issue #1536: [CARBONDATA-1776] Fix some possible test errors that...

2017-11-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1536
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1307/



---

[GitHub] carbondata issue #1516: [CARBONDATA-1729]Fix the compatibility issue with ha...

2017-11-20 Thread zzcclp

Github user zzcclp commented on the issue:

https://github.com/apache/carbondata/pull/1516
  
@jackylk @chenliang613  @QiangCai   According to Jacky's suggestion, just 
use java reflection for FileSystem.truncate in FileFactory.java,  please 
review, thanks.


---

97 matches

Mail list logo