[GitHub] carbondata issue #1386: [CARBONDATA-1513] bad-record for complex data type s...

2017-11-05 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1386
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/806/



---


[GitHub] carbondata issue #1386: [CARBONDATA-1513] bad-record for complex data type s...

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1386
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1439/



---


[GitHub] carbondata issue #1412: [CARBONDATA-1510] UDF test case added

2017-11-05 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1412
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/807/



---


[GitHub] carbondata pull request #1386: [CARBONDATA-1513] bad-record for complex data...

2017-11-05 Thread rahulforallp
Github user rahulforallp commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1386#discussion_r148955640
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/datatypes/ArrayDataType.java
 ---
@@ -131,6 +131,10 @@ public void 
getAllPrimitiveChildren(List primitiveChild) {
 }
   }
 
+  public GenericDataType getChildren() {
--- End diff --

@jackylk getChildren() added to GenericDataType and overriden.


---


[jira] [Created] (CARBONDATA-1666) Clean up redundant code

2017-11-05 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-1666:


 Summary: Clean up redundant code
 Key: CARBONDATA-1666
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1666
 Project: CarbonData
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Jacky Li
 Fix For: 1.3.0


There are some removed feature in carbon project, it is better to remove 
redundant code for better readability. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1667) Remove DirectLoad feature

2017-11-05 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-1667:


 Summary: Remove DirectLoad feature
 Key: CARBONDATA-1667
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1667
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Jacky Li


Currently carbon is always using DirectLoad from CSV. So this option can be 
removed in the code



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (CARBONDATA-1666) Clean up redundant code

2017-11-05 Thread Jacky Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li reassigned CARBONDATA-1666:


Assignee: Jacky Li

> Clean up redundant code
> ---
>
> Key: CARBONDATA-1666
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1666
> Project: CarbonData
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Jacky Li
>Assignee: Jacky Li
> Fix For: 1.3.0
>
>
> There are some removed feature in carbon project, it is better to remove 
> redundant code for better readability. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] carbondata pull request #1465: [CARBONDATA-1667] Remove direct load related ...

2017-11-05 Thread jackylk
GitHub user jackylk opened a pull request:

https://github.com/apache/carbondata/pull/1465

[CARBONDATA-1667] Remove direct load related code

Currently carbon is always using DirectLoad from CSV. So related code can 
be removed.

 - [X] Any interfaces changed?
 No

 - [X] Any backward compatibility impacted?
 No

 - [X] Document update required?
No

 - [X] Testing done
No new testcase required.

 - [X] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jackylk/incubator-carbondata direct_load

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1465.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1465


commit 79ac155f2e8d1a87908f313aacd799a6fed49885
Author: Jacky Li 
Date:   2017-11-04T10:53:46Z

remove direct load




---


[jira] [Created] (CARBONDATA-1668) Remove isTableSplitPartition while loading

2017-11-05 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-1668:


 Summary: Remove isTableSplitPartition while loading
 Key: CARBONDATA-1668
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1668
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Jacky Li
Assignee: Jacky Li
 Fix For: 1.3.0


This option is always false, related code can be removed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] carbondata pull request #1466: [CARBONDATA-1668] remove isTableSplitPartitio...

2017-11-05 Thread jackylk
GitHub user jackylk opened a pull request:

https://github.com/apache/carbondata/pull/1466

[CARBONDATA-1668] remove isTableSplitPartition in data loading

This option is always false, related code can be removed

 - [X] Any interfaces changed?
 No

 - [X] Any backward compatibility impacted?
 No

 - [X] Document update required?
No

 - [X] Testing done
No new testcase is required

 - [X] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jackylk/incubator-carbondata 
remove_isTableSplitPartition

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1466.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1466


commit a35b8a2a81c54df100738879a1ec66004efc8ca7
Author: Jacky Li 
Date:   2017-11-05T12:35:09Z

remove isTableSplitPartition




---


[GitHub] carbondata issue #1429: [CARBONDATA-1662] Make ArrayType and StructType cont...

2017-11-05 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1429
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/808/



---


[GitHub] carbondata issue #1412: [CARBONDATA-1510] UDF test case added

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1412
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1440/



---


[GitHub] carbondata issue #1465: [CARBONDATA-1667] Remove direct load related code

2017-11-05 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1465
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/809/



---


[GitHub] carbondata pull request #1435: [CARBONDATA-1626]add data size and index size...

2017-11-05 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1435#discussion_r148957017
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -1376,6 +1376,32 @@
 
   public static final String BITSET_PIPE_LINE_DEFAULT = "true";
 
+  /**
+   * The total size of carbon data
+   */
+  public static final String CARBON_TOTAL_DATA_SIZE = "datasize";
+
+  /**
+   * The total size of carbon index
+   */
+  public static final String CARBON_TOTAL_INDEX_SIZE = "indexsize";
+
+  /**
+   * ENABLE_CALCULATE_DATA_INDEX_SIZE
+   */
+  @CarbonProperty public static final String ENABLE_CALCULATE_SIZE = 
"carbon.enable.calculate.size";
+
+  /**
+   * DEFAULT_ENABLE_CALCULATE_DATA_INDEX_SIZE
+   */
+  @CarbonProperty public static final String DEFAULT_ENABLE_CALCULATE_SIZE 
= "true";
--- End diff --

ok


---


[GitHub] carbondata issue #1466: [CARBONDATA-1668] remove isTableSplitPartition in da...

2017-11-05 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1466
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/810/



---


[GitHub] carbondata issue #1429: [CARBONDATA-1662] Make ArrayType and StructType cont...

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1429
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1441/



---


[GitHub] carbondata issue #1460: [Docs] Fix partition-guide.md docs NUM_PARTITIONS wr...

2017-11-05 Thread LiShuMing
Github user LiShuMing commented on the issue:

https://github.com/apache/carbondata/pull/1460
  
@chenliang613 According 
https://github.com/apache/carbondata/blob/master/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala#171
 says, `PARTITION_NUM ` keyword should be `NUM_ PARTITION. 

I think it's a spelling mistake?


---


[GitHub] carbondata issue #1465: [CARBONDATA-1667] Remove direct load related code

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1465
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1442/



---


[GitHub] carbondata issue #1466: [CARBONDATA-1668] remove isTableSplitPartition in da...

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1466
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1443/



---


[GitHub] carbondata pull request #1437: [CARBONDATA-1618] Fix issue of not support ta...

2017-11-05 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1437#discussion_r148961091
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableInfo.java
 ---
@@ -79,6 +79,9 @@
   // this idenifier is a lazy field which will be created when it is used 
first time
   private AbsoluteTableIdentifier identifier;
 
+  // table comment
+  private String tableComment;
--- End diff --

no need to add this attribute as it is already stored in properties


---


[jira] [Created] (CARBONDATA-1669) Clean up code in CarbonDataRDDFactory

2017-11-05 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-1669:


 Summary: Clean up code in CarbonDataRDDFactory
 Key: CARBONDATA-1669
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1669
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Jacky Li
Assignee: Jacky Li
 Fix For: 1.3.0


Inside CarbonDataRDDFactory.loadCarbonData, there are many function defined 
inside function, makes the loading logic very hard to read



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] carbondata pull request #1467: [CARBONDATA-1669] Clean up code in CarbonData...

2017-11-05 Thread jackylk
GitHub user jackylk opened a pull request:

https://github.com/apache/carbondata/pull/1467

[CARBONDATA-1669] Clean up code in CarbonDataRDDFactory

Inside CarbonDataRDDFactory.loadCarbonData, there are many function defined 
inside function, makes the loading logic very hard to read
This PR improves its readability

 - [X] Any interfaces changed?
 No

 - [X] Any backward compatibility impacted?
 No

 - [X] Document update required?
No

 - [X] Testing done
No new testcase is required

 - [X] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jackylk/incubator-carbondata refactor_loaddata

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1467.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1467


commit a35b8a2a81c54df100738879a1ec66004efc8ca7
Author: Jacky Li 
Date:   2017-11-05T12:35:09Z

remove isTableSplitPartition

commit df65121987660b455201ab99abbe063c93e4eb8c
Author: Jacky Li 
Date:   2017-11-05T16:41:29Z

refactor load data




---


[GitHub] carbondata issue #1467: [CARBONDATA-1669] Clean up code in CarbonDataRDDFact...

2017-11-05 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1467
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/811/



---


[GitHub] carbondata pull request #1455: [CARBONDATA-1624]Set the default value of 'ca...

2017-11-05 Thread zzcclp
Github user zzcclp commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1455#discussion_r148963119
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/LoadTableCommand.scala
 ---
@@ -84,6 +84,10 @@ case class LoadTableCommand(
 
 val carbonProperty: CarbonProperties = CarbonProperties.getInstance()
 carbonProperty.addProperty("zookeeper.enable.lock", "false")
+carbonProperty.addProperty(CarbonCommonConstants.NUM_CORES_LOADING,
+carbonProperty.getProperty(CarbonCommonConstants.NUM_CORES_LOADING,
+
Math.min(sparkSession.sparkContext.conf.getInt("spark.executor.cores", 1),
+CarbonCommonConstants.NUM_CORES_MAX_VAL).toString()))
--- End diff --

Can't modify *NUM_CORES_DEFAULT_VAL* to 32 directly, there are some places 
to use *NUM_CORES_DEFAULT_VAL*, for example:
in org.apache.carbondata.core.datastore.BlockIndexStore.getAll:
`try {`
`  numberOfCores = Integer.parseInt(CarbonProperties.getInstance()`
`  .getProperty(CarbonCommonConstants.NUM_CORES,`
`  CarbonCommonConstants.NUM_CORES_DEFAULT_VAL));`
`} catch (NumberFormatException e) {`
`  numberOfCores = 
Integer.parseInt(CarbonCommonConstants.NUM_CORES_DEFAULT_VAL);`
`}`

32 is too big. 




---


[GitHub] carbondata issue #1465: [CARBONDATA-1667] Remove direct load related code

2017-11-05 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1465
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/812/



---


[GitHub] carbondata issue #1458: [CARBONDATA-1663] Decouple spark and core modules

2017-11-05 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1458
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/813/



---


[GitHub] carbondata issue #1467: [CARBONDATA-1669] Clean up code in CarbonDataRDDFact...

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1467
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1444/



---


[GitHub] carbondata issue #1465: [CARBONDATA-1667] Remove direct load related code

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1465
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1445/



---


[GitHub] carbondata issue #1458: [CARBONDATA-1663] Decouple spark and core modules

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1458
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1446/



---


[GitHub] carbondata issue #1447: [CARBONDATA-1611][Streaming] Reject Update and Delet...

2017-11-05 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/1447
  
retest this please


---


[GitHub] carbondata issue #1448: [CARBONDATA-1656][Streaming] Reject alter table comm...

2017-11-05 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/1448
  
retest this please


---


[GitHub] carbondata issue #1448: [CARBONDATA-1656][Streaming] Reject alter table comm...

2017-11-05 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1448
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/814/



---


[GitHub] carbondata issue #1447: [CARBONDATA-1611][Streaming] Reject Update and Delet...

2017-11-05 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1447
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/815/



---


[GitHub] carbondata issue #1448: [CARBONDATA-1656][Streaming] Reject alter table comm...

2017-11-05 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1448
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/816/



---


[GitHub] carbondata issue #1350: [CARBONDATA-1475] fix default maven dependencies for...

2017-11-05 Thread QiangCai
Github user QiangCai commented on the issue:

https://github.com/apache/carbondata/pull/1350
  
close this pr. not required


---


[GitHub] carbondata pull request #1350: [CARBONDATA-1475] fix default maven dependenc...

2017-11-05 Thread QiangCai
Github user QiangCai closed the pull request at:

https://github.com/apache/carbondata/pull/1350


---


[GitHub] carbondata issue #1440: [WIP][CARBONDATA-1581][CARBONDATA-1582][Streaming] I...

2017-11-05 Thread QiangCai
Github user QiangCai commented on the issue:

https://github.com/apache/carbondata/pull/1440
  
close this pr
I will raise new pr.


---


[GitHub] carbondata pull request #1440: [WIP][CARBONDATA-1581][CARBONDATA-1582][Strea...

2017-11-05 Thread QiangCai
Github user QiangCai closed the pull request at:

https://github.com/apache/carbondata/pull/1440


---


[GitHub] carbondata issue #1447: [CARBONDATA-1611][Streaming] Reject Update and Delet...

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1447
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1447/



---


[GitHub] carbondata issue #1465: [CARBONDATA-1667] Remove direct load related code

2017-11-05 Thread QiangCai
Github user QiangCai commented on the issue:

https://github.com/apache/carbondata/pull/1465
  
LGTM


---


[GitHub] carbondata issue #1467: [CARBONDATA-1669] Clean up code in CarbonDataRDDFact...

2017-11-05 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1467
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/818/



---


[GitHub] carbondata issue #1455: [CARBONDATA-1624]Set the default value of 'carbon.nu...

2017-11-05 Thread zzcclp
Github user zzcclp commented on the issue:

https://github.com/apache/carbondata/pull/1455
  
@jackylk please review.


---


[GitHub] carbondata pull request #1459: [CARBONDATA-1661] Fixed bug related to displa...

2017-11-05 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1459#discussion_r148986553
  
--- Diff: 
integration/presto/src/main/java/org/apache/carbondata/presto/PrestoFilterUtil.java
 ---
@@ -258,6 +258,8 @@ else if (type instanceof DecimalType) {
 return new BigDecimal(new BigInteger(String.valueOf(rawdata)),
 ((DecimalType) type).getScale());
   }
+} else if (type.equals(TimestampType.TIMESTAMP)) {
--- End diff --

You can use `type == DataTypes.TIMESTAMP` instead


---


[GitHub] carbondata issue #1448: [CARBONDATA-1656][Streaming] Reject alter table comm...

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1448
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1448/



---


[GitHub] carbondata pull request #1455: [CARBONDATA-1624]Set the default value of 'ca...

2017-11-05 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1455#discussion_r148987672
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/LoadTableCommand.scala
 ---
@@ -84,6 +84,32 @@ case class LoadTableCommand(
 
 val carbonProperty: CarbonProperties = CarbonProperties.getInstance()
 carbonProperty.addProperty("zookeeper.enable.lock", "false")
+
+val numCoresLoading =
+  try {
+Integer.parseInt(CarbonProperties.getInstance()
+.getProperty(CarbonCommonConstants.NUM_CORES_LOADING,
+CarbonCommonConstants.NUM_CORES_MAX_VAL.toString()))
+  } catch {
+case exc: NumberFormatException =>
+  LOGGER.error("Configured value for property " + 
CarbonCommonConstants.NUM_CORES_LOADING
+  + " is wrong. ")
+  CarbonCommonConstants.NUM_CORES_MAX_VAL
+  }
+// Get the minimum value of 'spark.executor.cores' and 
NUM_CORES_LOADING,
+// If user set the NUM_CORES_LOADING, it can't exceed the value of 
'spark.executor.cores';
+// If user doesn't set the NUM_CORES_LOADING, it will use the value of 
'spark.executor.cores',
+// but the value can't exceed the value of NUM_CORES_MAX_VAL,
+// NUM_CORES_LOADING's default value is NUM_CORES_MAX_VAL;
+val newNumCoresLoading =
+  Math.min(
+  sparkSession.sparkContext.conf.getInt("spark.executor.cores", 1),
+  numCoresLoading
+  )
+// update the property with new value
+carbonProperty.addProperty(CarbonCommonConstants.NUM_CORES_LOADING,
+newNumCoresLoading.toString())
+
--- End diff --

I think you can set the `spark.task.cpus` here so that spark will know 
carbon is using more cores for one task.


---


[GitHub] carbondata issue #1455: [CARBONDATA-1624]Set the default value of 'carbon.nu...

2017-11-05 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1455
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/819/



---


[GitHub] carbondata issue #1467: [CARBONDATA-1669] Clean up code in CarbonDataRDDFact...

2017-11-05 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/1467
  
retest this please


---


[GitHub] carbondata pull request #1223: [WIP] Support cleaning garbage segment in all...

2017-11-05 Thread jackylk
Github user jackylk closed the pull request at:

https://github.com/apache/carbondata/pull/1223


---


[GitHub] carbondata pull request #1067: [CARBONDATA-1199] support dynamically enablin...

2017-11-05 Thread jackylk
Github user jackylk closed the pull request at:

https://github.com/apache/carbondata/pull/1067


---


[GitHub] carbondata issue #1447: [CARBONDATA-1611][Streaming] Reject Update and Delet...

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1447
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1449/



---


[GitHub] carbondata issue #1429: [CARBONDATA-1662] Make ArrayType and StructType cont...

2017-11-05 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1429
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/820/



---


[GitHub] carbondata issue #1467: [CARBONDATA-1669] Clean up code in CarbonDataRDDFact...

2017-11-05 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1467
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/821/



---


[GitHub] carbondata pull request #1465: [CARBONDATA-1667] Remove direct load related ...

2017-11-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1465


---


[GitHub] carbondata issue #1467: [CARBONDATA-1669] Clean up code in CarbonDataRDDFact...

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1467
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1450/



---


[GitHub] carbondata pull request #1464: [WIP][CARBONDATA-1523]Pre Aggregate table sel...

2017-11-05 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1464#discussion_r148991769
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java
 ---
@@ -124,6 +124,8 @@
*/
   private int numberOfNoDictSortColumns;
 
+  private boolean hasPreAggDataMap;
--- End diff --

Better name it as 'hasChildDataMap'


---


[GitHub] carbondata pull request #1464: [WIP][CARBONDATA-1523]Pre Aggregate table sel...

2017-11-05 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1464#discussion_r148992553
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/metadata/schema/table/DataMapSchema.java
 ---
@@ -67,6 +92,83 @@ public void setRelationIdentifier(RelationIdentifier 
relationIdentifier) {
 
   public void setChildSchema(TableSchema childSchema) {
 this.childSchema = childSchema;
+List listOfColumns = this.childSchema.getListOfColumns();
+fillNonAggFunctionColumns(listOfColumns);
+fillAggFunctionColumns(listOfColumns);
+fillParentNameToAggregationMapping(listOfColumns);
+  }
+
+  /**
+   * Method to prepare mapping of parent to list of aggregation function 
applied on that column
+   * @param listOfColumns
+   *child column schema list
+   */
+  private void fillParentNameToAggregationMapping(List 
listOfColumns) {
+parentColumnToAggregationsMapping = new HashMap<>();
+for (ColumnSchema column : listOfColumns) {
+  if (null != column.getAggFunction() && 
!column.getAggFunction().isEmpty()) {
+List parentColumnTableRelations =
+column.getParentColumnTableRelations();
+if (null != parentColumnTableRelations) {
--- End diff --

Please check the size of this list as well or iterate the list instead of 
always getting from 0 element.


---


[GitHub] carbondata pull request #1464: [WIP][CARBONDATA-1523]Pre Aggregate table sel...

2017-11-05 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1464#discussion_r148992649
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/metadata/schema/table/DataMapSchema.java
 ---
@@ -67,6 +92,83 @@ public void setRelationIdentifier(RelationIdentifier 
relationIdentifier) {
 
   public void setChildSchema(TableSchema childSchema) {
 this.childSchema = childSchema;
+List listOfColumns = this.childSchema.getListOfColumns();
+fillNonAggFunctionColumns(listOfColumns);
+fillAggFunctionColumns(listOfColumns);
+fillParentNameToAggregationMapping(listOfColumns);
--- End diff --

I feel all the above 3 functions doing the almost same job, why don't you 
combine all of them.


---


[GitHub] carbondata issue #1467: [CARBONDATA-1669] Clean up code in CarbonDataRDDFact...

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1467
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1451/



---


[GitHub] carbondata pull request #1464: [WIP][CARBONDATA-1523]Pre Aggregate table sel...

2017-11-05 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1464#discussion_r148995479
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/metadata/schema/table/DataMapSchema.java
 ---
@@ -41,6 +48,24 @@
*/
   private Map properties;
 
+  /**
+   * map of parent column name to set of child column column without
+   * aggregation function
+   */
+  private Map> parentToNonAggChildMapping;
+
+  /**
+   * map of parent column name to set of child columns column with
+   * aggregation function
+   */
+  private Map> parentToAggChildMapping;
+
+  /**
+   * map of parent column name to set of aggregation function applied in
+   * in parent column
+   */
+  private Map> parentColumnToAggregationsMapping;
+
   public DataMapSchema(String className) {
--- End diff --

Create a factory and extend this datamap schema and implement as per the 
class name. All aggdatamap related should go to AggregationDataMapSchema class. 
DataMapSchema should be the generic class and it should only contains the 
generic attributes.


---


[GitHub] carbondata pull request #1464: [WIP][CARBONDATA-1523]Pre Aggregate table sel...

2017-11-05 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1464#discussion_r148995547
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/metadata/schema/table/DataMapSchema.java
 ---
@@ -116,6 +218,83 @@ public void setProperties(Map 
properties) {
   String value = in.readUTF();
   this.properties.put(key, value);
 }
+  }
 
+  /**
+   * Below method will be used to get the columns on which aggregate 
function is not applied
+   * @param columnName
+   *parent column name
+   * @return child column schema
+   */
+  public ColumnSchema getChildColBasedByParentForNonAggF(String 
columnName) {
--- End diff --

Better name as `getNonAggChildColumnByParentColName`


---


[GitHub] carbondata pull request #1464: [WIP][CARBONDATA-1523]Pre Aggregate table sel...

2017-11-05 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1464#discussion_r148995569
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/metadata/schema/table/DataMapSchema.java
 ---
@@ -116,6 +218,83 @@ public void setProperties(Map 
properties) {
   String value = in.readUTF();
   this.properties.put(key, value);
 }
+  }
 
+  /**
+   * Below method will be used to get the columns on which aggregate 
function is not applied
+   * @param columnName
+   *parent column name
+   * @return child column schema
+   */
+  public ColumnSchema getChildColBasedByParentForNonAggF(String 
columnName) {
+Set columnSchemas = 
parentToNonAggChildMapping.get(columnName);
+if (null != columnSchemas) {
+  Iterator iterator = columnSchemas.iterator();
+  while (iterator.hasNext()) {
+ColumnSchema next = iterator.next();
+if (null == next.getAggFunction() || 
next.getAggFunction().isEmpty()) {
+  return next;
+}
+  }
+}
+return null;
+  }
+
+  /**
+   * Below method will be used to get the column schema based on parent 
column name
+   * @param columName
+   *parent column name
+   * @return child column schmea
+   */
+  public ColumnSchema getChildColumnByParentName(String columName) {
--- End diff --

better name as `getChildColumnByParentColName'


---


[GitHub] carbondata pull request #1464: [WIP][CARBONDATA-1523]Pre Aggregate table sel...

2017-11-05 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1464#discussion_r148995599
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/metadata/schema/table/DataMapSchema.java
 ---
@@ -116,6 +218,83 @@ public void setProperties(Map 
properties) {
   String value = in.readUTF();
   this.properties.put(key, value);
 }
+  }
 
+  /**
+   * Below method will be used to get the columns on which aggregate 
function is not applied
+   * @param columnName
+   *parent column name
+   * @return child column schema
+   */
+  public ColumnSchema getChildColBasedByParentForNonAggF(String 
columnName) {
+Set columnSchemas = 
parentToNonAggChildMapping.get(columnName);
+if (null != columnSchemas) {
+  Iterator iterator = columnSchemas.iterator();
+  while (iterator.hasNext()) {
+ColumnSchema next = iterator.next();
+if (null == next.getAggFunction() || 
next.getAggFunction().isEmpty()) {
+  return next;
+}
+  }
+}
+return null;
+  }
+
+  /**
+   * Below method will be used to get the column schema based on parent 
column name
+   * @param columName
+   *parent column name
+   * @return child column schmea
+   */
+  public ColumnSchema getChildColumnByParentName(String columName) {
+List listOfColumns = childSchema.getListOfColumns();
+for (ColumnSchema columnSchema : listOfColumns) {
+  List parentColumnTableRelations =
+  columnSchema.getParentColumnTableRelations();
+  if 
(parentColumnTableRelations.get(0).getColumnName().equals(columName)) {
+return columnSchema;
+  }
+}
+return null;
+  }
+
+  /**
+   * Below method will be used to get the child column schema based on 
parent name and aggregate
+   * function applied on column
+   * @param columnName
+   *  parent column name
+   * @param aggFunction
+   *  aggregate function applied
+   * @return child column schema
+   */
+  public ColumnSchema getChildColByParentWithAggFun(String columnName,
--- End diff --

Better name as `getAggChildColumnByParentColName`


---


[GitHub] carbondata issue #1455: [CARBONDATA-1624]Set the default value of 'carbon.nu...

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1455
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1452/



---


[GitHub] carbondata pull request #1464: [WIP][CARBONDATA-1523]Pre Aggregate table sel...

2017-11-05 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1464#discussion_r148996693
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/preagg/QueryPlan.java ---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.core.preagg;
+
+import java.util.List;
+
+/**
+ * class to maintain the query plan to select the data map tables
+ */
+public class QueryPlan {
+
+  /**
+   * List of projection columns
+   */
+  private List projectionColumn;
+
+  /**
+   * list of aggregation columns
+   */
+  private List aggregationColumns;
--- End diff --

I think It is not required to separate out `aggregationColumns` , all 
should be part of `projectionColumn` . Just add one method `hasAggFunc` to 
`QueryColumn`


---


[GitHub] carbondata pull request #1455: [CARBONDATA-1624]Set the default value of 'ca...

2017-11-05 Thread zzcclp
Github user zzcclp commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1455#discussion_r148997065
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/LoadTableCommand.scala
 ---
@@ -84,6 +84,32 @@ case class LoadTableCommand(
 
 val carbonProperty: CarbonProperties = CarbonProperties.getInstance()
 carbonProperty.addProperty("zookeeper.enable.lock", "false")
+
+val numCoresLoading =
+  try {
+Integer.parseInt(CarbonProperties.getInstance()
+.getProperty(CarbonCommonConstants.NUM_CORES_LOADING,
+CarbonCommonConstants.NUM_CORES_MAX_VAL.toString()))
+  } catch {
+case exc: NumberFormatException =>
+  LOGGER.error("Configured value for property " + 
CarbonCommonConstants.NUM_CORES_LOADING
+  + " is wrong. ")
+  CarbonCommonConstants.NUM_CORES_MAX_VAL
+  }
+// Get the minimum value of 'spark.executor.cores' and 
NUM_CORES_LOADING,
+// If user set the NUM_CORES_LOADING, it can't exceed the value of 
'spark.executor.cores';
+// If user doesn't set the NUM_CORES_LOADING, it will use the value of 
'spark.executor.cores',
+// but the value can't exceed the value of NUM_CORES_MAX_VAL,
+// NUM_CORES_LOADING's default value is NUM_CORES_MAX_VAL;
+val newNumCoresLoading =
+  Math.min(
+  sparkSession.sparkContext.conf.getInt("spark.executor.cores", 1),
+  numCoresLoading
+  )
+// update the property with new value
+carbonProperty.addProperty(CarbonCommonConstants.NUM_CORES_LOADING,
+newNumCoresLoading.toString())
+
--- End diff --

I think it is unnecessary, If do so, it will affect other jobs and reduce 
the task parallelism, right?


---


[GitHub] carbondata pull request #1464: [WIP][CARBONDATA-1523]Pre Aggregate table sel...

2017-11-05 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1464#discussion_r148997142
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala ---
@@ -51,6 +51,7 @@ class CarbonEnv {
 
   def init(sparkSession: SparkSession): Unit = {
 sparkSession.udf.register("getTupleId", () => "")
+sparkSession.udf.register("preAgg", () => "")
--- End diff --

add comment for usage


---


[GitHub] carbondata pull request #1464: [WIP][CARBONDATA-1523]Pre Aggregate table sel...

2017-11-05 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1464#discussion_r148997655
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/CreatePreAggregateTableCommand.scala
 ---
@@ -61,7 +66,7 @@ case class CreatePreAggregateTableCommand(
 val dbName = cm.databaseName
 LOGGER.audit(s"Creating Table with Database name [$dbName] and Table 
name [$tbName]")
 // getting the parent table
-val parentTable = 
PreAggregateUtil.getParentCarbonTable(dataFrame.logicalPlan)
+val parentTable = PreAggregateUtil.getParentCarbonTable(logicalPlan)
--- End diff --

most of the content of class same as `CreateTableCommand`, so better call 
that command from here


---


[GitHub] carbondata pull request #1464: [WIP][CARBONDATA-1523]Pre Aggregate table sel...

2017-11-05 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1464#discussion_r148997835
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateUtil.scala
 ---
@@ -49,70 +50,109 @@ object PreAggregateUtil {
 
   def getParentCarbonTable(plan: LogicalPlan): CarbonTable = {
 plan match {
-  case Aggregate(_, aExp, SubqueryAlias(_, l: LogicalRelation, _))
-if l.relation.isInstanceOf[CarbonDatasourceHadoopRelation] =>
-
l.relation.asInstanceOf[CarbonDatasourceHadoopRelation].carbonRelation.metaData.carbonTable
+  case Aggregate(_, _, SubqueryAlias(_, logicalRelation: 
LogicalRelation, _))
+if 
logicalRelation.relation.isInstanceOf[CarbonDatasourceHadoopRelation] =>
+
logicalRelation.relation.asInstanceOf[CarbonDatasourceHadoopRelation].
+  carbonRelation.metaData.carbonTable
+  case Aggregate(_, _, logicalRelation: LogicalRelation)
+if 
logicalRelation.relation.isInstanceOf[CarbonDatasourceHadoopRelation] =>
+
logicalRelation.relation.asInstanceOf[CarbonDatasourceHadoopRelation].
+  carbonRelation.metaData.carbonTable
   case _ => throw new MalformedCarbonCommandException("table does not 
exist")
--- End diff --

It is not actually table does not exist. it is the plan doesn't match


---


[GitHub] carbondata pull request #1464: [WIP][CARBONDATA-1523]Pre Aggregate table sel...

2017-11-05 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1464#discussion_r148998207
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateUtil.scala
 ---
@@ -49,70 +50,109 @@ object PreAggregateUtil {
 
   def getParentCarbonTable(plan: LogicalPlan): CarbonTable = {
 plan match {
-  case Aggregate(_, aExp, SubqueryAlias(_, l: LogicalRelation, _))
-if l.relation.isInstanceOf[CarbonDatasourceHadoopRelation] =>
-
l.relation.asInstanceOf[CarbonDatasourceHadoopRelation].carbonRelation.metaData.carbonTable
+  case Aggregate(_, _, SubqueryAlias(_, logicalRelation: 
LogicalRelation, _))
+if 
logicalRelation.relation.isInstanceOf[CarbonDatasourceHadoopRelation] =>
+
logicalRelation.relation.asInstanceOf[CarbonDatasourceHadoopRelation].
+  carbonRelation.metaData.carbonTable
+  case Aggregate(_, _, logicalRelation: LogicalRelation)
+if 
logicalRelation.relation.isInstanceOf[CarbonDatasourceHadoopRelation] =>
+
logicalRelation.relation.asInstanceOf[CarbonDatasourceHadoopRelation].
+  carbonRelation.metaData.carbonTable
   case _ => throw new MalformedCarbonCommandException("table does not 
exist")
 }
   }
 
   /**
* Below method will be used to validate the select plan
* and get the required fields from select plan
-   * Currently only aggregate query is support any other type of query will
-   * fail
+   * Currently only aggregate query is support any other type of query 
will fail
+   *
* @param plan
* @param selectStmt
* @return list of fields
*/
   def validateActualSelectPlanAndGetAttrubites(plan: LogicalPlan,
--- End diff --

typo `Attributes`


---


[GitHub] carbondata pull request #1459: [CARBONDATA-1661] Fixed bug related to displa...

2017-11-05 Thread geetikagupta16
Github user geetikagupta16 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1459#discussion_r148998311
  
--- Diff: 
integration/presto/src/main/java/org/apache/carbondata/presto/PrestoFilterUtil.java
 ---
@@ -258,6 +258,8 @@ else if (type instanceof DecimalType) {
 return new BigDecimal(new BigInteger(String.valueOf(rawdata)),
 ((DecimalType) type).getScale());
   }
+} else if (type.equals(TimestampType.TIMESTAMP)) {
--- End diff --

Here I am comparing the type with Presto's datatypes that's why I have used 
`TimestampType.TIMESTAMP`


---


[GitHub] carbondata issue #1429: [CARBONDATA-1662] Make ArrayType and StructType cont...

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1429
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1453/



---


[GitHub] carbondata pull request #1464: [WIP][CARBONDATA-1523]Pre Aggregate table sel...

2017-11-05 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1464#discussion_r149001761
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateUtil.scala
 ---
@@ -49,70 +50,109 @@ object PreAggregateUtil {
 
   def getParentCarbonTable(plan: LogicalPlan): CarbonTable = {
 plan match {
-  case Aggregate(_, aExp, SubqueryAlias(_, l: LogicalRelation, _))
-if l.relation.isInstanceOf[CarbonDatasourceHadoopRelation] =>
-
l.relation.asInstanceOf[CarbonDatasourceHadoopRelation].carbonRelation.metaData.carbonTable
+  case Aggregate(_, _, SubqueryAlias(_, logicalRelation: 
LogicalRelation, _))
+if 
logicalRelation.relation.isInstanceOf[CarbonDatasourceHadoopRelation] =>
+
logicalRelation.relation.asInstanceOf[CarbonDatasourceHadoopRelation].
+  carbonRelation.metaData.carbonTable
+  case Aggregate(_, _, logicalRelation: LogicalRelation)
+if 
logicalRelation.relation.isInstanceOf[CarbonDatasourceHadoopRelation] =>
+
logicalRelation.relation.asInstanceOf[CarbonDatasourceHadoopRelation].
+  carbonRelation.metaData.carbonTable
   case _ => throw new MalformedCarbonCommandException("table does not 
exist")
 }
   }
 
   /**
* Below method will be used to validate the select plan
* and get the required fields from select plan
-   * Currently only aggregate query is support any other type of query will
-   * fail
+   * Currently only aggregate query is support any other type of query 
will fail
+   *
* @param plan
* @param selectStmt
* @return list of fields
*/
   def validateActualSelectPlanAndGetAttrubites(plan: LogicalPlan,
   selectStmt: String): scala.collection.mutable.LinkedHashMap[Field, 
DataMapField] = {
-val fieldToDataMapFieldMap = 
scala.collection.mutable.LinkedHashMap.empty[Field, DataMapField]
 plan match {
-  case Aggregate(_, aExp, SubqueryAlias(_, l: LogicalRelation, _))
-if l.relation.isInstanceOf[CarbonDatasourceHadoopRelation] =>
-val carbonTable = 
l.relation.asInstanceOf[CarbonDatasourceHadoopRelation].carbonRelation
-  .metaData.carbonTable
-val parentTableName = 
carbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier
-  .getTableName
-val parentDatabaseName = 
carbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier
-  .getDatabaseName
-val parentTableId = 
carbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier
-  .getTableId
-if 
(!carbonTable.getTableInfo.getParentRelationIdentifiers.isEmpty) {
+  case Aggregate(groupByExp, aggExp, SubqueryAlias(_, logicalRelation: 
LogicalRelation, _)) =>
+getFieldsFromPlan(groupByExp, aggExp, logicalRelation, selectStmt)
+  case Aggregate(groupByExp, aggExp, logicalRelation: LogicalRelation) 
=>
+getFieldsFromPlan(groupByExp, aggExp, logicalRelation, selectStmt)
+  case _ =>
+throw new MalformedCarbonCommandException(s"Unsupported Select 
Statement:${ selectStmt } ")
+}
+  }
+
+  /**
+   * Below method will be used to get the fields from expressions
+   * @param groupByExp
+   *  grouping expression
+   * @param aggExp
+   *   aggregate expression
+   * @param logicalRelation
+   *logical relation
+   * @param selectStmt
+   *   select statement
+   * @return fields from expressions
+   */
+  def getFieldsFromPlan(groupByExp: Seq[Expression],
+  aggExp: Seq[NamedExpression], logicalRelation: LogicalRelation, 
selectStmt: String):
+  scala.collection.mutable.LinkedHashMap[Field, DataMapField] = {
+val fieldToDataMapFieldMap = 
scala.collection.mutable.LinkedHashMap.empty[Field, DataMapField]
+if 
(!logicalRelation.relation.isInstanceOf[CarbonDatasourceHadoopRelation]) {
+  throw new MalformedCarbonCommandException("Un-supported table")
+}
+val carbonTable = logicalRelation.relation.
+  asInstanceOf[CarbonDatasourceHadoopRelation].carbonRelation
+  .metaData.carbonTable
+val parentTableName = 
carbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier
+  .getTableName
+val parentDatabaseName = 
carbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier
+  .getDatabaseName
+val parentTableId = 
carbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier
+  .getTableId
+if (!carbonTable.getTableInfo.getParentRelationIdentifiers.isEmpty) {
+  throw new MalformedCarbonCommandException(
+"Pre Aggregation i

[GitHub] carbondata pull request #1464: [WIP][CARBONDATA-1523]Pre Aggregate table sel...

2017-11-05 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1464#discussion_r149002531
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateUtil.scala
 ---
@@ -16,17 +16,18 @@
  */
 package org.apache.spark.sql.execution.command.preaaggregate
 
-import scala.collection.mutable.ListBuffer
+import scala.collection.mutable.{ArrayBuffer, ListBuffer}
--- End diff --

what is the use of method `prepareSchemaJson`


---


[GitHub] carbondata pull request #1464: [WIP][CARBONDATA-1523]Pre Aggregate table sel...

2017-11-05 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1464#discussion_r149003521
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateQueryRules.scala
 ---
@@ -0,0 +1,756 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.{AnalysisException, 
CarbonDatasourceHadoopRelation, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAlias
+import org.apache.spark.sql.catalyst.expressions.{Alias, 
AttributeReference, Cast, Divide,
+Expression, NamedExpression}
+import org.apache.spark.sql.catalyst.expressions.aggregate._
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Filter, 
LogicalPlan, SubqueryAlias}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.types._
+
+import org.apache.carbondata.core.metadata.schema.table.{CarbonTable, 
DataMapSchema}
+import org.apache.carbondata.core.preagg.{AggregateTableSelector, 
QueryColumn, QueryPlan}
+import org.apache.carbondata.spark.util.CarbonScalaUtil
+
+/**
+ * Class for applying Pre Aggregate rules
+ * Responsibility.
+ * 1. Check plan is valid plan for updating the parent table plan with 
child table
+ * 2. Updated the plan based on child schema
+ *
+ * Rules for Upadating the plan
+ * 1. Grouping expression rules
+ *1.1 Change the parent attribute reference for of group expression
+ * to child attribute reference
+ *
+ * 2. Aggregate expression rules
+ *2.1 Change the parent attribute reference for of group expression to
+ * child attribute reference
+ *2.2 Change the count AggregateExpression to Sum as count
+ * is already calculated so in case of aggregate table
+ * we need to apply sum to get the count
+ *2.2 In case of average aggregate function select 2 columns from 
aggregate table with
+ * aggregation
+ * sum and count. Then add divide(sum(column with sum), sum(column with 
count)).
+ * Note: During aggregate table creation for average table will be created 
with two columns
+ * one for sum(column) and count(column) to support rollup
+ *
+ * 3. Filter Expression rules.
+ *3.1 Updated filter expression attributes with child table attributes
+ * 4. Update the Parent Logical relation with child Logical relation
+ *
+ * @param sparkSession
+ * spark session
+ */
+case class CarbonPreAggregateQueryRules(sparkSession: SparkSession) 
extends Rule[LogicalPlan] {
+
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+var needAnalysis = true
+plan.transformExpressions {
+  // first check if any preAgg scala function is applied it is present 
is in plan
+  // then call is from create preaggregate table class so no need to 
transform the query plan
+  case al@Alias(_, name) if name.equals("preAgg") =>
+needAnalysis = false
+al
+  // in case of query if any unresolve alias is present then wait for 
plan to be resolved
+  // return the same plan as we can tranform the plan only when 
everything is resolved
+  case unresolveAlias@UnresolvedAlias(_, _) =>
+needAnalysis = false
+unresolveAlias
+}
+// if plan is not valid for transformation then return same plan
+if (!needAnalysis) {
+  plan
+} else {
+  // create buffer to collect all the column and its metadata 
information
+  val list = scala.collection.mutable.ListBuffer.empty[QueryColumn]
+  var isValidPlan = true
+  val carbonTable = plan match {
+// matching the plan based on supported plan
+// if plan is matches with any case it will validate and get all
+// info

[GitHub] carbondata pull request #1464: [WIP][CARBONDATA-1523]Pre Aggregate table sel...

2017-11-05 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1464#discussion_r149003605
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateQueryRules.scala
 ---
@@ -0,0 +1,756 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.{AnalysisException, 
CarbonDatasourceHadoopRelation, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAlias
+import org.apache.spark.sql.catalyst.expressions.{Alias, 
AttributeReference, Cast, Divide,
+Expression, NamedExpression}
+import org.apache.spark.sql.catalyst.expressions.aggregate._
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Filter, 
LogicalPlan, SubqueryAlias}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.types._
+
+import org.apache.carbondata.core.metadata.schema.table.{CarbonTable, 
DataMapSchema}
+import org.apache.carbondata.core.preagg.{AggregateTableSelector, 
QueryColumn, QueryPlan}
+import org.apache.carbondata.spark.util.CarbonScalaUtil
+
+/**
+ * Class for applying Pre Aggregate rules
+ * Responsibility.
+ * 1. Check plan is valid plan for updating the parent table plan with 
child table
+ * 2. Updated the plan based on child schema
+ *
+ * Rules for Upadating the plan
+ * 1. Grouping expression rules
+ *1.1 Change the parent attribute reference for of group expression
+ * to child attribute reference
+ *
+ * 2. Aggregate expression rules
+ *2.1 Change the parent attribute reference for of group expression to
+ * child attribute reference
+ *2.2 Change the count AggregateExpression to Sum as count
+ * is already calculated so in case of aggregate table
+ * we need to apply sum to get the count
+ *2.2 In case of average aggregate function select 2 columns from 
aggregate table with
+ * aggregation
+ * sum and count. Then add divide(sum(column with sum), sum(column with 
count)).
+ * Note: During aggregate table creation for average table will be created 
with two columns
+ * one for sum(column) and count(column) to support rollup
+ *
+ * 3. Filter Expression rules.
+ *3.1 Updated filter expression attributes with child table attributes
+ * 4. Update the Parent Logical relation with child Logical relation
+ *
+ * @param sparkSession
+ * spark session
+ */
+case class CarbonPreAggregateQueryRules(sparkSession: SparkSession) 
extends Rule[LogicalPlan] {
+
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+var needAnalysis = true
+plan.transformExpressions {
+  // first check if any preAgg scala function is applied it is present 
is in plan
+  // then call is from create preaggregate table class so no need to 
transform the query plan
+  case al@Alias(_, name) if name.equals("preAgg") =>
+needAnalysis = false
+al
+  // in case of query if any unresolve alias is present then wait for 
plan to be resolved
+  // return the same plan as we can tranform the plan only when 
everything is resolved
+  case unresolveAlias@UnresolvedAlias(_, _) =>
+needAnalysis = false
+unresolveAlias
+}
+// if plan is not valid for transformation then return same plan
+if (!needAnalysis) {
+  plan
+} else {
+  // create buffer to collect all the column and its metadata 
information
+  val list = scala.collection.mutable.ListBuffer.empty[QueryColumn]
+  var isValidPlan = true
+  val carbonTable = plan match {
+// matching the plan based on supported plan
+// if plan is matches with any case it will validate and get all
+// info

[GitHub] carbondata issue #1467: [CARBONDATA-1669] Clean up code in CarbonDataRDDFact...

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1467
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1454/



---


[GitHub] carbondata pull request #1464: [WIP][CARBONDATA-1523]Pre Aggregate table sel...

2017-11-05 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1464#discussion_r149004253
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateQueryRules.scala
 ---
@@ -0,0 +1,756 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.{AnalysisException, 
CarbonDatasourceHadoopRelation, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAlias
+import org.apache.spark.sql.catalyst.expressions.{Alias, 
AttributeReference, Cast, Divide,
+Expression, NamedExpression}
+import org.apache.spark.sql.catalyst.expressions.aggregate._
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Filter, 
LogicalPlan, SubqueryAlias}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.types._
+
+import org.apache.carbondata.core.metadata.schema.table.{CarbonTable, 
DataMapSchema}
+import org.apache.carbondata.core.preagg.{AggregateTableSelector, 
QueryColumn, QueryPlan}
+import org.apache.carbondata.spark.util.CarbonScalaUtil
+
+/**
+ * Class for applying Pre Aggregate rules
+ * Responsibility.
+ * 1. Check plan is valid plan for updating the parent table plan with 
child table
+ * 2. Updated the plan based on child schema
+ *
+ * Rules for Upadating the plan
+ * 1. Grouping expression rules
+ *1.1 Change the parent attribute reference for of group expression
+ * to child attribute reference
+ *
+ * 2. Aggregate expression rules
+ *2.1 Change the parent attribute reference for of group expression to
+ * child attribute reference
+ *2.2 Change the count AggregateExpression to Sum as count
+ * is already calculated so in case of aggregate table
+ * we need to apply sum to get the count
+ *2.2 In case of average aggregate function select 2 columns from 
aggregate table with
+ * aggregation
+ * sum and count. Then add divide(sum(column with sum), sum(column with 
count)).
+ * Note: During aggregate table creation for average table will be created 
with two columns
+ * one for sum(column) and count(column) to support rollup
+ *
+ * 3. Filter Expression rules.
+ *3.1 Updated filter expression attributes with child table attributes
+ * 4. Update the Parent Logical relation with child Logical relation
+ *
+ * @param sparkSession
+ * spark session
+ */
+case class CarbonPreAggregateQueryRules(sparkSession: SparkSession) 
extends Rule[LogicalPlan] {
+
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+var needAnalysis = true
+plan.transformExpressions {
+  // first check if any preAgg scala function is applied it is present 
is in plan
+  // then call is from create preaggregate table class so no need to 
transform the query plan
+  case al@Alias(_, name) if name.equals("preAgg") =>
+needAnalysis = false
+al
+  // in case of query if any unresolve alias is present then wait for 
plan to be resolved
+  // return the same plan as we can tranform the plan only when 
everything is resolved
+  case unresolveAlias@UnresolvedAlias(_, _) =>
+needAnalysis = false
+unresolveAlias
+}
+// if plan is not valid for transformation then return same plan
+if (!needAnalysis) {
+  plan
+} else {
+  // create buffer to collect all the column and its metadata 
information
+  val list = scala.collection.mutable.ListBuffer.empty[QueryColumn]
+  var isValidPlan = true
+  val carbonTable = plan match {
+// matching the plan based on supported plan
+// if plan is matches with any case it will validate and get all
+// info

[GitHub] carbondata issue #1455: [CARBONDATA-1624]Set the default value of 'carbon.nu...

2017-11-05 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1455
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/822/



---


[GitHub] carbondata pull request #1464: [WIP][CARBONDATA-1523]Pre Aggregate table sel...

2017-11-05 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1464#discussion_r149004422
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateQueryRules.scala
 ---
@@ -0,0 +1,756 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.{AnalysisException, 
CarbonDatasourceHadoopRelation, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAlias
+import org.apache.spark.sql.catalyst.expressions.{Alias, 
AttributeReference, Cast, Divide,
+Expression, NamedExpression}
+import org.apache.spark.sql.catalyst.expressions.aggregate._
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Filter, 
LogicalPlan, SubqueryAlias}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.types._
+
+import org.apache.carbondata.core.metadata.schema.table.{CarbonTable, 
DataMapSchema}
+import org.apache.carbondata.core.preagg.{AggregateTableSelector, 
QueryColumn, QueryPlan}
+import org.apache.carbondata.spark.util.CarbonScalaUtil
+
+/**
+ * Class for applying Pre Aggregate rules
+ * Responsibility.
+ * 1. Check plan is valid plan for updating the parent table plan with 
child table
+ * 2. Updated the plan based on child schema
+ *
+ * Rules for Upadating the plan
+ * 1. Grouping expression rules
+ *1.1 Change the parent attribute reference for of group expression
+ * to child attribute reference
+ *
+ * 2. Aggregate expression rules
+ *2.1 Change the parent attribute reference for of group expression to
+ * child attribute reference
+ *2.2 Change the count AggregateExpression to Sum as count
+ * is already calculated so in case of aggregate table
+ * we need to apply sum to get the count
+ *2.2 In case of average aggregate function select 2 columns from 
aggregate table with
+ * aggregation
+ * sum and count. Then add divide(sum(column with sum), sum(column with 
count)).
+ * Note: During aggregate table creation for average table will be created 
with two columns
+ * one for sum(column) and count(column) to support rollup
+ *
+ * 3. Filter Expression rules.
+ *3.1 Updated filter expression attributes with child table attributes
+ * 4. Update the Parent Logical relation with child Logical relation
+ *
+ * @param sparkSession
+ * spark session
+ */
+case class CarbonPreAggregateQueryRules(sparkSession: SparkSession) 
extends Rule[LogicalPlan] {
+
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+var needAnalysis = true
+plan.transformExpressions {
+  // first check if any preAgg scala function is applied it is present 
is in plan
+  // then call is from create preaggregate table class so no need to 
transform the query plan
+  case al@Alias(_, name) if name.equals("preAgg") =>
+needAnalysis = false
+al
+  // in case of query if any unresolve alias is present then wait for 
plan to be resolved
+  // return the same plan as we can tranform the plan only when 
everything is resolved
+  case unresolveAlias@UnresolvedAlias(_, _) =>
+needAnalysis = false
+unresolveAlias
+}
+// if plan is not valid for transformation then return same plan
+if (!needAnalysis) {
+  plan
+} else {
+  // create buffer to collect all the column and its metadata 
information
+  val list = scala.collection.mutable.ListBuffer.empty[QueryColumn]
+  var isValidPlan = true
+  val carbonTable = plan match {
+// matching the plan based on supported plan
+// if plan is matches with any case it will validate and get all
+// info

[GitHub] carbondata pull request #1464: [WIP][CARBONDATA-1523]Pre Aggregate table sel...

2017-11-05 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1464#discussion_r149005119
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateQueryRules.scala
 ---
@@ -0,0 +1,756 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.{AnalysisException, 
CarbonDatasourceHadoopRelation, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAlias
+import org.apache.spark.sql.catalyst.expressions.{Alias, 
AttributeReference, Cast, Divide,
+Expression, NamedExpression}
+import org.apache.spark.sql.catalyst.expressions.aggregate._
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Filter, 
LogicalPlan, SubqueryAlias}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.types._
+
+import org.apache.carbondata.core.metadata.schema.table.{CarbonTable, 
DataMapSchema}
+import org.apache.carbondata.core.preagg.{AggregateTableSelector, 
QueryColumn, QueryPlan}
+import org.apache.carbondata.spark.util.CarbonScalaUtil
+
+/**
+ * Class for applying Pre Aggregate rules
+ * Responsibility.
+ * 1. Check plan is valid plan for updating the parent table plan with 
child table
+ * 2. Updated the plan based on child schema
+ *
+ * Rules for Upadating the plan
+ * 1. Grouping expression rules
+ *1.1 Change the parent attribute reference for of group expression
+ * to child attribute reference
+ *
+ * 2. Aggregate expression rules
+ *2.1 Change the parent attribute reference for of group expression to
+ * child attribute reference
+ *2.2 Change the count AggregateExpression to Sum as count
+ * is already calculated so in case of aggregate table
+ * we need to apply sum to get the count
+ *2.2 In case of average aggregate function select 2 columns from 
aggregate table with
+ * aggregation
+ * sum and count. Then add divide(sum(column with sum), sum(column with 
count)).
+ * Note: During aggregate table creation for average table will be created 
with two columns
+ * one for sum(column) and count(column) to support rollup
+ *
+ * 3. Filter Expression rules.
+ *3.1 Updated filter expression attributes with child table attributes
+ * 4. Update the Parent Logical relation with child Logical relation
+ *
+ * @param sparkSession
+ * spark session
+ */
+case class CarbonPreAggregateQueryRules(sparkSession: SparkSession) 
extends Rule[LogicalPlan] {
+
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+var needAnalysis = true
+plan.transformExpressions {
+  // first check if any preAgg scala function is applied it is present 
is in plan
+  // then call is from create preaggregate table class so no need to 
transform the query plan
+  case al@Alias(_, name) if name.equals("preAgg") =>
+needAnalysis = false
+al
+  // in case of query if any unresolve alias is present then wait for 
plan to be resolved
+  // return the same plan as we can tranform the plan only when 
everything is resolved
+  case unresolveAlias@UnresolvedAlias(_, _) =>
+needAnalysis = false
+unresolveAlias
+}
+// if plan is not valid for transformation then return same plan
+if (!needAnalysis) {
+  plan
+} else {
+  // create buffer to collect all the column and its metadata 
information
+  val list = scala.collection.mutable.ListBuffer.empty[QueryColumn]
+  var isValidPlan = true
+  val carbonTable = plan match {
+// matching the plan based on supported plan
+// if plan is matches with any case it will validate and get all
+// info

[GitHub] carbondata pull request #1468: [WIP] Spark-2.2 Carbon Integration - Phase 1

2017-11-05 Thread sounakr
GitHub user sounakr opened a pull request:

https://github.com/apache/carbondata/pull/1468

[WIP] Spark-2.2 Carbon Integration - Phase 1

Spark-2.2 Carbon Integration.
Phase 1 - Compilation ready for Spark-2.2.
Phase 2 - Merge the changes of Spark-2.2 and Spark-2.1 to Spark-2 folder.  

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sounakr/incubator-carbondata 
spark-2.2-integration

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1468.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1468


commit 5936e7fb25c9d5125a5863aa7648cffe5a1d8f7c
Author: QiangCai 
Date:   2017-10-27T08:06:06Z

[CARBONDATA-1628][Streaming] Re-factory LoadTableCommand to reuse code for 
streaming ingest in the future

Re-factory LoadTableCommand to reuse code for streaming ingest in the future

This closes #1439

commit 8e6b0a7ce9ec9b5c2e9dc8e288f148d567224e28
Author: chenliang613 
Date:   2017-10-29T19:30:56Z

[CARBONDATA-1599] Update pull request checklist

Remove title and description items from pull request checklist, add them to 
How to contribute document

This closes #1451

commit 9955bed24447034f04f291dbdb2e1446e51ad8f1
Author: Zhang Zhichao <441586...@qq.com>
Date:   2017-10-31T06:09:30Z

[CARBONDATA-1659] Remove spark 1.x info

This closes #1456

commit b49160935a7c3c8fe1899e3e1c49ba7022cb0938
Author: kumarvishal 
Date:   2017-10-30T15:22:19Z

[CARBONDATA-1658] Fixed Thread leak issue in no sort

Problem: In case of no sort executor service is not shutting down in writer 
step which is causing thread leak. In case of long run it will throwing OOM 
error
Solution:: Need to shutdown executor service in all the case success and 
failure

This closes #1454

commit 0586146a8bd953db63e1d99608ba8a77a9f5a899
Author: ravipesala 
Date:   2017-10-25T05:43:22Z

[CARBONDATA-1617] Merging carbonindex files within segment

Merge the carbonindex files after data load, so that we can reduce the IO 
calls to namenode and improves the read performance for first query

This closes #1436

commit e9454499dcb89ed69b1e18f79b3003ea8e5d8d25
Author: wyp 
Date:   2017-10-30T04:49:53Z

[CARBONDATA-1593] Add partition to table cause NoSuchTableException

AlterTableSplitCarbonPartition's processSchema method doesn't provide db 
info to sparkSession.catalog.refreshTable, this will cause NoSuchTableException 
when we add partitions to carbondata table.

This closes #1452

commit f209e8ee315a272f1f60a7a037d6c15fc08b6add
Author: Jacky Li 
Date:   2017-10-31T18:48:47Z

[CARBONDATA-1594] Add precision and scale to DecimalType

Refactor on DecimalType to include precision and scale parameter.
Precision and scale parameter is required for Decimal data type. In earlier 
code, they are stored in following classes:

ColumnSpec
ColumnPageEncoderMeta
PrimitivePageStatsCollector
ColumnSchema
Since now we have changed DataType from enum to class, precision and scale 
should be stored in DecimalType object only. The PR does this change.

No new test case is added in this PR since no functionality change.

This closes #1417

commit f812e41c5333ad54449d3b514becd0fddb9c5024
Author: lishuming 
Date:   2017-11-02T15:47:05Z

[Tests] Fix BTreeBlockFinderTest variable mistake

There are some obvious spelling mistake leading to some variable unused, 
and it may change the unit test's meaning

This closes #1462

commit 6f6897191819994da2066584721c462f03184cc6
Author: Jacky Li 
Date:   2017-11-04T10:53:46Z

[CARBONDATA-1667] Remove direct load related code

This closes #1465

commit d1cf47753a765e3f46c3a32362a844329cc1173c
Author: sounakr 
Date:   2017-11-06T07:21:17Z

Spark-2.2 Carbon Integration




---


[GitHub] carbondata issue #1468: [WIP] Spark-2.2 Carbon Integration - Phase 1

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1468
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1456/



---


[GitHub] carbondata issue #1455: [CARBONDATA-1624]Set the default value of 'carbon.nu...

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1455
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1455/



---


[GitHub] carbondata issue #1468: [WIP] Spark-2.2 Carbon Integration - Phase 1

2017-11-05 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1468
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/823/



---


[GitHub] carbondata issue #1468: [WIP] Spark-2.2 Carbon Integration - Phase 1

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1468
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1457/



---


[GitHub] carbondata pull request #1468: [WIP] Spark-2.2 Carbon Integration - Phase 1

2017-11-05 Thread sounakr
Github user sounakr closed the pull request at:

https://github.com/apache/carbondata/pull/1468


---


[GitHub] carbondata pull request #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1

2017-11-05 Thread sounakr
GitHub user sounakr opened a pull request:

https://github.com/apache/carbondata/pull/1469

[WIP] Spark-2.2 Carbon Integration - Phase 1

Spark-2.2 Carbon Integration.
Phase 1 - Compilation ready for Spark-2.2.
Phase 2 - Merge the changes of Spark-2.2 and Spark-2.1 to Spark-2 folder.

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sounakr/incubator-carbondata Carbon-Spark-2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1469.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1469


commit d98e9a10bf2bf7e5164ba154230af40de2c1e796
Author: sounakr 
Date:   2017-11-06T07:21:17Z

Spark-2.2 Carbon Integration




---


[GitHub] carbondata issue #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1469
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1458/



---


[GitHub] carbondata issue #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1

2017-11-05 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1469
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/824/



---


[GitHub] carbondata issue #1400: [CARBONDATA-1537] Added back Adaptive delta encoding...

2017-11-05 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1400
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/825/



---


[GitHub] carbondata issue #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1

2017-11-05 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1469
  
@sounakr I did not except a new module and with 21 KLOC code. I hope all 
should be under same package


---