[GitHub] carbondata issue #2302: [CARBONDATA-2475] Support Modular Core for Materiali...

2018-05-12 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2302
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5853/



---


[GitHub] carbondata issue #2302: [CARBONDATA-2475] Support Modular Core for Materiali...

2018-05-12 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2302
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/4897/



---


[GitHub] carbondata pull request #2302: [CARBONDATA-2475] Support Modular Core for Ma...

2018-05-12 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2302#discussion_r187789644
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java 
---
@@ -208,7 +208,7 @@ public DataMapCatalog getDataMapCatalog(DataMapProvider 
dataMapProvider, String
   }
 
   /**
-   * Initialize by reading all datamaps from store and re register it
+   * Intialize by reading all datamaps from store and re register it
--- End diff --

ok


---


[GitHub] carbondata pull request #2302: [CARBONDATA-2475] Support Modular Core for Ma...

2018-05-12 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2302#discussion_r187789632
  
--- Diff: 
datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala
 ---
@@ -0,0 +1,676 @@
+package org.apache.carbondata.mv.rewrite
--- End diff --

ok


---


[GitHub] carbondata pull request #2302: [CARBONDATA-2475] Support Modular Core for Ma...

2018-05-12 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2302#discussion_r187789637
  
--- Diff: 
datamap/mv/plan/src/test/scala/org/apache/carbondata/mv/plans/Tpcds_1_4_BenchmarkSuite.scala
 ---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.mv.plans
+
+import scala.util.{Failure, Success, Try}
+
+import org.apache.spark.sql.SparkSession
+import org.scalatest.BeforeAndAfter
+
+import org.apache.carbondata.mv.dsl._
+import org.apache.carbondata.mv.testutil.ModularPlanTest
+
+// scalastyle:off println
+class Tpcds_1_4_BenchmarkSuite extends ModularPlanTest with BeforeAndAfter 
{
--- End diff --

ok


---


[GitHub] carbondata pull request #2302: [CARBONDATA-2475] Support Modular Core for Ma...

2018-05-12 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2302#discussion_r187789640
  
--- Diff: 
integration/spark2/src/main/spark2.2/org/apache/spark/sql/hive/CarbonAnalyzer.scala
 ---
@@ -20,15 +20,33 @@ import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.catalyst.analysis.Analyzer
 import org.apache.spark.sql.catalyst.catalog.SessionCatalog
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.CarbonReflectionUtils
 
 class CarbonAnalyzer(catalog: SessionCatalog,
 conf: SQLConf,
 sparkSession: SparkSession,
 analyzer: Analyzer) extends Analyzer(catalog, conf) {
+
+  val mvPlan = try {
+CarbonReflectionUtils.createObject(
+  "org.apache.carbondata.mv.datamap.MVAnalyzerRule",
+  sparkSession)._1.asInstanceOf[Rule[LogicalPlan]]
+  } catch {
+case e: Exception =>
+  null
+  }
+
   override def execute(plan: LogicalPlan): LogicalPlan = {
 var logicalPlan = analyzer.execute(plan)
 logicalPlan = 
CarbonPreAggregateDataLoadingRules(sparkSession).apply(logicalPlan)
-CarbonPreAggregateQueryRules(sparkSession).apply(logicalPlan)
+logicalPlan = 
CarbonPreAggregateQueryRules(sparkSession).apply(logicalPlan)
+// TODO Get the analyzer rules from registered datamap class and apply 
here.
--- End diff --

ok


---


[GitHub] carbondata pull request #2302: [CARBONDATA-2475] Support Modular Core for Ma...

2018-05-12 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2302#discussion_r187789629
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/status/DataMapStatusManager.java
 ---
@@ -53,6 +53,22 @@ private DataMapStatusManager() {
 return storageProvider.getDataMapStatusDetails();
   }
 
+  /**
+   * Get enabled datamap status details
+   * @return
+   * @throws IOException
+   */
+  public static DataMapStatusDetail[] getEnabledDataMapStatusDetails() 
throws IOException {
--- End diff --

It is used in `SummaryDatasetCatalog`


---


[GitHub] carbondata pull request #2302: [CARBONDATA-2475] Support Modular Core for Ma...

2018-05-12 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2302#discussion_r187789635
  
--- Diff: 
datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/Tpcds_1_4_Suite.scala
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.mv.rewrite
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.test.util.PlanTest
+import org.scalatest.BeforeAndAfter
+
+import org.apache.carbondata.mv.testutil.Tpcds_1_4_Tables._
+
+class Tpcds_1_4_Suite extends PlanTest with BeforeAndAfter {
--- End diff --

ok


---


[GitHub] carbondata issue #2302: [CARBONDATA-2475] Support Modular Core for Materiali...

2018-05-12 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2302
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4696/



---


[GitHub] carbondata issue #2302: [CARBONDATA-2475] Support Modular Core for Materiali...

2018-05-12 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2302
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5852/



---


[GitHub] carbondata issue #2302: [CARBONDATA-2475] Support Modular Core for Materiali...

2018-05-12 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2302
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/4896/



---


[GitHub] carbondata pull request #2292: [CARBONDATA-2467] sdk writer log shouldnot pr...

2018-05-12 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2292#discussion_r187779656
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java 
---
@@ -413,8 +413,8 @@ private CarbonTable buildCarbonTable() {
   tableName = "_tempTable";
   dbName = "_tempDB";
 } else {
-  dbName = null;
-  tableName = null;
+  dbName = "";
+  tableName = String.valueOf(UUID);
--- End diff --

I think it is better to put `"_tempTable_" + String.valueOf(UUID)`


---


[GitHub] carbondata pull request #2302: [CARBONDATA-2475] Support Modular Core for Ma...

2018-05-12 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2302#discussion_r187779478
  
--- Diff: 
integration/spark2/src/main/spark2.2/org/apache/spark/sql/hive/CarbonAnalyzer.scala
 ---
@@ -20,15 +20,33 @@ import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.catalyst.analysis.Analyzer
 import org.apache.spark.sql.catalyst.catalog.SessionCatalog
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.CarbonReflectionUtils
 
 class CarbonAnalyzer(catalog: SessionCatalog,
 conf: SQLConf,
 sparkSession: SparkSession,
 analyzer: Analyzer) extends Analyzer(catalog, conf) {
+
+  val mvPlan = try {
+CarbonReflectionUtils.createObject(
+  "org.apache.carbondata.mv.datamap.MVAnalyzerRule",
+  sparkSession)._1.asInstanceOf[Rule[LogicalPlan]]
+  } catch {
+case e: Exception =>
+  null
+  }
+
   override def execute(plan: LogicalPlan): LogicalPlan = {
 var logicalPlan = analyzer.execute(plan)
 logicalPlan = 
CarbonPreAggregateDataLoadingRules(sparkSession).apply(logicalPlan)
-CarbonPreAggregateQueryRules(sparkSession).apply(logicalPlan)
+logicalPlan = 
CarbonPreAggregateQueryRules(sparkSession).apply(logicalPlan)
+// TODO Get the analyzer rules from registered datamap class and apply 
here.
--- End diff --

This TODO can be removed?


---


[GitHub] carbondata pull request #2302: [CARBONDATA-2475] Support Modular Core for Ma...

2018-05-12 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2302#discussion_r187779448
  
--- Diff: 
datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/Tpcds_1_4_Suite.scala
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.mv.rewrite
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.test.util.PlanTest
+import org.scalatest.BeforeAndAfter
+
+import org.apache.carbondata.mv.testutil.Tpcds_1_4_Tables._
+
+class Tpcds_1_4_Suite extends PlanTest with BeforeAndAfter {
--- End diff --

Please remove this file as it is all commented out


---


[GitHub] carbondata pull request #2302: [CARBONDATA-2475] Support Modular Core for Ma...

2018-05-12 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2302#discussion_r187779458
  
--- Diff: 
datamap/mv/plan/src/test/scala/org/apache/carbondata/mv/plans/Tpcds_1_4_BenchmarkSuite.scala
 ---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.mv.plans
+
+import scala.util.{Failure, Success, Try}
+
+import org.apache.spark.sql.SparkSession
+import org.scalatest.BeforeAndAfter
+
+import org.apache.carbondata.mv.dsl._
+import org.apache.carbondata.mv.testutil.ModularPlanTest
+
+// scalastyle:off println
+class Tpcds_1_4_BenchmarkSuite extends ModularPlanTest with BeforeAndAfter 
{
--- End diff --

Please remove this file


---


[GitHub] carbondata pull request #2302: [CARBONDATA-2475] Support Modular Core for Ma...

2018-05-12 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2302#discussion_r187779425
  
--- Diff: 
datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala
 ---
@@ -0,0 +1,676 @@
+package org.apache.carbondata.mv.rewrite
--- End diff --

please add license.
license is missing in all testcase files


---


[GitHub] carbondata pull request #2302: [CARBONDATA-2475] Support Modular Core for Ma...

2018-05-12 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2302#discussion_r187779404
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/status/DataMapStatusManager.java
 ---
@@ -53,6 +53,22 @@ private DataMapStatusManager() {
 return storageProvider.getDataMapStatusDetails();
   }
 
+  /**
+   * Get enabled datamap status details
+   * @return
+   * @throws IOException
+   */
+  public static DataMapStatusDetail[] getEnabledDataMapStatusDetails() 
throws IOException {
--- End diff --

I think this is unused function, no need to add


---


[GitHub] carbondata pull request #2302: [CARBONDATA-2475] Support Modular Core for Ma...

2018-05-12 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2302#discussion_r187779386
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java 
---
@@ -208,7 +208,7 @@ public DataMapCatalog getDataMapCatalog(DataMapProvider 
dataMapProvider, String
   }
 
   /**
-   * Initialize by reading all datamaps from store and re register it
+   * Intialize by reading all datamaps from store and re register it
--- End diff --

typo


---


[GitHub] carbondata pull request #2300: [CARBONDATA-2459][DataMap] Add cache for bloo...

2018-05-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2300


---


[jira] [Resolved] (CARBONDATA-2459) Support cache for bloom datamap

2018-05-12 Thread Jacky Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-2459.
--
   Resolution: Fixed
Fix Version/s: 1.4.0

> Support cache for bloom datamap
> ---
>
> Key: CARBONDATA-2459
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2459
> Project: CarbonData
>  Issue Type: Improvement
>  Components: data-query
>Reporter: xuchuanyin
>Assignee: xuchuanyin
>Priority: Major
> Fix For: 1.4.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Currently query using bloom filter datamap is slow. The root cause is that 
> loading bloom filter index costs too much time. We can implement a driver 
> side cache to accelerate the loading index procedure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata pull request #2302: [CARBONDATA-2475] Support Modular Core for Ma...

2018-05-12 Thread ravipesala
GitHub user ravipesala opened a pull request:

https://github.com/apache/carbondata/pull/2302

[CARBONDATA-2475] Support Modular Core for Materialized View DataMap for 
query matching and rewriting

Currently carbon supports preaggregate datamap, which only supports 
preaggregate on single table. To improve it, we can add join capability by 
implementing Materialized View.

In carbon Materialized View, Modular Core provides the plan matcher and 
rewrites the query with MV query. 

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ravipesala/incubator-carbondata mv-core

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2302.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2302


commit fb74f53b4b585a4b92cfcd57c4d7a8da745c870f
Author: ravipesala 
Date:   2018-05-12T17:19:19Z

 Support Modular Core for Materialized View DataMap

commit de8ee65f2fc21159aabb636afbe68495adf96df6
Author: ravipesala 
Date:   2018-05-12T05:11:01Z

Integrate MV DataMap to Carbon




---


[GitHub] carbondata issue #2300: [CARBONDATA-2459][DataMap] Add cache for bloom filte...

2018-05-12 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2300
  
LGTM


---


[jira] [Resolved] (CARBONDATA-2474) Support Modular Plan

2018-05-12 Thread Ravindra Pesala (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala resolved CARBONDATA-2474.
-
Resolution: Fixed

> Support Modular Plan
> 
>
> Key: CARBONDATA-2474
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2474
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Jacky Li
>Priority: Major
> Fix For: 1.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Modular Plan is the basic structure for query plan in Materialized View. 
> Carbon should support converting Spark Logical Plan to Modular Plan



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-2474) Support Modular Plan

2018-05-12 Thread Ravindra Pesala (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala reassigned CARBONDATA-2474:
---

Assignee: Jacky Li

> Support Modular Plan
> 
>
> Key: CARBONDATA-2474
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2474
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Jacky Li
>Assignee: Jacky Li
>Priority: Major
> Fix For: 1.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Modular Plan is the basic structure for query plan in Materialized View. 
> Carbon should support converting Spark Logical Plan to Modular Plan



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2300: [CARBONDATA-2459][DataMap] Add cache for bloom filte...

2018-05-12 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2300
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5851/



---


[GitHub] carbondata issue #2300: [CARBONDATA-2459][DataMap] Add cache for bloom filte...

2018-05-12 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2300
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4695/



---


[GitHub] carbondata pull request #2301: [CARBONDATA-2474] Support Modular Plan for Ma...

2018-05-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2301


---


[GitHub] carbondata issue #2301: [CARBONDATA-2474] Support Modular Plan for Materiali...

2018-05-12 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2301
  
LGTM


---


[GitHub] carbondata issue #2300: [CARBONDATA-2459][DataMap] Add cache for bloom filte...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2300
  
retest this please


---


[GitHub] carbondata issue #2269: [CARBONDATA-2433][LUCENE]close the lucene index read...

2018-05-12 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2269
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/4895/



---


[GitHub] carbondata issue #2269: [CARBONDATA-2433][LUCENE]close the lucene index read...

2018-05-12 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2269
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4694/



---


[GitHub] carbondata issue #2269: [CARBONDATA-2433][LUCENE]close the lucene index read...

2018-05-12 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2269
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5850/



---


[GitHub] carbondata issue #2300: [CARBONDATA-2459][DataMap] Add cache for bloom filte...

2018-05-12 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2300
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/4894/



---


[GitHub] carbondata issue #2301: [CARBONDATA-2474][WIP] Support Modular Plan

2018-05-12 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2301
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5848/



---


[GitHub] carbondata issue #2301: [CARBONDATA-2474][WIP] Support Modular Plan

2018-05-12 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2301
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4692/



---


[GitHub] carbondata issue #2301: [CARBONDATA-2474][WIP] Support Modular Plan

2018-05-12 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2301
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/4893/



---


[GitHub] carbondata issue #2300: [CARBONDATA-2459][DataMap] Add cache for bloom filte...

2018-05-12 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2300
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4693/



---


[GitHub] carbondata pull request #2275: [WIP] Fix lucene datasize and performance

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2275#discussion_r187769082
  
--- Diff: 
datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneDataMapRefresher.java
 ---
@@ -114,108 +118,36 @@ public void initialize() throws IOException {
 indexWriter = new IndexWriter(indexDir, new 
IndexWriterConfig(analyzer));
   }
 
-  private IndexWriter createPageIndexWriter() throws IOException {
-// save index data into ram, write into disk after one page finished
-RAMDirectory ramDir = new RAMDirectory();
-return new IndexWriter(ramDir, new IndexWriterConfig(analyzer));
-  }
-
-  private void addPageIndex(IndexWriter pageIndexWriter) throws 
IOException {
-
-Directory directory = pageIndexWriter.getDirectory();
-
-// close ram writer
-pageIndexWriter.close();
-
-// add ram index data into disk
-indexWriter.addIndexes(directory);
-
-// delete this ram data
-directory.close();
-  }
-
-  @Override
-  public void addRow(int blockletId, int pageId, int rowId, Object[] 
values) throws IOException {
-if (rowId == 0) {
-  if (pageIndexWriter != null) {
-addPageIndex(pageIndexWriter);
-  }
-  pageIndexWriter = createPageIndexWriter();
-}
-
-// create a new document
-Document doc = new Document();
-
-// add blocklet Id
-doc.add(new IntPoint(LuceneDataMapWriter.BLOCKLETID_NAME, (int) 
values[columnsCount]));
-doc.add(new StoredField(LuceneDataMapWriter.BLOCKLETID_NAME, (int) 
values[columnsCount]));
-
-// add page id
-doc.add(new IntPoint(LuceneDataMapWriter.PAGEID_NAME, (int) 
values[columnsCount + 1]));
-doc.add(new StoredField(LuceneDataMapWriter.PAGEID_NAME, (int) 
values[columnsCount + 1]));
-
-// add row id
-doc.add(new IntPoint(LuceneDataMapWriter.ROWID_NAME, rowId));
-doc.add(new StoredField(LuceneDataMapWriter.ROWID_NAME, rowId));
+  @Override public void addRow(int blockletId, int pageId, int rowId, 
Object[] values)
+  throws IOException {
 
 // add other fields
+LuceneDataMapWriter.LuceneColumnKeys columns =
+new LuceneDataMapWriter.LuceneColumnKeys(columnsCount);
 for (int colIdx = 0; colIdx < columnsCount; colIdx++) {
-  CarbonColumn column = indexColumns.get(colIdx);
-  addField(doc, column.getColName(), column.getDataType(), 
values[colIdx]);
+  columns.getColValues()[colIdx] = values[colIdx];
+}
+if (writeCacheSize > 0) {
+  addToCache(columns, rowId, pageId, blockletId, cache, intBuffer, 
storeBlockletWise);
+  flushCacheIfCan();
+} else {
+  addData(columns, rowId, pageId, blockletId, intBuffer, indexWriter, 
indexColumns,
+  storeBlockletWise);
 }
 
-pageIndexWriter.addDocument(doc);
   }
 
-  private boolean addField(Document doc, String fieldName, DataType type, 
Object value) {
-if (type == DataTypes.STRING) {
-  doc.add(new TextField(fieldName, (String) value, Field.Store.NO));
-} else if (type == DataTypes.BYTE) {
-  // byte type , use int range to deal with byte, lucene has no byte 
type
-  IntRangeField field =
-  new IntRangeField(fieldName, new int[] { Byte.MIN_VALUE }, new 
int[] { Byte.MAX_VALUE });
-  field.setIntValue((int) value);
-  doc.add(field);
-} else if (type == DataTypes.SHORT) {
-  // short type , use int range to deal with short type, lucene has no 
short type
-  IntRangeField field = new IntRangeField(fieldName, new int[] { 
Short.MIN_VALUE },
-  new int[] { Short.MAX_VALUE });
-  field.setShortValue((short) value);
-  doc.add(field);
-} else if (type == DataTypes.INT) {
-  // int type , use int point to deal with int type
-  doc.add(new IntPoint(fieldName, (int) value));
-} else if (type == DataTypes.LONG) {
-  // long type , use long point to deal with long type
-  doc.add(new LongPoint(fieldName, (long) value));
-} else if (type == DataTypes.FLOAT) {
-  doc.add(new FloatPoint(fieldName, (float) value));
-} else if (type == DataTypes.DOUBLE) {
-  doc.add(new DoublePoint(fieldName, (double) value));
-} else if (type == DataTypes.DATE) {
-  // TODO: how to get data value
-} else if (type == DataTypes.TIMESTAMP) {
-  // TODO: how to get
-} else if (type == DataTypes.BOOLEAN) {
-  IntRangeField field = new IntRangeField(fieldName, new int[] { 0 }, 
new int[] { 1 });
-  field.setIntValue((boolean) value ? 1 : 0);
-  doc.add(field);
-} else {
-  LOGGER.error("unsu

[GitHub] carbondata pull request #2275: [WIP] Fix lucene datasize and performance

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2275#discussion_r187769148
  
--- Diff: 
datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneDataMapWriter.java
 ---
@@ -175,52 +205,38 @@ public void onBlockletEnd(int blockletId) throws 
IOException {
*/
   public void onPageAdded(int blockletId, int pageId, int pageSize, 
ColumnPage[] pages)
   throws IOException {
+// save index data into ram, write into disk after one page finished
+int columnsCount = pages.length;
+if (columnsCount <= 0) {
+  LOGGER.warn("empty data");
--- End diff --

Log is too simple for user to find problems in real env.


---


[GitHub] carbondata pull request #2275: [WIP] Fix lucene datasize and performance

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2275#discussion_r187769052
  
--- Diff: 
datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneDataMapRefresher.java
 ---
@@ -66,20 +62,28 @@
 
   private IndexWriter indexWriter = null;
 
-  private IndexWriter pageIndexWriter = null;
-
   private Analyzer analyzer = null;
 
-  LuceneDataMapRefresher(String tablePath, String dataMapName,
-  Segment segment, String shardName, List indexColumns) {
-this.dataMapPath = CarbonTablePath.getDataMapStorePathOnShardName(
-tablePath, segment.getSegmentNo(), dataMapName, shardName);
+  private int writeCacheSize;
+
+  private Map> cache =
+  new HashMap<>();
+
+  private ByteBuffer intBuffer = ByteBuffer.allocate(4);
+
+  private boolean storeBlockletWise;
+
+  LuceneDataMapRefresher(String tablePath, String dataMapName, Segment 
segment, String shardName,
+  List indexColumns, int writeCacheSize, boolean 
storeBlockletWise) {
+this.dataMapPath = CarbonTablePath
+.getDataMapStorePathOnShardName(tablePath, segment.getSegmentNo(), 
dataMapName, shardName);
 this.indexColumns = indexColumns;
 this.columnsCount = indexColumns.size();
+this.writeCacheSize = writeCacheSize;
+this.storeBlockletWise = storeBlockletWise;
   }
 
-  @Override
-  public void initialize() throws IOException {
+  @Override public void initialize() throws IOException {
--- End diff --

Move override to the previous line


---


[GitHub] carbondata pull request #2275: [WIP] Fix lucene datasize and performance

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2275#discussion_r187769190
  
--- Diff: 
datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneDataMapWriter.java
 ---
@@ -242,74 +258,216 @@ private boolean addField(Document doc, Object data, 
CarbonColumn column, Field.S
   if (store == Field.Store.YES) {
 doc.add(new StoredField(fieldName, (int) value));
   }
-} else if (type == DataTypes.INT) {
+} else if (key instanceof Integer) {
   // int type , use int point to deal with int type
-  int value = (int) data;
-  doc.add(new IntPoint(fieldName, value));
+  int value = (Integer) key;
+  doc.add(new IntPoint(fieldName, new int[] { value }));
 
   // if need store it , add StoredField
   if (store == Field.Store.YES) {
 doc.add(new StoredField(fieldName, value));
   }
-} else if (type == DataTypes.LONG) {
+} else if (key instanceof Long) {
   // long type , use long point to deal with long type
-  long value = (long) data;
-  doc.add(new LongPoint(fieldName, value));
+  long value = (Long) key;
+  doc.add(new LongPoint(fieldName, new long[] { value }));
 
   // if need store it , add StoredField
   if (store == Field.Store.YES) {
 doc.add(new StoredField(fieldName, value));
   }
-} else if (type == DataTypes.FLOAT) {
-  float value = (float) data;
-  doc.add(new FloatPoint(fieldName, value));
+} else if (key instanceof Float) {
+  float value = (Float) key;
+  doc.add(new FloatPoint(fieldName, new float[] { value }));
   if (store == Field.Store.YES) {
 doc.add(new FloatPoint(fieldName, value));
   }
-} else if (type == DataTypes.DOUBLE) {
-  double value = (double) data;
-  doc.add(new DoublePoint(fieldName, value));
+} else if (key instanceof Double) {
+  double value = (Double) key;
+  doc.add(new DoublePoint(fieldName, new double[] { value }));
   if (store == Field.Store.YES) {
 doc.add(new DoublePoint(fieldName, value));
   }
+} else if (key instanceof String) {
+  String strValue = (String) key;
+  doc.add(new TextField(fieldName, strValue, store));
+} else if (key instanceof Boolean) {
+  boolean value = (Boolean) key;
+  IntRangeField field = new IntRangeField(fieldName, new int[] { 0 }, 
new int[] { 1 });
+  field.setIntValue(value ? 1 : 0);
+  doc.add(field);
+  if (store == Field.Store.YES) {
+doc.add(new StoredField(fieldName, value ? 1 : 0));
+  }
+}
+return true;
+  }
+
+  private Object getValue(ColumnPage page, int rowId) {
+
+//get field type
+DataType type = page.getColumnSpec().getSchemaDataType();
+Object value = null;
+if (type == DataTypes.BYTE) {
+  // byte type , use int range to deal with byte, lucene has no byte 
type
+  value = page.getByte(rowId);
+} else if (type == DataTypes.SHORT) {
+  // short type , use int range to deal with short type, lucene has no 
short type
+  value = page.getShort(rowId);
+} else if (type == DataTypes.INT) {
+  // int type , use int point to deal with int type
+  value = page.getInt(rowId);
+} else if (type == DataTypes.LONG) {
+  // long type , use long point to deal with long type
+  value = page.getLong(rowId);
+} else if (type == DataTypes.FLOAT) {
+  value = page.getFloat(rowId);
+} else if (type == DataTypes.DOUBLE) {
+  value = page.getDouble(rowId);
 } else if (type == DataTypes.STRING) {
-  byte[] value = (byte[]) data;
+  byte[] bytes = page.getBytes(rowId);
   // TODO: how to get string value
-  String strValue = null;
   try {
-strValue = new String(value, 2, value.length - 2, "UTF-8");
+value = new String(bytes, 2, bytes.length - 2, "UTF-8");
   } catch (UnsupportedEncodingException e) {
 throw new RuntimeException(e);
   }
-  doc.add(new TextField(fieldName, strValue, store));
 } else if (type == DataTypes.DATE) {
   throw new RuntimeException("unsupported data type " + type);
 } else if (type == DataTypes.TIMESTAMP) {
   throw new RuntimeException("unsupported data type " + type);
 } else if (type == DataTypes.BOOLEAN) {
-  boolean value = (boolean) data;
-  IntRangeField field = new IntRangeField(fieldName, new int[] { 0 }, 
new int[] { 1 });
-  field.setIntValue(value ? 1 : 0);
-  doc.add(field);
-  if (store == Field.Store.YE

[GitHub] carbondata pull request #2275: [WIP] Fix lucene datasize and performance

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2275#discussion_r187769080
  
--- Diff: 
datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneDataMapRefresher.java
 ---
@@ -114,108 +118,36 @@ public void initialize() throws IOException {
 indexWriter = new IndexWriter(indexDir, new 
IndexWriterConfig(analyzer));
   }
 
-  private IndexWriter createPageIndexWriter() throws IOException {
-// save index data into ram, write into disk after one page finished
-RAMDirectory ramDir = new RAMDirectory();
-return new IndexWriter(ramDir, new IndexWriterConfig(analyzer));
-  }
-
-  private void addPageIndex(IndexWriter pageIndexWriter) throws 
IOException {
-
-Directory directory = pageIndexWriter.getDirectory();
-
-// close ram writer
-pageIndexWriter.close();
-
-// add ram index data into disk
-indexWriter.addIndexes(directory);
-
-// delete this ram data
-directory.close();
-  }
-
-  @Override
-  public void addRow(int blockletId, int pageId, int rowId, Object[] 
values) throws IOException {
-if (rowId == 0) {
-  if (pageIndexWriter != null) {
-addPageIndex(pageIndexWriter);
-  }
-  pageIndexWriter = createPageIndexWriter();
-}
-
-// create a new document
-Document doc = new Document();
-
-// add blocklet Id
-doc.add(new IntPoint(LuceneDataMapWriter.BLOCKLETID_NAME, (int) 
values[columnsCount]));
-doc.add(new StoredField(LuceneDataMapWriter.BLOCKLETID_NAME, (int) 
values[columnsCount]));
-
-// add page id
-doc.add(new IntPoint(LuceneDataMapWriter.PAGEID_NAME, (int) 
values[columnsCount + 1]));
-doc.add(new StoredField(LuceneDataMapWriter.PAGEID_NAME, (int) 
values[columnsCount + 1]));
-
-// add row id
-doc.add(new IntPoint(LuceneDataMapWriter.ROWID_NAME, rowId));
-doc.add(new StoredField(LuceneDataMapWriter.ROWID_NAME, rowId));
+  @Override public void addRow(int blockletId, int pageId, int rowId, 
Object[] values)
+  throws IOException {
 
 // add other fields
+LuceneDataMapWriter.LuceneColumnKeys columns =
+new LuceneDataMapWriter.LuceneColumnKeys(columnsCount);
 for (int colIdx = 0; colIdx < columnsCount; colIdx++) {
-  CarbonColumn column = indexColumns.get(colIdx);
-  addField(doc, column.getColName(), column.getDataType(), 
values[colIdx]);
+  columns.getColValues()[colIdx] = values[colIdx];
+}
+if (writeCacheSize > 0) {
+  addToCache(columns, rowId, pageId, blockletId, cache, intBuffer, 
storeBlockletWise);
+  flushCacheIfCan();
+} else {
+  addData(columns, rowId, pageId, blockletId, intBuffer, indexWriter, 
indexColumns,
+  storeBlockletWise);
 }
 
-pageIndexWriter.addDocument(doc);
   }
 
-  private boolean addField(Document doc, String fieldName, DataType type, 
Object value) {
-if (type == DataTypes.STRING) {
-  doc.add(new TextField(fieldName, (String) value, Field.Store.NO));
-} else if (type == DataTypes.BYTE) {
-  // byte type , use int range to deal with byte, lucene has no byte 
type
-  IntRangeField field =
-  new IntRangeField(fieldName, new int[] { Byte.MIN_VALUE }, new 
int[] { Byte.MAX_VALUE });
-  field.setIntValue((int) value);
-  doc.add(field);
-} else if (type == DataTypes.SHORT) {
-  // short type , use int range to deal with short type, lucene has no 
short type
-  IntRangeField field = new IntRangeField(fieldName, new int[] { 
Short.MIN_VALUE },
-  new int[] { Short.MAX_VALUE });
-  field.setShortValue((short) value);
-  doc.add(field);
-} else if (type == DataTypes.INT) {
-  // int type , use int point to deal with int type
-  doc.add(new IntPoint(fieldName, (int) value));
-} else if (type == DataTypes.LONG) {
-  // long type , use long point to deal with long type
-  doc.add(new LongPoint(fieldName, (long) value));
-} else if (type == DataTypes.FLOAT) {
-  doc.add(new FloatPoint(fieldName, (float) value));
-} else if (type == DataTypes.DOUBLE) {
-  doc.add(new DoublePoint(fieldName, (double) value));
-} else if (type == DataTypes.DATE) {
-  // TODO: how to get data value
-} else if (type == DataTypes.TIMESTAMP) {
-  // TODO: how to get
-} else if (type == DataTypes.BOOLEAN) {
-  IntRangeField field = new IntRangeField(fieldName, new int[] { 0 }, 
new int[] { 1 });
-  field.setIntValue((boolean) value ? 1 : 0);
-  doc.add(field);
-} else {
-  LOGGER.error("unsu

[GitHub] carbondata pull request #2275: [WIP] Fix lucene datasize and performance

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2275#discussion_r187769012
  
--- Diff: 
datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneDataMapFactoryBase.java
 ---
@@ -61,6 +61,9 @@
 @InterfaceAudience.Internal
 abstract class LuceneDataMapFactoryBase extends 
DataMapFactory {
 
+  static final String FLUSH_CACHE = "flush_cache";
--- End diff --

Can you explain the intention of these two properties just like you wrote 
in PR description.


---


[GitHub] carbondata pull request #2275: [WIP] Fix lucene datasize and performance

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2275#discussion_r187769000
  
--- Diff: 
datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneDataMapFactoryBase.java
 ---
@@ -107,13 +118,39 @@ public LuceneDataMapFactoryBase(CarbonTable 
carbonTable, DataMapSchema dataMapSc
 // optimizedOperations.add(ExpressionType.LESSTHAN_EQUALTO);
 // optimizedOperations.add(ExpressionType.NOT);
 optimizedOperations.add(ExpressionType.TEXT_MATCH);
-this.dataMapMeta = new DataMapMeta(indexedColumns, 
optimizedOperations);
-
+this.dataMapMeta = new DataMapMeta(indexedCarbonColumns, 
optimizedOperations);
 // get analyzer
 // TODO: how to get analyzer ?
 analyzer = new StandardAnalyzer();
   }
 
+  public static int validateAndGetWriteCacheSize(DataMapSchema schema) {
+String cacheStr = schema.getProperties().get(FLUSH_CACHE);
+if (cacheStr == null) {
+  cacheStr = "-1";
+}
+int cacheSize;
+try {
+  cacheSize = Integer.parseInt(cacheStr);
+} catch (NumberFormatException e) {
+  cacheSize = -1;
+}
+return cacheSize;
+  }
+
+  public static boolean validateAndGetStoreBlockletWise(DataMapSchema 
schema) {
+String splitBlockletStr = schema.getProperties().get(SPLIT_BLOCKLET);
+if (splitBlockletStr == null) {
+  splitBlockletStr = "false";
--- End diff --

Same as above


---


[GitHub] carbondata pull request #2275: [WIP] Fix lucene datasize and performance

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2275#discussion_r187769085
  
--- Diff: 
datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneDataMapRefresher.java
 ---
@@ -114,108 +118,36 @@ public void initialize() throws IOException {
 indexWriter = new IndexWriter(indexDir, new 
IndexWriterConfig(analyzer));
   }
 
-  private IndexWriter createPageIndexWriter() throws IOException {
-// save index data into ram, write into disk after one page finished
-RAMDirectory ramDir = new RAMDirectory();
-return new IndexWriter(ramDir, new IndexWriterConfig(analyzer));
-  }
-
-  private void addPageIndex(IndexWriter pageIndexWriter) throws 
IOException {
-
-Directory directory = pageIndexWriter.getDirectory();
-
-// close ram writer
-pageIndexWriter.close();
-
-// add ram index data into disk
-indexWriter.addIndexes(directory);
-
-// delete this ram data
-directory.close();
-  }
-
-  @Override
-  public void addRow(int blockletId, int pageId, int rowId, Object[] 
values) throws IOException {
-if (rowId == 0) {
-  if (pageIndexWriter != null) {
-addPageIndex(pageIndexWriter);
-  }
-  pageIndexWriter = createPageIndexWriter();
-}
-
-// create a new document
-Document doc = new Document();
-
-// add blocklet Id
-doc.add(new IntPoint(LuceneDataMapWriter.BLOCKLETID_NAME, (int) 
values[columnsCount]));
-doc.add(new StoredField(LuceneDataMapWriter.BLOCKLETID_NAME, (int) 
values[columnsCount]));
-
-// add page id
-doc.add(new IntPoint(LuceneDataMapWriter.PAGEID_NAME, (int) 
values[columnsCount + 1]));
-doc.add(new StoredField(LuceneDataMapWriter.PAGEID_NAME, (int) 
values[columnsCount + 1]));
-
-// add row id
-doc.add(new IntPoint(LuceneDataMapWriter.ROWID_NAME, rowId));
-doc.add(new StoredField(LuceneDataMapWriter.ROWID_NAME, rowId));
+  @Override public void addRow(int blockletId, int pageId, int rowId, 
Object[] values)
+  throws IOException {
 
 // add other fields
+LuceneDataMapWriter.LuceneColumnKeys columns =
+new LuceneDataMapWriter.LuceneColumnKeys(columnsCount);
 for (int colIdx = 0; colIdx < columnsCount; colIdx++) {
-  CarbonColumn column = indexColumns.get(colIdx);
-  addField(doc, column.getColName(), column.getDataType(), 
values[colIdx]);
+  columns.getColValues()[colIdx] = values[colIdx];
+}
+if (writeCacheSize > 0) {
+  addToCache(columns, rowId, pageId, blockletId, cache, intBuffer, 
storeBlockletWise);
+  flushCacheIfCan();
+} else {
+  addData(columns, rowId, pageId, blockletId, intBuffer, indexWriter, 
indexColumns,
+  storeBlockletWise);
 }
 
-pageIndexWriter.addDocument(doc);
   }
 
-  private boolean addField(Document doc, String fieldName, DataType type, 
Object value) {
-if (type == DataTypes.STRING) {
-  doc.add(new TextField(fieldName, (String) value, Field.Store.NO));
-} else if (type == DataTypes.BYTE) {
-  // byte type , use int range to deal with byte, lucene has no byte 
type
-  IntRangeField field =
-  new IntRangeField(fieldName, new int[] { Byte.MIN_VALUE }, new 
int[] { Byte.MAX_VALUE });
-  field.setIntValue((int) value);
-  doc.add(field);
-} else if (type == DataTypes.SHORT) {
-  // short type , use int range to deal with short type, lucene has no 
short type
-  IntRangeField field = new IntRangeField(fieldName, new int[] { 
Short.MIN_VALUE },
-  new int[] { Short.MAX_VALUE });
-  field.setShortValue((short) value);
-  doc.add(field);
-} else if (type == DataTypes.INT) {
-  // int type , use int point to deal with int type
-  doc.add(new IntPoint(fieldName, (int) value));
-} else if (type == DataTypes.LONG) {
-  // long type , use long point to deal with long type
-  doc.add(new LongPoint(fieldName, (long) value));
-} else if (type == DataTypes.FLOAT) {
-  doc.add(new FloatPoint(fieldName, (float) value));
-} else if (type == DataTypes.DOUBLE) {
-  doc.add(new DoublePoint(fieldName, (double) value));
-} else if (type == DataTypes.DATE) {
-  // TODO: how to get data value
-} else if (type == DataTypes.TIMESTAMP) {
-  // TODO: how to get
-} else if (type == DataTypes.BOOLEAN) {
-  IntRangeField field = new IntRangeField(fieldName, new int[] { 0 }, 
new int[] { 1 });
-  field.setIntValue((boolean) value ? 1 : 0);
-  doc.add(field);
-} else {
-  LOGGER.error("unsu

[GitHub] carbondata pull request #2275: [WIP] Fix lucene datasize and performance

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2275#discussion_r187769065
  
--- Diff: 
datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneDataMapRefresher.java
 ---
@@ -114,108 +118,36 @@ public void initialize() throws IOException {
 indexWriter = new IndexWriter(indexDir, new 
IndexWriterConfig(analyzer));
   }
 
-  private IndexWriter createPageIndexWriter() throws IOException {
-// save index data into ram, write into disk after one page finished
-RAMDirectory ramDir = new RAMDirectory();
-return new IndexWriter(ramDir, new IndexWriterConfig(analyzer));
-  }
-
-  private void addPageIndex(IndexWriter pageIndexWriter) throws 
IOException {
-
-Directory directory = pageIndexWriter.getDirectory();
-
-// close ram writer
-pageIndexWriter.close();
-
-// add ram index data into disk
-indexWriter.addIndexes(directory);
-
-// delete this ram data
-directory.close();
-  }
-
-  @Override
-  public void addRow(int blockletId, int pageId, int rowId, Object[] 
values) throws IOException {
-if (rowId == 0) {
-  if (pageIndexWriter != null) {
-addPageIndex(pageIndexWriter);
-  }
-  pageIndexWriter = createPageIndexWriter();
-}
-
-// create a new document
-Document doc = new Document();
-
-// add blocklet Id
-doc.add(new IntPoint(LuceneDataMapWriter.BLOCKLETID_NAME, (int) 
values[columnsCount]));
-doc.add(new StoredField(LuceneDataMapWriter.BLOCKLETID_NAME, (int) 
values[columnsCount]));
-
-// add page id
-doc.add(new IntPoint(LuceneDataMapWriter.PAGEID_NAME, (int) 
values[columnsCount + 1]));
-doc.add(new StoredField(LuceneDataMapWriter.PAGEID_NAME, (int) 
values[columnsCount + 1]));
-
-// add row id
-doc.add(new IntPoint(LuceneDataMapWriter.ROWID_NAME, rowId));
-doc.add(new StoredField(LuceneDataMapWriter.ROWID_NAME, rowId));
+  @Override public void addRow(int blockletId, int pageId, int rowId, 
Object[] values)
--- End diff --

move override to the previous ine


---


[GitHub] carbondata pull request #2275: [WIP] Fix lucene datasize and performance

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2275#discussion_r187769271
  
--- Diff: 
datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneDataMapFactoryBase.java
 ---
@@ -61,6 +61,9 @@
 @InterfaceAudience.Internal
 abstract class LuceneDataMapFactoryBase extends 
DataMapFactory {
 
+  static final String FLUSH_CACHE = "flush_cache";
--- End diff --

besides, what's the unit of this value? Better to add description for it.


---


[GitHub] carbondata pull request #2275: [WIP] Fix lucene datasize and performance

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2275#discussion_r187768992
  
--- Diff: 
datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneDataMapFactoryBase.java
 ---
@@ -107,13 +118,39 @@ public LuceneDataMapFactoryBase(CarbonTable 
carbonTable, DataMapSchema dataMapSc
 // optimizedOperations.add(ExpressionType.LESSTHAN_EQUALTO);
 // optimizedOperations.add(ExpressionType.NOT);
 optimizedOperations.add(ExpressionType.TEXT_MATCH);
-this.dataMapMeta = new DataMapMeta(indexedColumns, 
optimizedOperations);
-
+this.dataMapMeta = new DataMapMeta(indexedCarbonColumns, 
optimizedOperations);
 // get analyzer
 // TODO: how to get analyzer ?
 analyzer = new StandardAnalyzer();
   }
 
+  public static int validateAndGetWriteCacheSize(DataMapSchema schema) {
+String cacheStr = schema.getProperties().get(FLUSH_CACHE);
+if (cacheStr == null) {
+  cacheStr = "-1";
--- End diff --

Better to provide the property default value right below the property name.


---


[GitHub] carbondata pull request #2275: [WIP] Fix lucene datasize and performance

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2275#discussion_r187769074
  
--- Diff: 
datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneDataMapRefresher.java
 ---
@@ -114,108 +118,36 @@ public void initialize() throws IOException {
 indexWriter = new IndexWriter(indexDir, new 
IndexWriterConfig(analyzer));
   }
 
-  private IndexWriter createPageIndexWriter() throws IOException {
-// save index data into ram, write into disk after one page finished
-RAMDirectory ramDir = new RAMDirectory();
-return new IndexWriter(ramDir, new IndexWriterConfig(analyzer));
-  }
-
-  private void addPageIndex(IndexWriter pageIndexWriter) throws 
IOException {
-
-Directory directory = pageIndexWriter.getDirectory();
-
-// close ram writer
-pageIndexWriter.close();
-
-// add ram index data into disk
-indexWriter.addIndexes(directory);
-
-// delete this ram data
-directory.close();
-  }
-
-  @Override
-  public void addRow(int blockletId, int pageId, int rowId, Object[] 
values) throws IOException {
-if (rowId == 0) {
-  if (pageIndexWriter != null) {
-addPageIndex(pageIndexWriter);
-  }
-  pageIndexWriter = createPageIndexWriter();
-}
-
-// create a new document
-Document doc = new Document();
-
-// add blocklet Id
-doc.add(new IntPoint(LuceneDataMapWriter.BLOCKLETID_NAME, (int) 
values[columnsCount]));
-doc.add(new StoredField(LuceneDataMapWriter.BLOCKLETID_NAME, (int) 
values[columnsCount]));
-
-// add page id
-doc.add(new IntPoint(LuceneDataMapWriter.PAGEID_NAME, (int) 
values[columnsCount + 1]));
-doc.add(new StoredField(LuceneDataMapWriter.PAGEID_NAME, (int) 
values[columnsCount + 1]));
-
-// add row id
-doc.add(new IntPoint(LuceneDataMapWriter.ROWID_NAME, rowId));
-doc.add(new StoredField(LuceneDataMapWriter.ROWID_NAME, rowId));
+  @Override public void addRow(int blockletId, int pageId, int rowId, 
Object[] values)
+  throws IOException {
 
 // add other fields
+LuceneDataMapWriter.LuceneColumnKeys columns =
+new LuceneDataMapWriter.LuceneColumnKeys(columnsCount);
 for (int colIdx = 0; colIdx < columnsCount; colIdx++) {
-  CarbonColumn column = indexColumns.get(colIdx);
-  addField(doc, column.getColName(), column.getDataType(), 
values[colIdx]);
+  columns.getColValues()[colIdx] = values[colIdx];
+}
+if (writeCacheSize > 0) {
+  addToCache(columns, rowId, pageId, blockletId, cache, intBuffer, 
storeBlockletWise);
+  flushCacheIfCan();
+} else {
+  addData(columns, rowId, pageId, blockletId, intBuffer, indexWriter, 
indexColumns,
+  storeBlockletWise);
 }
 
-pageIndexWriter.addDocument(doc);
   }
 
-  private boolean addField(Document doc, String fieldName, DataType type, 
Object value) {
-if (type == DataTypes.STRING) {
-  doc.add(new TextField(fieldName, (String) value, Field.Store.NO));
-} else if (type == DataTypes.BYTE) {
-  // byte type , use int range to deal with byte, lucene has no byte 
type
-  IntRangeField field =
-  new IntRangeField(fieldName, new int[] { Byte.MIN_VALUE }, new 
int[] { Byte.MAX_VALUE });
-  field.setIntValue((int) value);
-  doc.add(field);
-} else if (type == DataTypes.SHORT) {
-  // short type , use int range to deal with short type, lucene has no 
short type
-  IntRangeField field = new IntRangeField(fieldName, new int[] { 
Short.MIN_VALUE },
-  new int[] { Short.MAX_VALUE });
-  field.setShortValue((short) value);
-  doc.add(field);
-} else if (type == DataTypes.INT) {
-  // int type , use int point to deal with int type
-  doc.add(new IntPoint(fieldName, (int) value));
-} else if (type == DataTypes.LONG) {
-  // long type , use long point to deal with long type
-  doc.add(new LongPoint(fieldName, (long) value));
-} else if (type == DataTypes.FLOAT) {
-  doc.add(new FloatPoint(fieldName, (float) value));
-} else if (type == DataTypes.DOUBLE) {
-  doc.add(new DoublePoint(fieldName, (double) value));
-} else if (type == DataTypes.DATE) {
-  // TODO: how to get data value
-} else if (type == DataTypes.TIMESTAMP) {
-  // TODO: how to get
-} else if (type == DataTypes.BOOLEAN) {
-  IntRangeField field = new IntRangeField(fieldName, new int[] { 0 }, 
new int[] { 1 });
-  field.setIntValue((boolean) value ? 1 : 0);
-  doc.add(field);
-} else {
-  LOGGER.error("unsu

[GitHub] carbondata pull request #2275: [WIP] Fix lucene datasize and performance

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2275#discussion_r187769203
  
--- Diff: 
datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneDataMapWriter.java
 ---
@@ -242,74 +258,216 @@ private boolean addField(Document doc, Object data, 
CarbonColumn column, Field.S
   if (store == Field.Store.YES) {
 doc.add(new StoredField(fieldName, (int) value));
   }
-} else if (type == DataTypes.INT) {
+} else if (key instanceof Integer) {
   // int type , use int point to deal with int type
-  int value = (int) data;
-  doc.add(new IntPoint(fieldName, value));
+  int value = (Integer) key;
+  doc.add(new IntPoint(fieldName, new int[] { value }));
 
   // if need store it , add StoredField
   if (store == Field.Store.YES) {
 doc.add(new StoredField(fieldName, value));
   }
-} else if (type == DataTypes.LONG) {
+} else if (key instanceof Long) {
   // long type , use long point to deal with long type
-  long value = (long) data;
-  doc.add(new LongPoint(fieldName, value));
+  long value = (Long) key;
+  doc.add(new LongPoint(fieldName, new long[] { value }));
 
   // if need store it , add StoredField
   if (store == Field.Store.YES) {
 doc.add(new StoredField(fieldName, value));
   }
-} else if (type == DataTypes.FLOAT) {
-  float value = (float) data;
-  doc.add(new FloatPoint(fieldName, value));
+} else if (key instanceof Float) {
+  float value = (Float) key;
+  doc.add(new FloatPoint(fieldName, new float[] { value }));
   if (store == Field.Store.YES) {
 doc.add(new FloatPoint(fieldName, value));
   }
-} else if (type == DataTypes.DOUBLE) {
-  double value = (double) data;
-  doc.add(new DoublePoint(fieldName, value));
+} else if (key instanceof Double) {
+  double value = (Double) key;
+  doc.add(new DoublePoint(fieldName, new double[] { value }));
   if (store == Field.Store.YES) {
 doc.add(new DoublePoint(fieldName, value));
   }
+} else if (key instanceof String) {
+  String strValue = (String) key;
+  doc.add(new TextField(fieldName, strValue, store));
+} else if (key instanceof Boolean) {
+  boolean value = (Boolean) key;
+  IntRangeField field = new IntRangeField(fieldName, new int[] { 0 }, 
new int[] { 1 });
+  field.setIntValue(value ? 1 : 0);
+  doc.add(field);
+  if (store == Field.Store.YES) {
+doc.add(new StoredField(fieldName, value ? 1 : 0));
+  }
+}
+return true;
+  }
+
+  private Object getValue(ColumnPage page, int rowId) {
+
+//get field type
+DataType type = page.getColumnSpec().getSchemaDataType();
+Object value = null;
+if (type == DataTypes.BYTE) {
+  // byte type , use int range to deal with byte, lucene has no byte 
type
+  value = page.getByte(rowId);
+} else if (type == DataTypes.SHORT) {
+  // short type , use int range to deal with short type, lucene has no 
short type
+  value = page.getShort(rowId);
+} else if (type == DataTypes.INT) {
+  // int type , use int point to deal with int type
+  value = page.getInt(rowId);
+} else if (type == DataTypes.LONG) {
+  // long type , use long point to deal with long type
+  value = page.getLong(rowId);
+} else if (type == DataTypes.FLOAT) {
+  value = page.getFloat(rowId);
+} else if (type == DataTypes.DOUBLE) {
+  value = page.getDouble(rowId);
 } else if (type == DataTypes.STRING) {
-  byte[] value = (byte[]) data;
+  byte[] bytes = page.getBytes(rowId);
   // TODO: how to get string value
-  String strValue = null;
   try {
-strValue = new String(value, 2, value.length - 2, "UTF-8");
+value = new String(bytes, 2, bytes.length - 2, "UTF-8");
   } catch (UnsupportedEncodingException e) {
 throw new RuntimeException(e);
   }
-  doc.add(new TextField(fieldName, strValue, store));
 } else if (type == DataTypes.DATE) {
   throw new RuntimeException("unsupported data type " + type);
 } else if (type == DataTypes.TIMESTAMP) {
   throw new RuntimeException("unsupported data type " + type);
 } else if (type == DataTypes.BOOLEAN) {
-  boolean value = (boolean) data;
-  IntRangeField field = new IntRangeField(fieldName, new int[] { 0 }, 
new int[] { 1 });
-  field.setIntValue(value ? 1 : 0);
-  doc.add(field);
-  if (store == Field.Store.YE

[GitHub] carbondata issue #2300: [CARBONDATA-2459][DataMap] Add cache for bloom filte...

2018-05-12 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2300
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5849/



---


[jira] [Created] (CARBONDATA-2475) Support Materialized View query rewrite

2018-05-12 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-2475:


 Summary: Support Materialized View query rewrite
 Key: CARBONDATA-2475
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2475
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Jacky Li
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata pull request #2301: [CARBONDATA-2473][WIP] Support Modular Plan

2018-05-12 Thread jackylk
GitHub user jackylk opened a pull request:

https://github.com/apache/carbondata/pull/2301

[CARBONDATA-2473][WIP] Support Modular Plan

WIP

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jackylk/incubator-carbondata mv-plan

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2301.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2301


commit 0f0b1416559c00b0b61b0eda99f60783b9dd01ec
Author: Jacky Li 
Date:   2018-05-12T09:23:20Z

support Modular Plan




---


[GitHub] carbondata pull request #2287: [CARBONDATA-2418] [Presto] [S3] Fixed Presto ...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2287#discussion_r187768917
  
--- Diff: integration/presto/README.md ---
@@ -82,6 +82,29 @@ Please follow the below steps to query carbondata in 
presto
   For example, if you have a schema named 'default' stored in 
hdfs://namenode:9000/test/carbondata/,
   Then set carbondata-store=hdfs://namenode:9000/test/carbondata
   
+ Connecting to carbondata store on s3
+ * In case you want to query carbonstore on S3 using S3A api put following 
additional properties inside $PRESTO_HOME$/etc/catalog/carbondata.properties 
+   ```
+fs.s3a.access.key={value}
+fs.s3a.secret.key={value}
+Optional: fs.s3a.endpoint={value}
--- End diff --

better to describe the properties as below to make it convenient for user 
to copy&paste.
```
# Required properties
A=B
# Optional properties
C=D
```


---


[GitHub] carbondata issue #2300: [CARBONDATA-2459][DataMap] Add cache for bloom filte...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2300
  
retest this please


---


[jira] [Created] (CARBONDATA-2474) Support Modular Plan

2018-05-12 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-2474:


 Summary: Support Modular Plan
 Key: CARBONDATA-2474
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2474
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Jacky Li
 Fix For: 1.4.0


Modular Plan is the basic structure for query plan in Materialized View. Carbon 
should support converting Spark Logical Plan to Modular Plan



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2473) Support Materialized View as enhanced Preaggregate DataMap

2018-05-12 Thread Jacky Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li updated CARBONDATA-2473:
-
Attachment: Query Rewrite using Materialized Views.pdf

> Support Materialized View as enhanced Preaggregate DataMap
> --
>
> Key: CARBONDATA-2473
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2473
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jacky Li
>Priority: Major
> Fix For: 1.4.0
>
> Attachments: Query Rewrite using Materialized Views.pdf
>
>
> Carbon DataMap is a framework to accelerate certain type of analysis 
> workload. in OLAP domain, traditionally there is a technique called 
> Materialized View to accelerate OLAP queries. 
> Currently carbon supports preaggregate datamap, as the preaggregate is only 
> on single table, Materialized View enhance it by adding join capability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2473) Support Materialized View as enhanced Preaggregate DataMap

2018-05-12 Thread Jacky Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li updated CARBONDATA-2473:
-
Attachment: (was: Query Rewrite using Materialized Views.pdf)

> Support Materialized View as enhanced Preaggregate DataMap
> --
>
> Key: CARBONDATA-2473
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2473
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jacky Li
>Priority: Major
> Fix For: 1.4.0
>
>
> Carbon DataMap is a framework to accelerate certain type of analysis 
> workload. in OLAP domain, traditionally there is a technique called 
> Materialized View to accelerate OLAP queries. 
> Currently carbon supports preaggregate datamap, as the preaggregate is only 
> on single table, Materialized View enhance it by adding join capability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2473) Support Materialized View as enhanced Preaggregate DataMap

2018-05-12 Thread Jacky Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li updated CARBONDATA-2473:
-
Attachment: Query Rewrite using Materialized Views.pdf

> Support Materialized View as enhanced Preaggregate DataMap
> --
>
> Key: CARBONDATA-2473
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2473
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jacky Li
>Priority: Major
> Fix For: 1.4.0
>
> Attachments: Query Rewrite using Materialized Views.pdf
>
>
> Carbon DataMap is a framework to accelerate certain type of analysis 
> workload. in OLAP domain, traditionally there is a technique called 
> Materialized View to accelerate OLAP queries. 
> Currently carbon supports preaggregate datamap, as the preaggregate is only 
> on single table, Materialized View enhance it by adding join capability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2290: [WIP][CARBONDATA-2389] Search mode support lucene da...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2290
  
Hi, @xubo245 Will you continue working on make search mode using other 
types of datamap?


---


[GitHub] carbondata pull request #2290: [WIP][CARBONDATA-2389] Search mode support lu...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2290#discussion_r187768741
  
--- Diff: store/search/src/main/scala/org/apache/spark/rpc/Master.scala ---
@@ -234,7 +232,7 @@ class Master(sparkConf: SparkConf) {
   // if we have enough data already, we do not need to collect more 
result
   if (rowCount < globalLimit) {
 // wait for worker for 10s
--- End diff --

please modify the comments as well


---


[GitHub] carbondata pull request #2290: [WIP][CARBONDATA-2389] Search mode support lu...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2290#discussion_r187768562
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/dev/expr/DataMapExprWrapper.java
 ---
@@ -40,6 +41,18 @@
   List prune(List segments, List 
partitionsToPrune)
   throws IOException;
 
+  /**
+   * prune blocklet according distributable
+   *
+   * @param distributable distributable
+   * @param partitionsToPrune partitions to prune
+   * @return the pruned ExtendedBlocklet list
+   * @throws IOException
+   */
+  List prune(DataMapDistributable distributable,
--- End diff --

there are other indent problems like this in this PR, better to optimize 
them all


---


[jira] [Created] (CARBONDATA-2473) Support Materialized View as enhanced Preaggregate DataMap

2018-05-12 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-2473:


 Summary: Support Materialized View as enhanced Preaggregate DataMap
 Key: CARBONDATA-2473
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2473
 Project: CarbonData
  Issue Type: Improvement
Reporter: Jacky Li
 Fix For: 1.4.0


Carbon DataMap is a framework to accelerate certain type of analysis workload. 
in OLAP domain, traditionally there is a technique called Materialized View to 
accelerate OLAP queries. 

Currently carbon supports preaggregate datamap, as the preaggregate is only on 
single table, Materialized View enhance it by adding join capability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata pull request #2290: [WIP][CARBONDATA-2389] Search mode support lu...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2290#discussion_r187768616
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/readcommitter/LatestFilesReadCommittedScope.java
 ---
@@ -40,9 +40,26 @@
 public class LatestFilesReadCommittedScope implements ReadCommittedScope {
 
   private String carbonFilePath;
+  private String segmentId;
   private ReadCommittedIndexFileSnapShot readCommittedIndexFileSnapShot;
   private LoadMetadataDetails[] loadMetadataDetails;
 
+  /**
+   * a new constructor of this class, which supports obtain lucene index 
in search mode
+   *
+   * @param path  carbon file path
+   * @param segmentId segment id
+   */
+  public LatestFilesReadCommittedScope(String path, String segmentId) {
--- End diff --

too much duplicate codes there. It can internally call `this(path)` to 
reduce the code


---


[GitHub] carbondata pull request #2290: [WIP][CARBONDATA-2389] Search mode support lu...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2290#discussion_r187768465
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java 
---
@@ -90,13 +90,18 @@ private DataMapStoreManager() {
   String dbName = carbonTable.getDatabaseName();
   String tableName = carbonTable.getTableName();
   String dmName = dataMap.getDataMapSchema().getDataMapName();
-  boolean isDmVisible = sessionInfo.getSessionParams().getProperty(
-  String.format("%s%s.%s.%s", 
CarbonCommonConstants.CARBON_DATAMAP_VISIBLE,
-  dbName, tableName, dmName), 
"true").trim().equalsIgnoreCase("true");
-  if (!isDmVisible) {
-LOGGER.warn(String.format("Ignore invisible datamap %s on table 
%s.%s",
-dmName, dbName, tableName));
-dataMapIterator.remove();
+  if (sessionInfo != null) {
--- End diff --

When will sessionInfo be null?


---


[GitHub] carbondata pull request #2290: [WIP][CARBONDATA-2389] Search mode support lu...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2290#discussion_r187768650
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java
 ---
@@ -169,6 +169,10 @@ private BlockletScannedResult 
executeFilter(RawBlockletColumnChunks rawBlockletC
 // apply filter on actual data, for each page
 BitSetGroup bitSetGroup = 
this.filterExecuter.applyFilter(rawBlockletColumnChunks,
 useBitSetPipeLine);
+// if bitSetGroup is nul, then new BitSetGroup object, which can avoid 
NPE
--- End diff --

Is `Lucene` will introduce this problem as you described or is it a bug 
caused by other scenario?


---


[GitHub] carbondata pull request #2290: [WIP][CARBONDATA-2389] Search mode support lu...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2290#discussion_r187768680
  
--- Diff: 
datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneDataMapFactoryBase.java
 ---
@@ -168,7 +169,8 @@ public DataMapBuilder createBuilder(Segment segment, 
String shardName) {
 getAllIndexDirs(tableIdentifier.getTablePath(), 
segment.getSegmentNo());
 for (CarbonFile indexDir : indexDirs) {
   // Filter out the tasks which are filtered through CG datamap.
-  if 
(!segment.getFilteredIndexShardNames().contains(indexDir.getName())) {
+  if (getDataMapLevel() != DataMapLevel.FG &&
--- End diff --

What does this for?
If it is only for CG datamap, then you can judge outside this loop.


---


[GitHub] carbondata pull request #2290: [WIP][CARBONDATA-2389] Search mode support lu...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2290#discussion_r187768530
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/dev/expr/AndDataMapExprWrapper.java
 ---
@@ -59,6 +60,21 @@ public AndDataMapExprWrapper(DataMapExprWrapper left, 
DataMapExprWrapper right,
 return andBlocklets;
   }
 
+  @Override
+  public List prune(DataMapDistributable distributable,
+  List 
partitionsToPrune)
+  throws IOException {
+List leftPrune = left.prune(distributable, 
partitionsToPrune);
+List rightPrune = right.prune(distributable, 
partitionsToPrune);
+List andBlocklets = new ArrayList<>();
+for (ExtendedBlocklet blocklet : leftPrune) {
+  if (rightPrune.contains(blocklet)) {
--- End diff --

Have you ever validate the correctness of this? The `equals` and `hashCode` 
method in `ExtendedBlocklet` only make use of its member `segmentId`. I'm not 
sure whether it is correct.


---


[GitHub] carbondata pull request #2290: [WIP][CARBONDATA-2389] Search mode support lu...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2290#discussion_r187768472
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/dev/expr/AndDataMapExprWrapper.java
 ---
@@ -59,6 +60,21 @@ public AndDataMapExprWrapper(DataMapExprWrapper left, 
DataMapExprWrapper right,
 return andBlocklets;
   }
 
+  @Override
+  public List prune(DataMapDistributable distributable,
--- End diff --

Is the indent correct?


---


[GitHub] carbondata pull request #2290: [WIP][CARBONDATA-2389] Search mode support lu...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2290#discussion_r187768550
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/dev/expr/DataMapExprWrapper.java
 ---
@@ -40,6 +41,18 @@
   List prune(List segments, List 
partitionsToPrune)
   throws IOException;
 
+  /**
+   * prune blocklet according distributable
+   *
+   * @param distributable distributable
+   * @param partitionsToPrune partitions to prune
+   * @return the pruned ExtendedBlocklet list
+   * @throws IOException
+   */
+  List prune(DataMapDistributable distributable,
--- End diff --

the indent problem?


---


[GitHub] carbondata issue #2300: [CARBONDATA-2459][DataMap] Add cache for bloom filte...

2018-05-12 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2300
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4690/



---


[GitHub] carbondata issue #2300: [CARBONDATA-2459][DataMap] Add cache for bloom filte...

2018-05-12 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2300
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5846/



---


[GitHub] carbondata issue #2300: [CARBONDATA-2459][DataMap] Add cache for bloom filte...

2018-05-12 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2300
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/4892/



---


[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2299#discussion_r187768276
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ---
@@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, 
String shardName) {
 Set tableBlockIndexUniqueIdentifiers =
 segmentMap.get(segment.getSegmentNo());
 if (tableBlockIndexUniqueIdentifiers == null) {
+  CarbonTable carbonTable = this.getCarbonTable();
+  if (!carbonTable.getTableInfo().isTransactionalTable()) {
+// For NonTransactional table, compare the schema of all index 
files with inferred schema.
+// If there is a mismatch throw exception. As all files must be of 
same schema.
+validateSchemaForNewTranscationalTableFiles(segment, carbonTable);
+  }
   tableBlockIndexUniqueIdentifiers =
   BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment);
   segmentMap.put(segment.getSegmentNo(), 
tableBlockIndexUniqueIdentifiers);
 }
 return tableBlockIndexUniqueIdentifiers;
   }
 
+  private void validateSchemaForNewTranscationalTableFiles(Segment 
segment, CarbonTable carbonTable)
+  throws IOException {
+SchemaConverter schemaConverter = new 
ThriftWrapperSchemaConverterImpl();
+Map indexFiles = segment.getCommittedIndexFile();
+for (Map.Entry indexFileEntry : indexFiles.entrySet()) 
{
+  Path indexFile = new Path(indexFileEntry.getKey());
--- End diff --

It seems it's unnecessary to new a path object.


---


[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2299#discussion_r187768333
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ---
@@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, 
String shardName) {
 Set tableBlockIndexUniqueIdentifiers =
 segmentMap.get(segment.getSegmentNo());
 if (tableBlockIndexUniqueIdentifiers == null) {
+  CarbonTable carbonTable = this.getCarbonTable();
+  if (!carbonTable.getTableInfo().isTransactionalTable()) {
+// For NonTransactional table, compare the schema of all index 
files with inferred schema.
+// If there is a mismatch throw exception. As all files must be of 
same schema.
+validateSchemaForNewTranscationalTableFiles(segment, carbonTable);
+  }
   tableBlockIndexUniqueIdentifiers =
   BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment);
   segmentMap.put(segment.getSegmentNo(), 
tableBlockIndexUniqueIdentifiers);
 }
 return tableBlockIndexUniqueIdentifiers;
   }
 
+  private void validateSchemaForNewTranscationalTableFiles(Segment 
segment, CarbonTable carbonTable)
+  throws IOException {
+SchemaConverter schemaConverter = new 
ThriftWrapperSchemaConverterImpl();
+Map indexFiles = segment.getCommittedIndexFile();
+for (Map.Entry indexFileEntry : indexFiles.entrySet()) 
{
+  Path indexFile = new Path(indexFileEntry.getKey());
+  org.apache.carbondata.format.TableInfo tableInfo = 
CarbonUtil.inferSchemaFromIndexFile(
+  indexFile.toString(), carbonTable.getTableName());
+  TableInfo wrapperTableInfo = 
schemaConverter.fromExternalToWrapperTableInfo(
+  tableInfo, identifier.getDatabaseName(),
+  identifier.getTableName(),
+  identifier.getTablePath());
+  List indexFileColumnList =
+  wrapperTableInfo.getFactTable().getListOfColumns();
+  List tableColumnList =
+  carbonTable.getTableInfo().getFactTable().getListOfColumns();
+  if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) {
+LOG.error("Schema of " + indexFile.getName()
++ " doesn't match with the table's schema");
+throw new IOException("All the files doesn't have same schema. "
++ "Unsupported operation on nonTransactional table. Check 
logs.");
+  }
+}
+  }
+
+  private boolean compareColumnSchemaList(List 
indexFileColumnList,
--- End diff --

please optimize the method name, such as `isColumnSchemaSame`


---


[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2299#discussion_r187768121
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java 
---
@@ -297,7 +297,7 @@ public TableDataMap registerDataMap(CarbonTable table,
 dataMapSchema, dataMapFactory, blockletDetailsFetcher, 
segmentPropertiesFetcher);
 
 tableIndices.add(dataMap);
-allDataMaps.put(tableUniqueName, tableIndices);
+allDataMaps.put(tableUniqueName.toLowerCase(), tableIndices);
--- End diff --

The name is already lowercased here, no need to convert it here. If it is 
not, there maybe other bugs that cause it.



---


[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2299#discussion_r187768312
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ---
@@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, 
String shardName) {
 Set tableBlockIndexUniqueIdentifiers =
 segmentMap.get(segment.getSegmentNo());
 if (tableBlockIndexUniqueIdentifiers == null) {
+  CarbonTable carbonTable = this.getCarbonTable();
+  if (!carbonTable.getTableInfo().isTransactionalTable()) {
+// For NonTransactional table, compare the schema of all index 
files with inferred schema.
+// If there is a mismatch throw exception. As all files must be of 
same schema.
+validateSchemaForNewTranscationalTableFiles(segment, carbonTable);
+  }
   tableBlockIndexUniqueIdentifiers =
   BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment);
   segmentMap.put(segment.getSegmentNo(), 
tableBlockIndexUniqueIdentifiers);
 }
 return tableBlockIndexUniqueIdentifiers;
   }
 
+  private void validateSchemaForNewTranscationalTableFiles(Segment 
segment, CarbonTable carbonTable)
+  throws IOException {
+SchemaConverter schemaConverter = new 
ThriftWrapperSchemaConverterImpl();
+Map indexFiles = segment.getCommittedIndexFile();
+for (Map.Entry indexFileEntry : indexFiles.entrySet()) 
{
+  Path indexFile = new Path(indexFileEntry.getKey());
+  org.apache.carbondata.format.TableInfo tableInfo = 
CarbonUtil.inferSchemaFromIndexFile(
+  indexFile.toString(), carbonTable.getTableName());
+  TableInfo wrapperTableInfo = 
schemaConverter.fromExternalToWrapperTableInfo(
+  tableInfo, identifier.getDatabaseName(),
+  identifier.getTableName(),
+  identifier.getTablePath());
+  List indexFileColumnList =
+  wrapperTableInfo.getFactTable().getListOfColumns();
+  List tableColumnList =
+  carbonTable.getTableInfo().getFactTable().getListOfColumns();
+  if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) {
+LOG.error("Schema of " + indexFile.getName()
++ " doesn't match with the table's schema");
+throw new IOException("All the files doesn't have same schema. "
++ "Unsupported operation on nonTransactional table. Check 
logs.");
+  }
+}
+  }
+
+  private boolean compareColumnSchemaList(List 
indexFileColumnList,
+  List tableColumnList) {
+if (indexFileColumnList.size() != tableColumnList.size()) {
+  return false;
--- End diff --

Can you add a log here and line170 to tell the reason.


---