carbondata git commit: [CARBONDATA-2098] Add datamap managment description

ravipesala Sat, 03 Mar 2018 02:25:21 -0800

Repository: carbondata
Updated Branches:
  refs/heads/master c125f0caa -> d0c2ab2dc



[CARBONDATA-2098] Add datamap managment description

Enhance document for datamap

This closes #2026


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/d0c2ab2d
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/d0c2ab2d
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/d0c2ab2d

Branch: refs/heads/master
Commit: d0c2ab2dc5abf16084354848dbcf6f5c45b3cae5
Parents: c125f0c
Author: Jacky Li <jacky.li...@qq.com>
Authored: Sat Mar 3 13:40:59 2018 +0800
Committer: ravipesala <ravi.pes...@gmail.com>
Committed: Sat Mar 3 15:43:14 2018 +0530

----------------------------------------------------------------------
 docs/datamap/preaggregate-datamap-guide.md      | 51 +++++++++++++++++---
 docs/datamap/timeseries-datamap-guide.md        | 23 ++++++---
 .../examples/PreAggregateTableExample.scala     |  2 +
 3 files changed, 64 insertions(+), 12 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/d0c2ab2d/docs/datamap/preaggregate-datamap-guide.md
----------------------------------------------------------------------
diff --git a/docs/datamap/preaggregate-datamap-guide.md 
b/docs/datamap/preaggregate-datamap-guide.md
index fabfd7d..199f674 100644
--- a/docs/datamap/preaggregate-datamap-guide.md
+++ b/docs/datamap/preaggregate-datamap-guide.md
@@ -1,5 +1,13 @@
 # CarbonData Pre-aggregate DataMap
   
+* [Quick Example](#quick-example)
+* [DataMap Management](#datamap-management)
+* [Pre-aggregate Table](#preaggregate-datamap-introduction)
+* [Loading Data](#loading-data)
+* [Querying Data](#querying-data)
+* [Compaction](#compacting-pre-aggregate-tables)
+* [Data Management](#data-management-with-pre-aggregate-tables)
+
 ## Quick example
 Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME
 
@@ -85,7 +93,35 @@ Start spark-shell in new terminal, type :paste, then copy 
and run the following
   spark.stop
 ```
 
-##PRE-AGGREGATE DataMap  
+#### DataMap Management
+DataMap can be created using following DDL
+  ```
+  CREATE DATAMAP [IF NOT EXISTS] datamap_name
+  ON TABLE main_table
+  USING "datamap_provider"
+  DMPROPERTIES ('key'='value', ...)
+  AS
+    SELECT statement
+  ```
+The string followed by USING is called DataMap Provider, in this version 
CarbonData supports two 
+kinds of DataMap: 
+1. preaggregate, for pre-aggregate table. No DMPROPERTY is required for this 
DataMap
+2. timeseries, for timeseries roll-up table. Please refer to [Timeseries 
DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/timeseries-datamap-guide.md)
+
+DataMap can be dropped using following DDL
+  ```
+  DROP DATAMAP [IF EXISTS] datamap_name
+  ON TABLE main_table
+  ```
+To show all DataMaps created, use:
+  ```
+  SHOW DATAMAP 
+  ON TABLE main_table
+  ```
+It will show all DataMaps created on main table.
+
+
+## Preaggregate DataMap Introduction
   Pre-aggregate tables are created as DataMaps and managed as tables 
internally by CarbonData. 
   User can create as many pre-aggregate datamaps required to improve query 
performance, 
   provided the storage requirements and loading speeds are acceptable.
@@ -163,7 +199,7 @@ SELECT country, max(price) from sales GROUP BY country
 will query against main table **sales** only, because it does not satisfy 
pre-aggregate table 
 selection logic. 
 
-#### Loading data to pre-aggregate tables
+## Loading data
 For existing table with loaded data, data load to pre-aggregate table will be 
triggered by the 
 CREATE DATAMAP statement when user creates the pre-aggregate table. For 
incremental loads after 
 aggregates tables are created, loading data to main table triggers the load to 
pre-aggregate tables 
@@ -174,7 +210,7 @@ meaning that data on main table and pre-aggregate tables 
are only visible to the
 tables are loaded successfully, if one of these loads fails, new data are not 
visible in all tables 
 as if the load operation is not happened.   
 
-#### Querying data from pre-aggregate tables
+## Querying data
 As a technique for query acceleration, Pre-aggregate tables cannot be queries 
directly. 
 Queries are to be made on main table. While doing query planning, internally 
CarbonData will check 
 associated pre-aggregate tables with the main table, and do query plan 
transformation accordingly. 
@@ -183,7 +219,8 @@ User can verify whether a query can leverage pre-aggregate 
table or not by execu
 command, which will show the transformed logical plan, and thus user can check 
whether pre-aggregate
 table is selected.
 
-#### Compacting pre-aggregate tables
+
+## Compacting pre-aggregate tables
 Running Compaction command (`ALTER TABLE COMPACT`) on main table will **not 
automatically** 
 compact the pre-aggregate tables created on the main table. User need to run 
Compaction command 
 separately on each pre-aggregate table to compact them.
@@ -193,8 +230,10 @@ main table but not performed on pre-aggregate table, all 
queries still can benef
 pre-aggregate tables. To further improve the query performance, compaction on 
pre-aggregate tables 
 can be triggered to merge the segments and files in the pre-aggregate tables. 
 
-#### Data Management on pre-aggregate tables
-Once there is pre-aggregate table created on the main table, following command 
on the main table
+## Data Management with pre-aggregate tables
+In current implementation, data consistence need to be maintained for both 
main table and pre-aggregate
+tables. Once there is pre-aggregate table created on the main table, following 
command on the main 
+table
 is not supported:
 1. Data management command: `UPDATE/DELETE/DELETE SEGMENT`. 
 2. Schema management command: `ALTER TABLE DROP COLUMN`, `ALTER TABLE CHANGE 
DATATYPE`, 

http://git-wip-us.apache.org/repos/asf/carbondata/blob/d0c2ab2d/docs/datamap/timeseries-datamap-guide.md
----------------------------------------------------------------------
diff --git a/docs/datamap/timeseries-datamap-guide.md 
b/docs/datamap/timeseries-datamap-guide.md
index ecd7234..886c161 100644
--- a/docs/datamap/timeseries-datamap-guide.md
+++ b/docs/datamap/timeseries-datamap-guide.md
@@ -1,14 +1,25 @@
 # CarbonData Timeseries DataMap
 
-## Supporting timeseries data (Alpha feature in 1.3.0)
+* [Timeseries 
DataMap](#timeseries-datamap-intoduction-(alpha-feature-in-1.3.0))
+* [Compaction](#compacting-pre-aggregate-tables)
+* [Data Management](#data-management-with-pre-aggregate-tables)
+
+## Timeseries DataMap Intoduction (Alpha feature in 1.3.0)
 Timeseries DataMap a pre-aggregate table implementation based on 
'preaggregate' DataMap. 
 Difference is that Timerseries DataMap has built-in understanding of time 
hierarchy and 
 levels: year, month, day, hour, minute, so that it supports automatic roll-up 
in time dimension 
 for query.
+
+The data loading, querying, compaction command and its behavior is the same as 
preaggregate DataMap.
+Please refer to [Pre-aggregate 
DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/preaggregate-datamap-guide.md)
+for more information.
   
-For instance, user can create multiple timeseries datamap on the main table 
which has a *event_time*
-column, one datamap for one time granularity. Then Carbondata can do automatic 
roll-up for queries 
-on the main table.
+To use this datamap, user can create multiple timeseries datamap on the main 
table which has 
+a *event_time* column, one datamap for one time granularity. Then Carbondata 
can do automatic 
+roll-up for queries on the main table.
+
+For example, below statement effectively create multiple pre-aggregate tables  
on main table called 
+**timeseries**
 
 ```
 CREATE DATAMAP agg_year
@@ -126,10 +137,10 @@ the future CarbonData release.
 * timeseries datamaps created for each level needs to be dropped separately 
       
 
-#### Compacting timeseries datamp
+## Compacting timeseries datamp
 Refer to Compaction section in [preaggregation 
datamap](https://github.com/apache/carbondata/blob/master/docs/datamap/preaggregate-datamap-guide.md).
 
 Same applies to timeseries datamap.
 
-#### Data Management on timeseries datamap
+## Data Management on timeseries datamap
 Refer to Data Management section in [preaggregation 
datamap](https://github.com/apache/carbondata/blob/master/docs/datamap/preaggregate-datamap-guide.md).
 Same applies to timeseries datamap.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/carbondata/blob/d0c2ab2d/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
----------------------------------------------------------------------
diff --git 
a/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
 
b/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
index ace3dcc..64ed525 100644
--- 
a/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
+++ 
b/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
@@ -99,6 +99,8 @@ object PreAggregateTableExample {
       s"""create datamap preagg_count on table maintable using 'preaggregate' 
as
          | select name, count(*) from maintable group by name""".stripMargin)
 
+    spark.sql("show datamap on table maintable").show
+
     spark.sql(
       s"""
          | SELECT id,max(age)

carbondata git commit: [CARBONDATA-2098] Add datamap managment description

Reply via email to