tajo git commit: TAJO-1740: Update Partition Table document. (jaehwa)

blrunner Sun, 17 Jan 2016 23:12:51 -0800

Repository: tajo
Updated Branches:
  refs/heads/branch-0.11.1 c57956e4c -> 4c34c53c6



TAJO-1740: Update Partition Table document. (jaehwa)


Project: http://git-wip-us.apache.org/repos/asf/tajo/repo
Commit: http://git-wip-us.apache.org/repos/asf/tajo/commit/4c34c53c
Tree: http://git-wip-us.apache.org/repos/asf/tajo/tree/4c34c53c
Diff: http://git-wip-us.apache.org/repos/asf/tajo/diff/4c34c53c

Branch: refs/heads/branch-0.11.1
Commit: 4c34c53c61701497b7eabc6e819a8f20d3526c5a
Parents: c57956e
Author: JaeHwa Jung <[email protected]>
Authored: Mon Jan 18 16:11:59 2016 +0900
Committer: JaeHwa Jung <[email protected]>
Committed: Mon Jan 18 16:11:59 2016 +0900

----------------------------------------------------------------------
 CHANGES                                         |   2 +
 .../sphinx/partitioning/column_partitioning.rst | 213 +++++++++++++++++--
 2 files changed, 200 insertions(+), 15 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tajo/blob/4c34c53c/CHANGES
----------------------------------------------------------------------
diff --git a/CHANGES b/CHANGES
index 92a5a09..b695055 100644
--- a/CHANGES
+++ b/CHANGES
@@ -7,6 +7,8 @@ Release 0.11.1 - unreleased
 
   IMPROVEMENT
 
+    TAJO-1740: Update Partition Table document. (jaehwa)
+
     TAJO-2053: Update description for HBase configuration.
     (Dongkyu Hwangbo via jaehwa)
 

http://git-wip-us.apache.org/repos/asf/tajo/blob/4c34c53c/tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst
----------------------------------------------------------------------
diff --git a/tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst 
b/tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst
index 4b8d6bf..5fd44ed 100644
--- a/tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst
+++ b/tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst
@@ -11,29 +11,61 @@ How to Create a Column Partitioned Table
 You can create a partitioned table by using the ``PARTITION BY`` clause. For a 
column partitioned table, you should use
 the ``PARTITION BY COLUMN`` clause with partition keys.
 
-For example, assume there is a table ``orders`` composed of the following 
schema. ::
+For example, assume a table with the following schema.
 
-  id          INT,
-  item_name   TEXT,
-  price       FLOAT
+.. code-block:: sql
+
+  id        INT,
+  name      TEXT,
+  gender    char(1),
+  grade     TEXT,
+  country   TEXT,
+  city      TEXT,
+  phone     TEXT
+  );
 
-Also, assume that you want to use ``order_date TEXT`` and ``ship_date TEXT`` 
as the partition keys.
-Then, you should create a table as follows:
+If you want to make country as partitioned column, your Tajo definition would 
be this:
 
 .. code-block:: sql
 
-  CREATE TABLE orders (
-    id INT,
-    item_name TEXT,
-    price
-  ) PARTITION BY COLUMN (order_date TEXT, ship_date TEXT);
+  CREATE TABLE student (
+    id        INT,
+    name      TEXT,
+    gender    char(1),
+    grade     TEXT,
+    city      TEXT,
+    phone     TEXT
+  ) PARTITION BY COLUMN (country TEXT);
+
+Let us assume you want to use more partition columns and parquet file format. 
Here's an example statement to create a table:
+
+.. code-block:: sql
+
+  CREATE TABLE student (
+    id        INT,
+    name      TEXT,
+    gender    char(1),
+    grade     TEXT,
+    phone     TEXT
+  ) USING PARQUET
+  PARTITION BY COLUMN (country TEXT, city TEXT);
+
+The statement above creates the student table with id, name, grade, etc. The 
table is also partitioned and data is stored in parquet files.
+
+You might have noticed that while the partitioning key columns are a part of 
the table DDL, theyâre only listed in the ``PARTITION BY`` clause. In Tajo, 
as data is written to disk, each partition of data will be automatically split 
out into different folders, e.g. country=USA/city=NEWYORK. During a read 
operation, Tajo will use the folder structure to quickly locate the right 
partitions and also return the partitioning columns as columns in the result 
set.
+
 
 ==================================================
-Partition Pruning on Column Partitioned Tables
+Querying Partitioned Tables
 ==================================================
 
-The following predicates in the ``WHERE`` clause can be used to prune 
unqualified column partitions without processing
-during query planning phase.
+If a table created using the ``PARTITION BY`` clause, a query can do partition 
pruning and scan only a fraction of the table relevant to the partitions 
specified by the query. Tajo currently does partition pruning if the partition 
predicates are specified in the WHERE clause. For example, if table student is 
partitioned on column country and column city, the following query retrieves 
rows in ``country=KOREA\city=SEOUL`` directory.
+
+.. code-block:: sql
+
+  SELECT * FROM student WHERE country = 'KOREA' AND city = 'SEOUL';
+
+The following predicates in the ``WHERE`` clause can be used to prune column 
partitions during query planning phase.
 
 * ``=``
 * ``<>``
@@ -44,9 +76,160 @@ during query planning phase.
 * LIKE predicates with a leading wild-card character
 * IN list predicates
 
+
+==================================================
+Add data to Partition Table
+==================================================
+
+Tajo provides a very useful feature of dynamic partitioning. You don't need to 
use any syntax with both ``INSERT INTO ... SELECT`` and ``Create Table As 
Select(CTAS)`` statments for dynamic partitioning. Tajo will automatically 
filter the data, create directories, move filtered data to appropriate 
directory and create partition over it.
+
+For example, assume there are both ``student_source`` and ``student`` tables 
composed of the following schema.
+
+.. code-block:: sql
+
+  CREATE TABLE student_source (
+    id        INT,
+    name      TEXT,
+    gender    char(1),
+    grade     TEXT,
+    country   TEXT,
+    city      TEXT,
+    phone     TEXT
+  );
+
+  CREATE TABLE student (
+    id        INT,
+    name      TEXT,
+    gender    char(1),
+    grade     TEXT,
+    phone     TEXT
+  ) PARTITION BY COLUMN (country TEXT, city TEXT);
+
+
+How to INSERT dynamically to partition table
+--------------------------------------------------------
+
+If you want to load an entire country or an entire city in one fell swoop:
+
+.. code-block:: sql
+
+  INSERT OVERWRITE INTO student
+  SELECT id, name, gender, grade, phone, country, city
+  FROM   student_source;
+
+
+How to CTAS dynamically to partition table
+--------------------------------------------------------
+
+when a partition table is created:
+
+.. code-block:: sql
+
+  DROP TABLE if exists student;
+
+  CREATE TABLE student (
+    id        INT,
+    name      TEXT,
+    gender    char(1),
+    grade     TEXT,
+    phone     TEXT
+  ) PARTITION BY COLUMN (country TEXT, city TEXT)
+  AS SELECT id, name, gender, grade, phone, country, city
+  FROM   student_source;
+
+
+.. note::
+
+  When loading data into a partition, itâs necessary to include the 
partition columns as the last columns in the query. The column names in the 
source query donât need to match the partition column names.
+
+
 ==================================================
 Compatibility Issues with Apache Hiveâ¢
 ==================================================
 
 If partitioned tables of Hive are created as external tables in Tajo, Tajo can 
process the Hive partitioned tables directly.
-There haven't known compatibility issues yet.
\ No newline at end of file
+
+
+How to create partition table
+--------------------------------------------------------
+
+If you create a partition table as follows in Tajo:
+
+.. code-block:: sql
+
+  default> CREATE TABLE student (
+    id        INT,
+    name      TEXT,
+    gender    char(1),
+    grade     TEXT,
+    phone     TEXT
+  ) PARTITION BY COLUMN (country TEXT, city TEXT);
+
+
+And then you can get table information in Hive:
+
+.. code-block:: sql
+
+  hive> desc student;
+  OK
+  id                   int
+  name                 string
+  gender               char(1)
+  grade                string
+  phone                string
+  country              string
+  city                 string
+
+  # Partition Information
+  # col_name                   data_type               comment
+
+  country              string
+  city                 string
+
+
+Or as you create the table in Hive:
+
+.. code-block:: sql
+
+  hive > CREATE TABLE student (
+    id int,
+    name string,
+    gender char(1),
+    grade string,
+    phone string
+  ) PARTITIONED BY (country string, city string)
+  ROW FORMAT DELIMITED
+    FIELDS TERMINATED BY '|' ;
+
+You will see table information in Tajo:
+
+.. code-block:: sql
+
+  default> \d student;
+  table name: default.student
+  table uri: hdfs://your_hdfs_namespace/user/hive/warehouse/student
+  store type: TEXT
+  number of rows: 0
+  volume: 0 B
+  Options:
+    'text.null'='\\N'
+    'transient_lastDdlTime'='1438756422'
+    'text.delimiter'='|'
+
+  schema:
+  id   INT4
+  name TEXT
+  gender       CHAR(1)
+  grade        TEXT
+  phone        TEXT
+
+  Partitions:
+  type:COLUMN
+  columns::default.student.country (TEXT), default.student.city (TEXT)
+
+
+How to add data to partition table
+--------------------------------------------------------
+
+In Tajo, you can add data dynamically to partition table of Hive with both 
``INSERT INTO ... SELECT`` and ``Create Table As Select (CTAS)`` statments. 
Tajo will automatically filter the data to HiveMetastore, create directories 
and move filtered data to appropriate directory on the distributed file system.
+

tajo git commit: TAJO-1740: Update Partition Table document. (jaehwa)

Reply via email to