[beam-site] 09/10: remove post page move internal imple to the end of page

mergebot-role Mon, 21 Aug 2017 11:50:05 -0700

This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git


commit 292e22c1fc683128810a00ed937fdd2f73427234
Author: mingmxu <ming...@ebay.com>
AuthorDate: Mon Aug 21 09:16:36 2017 -0700

    remove post page
    move internal imple to the end of page
---
 src/_data/authors.yml            |  4 --
 src/_posts/2017-07-21-sql-dsl.md | 64 --------------------------
 src/documentation/dsls/sql.md    | 98 +++++++++++++++++++++-------------------
 3 files changed, 51 insertions(+), 115 deletions(-)

diff --git a/src/_data/authors.yml b/src/_data/authors.yml
index 1b76ddb..d8cd836 100644
--- a/src/_data/authors.yml
+++ b/src/_data/authors.yml
@@ -28,10 +28,6 @@ klk:
     name: Kenneth Knowles
     email: k...@apache.org
     twitter: KennKnowles
-mingmxu:
-    name: Mingmin Xu
-    email: mingm...@gmail.com
-    twitter:
 robertwb:
     name: Robert Bradshaw
     email: rober...@apache.org
diff --git a/src/_posts/2017-07-21-sql-dsl.md b/src/_posts/2017-07-21-sql-dsl.md
deleted file mode 100644
index 95df3f8..0000000
--- a/src/_posts/2017-07-21-sql-dsl.md
+++ /dev/null
@@ -1,64 +0,0 @@
----
-layout: post
-title:  "Use Beam SQL DSL to build a pipeline"
-date:   2017-07-21 00:00:00 -0800
-excerpt_separator: <!--more-->
-categories: blog
-authors:
-  - mingmxu
----
-
-Beam SQL DSL provides the capability to execute standard SQL queries using 
Beam Java SDK. It packages the backend parsing/validation/assembling features, 
and delivers a SDK style API to developers, to express a processing logic using 
SQL statements, from simple TABLE_FILTER, to complex queries containing 
JOIN/GROUP_BY etc.
-
-<!--more-->
-
-# <a name="overview"></a>1. Overview
-SQL is a well-adopted standard to process data with concise syntax. With DSL 
APIs, now `PCollection`s can be queried with standard SQL statements, like a 
regular table. The DSL APIs leverage [Apache 
Calcite](http://calcite.apache.org/) to parse and optimize SQL queries, then 
translate it into a _composite_ Beam `PTransform`. In this way, both SQL and 
normal Beam `PTransform`s can be mixed in one pipeline.
-
-# <a name="usage"></a>2. Usage of DSL APIs 
-`BeamSql` is the only interface(with two methods `BeamSql.query()` and 
`BeamSql.simpleQuery()`) for developers. It wraps the back-end details of 
parsing/validation/assembling, and deliver a Beam SDK style API that can take 
either simple TABLE_FILTER queries or complex queries containing JOIN/GROUP_BY 
etc. 
-
-*Note*, the two methods are equivalent in functionality, `BeamSql.query()` 
applies on a `PCollectionTuple` with one or many input `PCollection`s; 
`BeamSql.simpleQuery()` is a simplified API which applies on single 
`PCollection`.
-
-[BeamSqlExample](https://github.com/apache/beam/blob/DSL_SQL/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/example/BeamSqlExample.java)
 in code repository shows the usage of both APIs:
-
-```
-//Step 1. create a source PCollection with Create.of();
-BeamRecordSqlType type = BeamRecordSqlType.create(fieldNames, fieldTypes);
-...
-
-PCollection<BeamRecord> inputTable = PBegin.in(p).apply(Create.of(row...)
-    .withCoder(type.getRecordCoder()));
-
-//Step 2. (Case 1) run a simple SQL query over input PCollection with 
BeamSql.simpleQuery;
-PCollection<BeamRecord> outputStream = inputTable.apply(
-    BeamSql.simpleQuery("select c1, c2, c3 from PCOLLECTION where c1=1"));
-
-
-//Step 2. (Case 2) run the query with BeamSql.query over result PCollection of 
(case 1);
-PCollection<BeamRecord> outputStream2 =
-    PCollectionTuple.of(new TupleTag<BeamRecord>("CASE1_RESULT"), outputStream)
-        .apply(BeamSql.query("select c2, sum(c3) from CASE1_RESULT group by 
c2"));
-```
-
-In Step 1, a `PCollection<BeamRecord>` is prepared as the source dataset. The 
work to generate a queriable `PCollection<BeamRecord>` is beyond the scope of 
Beam SQL DSL. 
-
-Step 2(Case 1) shows the usage to run a query with `BeamSql.simpleQuery()`, be 
aware that the input `PCollection` is named with a fixed table name 
__PCOLLECTION__. Step 2(Case 2) is another example to run a query with 
`BeamSql.query()`. A Table name is specified when adding `PCollection` to 
`PCollectionTuple`. As each call of either `BeamSql.query()` or 
`BeamSql.simpleQuery()` has its own schema repository, developers need to 
include all `PCollection`s that would be used in your query.
-
-# <a name="functionality"></a>4. Functionality in Beam SQL
-Just as the unified model for both bounded and unbounded data in Beam, SQL DSL 
provides the same functionalities for bounded and unbounded `PCollection` as 
well. 
-
-Note that, SQL support is not fully completed. Queries that include 
unsupported features would cause an `UnsupportedOperationException`.
-
-## <a name="features"></a>4.1. Supported Features
-The following features are supported in current repository:
-
-1. filter clauses;
-2. data field projections;
-3. aggregations (global_window, fixed_window, sliding_window, session_window);
-4. Join (inner, left_outer, right_outer);
-5. built-in SQL functions
-6. User Defined Function (UDF) and User Defined Aggregate Function (UDAF);
-
-For more deep dive, please visit [DSLs: SQL]({{ site.baseurl 
}}/documentation/dsls/sql/)
-
diff --git a/src/documentation/dsls/sql.md b/src/documentation/dsls/sql.md
index bfcbd1c..4f25fe4 100644
--- a/src/documentation/dsls/sql.md
+++ b/src/documentation/dsls/sql.md
@@ -5,55 +5,23 @@ permalink: /documentation/dsls/sql/
 ---
 
 * [1. Overview](#overview)
-* [2. The Internal of Beam SQL](#internal-of-sql)
-* [3. Usage of DSL APIs](#usage)
-* [4. Functionality in Beam SQL](#functionality)
-  * [4.1. Supported Features](#features)
-  * [4.2. Intro of BeamSqlRow](#beamsqlrow)
-  * [4.3. Data Types](#data-type)
-  * [4.4. built-in SQL functions](#built-in-functions)
+* [2. Usage of DSL APIs](#usage)
+* [3. Functionality in Beam SQL](#functionality)
+  * [3.1. Supported Features](#features)
+  * [3.2. Intro of BeamSqlRow](#beamsqlrow)
+  * [3.3. Data Types](#data-type)
+  * [3.4. built-in SQL functions](#built-in-functions)
+* [4. The Internal of Beam SQL](#internal-of-sql)
 
 This page describes the implementation of Beam SQL, and how to simplify a Beam 
pipeline with DSL APIs.
 
+> Note, Beam SQL hasn't been merged to master branch yet(being developed with 
branch [DSL_SQL](https://github.com/apache/beam/tree/DSL_SQL)), but is coming 
soon.
+
 # <a name="overview"></a>1. Overview
 SQL is a well-adopted standard to process data with concise syntax. With DSL 
APIs, now `PCollection`s can be queried with standard SQL statements, like a 
regular table. The DSL APIs leverage [Apache 
Calcite](http://calcite.apache.org/) to parse and optimize SQL queries, then 
translate into a composite Beam `PTransform`. In this way, both SQL and normal 
Beam `PTransform`s can be mixed in the same pipeline.
 
-# <a name="internal-of-sql"></a>2. The Internal of Beam SQL
-Figure 1 describes the back-end steps from a SQL statement to a Beam 
`PTransform`.
-
-![Workflow of Beam SQL DSL]({{ "/images/beam_sql_dsl_workflow.png" | prepend: 
site.baseurl }} "workflow of Beam SQL DSL")
-
-**Figure 1** workflow of Beam SQL DSL
-
-Given a `PCollection` and the query as input, first of all the input 
`PCollection` is registered as a table in the schema repository. Then it's 
processed as:
-
-1. SQL query is parsed according to grammar to generate a SQL Abstract Syntax 
Tree;
-2. Validate against table schema, and output a logical plan represented with 
relational algebras;
-3. Relational rules are applied to convert it to a physical plan, expressed 
with Beam components. An optimizer is optional to update the plan;
-4. Finally, the Beam physical plan is compiled as a composite `PTransform`;
 
-Here is an example to show a query that filters and projects from an input 
`PCollection`:
-
-```
-SELECT USER_ID, USER_NAME FROM PCOLLECTION WHERE USER_ID = 1
-```
-
-The logical plan is shown as:
-
-```
-LogicalProject(USER_ID=[$0], USER_NAME=[$1])
-  LogicalFilter(condition=[=($0, 1)])
-    LogicalTableScan(table=[[PCOLLECTION]])
-```
-
-And compiled as a composite `PTransform`
-
-```
-pCollection.apply(BeamSqlFilter...)
-           .apply(BeamSqlProject...)
-```
-
-# <a name="usage"></a>3. Usage of DSL APIs 
+# <a name="usage"></a>2. Usage of DSL APIs 
 `BeamSql` is the only interface(with two methods `BeamSql.query()` and 
`BeamSql.simpleQuery()`) for developers. It wraps the back-end details of 
parsing/validation/assembling, and deliver a Beam SDK style API that can take 
either simple TABLE_FILTER queries or complex queries containing JOIN/GROUP_BY 
etc. 
 
 *Note*, the two methods are equivalent in functionality, `BeamSql.query()` 
applies on a `PCollectionTuple` with one or many input `PCollection`s; 
`BeamSql.simpleQuery()` is a simplified API which applies on single 
`PCollection`.
@@ -83,12 +51,12 @@ In Step 1, a `PCollection<BeamRecord>` is prepared as the 
source dataset. The wo
 
 Step 2(Case 1) shows the usage to run a query with `BeamSql.simpleQuery()`, be 
aware that the input `PCollection` is named with a fixed table name 
__PCOLLECTION__. Step 2(Case 2) is another example to run a query with 
`BeamSql.query()`. A Table name is specified when adding `PCollection` to 
`PCollectionTuple`. As each call of either `BeamSql.query()` or 
`BeamSql.simpleQuery()` has its own schema repository, developers need to 
include all `PCollection`s that would be used in your query.
 
-# <a name="functionality"></a>4. Functionality in Beam SQL
+# <a name="functionality"></a>3. Functionality in Beam SQL
 Just as the unified model for both bounded and unbounded data in Beam, SQL DSL 
provides the same functionalities for bounded and unbounded `PCollection` as 
well. 
 
 Note that, SQL support is not fully completed. Queries that include 
unsupported features would cause an `UnsupportedOperationException`.
 
-## <a name="features"></a>4.1. Supported Features
+## <a name="features"></a>3.1. Supported Features
 The following features are supported in current repository:
 
 **1. filter clauses;**
@@ -217,7 +185,7 @@ PCollection<BeamSqlRow> result =
         BeamSql.simpleQuery(sql).withUdaf("squaresum", new SquareSum()));
 ```
  
-## <a name="beamsqlrow"></a>4.2. Intro of BeamRecord
+## <a name="beamsqlrow"></a>3.2. Intro of BeamRecord
 `BeamRecord`, described by `BeamRecordType`(extended `BeamRecordSqlType` in 
Beam SQL) and encoded/decoded by `BeamRecordCoder`, represents a single, 
immutable row in a Beam SQL `PCollection`. Similar as _row_ in relational 
database, each `BeamRecord` consists of named columns with fixed types(see 
[4.3. Data Types](#data-type)).
 
 A Beam SQL `PCollection` can be created from an external source, in-memory 
data or derive from another SQL query. For `PCollection`s from external source 
or in-memory data, it's required to specify coder explcitly; `PCollection` 
derived from SQL query has the coder set already. Below is one example:
@@ -234,7 +202,7 @@ PCollection<BeamRecord> inputTable = 
PBegin.in(p).apply(Create.of(row)
     .withCoder(type.getRecordCoder()));
 ```
  
-## <a name="data-type"></a>4.3. Data Types
+## <a name="data-type"></a>3.3. Data Types
 Each type in Beam SQL maps to a Java class to holds the value in `BeamRecord`. 
The following table lists the relation between SQL types and Java classes, 
which are supported in current repository:
 
 | SQL Type | Java class |
@@ -250,7 +218,7 @@ Each type in Beam SQL maps to a Java class to holds the 
value in `BeamRecord`. T
 | Types.TIMESTAMP | java.util.Date |
 {:.table}
 
-## <a name="built-in-functions"></a>4.4. built-in SQL functions
+## <a name="built-in-functions"></a>3.4. built-in SQL functions
 
 Beam SQL has implemented lots of build-in functions defined in [Apache 
Calcite](http://calcite.apache.org). The available functions are listed as 
below:
 
@@ -375,3 +343,39 @@ Beam SQL has implemented lots of build-in functions 
defined in [Apache Calcite](
 | MAX(value) | Returns the maximum value of value across all input values |
 | MIN(value) | Returns the minimum value of value across all input values |
 {:.table}
+
+# <a name="internal-of-sql"></a>4. The Internal of Beam SQL
+Figure 1 describes the back-end steps from a SQL statement to a Beam 
`PTransform`.
+
+![Workflow of Beam SQL DSL]({{ "/images/beam_sql_dsl_workflow.png" | prepend: 
site.baseurl }} "workflow of Beam SQL DSL")
+
+**Figure 1** workflow of Beam SQL DSL
+
+Given a `PCollection` and the query as input, first of all the input 
`PCollection` is registered as a table in the schema repository. Then it's 
processed as:
+
+1. SQL query is parsed according to grammar to generate a SQL Abstract Syntax 
Tree;
+2. Validate against table schema, and output a logical plan represented with 
relational algebras;
+3. Relational rules are applied to convert it to a physical plan, expressed 
with Beam components. An optimizer is optional to update the plan;
+4. Finally, the Beam physical plan is compiled as a composite `PTransform`;
+
+Here is an example to show a query that filters and projects from an input 
`PCollection`:
+
+```
+SELECT USER_ID, USER_NAME FROM PCOLLECTION WHERE USER_ID = 1
+```
+
+The logical plan is shown as:
+
+```
+LogicalProject(USER_ID=[$0], USER_NAME=[$1])
+  LogicalFilter(condition=[=($0, 1)])
+    LogicalTableScan(table=[[PCOLLECTION]])
+```
+
+And compiled as a composite `PTransform`
+
+```
+pCollection.apply(BeamSqlFilter...)
+           .apply(BeamSqlProject...)
+```
+

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" <commits@beam.apache.org>.

[beam-site] 09/10: remove post page move internal imple to the end of page

Reply via email to