[46/50] samza git commit: Clean up open-source documentation for Samza SQL

jagadish Tue, 27 Nov 2018 01:54:57 -0800

Clean up open-source documentation for Samza SQL


Project: http://git-wip-us.apache.org/repos/asf/samza/repo
Commit: http://git-wip-us.apache.org/repos/asf/samza/commit/072cff60
Tree: http://git-wip-us.apache.org/repos/asf/samza/tree/072cff60
Diff: http://git-wip-us.apache.org/repos/asf/samza/diff/072cff60

Branch: refs/heads/master
Commit: 072cff600b41aaac564aefebdea2b541c9f59277
Parents: 65f58bf
Author: Jagadish <jvenkatra...@linkedin.com>
Authored: Mon Nov 26 14:18:01 2018 -0800
Committer: Jagadish <jvenkatra...@linkedin.com>
Committed: Mon Nov 26 14:18:01 2018 -0800

----------------------------------------------------------------------
 .../documentation/versioned/api/samza-sql.md    | 103 +++++--------------
 1 file changed, 23 insertions(+), 80 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/samza/blob/072cff60/docs/learn/documentation/versioned/api/samza-sql.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/api/samza-sql.md 
b/docs/learn/documentation/versioned/api/samza-sql.md
index 7412f6c..0b488b3 100644
--- a/docs/learn/documentation/versioned/api/samza-sql.md
+++ b/docs/learn/documentation/versioned/api/samza-sql.md
@@ -20,127 +20,66 @@ title: Samza SQL
 -->
 
 
-# Overview
-Samza SQL allows users to write Stream processing application by just writing 
a SQL query. SQL query is translated to a Samza job using the high level API, 
which can then be run on all the environments (e.g. Yarn, standalone, etc..) 
that Samza currently supports.
+### Overview
+Samza SQL allows you to define your stream processing logic declaratively as a 
a SQL query.
+This allows you to create streaming pipelines without Java code or 
configuration unless you require user-defined functions (UDFs). Support for SQL 
internally uses Apache Calcite, which provides SQL parsing and query planning. 
The query is automatically translated to Samza's high level API and runs on 
Samza's execution engine.
 
-Samza SQL was created with following principles:
+You can run Samza SQL locally on your machine or on a YARN cluster.
 
-1. Users of Samza SQL should be able to create stream processing apps without 
needing to write Java code unless they require UDFs.
-2. No configs needed by users to develop and execute a Samza SQL job.
-3. Samza SQL should support all the runtimes and systems that Samza supports.  
 
+### Running Samza SQL on your local machine
+The [Samza SQL 
console](https://samza.apache.org/learn/tutorials/0.14/samza-tools.html) allows 
you to experiment with Samza SQL locally on your machine. 
 
-Samza SQL uses Apache Calcite to provide the SQL interface. Apache Calcite is 
a popular SQL engine used by wide variety of big data processing systems (Beam, 
Flink, Storm etc..)
+#### Setup Kafka
+Follow the instructions from the [Kafka 
quickstart](http://kafka.apache.org/quickstart) to start the zookeeper and 
Kafka server.
 
-# How to use Samza SQL
-There are couple of ways to use Samza SQL:
-
-* Run Samza SQL on your local machine.
-* Run Samza SQL on YARN.
-
-# Running Samza SQL on your local machine
-Samza SQL console tool documented 
[here](https://samza.apache.org/learn/tutorials/0.14/samza-tools.html) uses 
Samza standalone to run Samza SQL on your local machine. This is the quickest 
way to play with Samza SQL. Please follow the instructions 
[here](https://samza.apache.org/learn/tutorials/0.14/samza-tools.html) to get 
access to the Samza tools on your machine.
-
-## Start the Kafka server
-Please follow the instructions from the [Kafka 
quickstart](http://kafka.apache.org/quickstart) to start the zookeeper and 
Kafka server.
-
-## Create ProfileChangeStream Kafka topic
-The below sql statements require a topic named ProfileChangeStream to be 
created on the Kafka broker. You can follow the instructions in the Kafka quick 
start guide to create a topic named âProfileChangeStreamâ.
+Let us create a Kafka topic named âProfileChangeStreamâ for this demo.
 
 ```bash
 >./deploy/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 
 >--replication-factor 1 --partitions 1 --topic ProfileChangeStream
 ```
 
-## Generate events into ProfileChangeStream topic
-Use generate-kafka-events from Samza tools to generate events into the 
ProfileChangeStream
+Download the Samza tools package from 
[here](https://samza.apache.org/learn/tutorials/0.14/samza-tools.html) and use 
the `generate-kafka-events` script populate the stream with sample data.
 
 ```bash
 > cd samza-tools-<version>
 > ./scripts/generate-kafka-events.sh -t ProfileChangeStream -e ProfileChange
 ```
 
-## Using Samza SQL Console to run Samza sql on your local machine
+#### Using the Samza SQL Console
 
-Below are some of the sql queries that you can execute using the 
samza-sql-console tool from Samza tools package.
 
-This command just prints out all the events in the Kafka topic 
ProfileChangeStream into console output as a json serialized payload.
+The simplest SQL query is to read all events from a Kafka topic 
`ProfileChangeStream` and print them to the console.
 
 ```bash
 > ./scripts/samza-sql-console.sh --sql "insert into log.consoleoutput select * 
 > from kafka.ProfileChangeStream"
 ```
 
-This command prints out the fields that are selected into the console output 
as a json serialized payload.
+Next, let us project a few fields from the input stream.
 
 ```bash
 > ./scripts/samza-sql-console.sh --sql "insert into log.consoleoutput select 
 > Name, OldCompany, NewCompany from kafka.ProfileChangeStream"
 ```
 
-
-This command showcases the RegexMatch udf and filtering capabilities.
+You can also filter messages in the input stream based on some predicate. In 
this example, we filter profiles currently working at LinkedIn, whose previous 
employer matches the regex `.*soft`. The function `RegexMatch(regex, company)` 
is an example of 
+a UDF that defines a predicate. 
 
 ```bash
 > ./scripts/samza-sql-console.sh --sql "insert into log.consoleoutput select 
 > Name as __key__, Name, NewCompany, RegexMatch('.*soft', OldCompany) from 
 > kafka.ProfileChangeStream where NewCompany = 'LinkedIn'"
 ```
 
-Note: Samza sql console right now doesnât support queries that need state, 
for e.g. Stream-Table join, GroupBy and Stream-Stream joins.
-
-
-
 
-# Running Samza SQL on YARN
-The [hello-samza](https://github.com/apache/samza-hello-samza) project is an 
example project designed to help you run your first Samza application. It has 
examples of applications using the Low Level Task API, High Level Streams API 
as well as Samza SQL.
+### Running Samza SQL on YARN
+The [hello-samza](https://github.com/apache/samza-hello-samza) project has 
examples to get started with Samza on YARN. You can define your SQL query in a 
[configuration 
file](https://github.com/apache/samza-hello-samza/blob/master/src/main/config/pageview-filter-sql.properties)
 and submit it to a YARN cluster.
 
-This tutorial demonstrates a simple Samza application that uses SQL to perform 
stream processing.
-
-## Get the hello-samza code and start the grid
-Please follow the instructions from hello-samza-high-level-yarn on how to 
build the hello-samza repository and start the yarn grid.
-
-## Create the topic and generate Kafka events
-Please follow the steps in the section âCreate ProfileChangeStream Kafka 
topicâ and âGenerate events into ProfileChangeStream topicâ above.
-
-Build a Samza Application package
-Before you can run a Samza application, you need to build a package for it. 
Please follow the instructions from hello-samza-high-level-yarn on how to build 
the hello-samza application package.
-
-## Run a Samza Application
-After youâve built your Samza package, you can start the app on the grid 
using the run-app.sh script.
 
 ```bash
 > ./deploy/samza/bin/run-app.sh 
 > --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory 
 > --config-path=file://$PWD/deploy/samza/config/page-view-filter-sql.properties
 ```
 
-The app executes the following SQL command :
-
-
-```sql
-insert into kafka.NewLinkedInEmployees select Name from ProfileChangeStream 
where NewCompany = 'LinkedIn'
-```
-
-
-This SQL performs the following:
-
-* Consumes the Kafka topic ProfileChangeStream which contains the avro 
serialized ProfileChangeEvent(s)
-* Deserializes the events and filters out only the profile change events where 
NewCompany = âLinkedInâ i.e. Members who have moved to LinkedIn.
-* Writes the Avro serialized event that contains the Id and Name of those 
profiles to Kafka topic NewLinkedInEmployees.
-
-Give the job a minute to startup, and then tail the Kafka topic:
 
-```bash
-> ./deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper localhost:2181 
--topic NewLinkedInEmployees
-```
-
-Congratulations! Youâve now setup a local grid that includes YARN, Kafka, 
and ZooKeeper, and run a Samza SQL application on it.
-## Shutdown and cleanup
-To shutdown the app, use the same run-app.sh script with an extra 
âoperation=kill argument
-
-```bash
-> ./deploy/samza/bin/run-app.sh 
--config-factory=org.apache.samza.config.factories.PropertiesConfigFactory 
--config-path=file://$PWD/deploy/samza/config/page-view-filter-sql.properties 
--operation=kill
-```
-
-Please follow the instructions from Hello Samza High Level API - YARN 
Deployment on how to shutdown and cleanup the app.
-
-
-# SQL Grammar
-Following BNF grammar is based on Apache Caliciteâs SQL parser. It is the 
subset of the capabilities that Calcite supports.  
+### SQL Grammar
 
+Samza SQL's grammar is a subset of capabilities supported by Calcite's SQL 
parser.
 
 ```
 statement:
@@ -188,3 +127,7 @@ values:
   VALUES expression [, expression ]*
 
 ```
+### Known Limitations
+
+Samza SQL only supports simple stateless queries including selections and 
projections. We are actively working on supporting stateful operations such as 
aggregations, windows and joins.
+

[46/50] samza git commit: Clean up open-source documentation for Samza SQL

Reply via email to