This is an automated email from the ASF dual-hosted git repository.
zjffdu pushed a commit to branch branch-0.9
in repository https://gitbox.apache.org/repos/asf/zeppelin.git
The following commit(s) were added to refs/heads/branch-0.9 by this push:
new 3c33e23 [ZEPPELIN-5220]. Update flink turtorial note
3c33e23 is described below
commit 3c33e235566924e348d7f186bcde631f87aa280f
Author: Jeff Zhang <[email protected]>
AuthorDate: Mon Jan 25 14:32:51 2021 +0800
[ZEPPELIN-5220]. Update flink turtorial note
### What is this PR for?
Some update about flink tutorial note.
### What type of PR is it?
[Improvement]
### Todos
* [ ] - Task
### What is the Jira issue?
* https://issues.apache.org/jira/browse/ZEPPELIN-5220
### How should this be tested?
* No test
### Screenshots (if appropriate)
### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No
Author: Jeff Zhang <[email protected]>
Closes #4033 from zjffdu/ZEPPELIN-5220 and squashes the following commits:
ff3bed1c3 [Jeff Zhang] [ZEPPELIN-5220]. Update flink turtorial note
(cherry picked from commit f5eb4cec32e54ba40da492483fc5d9afbd376405)
Signed-off-by: Jeff Zhang <[email protected]>
---
.../Flink Tutorial/1. Flink Basics_2F2YS7PCE.zpln | 53 ++------
...ial Steps for Building Flink Job_2F7SKEHPA.zpln | 77 +++--------
.../3. Flink Job Control Tutorial_2F5RKHCDV.zpln | 58 ++------
.../Flink Tutorial/4. Streaming ETL_2EYD56B9B.zpln | 44 ++----
.../5. Streaming Data Analytics_2EYT7Q6R8.zpln | 151 ++++++++++++++-------
.../Flink Tutorial/6. Batch ETL_2EW19CSPA.zpln | 139 ++++---------------
.../7. Batch Data Analytics_2EZ9G3JJU.zpln | 92 ++-----------
.../8. Logistic Regression (Alink)_2F4HJNWVN.zpln | 71 ++--------
8 files changed, 205 insertions(+), 480 deletions(-)
diff --git a/notebook/Flink Tutorial/1. Flink Basics_2F2YS7PCE.zpln
b/notebook/Flink Tutorial/1. Flink Basics_2F2YS7PCE.zpln
index dec2c63..908d37e 100644
--- a/notebook/Flink Tutorial/1. Flink Basics_2F2YS7PCE.zpln
+++ b/notebook/Flink Tutorial/1. Flink Basics_2F2YS7PCE.zpln
@@ -2,9 +2,10 @@
"paragraphs": [
{
"title": "Introduction",
- "text": "%md\n\n# Introduction\n\n[Apache
Flink](https://flink.apache.org/) is a framework and distributed processing
engine for stateful computations over unbounded and bounded data streams. This
is Flink tutorial for running classical wordcount in both batch and streaming
mode. \n\nThere\u0027re 3 things you need to do before using flink in
Zeppelin.\n\n* Download [Flink 1.10](https://flink.apache.org/downloads.html)
for scala 2.11 (Only scala-2.11 is supported, scala-2.12 is not [...]
+ "text": "%md\n\n# Introduction\n\n[Apache
Flink](https://flink.apache.org/) is a framework and distributed processing
engine for stateful computations over unbounded and bounded data streams. This
is Flink tutorial for running classical wordcount in both batch and streaming
mode. \n\nThere\u0027re 3 things you need to do before using flink in
Zeppelin.\n\n* Download [Flink 1.10](https://flink.apache.org/downloads.html)
(Only 1.10 afterwards are supported) for scala 2.11 (Only scal [...]
"user": "anonymous",
- "dateUpdated": "2020-07-14 22:20:39.929",
+ "dateUpdated": "2021-01-25 14:14:41.500",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -25,23 +26,14 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "HTML",
- "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003ch1\u003eIntroduction\u003c/h1\u003e\n\u003cp\u003e\u003ca
href\u003d\"https://flink.apache.org/\"\u003eApache Flink\u003c/a\u003e is a
framework and distributed processing engine for stateful computations over
unbounded and bounded data streams. This is Flink tutorial for running
classical wordcount in both batch and streaming
mode.\u003c/p\u003e\n\u003cp\u003eThere\u0026rsquo;re 3 things you need to do
before using [...]
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1580997898536_-1239502599",
"id": "paragraph_1580997898536_-1239502599",
"dateCreated": "2020-02-06 22:04:58.536",
- "dateStarted": "2020-07-14 22:20:39.929",
- "dateFinished": "2020-07-14 22:20:39.954",
+ "dateStarted": "2021-01-25 14:14:41.500",
+ "dateFinished": "2021-01-25 14:14:41.515",
"status": "FINISHED"
},
{
@@ -49,6 +41,7 @@
"text": "%flink\n\nval data \u003d benv.fromElements(\"hello world\",
\"hello flink\", \"hello hadoop\")\ndata.flatMap(line \u003d\u003e
line.split(\"\\\\s\"))\n .map(w \u003d\u003e (w, 1))\n
.groupBy(0)\n .sum(1)\n .print()\n",
"user": "anonymous",
"dateUpdated": "2020-04-29 10:57:40.471",
+ "progress": 0,
"config": {
"colWidth": 6.0,
"fontSize": 9.0,
@@ -67,15 +60,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TEXT",
- "data": "\u001b[1m\u001b[34mdata\u001b[0m:
\u001b[1m\u001b[32morg.apache.flink.api.scala.DataSet[String]\u001b[0m \u003d
org.apache.flink.api.scala.DataSet@177908f2\n(flink,1)\n(hadoop,1)\n(hello,3)\n(world,1)\n"
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -91,6 +75,7 @@
"text": "%flink\n\nval data \u003d senv.fromElements(\"hello world\",
\"hello flink\", \"hello hadoop\")\ndata.flatMap(line \u003d\u003e
line.split(\"\\\\s\"))\n .map(w \u003d\u003e (w, 1))\n .keyBy(0)\n .sum(1)\n
.print\n\nsenv.execute()",
"user": "anonymous",
"dateUpdated": "2020-04-29 10:58:47.117",
+ "progress": 0,
"config": {
"colWidth": 6.0,
"fontSize": 9.0,
@@ -109,15 +94,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TEXT",
- "data": "\u001b[1m\u001b[34mdata\u001b[0m:
\u001b[1m\u001b[32morg.apache.flink.streaming.api.scala.DataStream[String]\u001b[0m
\u003d
org.apache.flink.streaming.api.scala.DataStream@614839e8\n\u001b[1m\u001b[34mres2\u001b[0m:
\u001b[1m\u001b[32morg.apache.flink.streaming.api.datastream.DataStreamSink[(String,
Int)]\u001b[0m \u003d
org.apache.flink.streaming.api.datastream.DataStreamSink@1ead6506\n(hello,1)\n(world,1)\n(hello,2)\n(flink,1)\n(hello,3)\n(hadoop,1)\n\u001b[1m\u00
[...]
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -133,6 +109,7 @@
"text": "%flink\n\nval data \u003d
benv.readTextFile(\"hdfs:///tmp/bank.csv\")\ndata.print()\n",
"user": "anonymous",
"dateUpdated": "2020-02-25 11:11:54.292",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -151,19 +128,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TEXT",
- "data": "\u001b[1m\u001b[34mdata\u001b[0m:
\u001b[1m\u001b[32morg.apache.flink.api.scala.DataSet[String]\u001b[0m \u003d
org.apache.flink.api.scala.DataSet@4721b6ff\n\"age\";\"job\";\"marital\";\"education\";\"default\";\"balance\";\"housing\";\"loan\";\"contact\";\"day\";\"month\";\"duration\";\"campaign\";\"pdays\";\"previous\";\"poutcome\";\"y\"\n30;\"unemployed\";\"married\";\"primary\";\"no\";1787;\"no\";\"no\";\"cellular\";19;\"oct\";79;1;-1;0;\"unknown\";\"no\"\n33;\"s
[...]
- },
- {
- "type": "HTML",
- "data": "\u003cdiv class\u003d\"result-alert alert-warning\"
role\u003d\"alert\"\u003e\u003cbutton type\u003d\"button\" class\u003d\"close\"
data-dismiss\u003d\"alert\" aria-label\u003d\"Close\"\u003e\u003cspan
aria-hidden\u003d\"true\"\u003e\u0026times;\u003c/span\u003e\u003c/button\u003e\u003cstrong\u003eOutput
is truncated\u003c/strong\u003e to 102400 bytes. Learn more about
\u003cstrong\u003eZEPPELIN_INTERPRETER_OUTPUT_LIMIT\u003c/strong\u003e\u003c/div\u003e"
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -178,6 +142,7 @@
"text": "%flink\n",
"user": "anonymous",
"dateUpdated": "2020-02-25 11:10:14.096",
+ "progress": 0,
"config": {},
"settings": {
"params": {},
diff --git a/notebook/Flink Tutorial/2. Three Essential Steps for Building
Flink Job_2F7SKEHPA.zpln b/notebook/Flink Tutorial/2. Three Essential Steps for
Building Flink Job_2F7SKEHPA.zpln
index aec2347..903f696 100644
--- a/notebook/Flink Tutorial/2. Three Essential Steps for Building Flink
Job_2F7SKEHPA.zpln
+++ b/notebook/Flink Tutorial/2. Three Essential Steps for Building Flink
Job_2F7SKEHPA.zpln
@@ -4,7 +4,8 @@
"title": "Introduction",
"text": "%md\n\n# Introduction\n\n\nTypically there\u0027re 3 essential
steps for building one flink job. And each step has its favorite tools.\n\n*
Define source/sink (SQL DDL)\n* Define data flow (Table Api / SQL)\n* Implement
business logic (UDF)\n\nThis tutorial demonstrates how to build one typical
flinkjob via these 3 steps and their favorite tools.\nIn this demo, we will do
real time analysis of cdn access data. First we read cdn access log from kafka
queue and do some proc [...]
"user": "anonymous",
- "dateUpdated": "2020-07-14 22:22:38.673",
+ "dateUpdated": "2021-01-25 14:25:27.359",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -25,23 +26,14 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "HTML",
- "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003ch1\u003eIntroduction\u003c/h1\u003e\n\u003cp\u003eTypically
there\u0026rsquo;re 3 essential steps for building one flink job. And each
step has its favorite
tools.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eDefine source/sink (SQL
DDL)\u003c/li\u003e\n\u003cli\u003eDefine data flow (Table Api /
SQL)\u003c/li\u003e\n\u003cli\u003eImplement business logic
(UDF)\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThis tuto [...]
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1587965294481_785664297",
"id": "paragraph_1587965294481_785664297",
"dateCreated": "2020-04-27 13:28:14.481",
- "dateStarted": "2020-07-14 22:22:38.673",
- "dateFinished": "2020-07-14 22:22:38.688",
+ "dateStarted": "2021-01-25 14:25:27.360",
+ "dateFinished": "2021-01-25 14:25:30.018",
"status": "FINISHED"
},
{
@@ -49,6 +41,7 @@
"text": "%flink.conf\n\n# This example use kafka as source and mysql as
sink, so you need to specify flink kafka connector and flink jdbc connector
first.\n\nflink.execution.packages\torg.apache.flink:flink-connector-kafka_2.11:1.10.0,org.apache.flink:flink-connector-kafka-base_2.11:1.10.0,org.apache.flink:flink-json:1.10.0,org.apache.flink:flink-jdbc_2.11:1.10.0,mysql:mysql-connector-java:5.1.38\n\n#
Set taskmanager.memory.segment-size to be the smallest value just for this
demo, [...]
"user": "anonymous",
"dateUpdated": "2020-04-29 15:07:01.902",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -67,10 +60,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": []
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -86,6 +75,7 @@
"text": "%flink.ssql\n\nDROP table if exists cdn_access_log;\n\nCREATE
TABLE cdn_access_log (\n uuid VARCHAR,\n client_ip VARCHAR,\n request_time
BIGINT,\n response_size BIGINT,\n uri VARCHAR,\n event_ts BIGINT\n) WITH (\n
\u0027connector.type\u0027 \u003d \u0027kafka\u0027,\n
\u0027connector.version\u0027 \u003d \u0027universal\u0027,\n
\u0027connector.topic\u0027 \u003d \u0027cdn_events\u0027,\n
\u0027connector.properties.zookeeper.connect\u0027 \u003d
\u0027localhost:2181\u0027, [...]
"user": "anonymous",
"dateUpdated": "2020-04-29 15:07:03.989",
+ "progress": 0,
"config": {
"colWidth": 6.0,
"fontSize": 9.0,
@@ -104,15 +94,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TEXT",
- "data": "Table has been dropped.\nTable has been created.\n"
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -128,6 +109,7 @@
"text": "%flink.ssql\n\nDROP table if exists cdn_access_statistic;\n\n--
Please create this mysql table first in your mysql instance. Flink won\u0027t
create mysql table for you.\n\nCREATE TABLE cdn_access_statistic (\n province
VARCHAR,\n access_count BIGINT,\n total_download BIGINT,\n download_speed
DOUBLE\n) WITH (\n \u0027connector.type\u0027 \u003d \u0027jdbc\u0027,\n
\u0027connector.url\u0027 \u003d
\u0027jdbc:mysql://localhost:3306/flink_cdn\u0027,\n \u0027connector.table\u0
[...]
"user": "anonymous",
"dateUpdated": "2020-04-29 15:07:05.522",
+ "progress": 0,
"config": {
"colWidth": 6.0,
"fontSize": 9.0,
@@ -146,15 +128,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TEXT",
- "data": "Table has been dropped.\nTable has been created.\n"
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -170,6 +143,7 @@
"text": "%flink.ipyflink\n\nimport re\nimport json\nfrom pyflink.table
import DataTypes\nfrom pyflink.table.udf import udf\nfrom urllib.parse import
quote_plus\nfrom urllib.request import urlopen\n\n# This UDF is to convert ip
address to country. Here we simply the logic just for demo
purpose.\n\n@udf(input_types\u003d[DataTypes.STRING()],
result_type\u003dDataTypes.STRING())\ndef ip_to_country(ip):\n\n countries
\u003d [\u0027USA\u0027, \u0027China\u0027, \u0027Japan\u0027, \u0 [...]
"user": "anonymous",
"dateUpdated": "2020-04-29 15:07:24.842",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -188,10 +162,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": []
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -207,6 +177,7 @@
"text": "%flink.bsql\n\nselect ip_to_country(\u00272.10.01.1\u0027)",
"user": "anonymous",
"dateUpdated": "2020-04-29 14:17:35.942",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -253,15 +224,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TABLE",
- "data": "EXPR$0\nChina\n"
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -277,6 +239,7 @@
"text": "%flink.ipyflink\n\nt \u003d
st_env.from_path(\"cdn_access_log\")\\\n .select(\"uuid, \"\n
\"ip_to_country(client_ip) as country, \" \n \"response_size,
request_time\")\\\n .group_by(\"country\")\\\n .select(\n \"country,
count(uuid) as access_count, \" \n \"sum(response_size) as
total_download, \" \n \"sum(response_size) * 1.0 / sum(request_time)
as download_speed\")\n #.insert_into(\"cdn_access_statistic\")\n\n# z.show
[...]
"user": "anonymous",
"dateUpdated": "2020-04-29 14:19:00.530",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -367,6 +330,7 @@
"text": "%flink\n\nval t \u003d stenv.from(\"cdn_access_log\")\n
.select(\"uuid, ip_to_country(client_ip) as country, response_size,
request_time\")\n .groupBy(\"country\")\n .select( \"country, count(uuid)
as access_count, sum(response_size) as total_download, sum(response_size) *
1.0 / sum(request_time) as download_speed\")\n
//.insertInto(\"cdn_access_statistic\")\n\n// z.show will display the result in
zeppelin front end in table format, you can uncomment the above ins [...]
"user": "anonymous",
"dateUpdated": "2020-04-29 15:14:29.646",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -456,6 +420,7 @@
"text": "%flink.ssql\n\ninsert into cdn_access_statistic\nselect
ip_to_country(client_ip) as country, \n count(uuid) as access_count, \n
sum(response_size) as total_download ,\n sum(response_size) * 1.0 /
sum(request_time) as download_speed\nfrom cdn_access_log\n group by
ip_to_country(client_ip)\n",
"user": "anonymous",
"dateUpdated": "2020-04-29 15:12:26.431",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -489,7 +454,8 @@
{
"text": "%md\n\n# Query sink table via jdbc interpreter\n\nYou can also
query the sink table (mysql) directly via jdbc interpreter. Here I will create
a jdbc interpreter named `mysql` and use it to query the sink table. Regarding
how to connect mysql in Zeppelin, refer this
[link](http://zeppelin.apache.org/docs/0.9.0-preview1/interpreter/jdbc.html#mysql)",
"user": "anonymous",
- "dateUpdated": "2020-07-14 22:23:28.657",
+ "dateUpdated": "2021-01-25 14:25:41.163",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -509,23 +475,14 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "HTML",
- "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003ch1\u003eQuery sink table via jdbc
interpreter\u003c/h1\u003e\n\u003cp\u003eYou can also query the sink table
(mysql) directly via jdbc interpreter. Here I will create a jdbc interpreter
named \u003ccode\u003emysql\u003c/code\u003e and use it to query the sink
table. Regarding how to connect mysql in Zeppelin, refer this \u003ca
href\u003d\"http://zeppelin.apache.org/docs/0.9.0-preview1/interpreter/jdbc.html#mysql\"\
[...]
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1587976725546_2073084823",
"id": "paragraph_1587976725546_2073084823",
"dateCreated": "2020-04-27 16:38:45.548",
- "dateStarted": "2020-07-14 22:23:28.657",
- "dateFinished": "2020-07-14 22:23:28.670",
+ "dateStarted": "2021-01-25 14:25:41.162",
+ "dateFinished": "2021-01-25 14:25:41.175",
"status": "FINISHED"
},
{
@@ -533,6 +490,7 @@
"text": "%mysql\n\nselect * from flink_cdn.cdn_access_statistic",
"user": "anonymous",
"dateUpdated": "2020-04-29 15:22:13.333",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -630,6 +588,7 @@
"text": "%flink.ipyflink\n",
"user": "anonymous",
"dateUpdated": "2020-04-27 16:38:45.464",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
diff --git a/notebook/Flink Tutorial/3. Flink Job Control
Tutorial_2F5RKHCDV.zpln b/notebook/Flink Tutorial/3. Flink Job Control
Tutorial_2F5RKHCDV.zpln
index 9a70026..e533972 100644
--- a/notebook/Flink Tutorial/3. Flink Job Control Tutorial_2F5RKHCDV.zpln
+++ b/notebook/Flink Tutorial/3. Flink Job Control Tutorial_2F5RKHCDV.zpln
@@ -4,7 +4,8 @@
"title": "Introduction",
"text": "%md\n\n# Introduction\n\nThis tutorial is to demonstrate how to
do job control in flink (job submission/cancel/resume).\n2 steps:\n1. Create
custom data stream and register it as flink table. The custom data stream is a
simulated web access logs. \n2. Query this flink table (pv for each page type),
you can cancel it and then resume it again w/o savepoint.\n",
"user": "anonymous",
- "dateUpdated": "2020-07-14 22:24:11.629",
+ "dateUpdated": "2021-01-25 14:25:48.903",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -25,23 +26,14 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "HTML",
- "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003ch1\u003eIntroduction\u003c/h1\u003e\n\u003cp\u003eThis
tutorial is to demonstrate how to do job control in flink (job
submission/cancel/resume).\u003cbr /\u003e\n2
steps:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eCreate custom data stream
and register it as flink table. The custom data stream is a simulated web
access logs.\u003c/li\u003e\n\u003cli\u003eQuery this flink table (pv for each
page type), you can canc [...]
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1587964310955_443124874",
"id": "paragraph_1587964310955_443124874",
"dateCreated": "2020-04-27 13:11:50.955",
- "dateStarted": "2020-07-14 22:24:11.630",
- "dateFinished": "2020-07-14 22:24:11.642",
+ "dateStarted": "2021-01-25 14:25:48.903",
+ "dateFinished": "2021-01-25 14:25:48.916",
"status": "FINISHED"
},
{
@@ -49,6 +41,7 @@
"text": "%flink \n\nimport
org.apache.flink.streaming.api.functions.source.SourceFunction\nimport
org.apache.flink.table.api.TableEnvironment\nimport
org.apache.flink.streaming.api.TimeCharacteristic\nimport
org.apache.flink.streaming.api.checkpoint.ListCheckpointed\nimport
java.util.Collections\nimport
scala.collection.JavaConversions._\n\nsenv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)\nsenv.enableCheckpointing(1000)\n\nval
data \u003d senv.addSource(new SourceFunc [...]
"user": "anonymous",
"dateUpdated": "2020-04-29 14:21:46.802",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -67,15 +60,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TEXT",
- "data": "import
org.apache.flink.streaming.api.functions.source.SourceFunction\nimport
org.apache.flink.table.api.TableEnvironment\nimport
org.apache.flink.streaming.api.TimeCharacteristic\nimport
org.apache.flink.streaming.api.checkpoint.ListCheckpointed\nimport
java.util.Collections\nimport
scala.collection.JavaConversions._\n\u001b[1m\u001b[34mres1\u001b[0m:
\u001b[1m\u001b[32morg.apache.flink.streaming.api.scala.StreamExecutionEnvironment\u001b[0m
\u003d org.apache.flink. [...]
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -91,6 +75,7 @@
"text": "%flink.ssql(type\u003dupdate)\n\nselect url, count(1) as c from
log group by url",
"user": "anonymous",
"dateUpdated": "2020-04-29 14:22:40.594",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -176,6 +161,7 @@
"text":
"%flink.ssql(type\u003dupdate,parallelism\u003d2,maxParallelism\u003d10,savepointDir\u003d/tmp/flink_a)\n\nselect
url, count(1) as pv from log group by url",
"user": "anonymous",
"dateUpdated": "2020-04-29 14:24:26.768",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -229,15 +215,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TABLE",
- "data": "url\tpv\nhome\t500\nproduct\t1500\nsearch\t1000\n"
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -253,6 +230,7 @@
"text":
"%flink(parallelism\u003d1,maxParallelism\u003d10,savepointDir\u003d/tmp/flink_b)\n\nval
table \u003d stenv.sqlQuery(\"select url, count(1) as pv from log group by
url\")\n\nz.show(table, streamType\u003d\"update\")\n",
"user": "anonymous",
"dateUpdated": "2020-04-15 16:11:54.114",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -304,15 +282,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TABLE",
- "data": "url\tpv\nhome\t34\nproduct\t100\nsearch\t68\n"
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -328,6 +297,7 @@
"text":
"%flink.ipyflink(parallelism\u003d1,maxParallelism\u003d10,savepointDir\u003d/tmp/flink_c)\n\ntable
\u003d st_env.sql_query(\"select url, count(1) as pv from log group by
url\")\n\nz.show(table, stream_type\u003d\"update\")",
"user": "anonymous",
"dateUpdated": "2020-04-27 14:45:28.045",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -378,15 +348,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TABLE",
- "data": "url\tpv\nhome\t17\nproduct\t49\nsearch\t34\n"
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -401,6 +362,7 @@
"text": "%flink.ssql\n",
"user": "anonymous",
"dateUpdated": "2020-04-18 19:04:17.969",
+ "progress": 0,
"config": {},
"settings": {
"params": {},
diff --git a/notebook/Flink Tutorial/4. Streaming ETL_2EYD56B9B.zpln
b/notebook/Flink Tutorial/4. Streaming ETL_2EYD56B9B.zpln
index 11501b4..9e8b92f 100644
--- a/notebook/Flink Tutorial/4. Streaming ETL_2EYD56B9B.zpln
+++ b/notebook/Flink Tutorial/4. Streaming ETL_2EYD56B9B.zpln
@@ -4,7 +4,8 @@
"title": "Overview",
"text": "%md\n\nThis tutorial demonstrate how to use Flink do streaming
processing via its streaming sql + udf. In this tutorial, we read data from
kafka queue and do some simple processing (just filtering here) and then write
it back to another kafka queue. We use this
[docker](https://zeppelin-kafka-connect-datagen.readthedocs.io/en/latest/) to
create kafka cluster and source data \n\n* Make sure you add the following ip
host name mapping to your hosts file, otherwise you may not [...]
"user": "anonymous",
- "dateUpdated": "2020-04-27 13:44:26.806",
+ "dateUpdated": "2021-01-25 14:26:04.106",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -27,23 +28,14 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "HTML",
- "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThis tutorial demonstrate how
to use Flink do streaming processing via its streaming sql + udf. In this
tutorial, we read data from kafka queue and do some simple processing (just
filtering here) and then write it back to another kafka queue. We use this
\u003ca
href\u003d\"https://zeppelin-kafka-connect-datagen.readthedocs.io/en/latest/\"\u003edocker\u003c/a\u003e
to create kafka cluster and source data\u003 [...]
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1579054287919_-61477360",
"id": "paragraph_1579054287919_-61477360",
"dateCreated": "2020-01-15 10:11:27.919",
- "dateStarted": "2020-04-27 11:49:46.116",
- "dateFinished": "2020-04-27 11:49:46.128",
+ "dateStarted": "2021-01-25 14:26:04.106",
+ "dateFinished": "2021-01-25 14:26:04.118",
"status": "FINISHED"
},
{
@@ -51,6 +43,7 @@
"text": "%flink.conf\n\n# You need to run this paragraph first before
running any flink
code.\n\nflink.execution.packages\torg.apache.flink:flink-connector-kafka_2.11:1.10.0,org.apache.flink:flink-connector-kafka-base_2.11:1.10.0,org.apache.flink:flink-json:1.10.0",
"user": "anonymous",
"dateUpdated": "2020-04-29 15:45:27.361",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -69,10 +62,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": []
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -88,6 +77,7 @@
"text": "%flink.ssql\n\nDROP TABLE IF EXISTS source_kafka;\n\nCREATE
TABLE source_kafka (\n status STRING,\n direction STRING,\n event_ts
BIGINT\n) WITH (\n \u0027connector.type\u0027 \u003d \u0027kafka\u0027,
\n \u0027connector.version\u0027 \u003d \u0027universal\u0027,\n
\u0027connector.topic\u0027 \u003d \u0027generated.events\u0027,\n
\u0027connector.startup-mode\u0027 \u003d \u0027earliest-offset\u0027,\n
\u0027connector.properties.zookeeper.connect\u0027 [...]
"user": "anonymous",
"dateUpdated": "2020-04-29 15:45:29.234",
+ "progress": 0,
"config": {
"colWidth": 6.0,
"fontSize": 9.0,
@@ -108,15 +98,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TEXT",
- "data": "Table has been dropped.\nTable has been created.\n"
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -132,6 +113,7 @@
"text": "%flink.ssql\n\nDROP TABLE IF EXISTS sink_kafka;\n\nCREATE TABLE
sink_kafka (\n status STRING,\n direction STRING,\n event_ts
TIMESTAMP(3),\n WATERMARK FOR event_ts AS event_ts - INTERVAL \u00275\u0027
SECOND\n) WITH (\n \u0027connector.type\u0027 \u003d \u0027kafka\u0027,
\n \u0027connector.version\u0027 \u003d \u0027universal\u0027, \n
\u0027connector.topic\u0027 \u003d \u0027generated.events2\u0027,\n
\u0027connector.properties.zookeeper.connect [...]
"user": "anonymous",
"dateUpdated": "2020-04-29 15:45:30.663",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -152,15 +134,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TEXT",
- "data": "Table has been dropped.\nTable has been created.\n"
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -176,6 +149,7 @@
"text": "%flink.ssql\n\ninsert into sink_kafka select status, direction,
cast(event_ts/1000000000 as timestamp(3)) from source_kafka where status
\u003c\u003e \u0027foo\u0027\n",
"user": "anonymous",
"dateUpdated": "2020-04-29 15:45:43.388",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -212,6 +186,7 @@
"text": "%flink.ssql(type\u003dupdate)\n\nselect * from sink_kafka order
by event_ts desc limit 10;",
"user": "anonymous",
"dateUpdated": "2020-04-29 15:28:01.122",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -325,6 +300,7 @@
"text": "%flink.ssql\n",
"user": "anonymous",
"dateUpdated": "2020-04-29 15:27:31.430",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
diff --git a/notebook/Flink Tutorial/5. Streaming Data Analytics_2EYT7Q6R8.zpln
b/notebook/Flink Tutorial/5. Streaming Data Analytics_2EYT7Q6R8.zpln
index 5fcbaf9..6d94d14 100644
--- a/notebook/Flink Tutorial/5. Streaming Data Analytics_2EYT7Q6R8.zpln
+++ b/notebook/Flink Tutorial/5. Streaming Data Analytics_2EYT7Q6R8.zpln
@@ -4,7 +4,8 @@
"title": "Overview",
"text": "%md\n\nThis tutorial demonstrate how to use Flink do streaming
analytics via its streaming sql + udf. Zeppelin now support 3 kinds of
streaming visualization.\n\n* Single - Single mode is for the case when the
result of sql statement is always one row, such as the following example\n*
Update - Update mode is suitable for the case when the output is more than one
rows, and always will be updated continuously. \n* Append - Append mode is
suitable for the scenario where ou [...]
"user": "anonymous",
- "dateUpdated": "2020-04-29 15:31:49.376",
+ "dateUpdated": "2021-01-25 14:26:13.376",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -27,30 +28,54 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "HTML",
- "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThis tutorial demonstrate how
to use Flink do streaming analytics via its streaming sql + udf. Zeppelin now
support 3 kinds of streaming
visualization.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eSingle - Single
mode is for the case when the result of sql statement is always one row, such
as the following example\u003c/li\u003e\n\u003cli\u003eUpdate - Update mode is
suitable for the case when the output is [...]
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1579054784565_2122156822",
"id": "paragraph_1579054784565_2122156822",
"dateCreated": "2020-01-15 10:19:44.565",
- "dateStarted": "2020-04-29 15:31:49.379",
- "dateFinished": "2020-04-29 15:31:49.549",
+ "dateStarted": "2021-01-25 14:26:13.376",
+ "dateFinished": "2021-01-25 14:26:13.394",
+ "status": "FINISHED"
+ },
+ {
+ "text": "%flink \n\nimport
org.apache.flink.streaming.api.functions.source.SourceFunction\nimport
org.apache.flink.table.api.TableEnvironment\nimport
org.apache.flink.streaming.api.TimeCharacteristic\nimport
org.apache.flink.streaming.api.checkpoint.ListCheckpointed\nimport
java.util.Collections\nimport
scala.collection.JavaConversions._\n\nsenv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)\nsenv.enableCheckpointing(1000)\n\nval
data \u003d senv.addSource(new SourceFunc [...]
+ "user": "anonymous",
+ "dateUpdated": "2021-01-25 14:27:02.042",
+ "progress": 0,
+ "config": {
+ "editorSetting": {
+ "language": "scala",
+ "editOnDblClick": false,
+ "completionKey": "TAB",
+ "completionSupport": true
+ },
+ "colWidth": 12.0,
+ "editorMode": "ace/mode/scala",
+ "fontSize": 9.0,
+ "results": {},
+ "enabled": true
+ },
+ "settings": {
+ "params": {},
+ "forms": {}
+ },
+ "apps": [],
+ "runtimeInfos": {},
+ "progressUpdateIntervalMs": 500,
+ "jobName": "paragraph_1611556011274_1848600588",
+ "id": "paragraph_1611556011274_1848600588",
+ "dateCreated": "2021-01-25 14:26:51.274",
+ "dateStarted": "2021-01-25 14:27:02.045",
+ "dateFinished": "2021-01-25 14:27:36.849",
"status": "FINISHED"
},
{
"title": "Single row mode of Output",
- "text": "%flink.ssql(type\u003dsingle, parallelism\u003d1,
refreshInterval\u003d3000, template\u003d\u003ch1\u003e{1}\u003c/h1\u003e until
\u003ch2\u003e{0}\u003c/h2\u003e)\n\nselect max(event_ts), count(1) from
sink_kafka\n",
+ "text":
"%flink.ssql(type\u003dsingle,parallelism\u003d1,refreshInterval\u003d1000,template\u003d\u003ch1\u003e{1}\u003c/h1\u003e
until \u003ch2\u003e{0}\u003c/h2\u003e)\n\nselect max(rowtime),count(1) from
log\n",
"user": "anonymous",
- "dateUpdated": "2020-04-29 15:45:47.116",
+ "dateUpdated": "2021-01-25 14:28:01.427",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -69,27 +94,42 @@
"template": "\u003ch1\u003e{1}\u003c/h1\u003e until
\u003ch2\u003e{0}\u003c/h2\u003e",
"refreshInterval": "3000",
"parallelism": "1",
- "type": "single"
+ "type": "single",
+ "latest_checkpoint_path":
"\u003ccheckpoint-not-externally-addressable\u003e"
},
"settings": {
"params": {},
"forms": {}
},
"apps": [],
- "runtimeInfos": {},
+ "runtimeInfos": {
+ "jobUrl": {
+ "propertyName": "jobUrl",
+ "label": "FLINK JOB",
+ "tooltip": "View in Flink web UI",
+ "group": "flink",
+ "values": [
+ {
+ "jobUrl":
"http://localhost:8081#/job/1086e2a18ab8e8402b8969412c9b0e8e"
+ }
+ ],
+ "interpreterSettingId": "flink"
+ }
+ },
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1578909960516_-1812187661",
"id": "paragraph_1578909960516_-1812187661",
"dateCreated": "2020-01-13 18:06:00.516",
- "dateStarted": "2020-04-29 15:45:47.127",
- "dateFinished": "2020-04-29 15:47:14.738",
+ "dateStarted": "2021-01-25 14:28:01.430",
+ "dateFinished": "2021-01-25 14:30:03.648",
"status": "ABORT"
},
{
"title": "Update mode of Output",
- "text": "%flink.ssql(type\u003dupdate, refreshInterval\u003d2000,
parallelism\u003d1)\n\nselect status, count(1) as pv from sink_kafka group by
status",
+ "text":
"%flink.ssql(type\u003dupdate,parallelism\u003d1,refreshInterval\u003d2000)\n\nselect
url,count(1) as pv from log group by url\n",
"user": "anonymous",
- "dateUpdated": "2020-04-29 15:46:09.485",
+ "dateUpdated": "2021-01-25 14:28:48.633",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -114,7 +154,7 @@
"commonSetting": {},
"keys": [
{
- "name": "status",
+ "name": "url",
"index": 0.0,
"aggr": "sum"
}
@@ -142,27 +182,42 @@
"parallelism": "1",
"type": "update",
"savepointDir": "/tmp/flink_2",
- "editorHide": false
+ "editorHide": false,
+ "latest_checkpoint_path":
"\u003ccheckpoint-not-externally-addressable\u003e"
},
"settings": {
"params": {},
"forms": {}
},
"apps": [],
- "runtimeInfos": {},
+ "runtimeInfos": {
+ "jobUrl": {
+ "propertyName": "jobUrl",
+ "label": "FLINK JOB",
+ "tooltip": "View in Flink web UI",
+ "group": "flink",
+ "values": [
+ {
+ "jobUrl":
"http://localhost:8081#/job/026f68085b98b16a9a677c2b1ba47bb3"
+ }
+ ],
+ "interpreterSettingId": "flink"
+ }
+ },
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1578910004762_-286113604",
"id": "paragraph_1578910004762_-286113604",
"dateCreated": "2020-01-13 18:06:44.762",
- "dateStarted": "2020-04-29 15:46:09.492",
- "dateFinished": "2020-04-29 15:47:13.312",
+ "dateStarted": "2021-01-25 14:28:48.637",
+ "dateFinished": "2021-01-25 14:30:01.879",
"status": "ABORT"
},
{
"title": "Append mode of Output",
- "text": "%flink.ssql(type\u003dappend, parallelism\u003d1,
refreshInterval\u003d2000, threshold\u003d60000)\n\nselect
TUMBLE_START(event_ts, INTERVAL \u00275\u0027 SECOND) as start_time, status,
count(1) from sink_kafka\ngroup by TUMBLE(event_ts, INTERVAL \u00275\u0027
SECOND), status\n",
+ "text":
"%flink.ssql(type\u003dappend,parallelism\u003d1,refreshInterval\u003d2000,threshold\u003d60000)\n\nselect
TUMBLE_START(rowtime,INTERVAL \u00275\u0027 SECOND) start_time,url,count(1) as
pv from log\ngroup by TUMBLE(rowtime,INTERVAL \u00275\u0027 SECOND),url\n",
"user": "anonymous",
- "dateUpdated": "2020-04-29 15:46:40.174",
+ "dateUpdated": "2021-01-25 14:29:33.485",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -194,14 +249,14 @@
],
"groups": [
{
- "name": "status",
+ "name": "url",
"index": 1.0,
"aggr": "sum"
}
],
"values": [
{
- "name": "EXPR$2",
+ "name": "pv",
"index": 2.0,
"aggr": "sum"
}
@@ -221,39 +276,28 @@
"parallelism": "1",
"threshold": "60000",
"type": "append",
- "savepointDir": "/tmp/flink_3"
+ "savepointDir": "/tmp/flink_3",
+ "latest_checkpoint_path":
"\u003ccheckpoint-not-externally-addressable\u003e"
},
"settings": {
"params": {},
"forms": {}
},
- "results": {
- "code": "ERROR",
- "msg": [
- {
- "type": "TABLE",
- "data": "start_time\tstatus\tEXPR$2\n2020-04-29
07:46:20.0\tbaz\t3\n2020-04-29 07:46:20.0\tbar\t3\n2020-04-29
07:46:25.0\tbaz\t2\n2020-04-29 07:46:25.0\tbar\t4\n2020-04-29
07:46:30.0\tbar\t4\n2020-04-29 07:46:30.0\tbaz\t2\n2020-04-29
07:46:35.0\tbar\t4\n2020-04-29 07:46:40.0\tbar\t5\n2020-04-29
07:46:40.0\tbaz\t2\n2020-04-29 07:46:45.0\tbar\t4\n2020-04-29
07:46:45.0\tbaz\t4\n2020-04-29 07:46:50.0\tbar\t4\n2020-04-29
07:46:50.0\tbaz\t3\n2020-04-29 07:46:55.0\tbaz\t3\n2020-04-2 [...]
- },
- {
- "type": "TEXT",
- "data": "Fail to run sql command: select TUMBLE_START(event_ts,
INTERVAL \u00275\u0027 SECOND) as start_time, status, count(1) from
sink_kafka\ngroup by TUMBLE(event_ts, INTERVAL \u00275\u0027 SECOND),
status\njava.io.IOException: Fail to run stream sql job\n\tat
org.apache.zeppelin.flink.sql.AbstractStreamSqlJob.run(AbstractStreamSqlJob.java:166)\n\tat
org.apache.zeppelin.flink.sql.AbstractStreamSqlJob.run(AbstractStreamSqlJob.java:101)\n\tat
org.apache.zeppelin.flink.FlinkS [...]
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1578910016872_1942851900",
"id": "paragraph_1578910016872_1942851900",
"dateCreated": "2020-01-13 18:06:56.872",
- "dateStarted": "2020-04-29 15:46:18.958",
- "dateFinished": "2020-04-29 15:47:22.960",
+ "dateStarted": "2021-01-25 14:29:17.122",
+ "dateFinished": "2021-01-25 14:29:53.805",
"status": "ABORT"
},
{
"text": "%flink.ssql\n",
"user": "anonymous",
"dateUpdated": "2020-01-13 21:17:35.739",
+ "progress": 0,
"config": {},
"settings": {
"params": {},
@@ -274,7 +318,22 @@
"version": "0.9.0-SNAPSHOT",
"noteParams": {},
"noteForms": {},
- "angularObjects": {},
+ "angularObjects": {
+ "flink-shared_process": [
+ {
+ "name": "value_0",
+ "object": "2019-12-31 16:01:44.1",
+ "noteId": "2EYT7Q6R8",
+ "paragraphId": "paragraph_1578909960516_-1812187661"
+ },
+ {
+ "name": "value_1",
+ "object": "1042",
+ "noteId": "2EYT7Q6R8",
+ "paragraphId": "paragraph_1578909960516_-1812187661"
+ }
+ ]
+ },
"config": {
"isZeppelinNotebookCronEnable": false
},
diff --git a/notebook/Flink Tutorial/6. Batch ETL_2EW19CSPA.zpln
b/notebook/Flink Tutorial/6. Batch ETL_2EW19CSPA.zpln
index 40c8cc2..93f11bb 100644
--- a/notebook/Flink Tutorial/6. Batch ETL_2EW19CSPA.zpln
+++ b/notebook/Flink Tutorial/6. Batch ETL_2EW19CSPA.zpln
@@ -4,7 +4,8 @@
"title": "Overview",
"text": "%md\n\nThis tutorial demonstrates how to use Flink do batch ETL
via its batch sql + udf (scala, python \u0026 hive). Here\u0027s what we do in
this tutorial\n\n* Download
[bank](https://archive.ics.uci.edu/ml/datasets/bank+marketing) data via shell
interpreter to local\n* Process the raw data via flink batch sql \u0026 scala
udf which parse and clean the raw data\n* Write the structured and cleaned data
to another flink table via sql\n",
"user": "anonymous",
- "dateUpdated": "2020-04-29 16:08:52.383",
+ "dateUpdated": "2021-01-25 14:26:19.720",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -27,22 +28,14 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "HTML",
- "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThis tutorial demonstrates how
to use Flink do batch ETL via its batch sql + udf (scala, python \u0026amp;
hive). Here\u0026rsquo;s what we do in this
tutorial\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eDownload \u003ca
href\u003d\"https://archive.ics.uci.edu/ml/datasets/bank+marketing\"\u003ebank\u003c/a\u003e
data via shell interpreter to local\u003c/li\u003e\n\u003cli\u003eProcess the
raw data via flink [...]
- }
- ]
- },
"apps": [],
+ "runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1579052523153_721650872",
"id": "paragraph_1579052523153_721650872",
"dateCreated": "2020-01-15 09:42:03.156",
- "dateStarted": "2020-04-27 11:41:03.630",
- "dateFinished": "2020-04-27 11:41:04.735",
+ "dateStarted": "2021-01-25 14:26:19.720",
+ "dateFinished": "2021-01-25 14:26:19.732",
"status": "FINISHED"
},
{
@@ -50,6 +43,7 @@
"text": "%sh\n\ncd /tmp\nrm -rf bank*\nwget
https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip\nunzip
bank.zip\n# upload data to hdfs if you want to run it in yarn mode\n# hadoop fs
-put /tmp/bank.csv /tmp/bank.csv\n",
"user": "anonymous",
"dateUpdated": "2020-05-05 22:52:51.576",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -70,16 +64,8 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TEXT",
- "data": "--2020-05-05 22:52:51--
https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip\nResolving
archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252\nConnecting to
archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443...
connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 579043
(565K) [application/x-httpd-php]\nSaving to: ‘bank.zip’\n\n 0K ..........
.......... .......... .......... .......... 8% 105K 5s\n 50K ... [...]
- }
- ]
- },
"apps": [],
+ "runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1578045094400_1030344935",
"id": "paragraph_1578045094400_1030344935",
@@ -93,6 +79,7 @@
"text": "%sh\n\nhead -n 10 /tmp/bank.csv",
"user": "anonymous",
"dateUpdated": "2020-04-27 11:37:44.952",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -113,16 +100,8 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TEXT",
- "data":
"\"age\";\"job\";\"marital\";\"education\";\"default\";\"balance\";\"housing\";\"loan\";\"contact\";\"day\";\"month\";\"duration\";\"campaign\";\"pdays\";\"previous\";\"poutcome\";\"y\"\n30;\"unemployed\";\"married\";\"primary\";\"no\";1787;\"no\";\"no\";\"cellular\";19;\"oct\";79;1;-1;0;\"unknown\";\"no\"\n33;\"services\";\"married\";\"secondary\";\"no\";4789;\"yes\";\"yes\";\"cellular\";11;\"may\";220;1;339;4;\"failure\";\"no\"\n35;\"management\";\"single\";\"tertia
[...]
- }
- ]
- },
"apps": [],
+ "runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1579053112778_2010129053",
"id": "paragraph_1579053112778_2010129053",
@@ -136,6 +115,7 @@
"text": "%flink.bsql\n\nDROP TABLE IF EXISTS bank_raw;\nCREATE TABLE
bank_raw (\n content STRING\n) WITH
(\n\u0027format.field-delimiter\u0027\u003d\u0027\\n\u0027,\n\u0027connector.type\u0027\u003d\u0027filesystem\u0027,\n\u0027format.derive-schema\u0027\u003d\u0027true\u0027,\n\u0027connector.path\u0027\u003d\u0027/tmp/bank.csv\u0027,\n\u0027format.type\u0027\u003d\u0027csv\u0027\n);",
"user": "anonymous",
"dateUpdated": "2020-05-05 23:01:30.843",
+ "progress": 0,
"config": {
"colWidth": 6.0,
"fontSize": 9.0,
@@ -156,16 +136,8 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TEXT",
- "data": "Table has been dropped.\nTable has been created.\n"
- }
- ]
- },
"apps": [],
+ "runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1578044954921_-1188487356",
"id": "paragraph_1578044954921_-1188487356",
@@ -179,6 +151,7 @@
"text": "%flink.bsql\n\nDROP TABLE IF EXISTS bank;\nCREATE TABLE bank
(\n age int, \n job string,\n marital string,\n education string,\n
`default` string,\n balance string,\n housing string,\n loan
string,\n contact string, \n `day` string,\n `month` string,\n
duration int,\n campaign int,\n pdays int,\n previous int,\n
poutcome string,\n y string\n) WITH
(\n\u0027format.field-delimiter\u0027\u003d\u0027,\u0027,\n\u0027connector.t
[...]
"user": "anonymous",
"dateUpdated": "2020-05-05 23:01:32.335",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -199,16 +172,8 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TEXT",
- "data": "Table has been dropped.\nTable has been created.\n"
- }
- ]
- },
"apps": [],
+ "runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1578045204379_-1427374232",
"id": "paragraph_1578045204379_-1427374232",
@@ -222,6 +187,7 @@
"text": "%flink.bsql\n\nshow tables",
"user": "anonymous",
"dateUpdated": "2020-05-05 22:52:11.086",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -268,16 +234,8 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TABLE",
- "data": "table\nbank\nbank_raw\n"
- }
- ]
- },
"apps": [],
+ "runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1587958743728_1444404682",
"id": "paragraph_1587958743728_1444404682",
@@ -291,6 +249,7 @@
"text": "%flink\n\nimport
org.apache.flink.api.java.typeutils.RowTypeInfo\nimport
org.apache.flink.api.common.typeinfo.Types\nimport
org.apache.flink.api.java.typeutils._\nimport
org.apache.flink.api.scala.typeutils._\nimport
org.apache.flink.api.scala._\n\nclass Person(val age:Int, val job: String, val
marital: String, val education: String, val default: String, val balance:
String, val housing: String, val loan: String, val contact: String, val day:
String, val month: String, val [...]
"user": "anonymous",
"dateUpdated": "2020-05-05 23:01:45.688",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -311,16 +270,8 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TEXT",
- "data": "import
org.apache.flink.api.java.typeutils.RowTypeInfo\nimport
org.apache.flink.api.common.typeinfo.Types\nimport
org.apache.flink.api.java.typeutils._\nimport
org.apache.flink.api.scala.typeutils._\nimport
org.apache.flink.api.scala._\ndefined class Person\ndefined class
ParseFunction\n"
- }
- ]
- },
"apps": [],
+ "runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1578888628353_1621411444",
"id": "paragraph_1578888628353_1621411444",
@@ -334,6 +285,7 @@
"text": "%sh\n\nrm -rf /tmp/bank_cleaned\n#hadoop fs -rmr
/tmp/bank_cleaned",
"user": "anonymous",
"dateUpdated": "2020-05-05 22:52:42.774",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -354,11 +306,8 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": []
- },
"apps": [],
+ "runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1579061020460_-113987164",
"id": "paragraph_1579061020460_-113987164",
@@ -372,6 +321,7 @@
"text": "%flink.bsql\n\ninsert into bank select T.* from bank_raw,
LATERAL TABLE(parse(content)) as T(age, job, marital, education, `default`,
balance, housing, loan, contact, `day`, `month`, duration, campaign, pdays,
previous, poutcome, y) ",
"user": "anonymous",
"dateUpdated": "2020-05-05 22:53:04.445",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -392,16 +342,8 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TEXT",
- "data": "Insertion successfully.\n"
- }
- ]
- },
"apps": [],
+ "runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1578669828368_-1923137601",
"id": "paragraph_1578669828368_-1923137601",
@@ -415,6 +357,7 @@
"text": "%flink.bsql\n\nselect * from bank limit 10\n",
"user": "anonymous",
"dateUpdated": "2020-05-05 23:01:51.464",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -479,16 +422,8 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TABLE",
- "data":
"age\tjob\tmarital\teducation\tdefault\tbalance\thousing\tloan\tcontact\tday\tmonth\tduration\tcampaign\tpdays\tprevious\tpoutcome\ty\n30\tunemployed\tmarried\tprimary\tno\t1787\tno\tno\tcellular\t19\toct\t79\t1\t-1\t0\tunknown\tno\n33\tservices\tmarried\tsecondary\tno\t4789\tyes\tyes\tcellular\t11\tmay\t220\t1\t339\t4\tfailure\tno\n35\tmanagement\tsingle\ttertiary\tno\t1350\tyes\tno\tcellular\t16\tapr\t185\t1\t330\t1\tfailure\tno\n30\tmanagement\tmarried\ttertiary\tn
[...]
- }
- ]
- },
"apps": [],
+ "runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1578068480238_-1678045273",
"id": "paragraph_1578068480238_-1678045273",
@@ -502,6 +437,7 @@
"text": "%flink\n\nval table \u003d btenv.sqlQuery(\"select * from bank
limit 10\")\nz.show(table)",
"user": "anonymous",
"dateUpdated": "2020-05-05 23:33:44.788",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -564,20 +500,8 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TEXT",
- "data": "\u001b[1m\u001b[34mtable\u001b[0m:
\u001b[1m\u001b[32morg.apache.flink.table.api.Table\u001b[0m \u003d
UnnamedTable$3\n"
- },
- {
- "type": "TABLE",
- "data":
"age\tjob\tmarital\teducation\tdefault\tbalance\thousing\tloan\tcontact\tday\tmonth\tduration\tcampaign\tpdays\tprevious\tpoutcome\ty\n30\tunemployed\tmarried\tprimary\tno\t1787\tno\tno\tcellular\t19\toct\t79\t1\t-1\t0\tunknown\tno\n33\tservices\tmarried\tsecondary\tno\t4789\tyes\tyes\tcellular\t11\tmay\t220\t1\t339\t4\tfailure\tno\n35\tmanagement\tsingle\ttertiary\tno\t1350\tyes\tno\tcellular\t16\tapr\t185\t1\t330\t1\tfailure\tno\n30\tmanagement\tmarried\ttertiary\tn
[...]
- }
- ]
- },
"apps": [],
+ "runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1579061037737_-1577558456",
"id": "paragraph_1579061037737_-1577558456",
@@ -591,6 +515,7 @@
"text": "%flink.pyflink\n\ntable \u003d bt_env.sql_query(\"select * from
bank limit 10\")\nz.show(table)",
"user": "anonymous",
"dateUpdated": "2020-05-05 23:37:48.878",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -653,16 +578,8 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TABLE",
- "data":
"age\tjob\tmarital\teducation\tdefault\tbalance\thousing\tloan\tcontact\tday\tmonth\tduration\tcampaign\tpdays\tprevious\tpoutcome\ty\n30\tunemployed\tmarried\tprimary\tno\t1787\tno\tno\tcellular\t19\toct\t79\t1\t-1\t0\tunknown\tno\n33\tservices\tmarried\tsecondary\tno\t4789\tyes\tyes\tcellular\t11\tmay\t220\t1\t339\t4\tfailure\tno\n35\tmanagement\tsingle\ttertiary\tno\t1350\tyes\tno\tcellular\t16\tapr\t185\t1\t330\t1\tfailure\tno\n30\tmanagement\tmarried\ttertiary\tn
[...]
- }
- ]
- },
"apps": [],
+ "runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1588690392097_1159956807",
"id": "paragraph_1588690392097_1159956807",
@@ -675,12 +592,14 @@
"text": "%flink.pyflink\n",
"user": "anonymous",
"dateUpdated": "2020-05-05 23:37:07.989",
+ "progress": 0,
"config": {},
"settings": {
"params": {},
"forms": {}
},
"apps": [],
+ "runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1588693027989_1331448600",
"id": "paragraph_1588693027989_1331448600",
diff --git a/notebook/Flink Tutorial/7. Batch Data Analytics_2EZ9G3JJU.zpln
b/notebook/Flink Tutorial/7. Batch Data Analytics_2EZ9G3JJU.zpln
index f66031a..dd3501b 100644
--- a/notebook/Flink Tutorial/7. Batch Data Analytics_2EZ9G3JJU.zpln
+++ b/notebook/Flink Tutorial/7. Batch Data Analytics_2EZ9G3JJU.zpln
@@ -4,7 +4,8 @@
"title": "Overview",
"text": "%md\n\nThis tutorial demonstrates how to use Flink do data
exploration analytics via its.\n\n* batch sql \n* udf (scala, python \u0026
hive) \n* Zeppelin\u0027s dynamic forms and builtin visualization\n\nWe use the
bank data registered in another tutorial note. You can also use any existed
hive table. \n",
"user": "anonymous",
- "dateUpdated": "2020-04-29 22:12:26.213",
+ "dateUpdated": "2021-01-25 14:30:29.848",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -27,23 +28,14 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "HTML",
- "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThis tutorial demonstrates how
to use Flink do data exploration analytics via
its.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003ebatch
sql\u003c/li\u003e\n\u003cli\u003eudf (scala, python \u0026amp;
hive)\u003c/li\u003e\n\u003cli\u003eZeppelin\u0026rsquo;s dynamic forms and
builtin visualization\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eWe use the
bank data registered in another tutorial note. You can also [...]
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1579053946947_-1754951794",
"id": "paragraph_1579053946947_-1754951794",
"dateCreated": "2020-01-15 10:05:46.947",
- "dateStarted": "2020-04-29 22:12:26.216",
- "dateFinished": "2020-04-29 22:12:26.371",
+ "dateStarted": "2021-01-25 14:30:29.848",
+ "dateFinished": "2021-01-25 14:30:29.873",
"status": "FINISHED"
},
{
@@ -51,6 +43,7 @@
"text": "%flink.bsql\n\nselect age, count(1) as aval\nfrom bank \nwhere
age \u003c 30 \ngroup by age \norder by age\n",
"user": "anonymous",
"dateUpdated": "2020-04-27 11:41:22.697",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -122,15 +115,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TABLE",
- "data":
"age\taval\n19\t4\n20\t3\n21\t7\n22\t9\n23\t20\n24\t24\n25\t44\n26\t77\n27\t94\n28\t103\n29\t97\n"
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -146,6 +130,7 @@
"text": "%flink.bsql\n\nselect age, count(1) as val \nfrom bank \nwhere
age \u003c ${maxAge\u003d30} \ngroup by age \norder by age",
"user": "anonymous",
"dateUpdated": "2020-04-27 11:41:24.988",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -226,15 +211,6 @@
}
}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TABLE",
- "data":
"age\tval\n19\t4\n20\t3\n21\t7\n22\t9\n23\t20\n24\t24\n25\t44\n26\t77\n27\t94\n28\t103\n29\t97\n30\t150\n31\t199\n32\t224\n33\t186\n34\t231\n35\t180\n36\t188\n37\t161\n38\t159\n39\t130\n40\t142\n41\t135\n42\t141\n43\t115\n44\t105\n45\t112\n46\t119\n47\t108\n48\t114\n49\t112\n"
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -250,6 +226,7 @@
"text": "%flink.bsql\n\nselect age, count(1) as val \nfrom bank \nwhere
marital\u003d\u0027${marital\u003dsingle,single|divorced|married}\u0027 \ngroup
by age \norder by age",
"user": "anonymous",
"dateUpdated": "2020-04-27 11:41:26.761",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -341,15 +318,6 @@
}
}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TABLE",
- "data":
"age\tval\n23\t3\n24\t11\n25\t11\n26\t18\n27\t26\n28\t23\n29\t37\n30\t56\n31\t104\n32\t105\n33\t103\n34\t142\n35\t109\n36\t117\n37\t100\n38\t99\n39\t88\n40\t105\n41\t97\n42\t91\n43\t79\n44\t68\n45\t76\n46\t82\n47\t78\n48\t91\n49\t87\n50\t74\n51\t63\n52\t66\n53\t75\n54\t56\n55\t68\n56\t50\n57\t78\n58\t67\n59\t56\n60\t36\n61\t15\n62\t5\n63\t7\n64\t6\n65\t4\n66\t7\n67\t5\n68\t1\n69\t5\n70\t5\n71\t5\n72\t4\n73\t6\n74\t2\n75\t3\n76\t1\n77\t5\n78\t2\n79\t3\n80\t6\n81\t1\n83
[...]
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -365,6 +333,7 @@
"text": "%flink\n\n\nclass ScalaUpper extends ScalarFunction {\n def
eval(str: String) \u003d
str.toUpperCase\n}\n\nbtenv.registerFunction(\"scala_upper\", new
ScalaUpper())\n",
"user": "anonymous",
"dateUpdated": "2020-04-27 11:41:30.878",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -385,15 +354,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TEXT",
- "data": "defined class ScalaUpper\n"
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -409,6 +369,7 @@
"text": "%flink.pyflink\n\nclass PythonUpper(ScalarFunction):\n def
eval(self, s):\n return
s.upper()\n\nbt_env.register_function(\"python_upper\", udf(PythonUpper(),
DataTypes.STRING(), DataTypes.STRING()))\n\n\n\n\n",
"user": "anonymous",
"dateUpdated": "2020-04-27 11:41:33.020",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -429,10 +390,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": []
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -448,6 +405,7 @@
"text": "%flink.bsql\n\nhelp",
"user": "anonymous",
"dateUpdated": "2020-04-27 11:41:37.384",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -474,15 +432,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TEXT",
- "data": "The following commands are available:\n\nCREATE
TABLE\t\tCreate table under current catalog and database.\nDROP TABLE\t\tDrop
table with optional catalog and database. Syntax: \u0027DROP TABLE [IF EXISTS]
\u003cname\u003e;\u0027\nCREATE VIEW\t\tCreates a virtual table from a SQL
query. Syntax: \u0027CREATE VIEW \u003cname\u003e AS
\u003cquery\u003e;\u0027\nDESCRIBE\t\tDescribes the schema of a table with the
given name.\nDROP VIEW\t\tDeletes a previously created virt [...]
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -498,6 +447,7 @@
"text": "%flink.bsql\n\nselect scala_upper(education), count(1) from
bank group by education\n",
"user": "anonymous",
"dateUpdated": "2020-04-27 11:41:42.397",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -569,15 +519,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TABLE",
- "data":
"EXPR$0\tEXPR$1\nPRIMARY\t678\nSECONDARY\t2306\nTERTIARY\t1350\nUNKNOWN\t187\n"
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -593,6 +534,7 @@
"text": "%flink.bsql\n\nselect python_upper(education) as edu, count(1)
as c from bank group by education\n\n",
"user": "anonymous",
"dateUpdated": "2020-04-27 11:41:44.032",
+ "progress": 0,
"config": {
"runOnSelectionChange": true,
"title": true,
@@ -677,15 +619,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TABLE",
- "data":
"edu\tc\nPRIMARY\t678\nSECONDARY\t2306\nTERTIARY\t1350\nUNKNOWN\t187\n"
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -700,6 +633,7 @@
"text": "%flink.bsql\n",
"user": "anonymous",
"dateUpdated": "2020-02-05 16:10:30.318",
+ "progress": 0,
"config": {},
"settings": {
"params": {},
diff --git a/notebook/Flink Tutorial/8. Logistic Regression
(Alink)_2F4HJNWVN.zpln b/notebook/Flink Tutorial/8. Logistic Regression
(Alink)_2F4HJNWVN.zpln
index 097ddf8..a29dbb9 100644
--- a/notebook/Flink Tutorial/8. Logistic Regression (Alink)_2F4HJNWVN.zpln
+++ b/notebook/Flink Tutorial/8. Logistic Regression (Alink)_2F4HJNWVN.zpln
@@ -3,7 +3,8 @@
{
"text": "%md\n\n# Introduction\n\nThis note is to demonstrate how to do
machine learning in flink. Here we use
[Alink](https://github.com/alibaba/Alink/). \nWe use logics regression to do
classification task. We use the same data as other tutorials
[bank](https://archive.ics.uci.edu/ml/datasets/bank+marketing).\n",
"user": "anonymous",
- "dateUpdated": "2020-04-29 16:09:44.080",
+ "dateUpdated": "2021-01-25 14:30:35.800",
+ "progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
@@ -23,29 +24,21 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "HTML",
- "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003ch1\u003eIntroduction\u003c/h1\u003e\n\u003cp\u003eThis
note is to demonstrate how to do machine learning in flink. Here we use
\u003ca
href\u003d\"https://github.com/alibaba/Alink/\"\u003eAlink\u003c/a\u003e.\u003cbr
/\u003e\nWe use logics regression to do classification task. We use the same
data as other tutorials \u003ca
href\u003d\"https://archive.ics.uci.edu/ml/datasets/bank+marketing\"\u003ebank\u003c/a\u003e.
[...]
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1588147625869_1181490991",
"id": "paragraph_1588147625869_1181490991",
"dateCreated": "2020-04-29 16:07:05.869",
- "dateStarted": "2020-04-29 16:09:44.080",
- "dateFinished": "2020-04-29 16:09:44.102",
+ "dateStarted": "2021-01-25 14:30:35.800",
+ "dateFinished": "2021-01-25 14:30:35.807",
"status": "FINISHED"
},
{
"text": "%flink.pyflink\n\nimport pyflink\nfrom pyflink.dataset import
ExecutionEnvironment\nfrom pyflink.datastream import
StreamExecutionEnvironment\nfrom pyalink.alink.env import useCustomEnv\nmlenv
\u003d useCustomEnv(gateway,\n b_env,bt_env_2, s_env,
st_env_2)\nfrom pyalink.alink import *\n\nt \u003d bt_env_2.from_elements([(1,
2), (2, 5), (3, 1)], [\u0027a\u0027, \u0027b\u0027])\nsource \u003d
TableSourceBatchOp(t)\nsource.print()",
"user": "anonymous",
"dateUpdated": "2020-04-27 13:48:06.523",
+ "progress": 0,
"config": {
"editorMode": "ace/mode/python",
"editorHide": false,
@@ -64,15 +57,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TEXT",
- "data": "\nUse one of the following commands to start using
PyAlink:\n - useLocalEnv(parallelism, flinkHome\u003dNone, config\u003dNone)\n
- useRemoteEnv(host, port, parallelism, flinkHome\u003dNone,
localIp\u003d\"localhost\", config\u003dNone)\nCall resetEnv() to reset
environment and switch to another.\n\n a b\n0 1 2\n1 2 5\n2 3 1\n"
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -87,6 +71,7 @@
"text": "%flink.pyflink\n\n\ntest_data_path \u003d
\"/tmp/bank.csv\"\nfull_data_path \u003d \"/tmp/bank-full.csv\"\nschema_str
\u003d \"age int, job string, marital string, education string, default string,
balance string, housing string, loan string, contact string, day string, month
string, duration int, campaign int, pdays int, previous int, poutcome string, y
string\"\n\ntest_data \u003d CsvSourceBatchOp() \\\n
.setFilePath(test_data_path) \\\n .setSchemaStr(schema_str) \ [...]
"user": "anonymous",
"dateUpdated": "2020-04-29 16:10:03.433",
+ "progress": 0,
"config": {
"editorMode": "ace/mode/python",
"editorHide": false,
@@ -105,10 +90,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": []
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -123,6 +104,7 @@
"text": "%flink.pyflink\n\n\ncategoricalColNames \u003d [\"job\",
\"marital\", \"education\", \"default\",\n \"balance\",
\"housing\", \"loan\", \"contact\", \"poutcome\" ]\nnumerialColNames \u003d
[\"age\", \"duration\", \"campaign\", \"pdays\",\n
\"previous\"]\nlabelColName \u003d \"y\"\n\nonehot \u003d
OneHotEncoder().setSelectedCols(categoricalColNames) \\\n
.setOutputCols([\"output\"])\nassembler \u003d VectorAssembler().setSelec [...]
"user": "anonymous",
"dateUpdated": "2020-04-27 13:48:37.221",
+ "progress": 0,
"config": {
"editorMode": "ace/mode/python",
"editorHide": false,
@@ -141,10 +123,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": []
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -159,6 +137,7 @@
"text": "%flink.pyflink\n\n\nlogistic \u003d
LogisticRegression().setVectorCol(\"vec\").setLabelCol(labelColName) \\\n
.setPredictionCol(\"pred\").setPredictionDetailCol(\"detail\")\nmodel \u003d
pipeline.add(logistic).fit(train_set)\n\npredict \u003d
model.transform(test_data)\n\nmetrics \u003d
EvalBinaryClassBatchOp().setLabelCol(labelColName) \\\n
.setPredictionDetailCol(\"detail\").linkFrom(predict).collectMetrics()\n
\n \n ",
"user": "anonymous",
"dateUpdated": "2020-04-27 13:48:39.232",
+ "progress": 0,
"config": {
"editorMode": "ace/mode/python",
"editorHide": false,
@@ -177,10 +156,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": []
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -195,6 +170,7 @@
"text": "%flink.pyflink\n\nprint(\"AUC:\",
metrics.getAuc())\nprint(\"KS:\", metrics.getKs())\nprint(\"PRC:\",
metrics.getPrc())\nprint(\"Precision:\",
metrics.getPrecision())\nprint(\"Recall:\",
metrics.getRecall())\nprint(\"F1:\",
metrics.getF1())\nprint(\"ConfusionMatrix:\",
metrics.getConfusionMatrix())\nprint(\"LabelArray:\",
metrics.getLabelArray())\nprint(\"LogLoss:\",
metrics.getLogLoss())\nprint(\"TotalSamples:\",
metrics.getTotalSamples())\nprint(\"ActualLabelProportion:\ [...]
"user": "anonymous",
"dateUpdated": "2020-04-27 13:48:51.441",
+ "progress": 0,
"config": {
"editorMode": "ace/mode/python",
"editorHide": false,
@@ -213,15 +189,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TEXT",
- "data": "AUC: 0.9076425143954201\nKS: 0.6844817658349329\nPRC:
0.5349186786420718\nPrecision: 0.4748803827751196\nRecall:
0.761996161228407\nF1: 0.5851142225497421\nConfusionMatrix: [[397, 439], [124,
3561]]\nLabelArray: [\u0027yes\u0027, \u0027no\u0027]\nLogLoss:
0.3427333415817235\nTotalSamples: 4521\nActualLabelProportion:
[0.11523999115239991, 0.8847600088476001]\nActualLabelFrequency: [521,
4000]\nAccuracy: 0.8754700287547003\nKappa: 0.5164554316821129\n"
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -236,6 +203,7 @@
"text": "%flink.pyflink\n\ndf \u003d
predict.filter(\"y\u003c\u003epred\").firstN(300).collectToDataframe()\n\nz.show(df)",
"user": "anonymous",
"dateUpdated": "2020-04-27 13:48:54.873",
+ "progress": 0,
"config": {
"editorMode": "ace/mode/python",
"editorHide": false,
@@ -302,15 +270,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TABLE",
- "data":
"age\tjob\tmarital\teducation\tdefault\tbalance\thousing\tloan\tcontact\tday\tmonth\tduration\tcampaign\tpdays\tprevious\tpoutcome\ty\toutput\tvec\tpred\tdetail\n36\tself-employed\tmarried\ttertiary\tno\t307\tyes\tno\tcellular\t14\tmay\t341\t1\t330\t2\tother\tno\t$4563$6:1.0
13:1.0 17:1.0 19:1.0 2518:1.0 4554:1.0 4556:1.0 4560:1.0\t$4568$6:1.0 13:1.0
17:1.0 19:1.0 2518:1.0 4554:1.0 4556:1.0 4560:1.0 4563:36.0 4564:341.0 4565:1.0
4566:330.0 4567:2.0\tyes\t{\"no\":\"0.2 [...]
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
@@ -325,6 +284,7 @@
"text": "%flink.pyflink\n",
"user": "anonymous",
"dateUpdated": "2020-03-10 11:04:48.771",
+ "progress": 0,
"config": {
"editorMode": "ace/mode/python",
"editorHide": false,
@@ -343,15 +303,6 @@
"params": {},
"forms": {}
},
- "results": {
- "code": "SUCCESS",
- "msg": [
- {
- "type": "TEXT",
- "data": "UsageError: Line magic function `%flink.pyflink` not
found.\n"
- }
- ]
- },
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,