[kylin] branch document updated: Update configuration page for 3.0.0 realtime olap.

nic Tue, 24 Dec 2019 03:49:50 -0800

This is an automated email from the ASF dual-hosted git repository.

nic pushed a commit to branch document
in repository https://gitbox.apache.org/repos/asf/kylin.git



The following commit(s) were added to refs/heads/document by this push:
     new 443e7b0  Update configuration page for 3.0.0 realtime olap.
443e7b0 is described below

commit 443e7b0c511ab3a8044e9d7ece72fa24835a0d75
Author: XiaoxiangYu <hit_la...@126.com>
AuthorDate: Tue Dec 24 17:05:49 2019 +0800

    Update configuration page for 3.0.0 realtime olap.
---
 website/_data/docs.yml                             |   1 +
 website/_docs/install/configuration.cn.md          |  70 +++++----
 website/_docs/install/configuration.md             |  37 +++--
 .../lambda_mode_and_timezone_realtime_olap.md      | 175 +++++++++++++++++++++
 website/_docs/tutorial/real_time_olap.md           |   5 +-
 website/_docs30/tutorial/real_time_olap.md         |   1 +
 website/download/index.cn.md                       |   2 +-
 website/download/index.md                          |   2 +-
 website/images/RealtimeOlap/Before-Submit.png      | Bin 0 -> 357148 bytes
 .../images/RealtimeOlap/CreateStreamingModel.png   | Bin 0 -> 39540 bytes
 website/images/RealtimeOlap/JobMonitor.png         | Bin 0 -> 594207 bytes
 website/images/RealtimeOlap/LambdaCubeSegment.png  | Bin 0 -> 176723 bytes
 website/images/RealtimeOlap/Table-Meta-1.png       | Bin 0 -> 136491 bytes
 website/images/RealtimeOlap/Table-Meta-2.png       | Bin 0 -> 181488 bytes
 website/images/RealtimeOlap/Table-Meta-3.png       | Bin 0 -> 42778 bytes
 .../images/RealtimeOlap/Timezone-checkresult.png   | Bin 0 -> 167890 bytes
 16 files changed, 247 insertions(+), 46 deletions(-)

diff --git a/website/_data/docs.yml b/website/_data/docs.yml
index d3ada2a..5c99520 100644
--- a/website/_data/docs.yml
+++ b/website/_data/docs.yml
@@ -52,6 +52,7 @@
   - tutorial/setup_jdbc_datasource
   - tutorial/hybrid
   - tutorial/mysql_metastore
+  - tutorial/lambda_mode_and_timezone_realtime_olap
 
 - title: Integration
   docs:
diff --git a/website/_docs/install/configuration.cn.md 
b/website/_docs/install/configuration.cn.md
index bbb9905..8099c6a 100644
--- a/website/_docs/install/configuration.cn.md
+++ b/website/_docs/install/configuration.cn.md
@@ -567,35 +567,47 @@ Kylin 可以使用三种类型的压缩，分别是 HBase 表压缩，Hive 输
 
 
 ### 实时 OLAP    {#realtime-olap}
-- `kylin.stream.job.dfs.block.size`：指定了流式构建 Base Cuboid 任务所需 HDFS 块的大小。默认值为 
*16M*。
-- `kylin.stream.index.path`：指定了本地 segment 缓存的位置。默认值为 *stream_index*。
-- `kylin.stream.cube-num-of-consumer-tasks`：指定了共享同一个 topic 分区的 replica set 
数量，影响着不同 replica set 分配的分区数量。默认值为 *3*。
-- `kylin.stream.cube.window`：指定了每个 segment 的持续时长，以秒为单位。默认值为 *3600*。
-- `kylin.stream.cube.duration`：指定了 segment 从 active 状态变为 IMMUTABLE 
状态的等待时间，以秒为单位。默认值为 *7200*。
-- `kylin.stream.cube.duration.max`：segment 的 active 状态的最长持续时间，以秒为单位。默认值为 
*43200*。
-- `kylin.stream.checkpoint.file.max.num`：指定了每个 Cube 包含的 checkpoint 
文件数的最大值。默认值为 *5*。
-- `kylin.stream.index.checkpoint.intervals`：指定了两个 checkpoint 设置的时间间隔。默认值为 
*300*。
-- `kylin.stream.index.maxrows`：指定了缓存在堆/内存中的事件数的最大值。默认值为 *50000*。
-- `kylin.stream.immutable.segments.max.num`：指定了当前 receiver 里每个 Cube 中状态为 
IMMUTABLE 的 segment 的最大数值，如果超过最大值，当前 topic 的消费将会被暂停。默认值为 *100*。
-- `kylin.stream.consume.offsets.latest`：是否从最近的偏移量开始消费。默认值为 *true*。
-- `kylin.stream.node`：指定了 coordinator/receiver 的节点。形如 host:port。默认值为 *null*。
-- `kylin.stream.metadata.store.type`：指定了元数据存储的位置。默认值为 *zk*。
-- `kylin.stream.segment.retention.policy`：指定了当 segment 变为 IMMUTABLE 状态时，本地 
segment 缓存的处理策略。参数值可选 `purge` 和 `fullBuild`。`purge` 意味着当 segment 的状态变为 
IMMUTABLE，本地缓存的 segment 数据将被删除。`fullBuild` 意味着当 segment 的状态变为 IMMUTABLE，本地缓存的 
segment 数据将被上传到 HDFS。默认值为 *fullBuild*。
-- `kylin.stream.assigner`：指定了用于将 topic 分区分配给不同 replica set 的实现类。该类实现了 
`org.apache.kylin.stream.coordinator.assign.Assigner` 类。默认值为 *DefaultAssigner*。
-- `kylin.stream.coordinator.client.timeout.millsecond`：指定了连接 coordinator 
客户端的超时时间。默认值为 *5000*。
-- `kylin.stream.receiver.client.timeout.millsecond`：指定了连接 receiver 
客户端的超时时间。默认值为 *5000*。
-- `kylin.stream.receiver.http.max.threads`：指定了连接 receiver 的最大线程数。默认值为 *200*。
-- `kylin.stream.receiver.http.min.threads`：指定了连接 receiver 的最小线程数。默认值为 *10*。
-- `kylin.stream.receiver.query-core-threads`：指定了当前 receiver 用于查询的线程数。默认值为 *50*。
-- `kylin.stream.receiver.query-max-threads`：指定了当前 receiver 用于查询的最大线程数。默认值为 
*200*。
-- `kylin.stream.receiver.use-threads-per-query`：指定了每个查询使用的线程数。默认值为 *8*。
-- `kylin.stream.build.additional.cuboids`：是否构建除 Base Cuboid 外的 cuboids。除 Base 
Cuboid 外的 cuboids 指的是在 Cube 的 Advanced Setting 页面选择的强制维度的聚合。默认值为 *false*。默认只构建 
Base Cuboid。
-- `kylin.stream.segment-max-fragments`：指定了每个 segment 保存的最大 fragment 数。默认值为 
*50*。
-- `kylin.stream.segment-min-fragments`：指定了每个 segment 保存的最小 fragment 数。默认值为 
*15*。
-- `kylin.stream.max-fragment-size-mb`：指定了每个 fragment 文件的最大尺寸。默认值为 *300*。
-- `kylin.stream.fragments-auto-merge-enable`：是否开启 fragment 文件自动合并的功能。默认值为 
*true*。
-
-> 提示：更多信息请参考 [Real-time 
OLAP](http://kylin.apache.org/docs30/tutorial/real_time_olap.html)。
+
+#### 全局设置
+
+- `kylin.stream.job.dfs.block.size`: 指定了流式构建 Cuboid 任务所需 HDFS 块的大小。默认值为 *16M*。
+- `kylin.stream.index.path`: 指定了存储segment cache file的本地路径(包括本地fragment 
file和checkpoint file)。支持相对路径和绝对路径，默认值是 
*stream_index*，也就是写到`$KYLIN_HOME/stream_index`，如果数据量很大的话将会占用大量磁盘空间，您也可以根据您的需求写成绝对路径以将数据放到数据盘。
+- `kylin.stream.node`: 指定了 
receiver/coordinator的地址。格式应该为`hostname:port`或者`port`。如果设置成`port`，Kylin将会自动补全hostname；如果不设置该属性，将会使用默认的端口(Coordinator:7070，Receiver:9090)。当进程启动时，会将自身注册到Metadata。
+- `kylin.stream.metadata.store.type`: 指定了Realtime集群信息的元数据存储。默认值是 *zk*。
+- `kylin.stream.receiver.use-threads-per-query`: 指定了每个查询使用的线程资源数量。默认值是*8*。
+
+#### Cube 级别设置
+
+- `kylin.stream.index.maxrows`: 
指定了缓存在堆内的聚合后的事件最大行数。默认值是*50000*。这个参数会影响Fragment File的数量，可以根据需求适当调高。
+- `kylin.stream.cube-num-of-consumer-tasks`: 指定了一个topic的全部消息的摄入将由哪多少Replica 
Set来负责。如果您的消息速率较大，需要适当提升这个数值。默认值是*3*。
+- `kylin.stream.segment.retention.policy`: 
当Segment状态变为*IMMUTABLE*，该配置指定了Receiver如何处理本地Segment 
Cache。可选值包含`purge`和`fullBuild`。设置为`purge`后，Receiver会等待一定时间后删除本地数据；设置为`fullBuild`后，数据会上传到HDFS并等待构建。默认值是*fullBuild*。
+- `kylin.stream.build.additional.cuboids`: 默认情况下Receiver只构建base 
cuboid来回答查询，可以在Receiver端是否构建额外的cuboid，如果你希望优化某些查询的响应时间。具体哪些额外的Cuboid需要被构建由高级配置页面的强制Cuboid指定。
+- `kylin.stream.cube.window`: 指定了Streaming 
Segment的长度。默认值是*3600*。详情参阅[deep-dive-real-time-olap](http://kylin.apache.org/blog/2019/07/01/deep-dive-real-time-olap/)。
+- `kylin.stream.cube.duration`: 指定了Streaming Segment会等待迟到的消息多久，默认值是 *7200*(秒)。 
详情参阅[deep-dive-real-time-olap](http://kylin.apache.org/blog/2019/07/01/deep-dive-real-time-olap/)。
+- `kylin.stream.cube.duration.max`: 指定了Streaming Segment保持Active的最长时间。默认值是 
*43200*。详情参阅[deep-dive-real-time-olap](http://kylin.apache.org/blog/2019/07/01/deep-dive-real-time-olap/)。
+- `kylin.stream.checkpoint.file.max.num`: 
指定了Receiver为每一个Cube保留的checkpoint文件数量。默认值是 *5*。
+- `kylin.stream.index.checkpoint.intervals`: 指定了Receiver进行checkpoint的间隔。默认值是 
*300*。
+- `kylin.stream.immutable.segments.max.num`: 
指定了在Receiver端，一个Cube最多可以保持多少个*IMMUTABLE*segment，因为Receiver端的性能和Fragment 
File的数量呈负相关。默认值是 *100*。
+- 
`kylin.stream.consume.offsets.latest`:指定了Receiver从什么位置开始消费，设置成*true*则从最新的offset开始消费，false则从最老的位置消费。默认值是
 *true*。
+
+#### 高级设置
+
+- `kylin.stream.assigner`: 
值是一个类的名字，这个类应该是`org.apache.kylin.stream.coordinator.assign.Assigner`的实现类，用于指定如何将Kafka
 Topic 下的各个Partition分配给各个Replica Set。默认值是 
*DefaultAssigner*，其策略会努力将工作负载分配给负责partition数量少的Replica Set，以使得各个Replica 
Set工作负载相对均衡。
+- `kylin.stream.coordinator.client.timeout.millsecond`: 指定和Coordinator 
HTTP连接的Timeout，默认值是 *5000*。
+- `kylin.stream.receiver.client.timeout.millsecond`:指定和Receiver 
HTTP连接的Timeout，默认值是 *5000*。
+- `kylin.stream.receiver.http.max.threads`: 指定了Receiver端的Http连接最大线程数。默认值为 
*200*。
+- `kylin.stream.receiver.http.min.threads`: 指定了Receiver端的Http连接最小线程数。默认值为 *10*。
+- `kylin.stream.receiver.query-core-threads`: 指定了Receiver用于scan的线程数量，默认值是*50*。
+- `kylin.stream.receiver.query-max-threads`: 
指定了Receiver用于scan的线程最大数量，默认值是*200*。
+- `kylin.stream.segment-max-fragments`: 
Receiver端每次MemoryStore大小达到阈值(`kylin.stream.index.maxrows`)，会落盘形成一个Fragment 
File，Receiver会尝试尽可能合并这些Fragment File来减少数据冗余。这个配置项会指定触发merge的阈值，默认值是*50*。
+- `kylin.stream.segment-min-fragments`: Receiver端的每次merge后不会使文件数量少于这个阈值，默认值是 
*15*。
+- `kylin.stream.max-fragment-size-mb`: 合并后，每个Fragment File的大小不会超过该值，默认值是 *300*。
+- `kylin.stream.fragments-auto-merge-enable`: 是否开启后台自动合并Fragment File。默认值是 
*true*。
+- `kylin.stream.metrics.option`: 指定是否开启Receiver端的metrics信息收集, 可选值是 
csv/console/jmx。
+- `kylin.stream.event.timezone`: 指定从Event 
Time衍生出来的时间衍生列如`HOUR_START`/`DAY_START`使用哪种时区，默认是UTC时间。
+- `kylin.stream.auto-resubmit-after-discard-enabled`: 当用户 discard了某一个 
Realtime的构建任务，是否自动重新提交新任务。
+
+> 提示：入门教程 请参考 [Real-time OLAP](/docs/tutorial/realtime_olap.html)。
 
 
 
diff --git a/website/_docs/install/configuration.md 
b/website/_docs/install/configuration.md
index d8c5e1f..8e61c6d 100644
--- a/website/_docs/install/configuration.md
+++ b/website/_docs/install/configuration.md
@@ -565,20 +565,30 @@ This compression is configured via `kylin_job_conf.xml` 
and `kylin_job_conf_inme
 
 
 ### Real-time OLAP    {#realtime-olap}
+#### Global level config
+
 - `kylin.stream.job.dfs.block.size`: specifies the HDFS block size of the 
streaming Base Cuboid job using. The default value is *16M*.
-- `kylin.stream.index.path`: specifies the path to store local segment cache. 
The default value is *stream_index*.
+- `kylin.stream.index.path`: specifies the local path to store segment cache 
files(including fragment and checkpoint files). The default value is 
*stream_index*.
+- `kylin.stream.node`: specifies the node of coordinator/receiver. Value 
should be `hostname:port` or `port`. If set to `port`, Kylin will complete 
hostname automatically. When Kylin process started, it will register it into 
metadata. The default value is *null*.
+- `kylin.stream.metadata.store.type`: specifies the position of metadata 
store. The default value is *zk*. This entry is trivial because it has only one 
option.
+- `kylin.stream.receiver.use-threads-per-query`: specifies the threads number 
that each query use. The default value is *8*.
+
+#### Cube level config
+
+- `kylin.stream.index.maxrows`: specifies the maximum number of the aggregated 
event keep in JVM heap. The default value is *50000*. Try to advance it if you 
have enough heap size.
 - `kylin.stream.cube-num-of-consumer-tasks`: specifies the number of replica 
sets that share the whole topic partition. It affects how many partitions will 
be assigned to different replica sets. The default value is *3*.
-- `kylin.stream.cube.window`: specifies the length of duration of each 
segment, value in seconds. The default value is *3600*.
-- `kylin.stream.cube.duration`: specifies the wait time that a segment's 
status changes from active to IMMUTABLE, value in seconds. The default value is 
*7200*.
-- `kylin.stream.cube.duration.max`: specifies the maximum duration that 
segment can keep active, value in seconds. The default value is *43200*.
+- `kylin.stream.segment.retention.policy`: specifies the strategy to process 
local segment cache when segment become *IMMUTABLE*. Optional values include 
`purge` and `fullBuild`. `purge` means when the segment become *IMMUTABLE*, it 
will be deleted. `fullBuild` means when the segment become *IMMUTABLE*, it will 
be uploaded to HDFS. The default value is *fullBuild*.
+- `kylin.stream.build.additional.cuboids`: whether to build additional 
Cuboids. The additional Cuboids mean the aggregation of Mandatory Dimensions 
that chosen in *Cube Advanced Setting* page. The default value is *false*. Only 
build Base Cuboid by default. Try to enable it if you care the QPS and most 
query pattern can be foresaw.
+- `kylin.stream.cube.window`: specifies the length of duration of each 
segment, value in seconds. The default value is *3600*. Please check detail 
at[deep-dive-real-time-olap](http://kylin.apache.org/blog/2019/07/01/deep-dive-real-time-olap/).
+- `kylin.stream.cube.duration`: specifies the wait time that a segment's 
status changes from active to IMMUTABLE, value in seconds. The default value is 
*7200*. Please check detail 
at[deep-dive-real-time-olap](http://kylin.apache.org/blog/2019/07/01/deep-dive-real-time-olap/).
+- `kylin.stream.cube.duration.max`: specifies the maximum duration that 
segment can keep active, value in seconds. The default value is *43200*. Please 
check detail 
at[deep-dive-real-time-olap](http://kylin.apache.org/blog/2019/07/01/deep-dive-real-time-olap/).
 - `kylin.stream.checkpoint.file.max.num`: specifies the maximum number of 
checkpoint file for each cube. The default value is *5*.
 - `kylin.stream.index.checkpoint.intervals`: specifies the time interval 
between setting two checkpoints. The default value is *300*.
-- `kylin.stream.index.maxrows`: specifies the maximum number of the entered 
event be cached in heap/memory. The default value is *50000*.
 - `kylin.stream.immutable.segments.max.num`: specifies the maximum number of 
the IMMUTABLE segment in each Cube of the current streaming receiver, if 
exceed, consumption of current topic will be paused. The default value is *100*.
-- `kylin.stream.consume.offsets.latest`: whether to consume from the latest 
offset. The default value is *true*.
-- `kylin.stream.node`: specifies the node of coordinator/receiver. Such as 
host:port. The default value is *null*.
-- `kylin.stream.metadata.store.type`: specifies the position of metadata 
store. The default value is *zk*.
-- `kylin.stream.segment.retention.policy`: specifies the strategy to process 
local segment cache when segment become IMMUTABLE. Optional values include 
`purge` and `fullBuild`. `purge` means when the segment become IMMUTABLE, it 
will be dropped. `fullBuild` means when the segment become IMMUTABLE, it will 
be uploaded to HDFS. The default value is *fullBuild*.
+- `kylin.stream.consume.offsets.latest`: whether to consume from the latest 
offset or the earliest offset. The default value is *true*.
+
+#### Advanced config
+
 - `kylin.stream.assigner`: specifies the implementation class which used to 
assign the topic partition to different replica sets. The class should be the 
implementation class of `org.apache.kylin.stream.coordinator.assign.Assigner`. 
The default value is *DefaultAssigner*.
 - `kylin.stream.coordinator.client.timeout.millsecond`: specifies the 
connection timeout of the coordinator client. The default value is *5000*.
 - `kylin.stream.receiver.client.timeout.millsecond`: specifies the connection 
timeout of the receiver client. The default value is *5000*.
@@ -586,14 +596,15 @@ This compression is configured via `kylin_job_conf.xml` 
and `kylin_job_conf_inme
 - `kylin.stream.receiver.http.min.threads`: specifies the minimum connection 
threads of the receiver. The default value is *10*.
 - `kylin.stream.receiver.query-core-threads`: specifies the number of query 
threads be used for the current streaming receiver. The default value is *50*.
 - `kylin.stream.receiver.query-max-threads`: specifies the maximum number of 
query threads be used for the current streaming receiver. The default value is 
*200*.
-- `kylin.stream.receiver.use-threads-per-query`: specifies the threads number 
that each query use. The default value is *8*.
-- `kylin.stream.build.additional.cuboids`: whether to build additional 
Cuboids. The additional Cuboids mean the aggregation of Mandatory Dimensions 
that chosen in Cube Advanced Setting page. The default value is *false*. Only 
build Base Cuboid by default.
 - `kylin.stream.segment-max-fragments`: specifies the maximum number of 
fragments that each segment keep. The default value is *50*.
 - `kylin.stream.segment-min-fragments`: specifies the minimum number of 
fragments that each segment keep. The default value is *15*.
 - `kylin.stream.max-fragment-size-mb`: specifies the maximum size of each 
fragment. The default value is *300*.
-- `kylin.stream.fragments-auto-merge-enable`: whether to enable fragments auto 
merge. The default value is *true*.
+- `kylin.stream.fragments-auto-merge-enable`: whether to enable fragments auto 
merge in streaming receiver side. The default value is *true*.
+- `kylin.stream.metrics.option`: specifies how to report metrics in streaming 
receiver side, option value are csv/console/jmx.
+- `kylin.stream.event.timezone`: specifies which timezone should derived time 
column like `HOUR_START`/`DAY_START` used.
+- `kylin.stream.auto-resubmit-after-discard-enabled`: whether to resubmit new 
building job automatically when finding previous job be discarded by user.
 
-> Note: For more information, please refer to the [Real-time 
OLAP](http://kylin.apache.org/docs30/tutorial/real_time_olap.html).
+> Note: For step by step tutorial, please refer to the [Real-time 
OLAP](/docs/tutorial/realtime_olap.html).
 
 ### Storage Clean up Configuration    {#storage-clean-up-configuration}
 
diff --git a/website/_docs/tutorial/lambda_mode_and_timezone_realtime_olap.md 
b/website/_docs/tutorial/lambda_mode_and_timezone_realtime_olap.md
new file mode 100644
index 0000000..3f22997
--- /dev/null
+++ b/website/_docs/tutorial/lambda_mode_and_timezone_realtime_olap.md
@@ -0,0 +1,175 @@
+---
+layout: docs
+title:  Lambda mode and Timezone in Real-time OLAP
+categories: tutorial
+permalink: /docs/tutorial/lambda_mode_and_timezone_realtime_olap.html
+---
+
+Kylin v3.0.0 will release the real-time OLAP feature, by the power of newly 
added streaming reciever cluster, Kylin can query streaming data with 
sub-second latency. You can check [this tech 
blog](/blog/2019/04/12/rt-streaming-design/) for the overall design and core 
concept. 
+
+If you want to find a step by step tutorial, please check this [this tech 
blog](/docs/tutorial/realtime_olap.html).
+In this article, we will introduce how to update segment and set timezone for 
derived time column in realtime OLAP cube. 
+
+# Background
+
+Says we have Kafka message which looks like this:
+
+{% highlight Groff markup %}
+{
+    "s_nation":"SAUDI ARABIA",
+    "lo_supplycost":74292,
+    "p_category":"MFGR#0910",
+    "local_day_hour_minute":"09_21_44",
+    "event_time":"2019-12-09 08:44:50.000-0500",
+    "local_day_hour":"09_21",
+    "lo_quantity":12,
+    "lo_revenue":1411548,
+    "p_brand":"MFGR#0910051",
+    "s_region":"MIDDLE EAST",
+    "lo_discount":5,
+    "customer_info":{
+        "CITY":"CHINA    057",
+        "REGION":"ASIA",
+        "street":"CHINA    05721",
+        "NATION":"CHINA"
+    },
+    "d_year":1994,
+    "d_weeknuminyear":30,
+    "p_mfgr":"MFGR#09",
+    "v_revenue":7429200,
+    "d_yearmonth":"Jul1994",
+    "s_city":"SAUDI ARA15",
+    "profit_ratio":0.05263157894736842,
+    "d_yearmonthnum":199407,
+    "round":1
+}
+{% endhighlight %}
+
+This sample comes from SSB with some additional fields such as `event_time`. 
We have the field such as `event_time`, which stands for the timestamp of 
current event. 
+And we assume that event come from countries of different timezone, 
"2019-12-09 08:44:50.000-0500" indicated that event applies `America/New_York` 
timezone. You may have some events which come from `Asia/Shanghai` as well.
+
+`local_day_hour_minute` is a column which value is in local timezone, eg. 
"GMT+8" in the above sample.
+
+### Question
+When perform realtime OLAP analysis with Kylin, you may have some concerns 
included:
+
+1. Will events in different timezones cause incorrect query results?
+2. How could I make it correct when kafka messages contain the value which is 
not what you want, says some dimension value is misspelled?
+3. How could I retrieve long-late messages which has been dropped?
+4. My query only hit a small range of time, how should I write filter 
condition to make sure unused segments are purged/skipped from scan?
+
+### Quick Answer
+For the first question, you can always get the correct result in the right 
timezone of location by set `kylin.stream.event.timezone=GMT+N` for all Kylin 
processes. By default, UTC is used for *derived time column*.
+
+For the second and third question, in fact you cannot update/append segment to 
a normal streaming cube, but you can update/append a streaming cube which in 
lambda mode, all you need to prepare is creating a Hive table which is mapped 
to your kafka event.
+
+For the fourth question, you can achieved this by adding *derived time column* 
in your filter condition like `MINUTE_START`/`DAY_START` etc.
+
+# How to do
+
+### Configure timezone
+We know message may come from different timezone, but you want query results 
using some specific timezone. 
+For example, if you live in some place in GMT+2, please set 
`kylin.stream.event.timezone=GMT+2` for all Kylin process. 
+
+
+### Create lambda table
+
+You should create a hive table in *default* namespace, and this table should 
contains all your dimension and measure columns, please 
+ remember to include derived time column like `MINUTE_START`/`DAY_START` if 
you set them in your cube's dimension column. 
+
+Depend on which granularity level you want to update segment,  you can choose 
HOUR_START* or `DAY_START` as partition column of this hive table.
+
+{% highlight Groff markup %}
+use default;
+CREATE EXTERNAL TABLE IF NOT EXISTS lambda_flat_table
+(
+-- event timestamp and debug purpose column
+EVENT_TIME timestamp
+,ROUND bigint COMMENT "For debug purpose, in which round did this event sent 
by producer"
+,LOCAL_DAY_HOUR string COMMENT "For debug purpose, maybe check timezone etc"
+,LOCAL_MINUTE string COMMENT "For debug purpose, maybe check timezone etc"
+
+-- dimension column on fact table
+,LO_QUANTITY bigint
+,LO_DISCOUNT bigint
+
+-- dimension column on dimension table
+,C_REGION string
+,C_NATION string
+,C_CITY string
+
+,D_YEAR int
+,D_YEARMONTH string
+,D_WEEKNUMINYEAR int
+,D_YEARMONTHNUM int
+
+,S_REGION string
+,S_NATION string
+,S_CITY string
+
+,P_CATEGORY string
+,P_BRAND string
+,P_MFGR string
+
+
+-- measure column  on fact table
+,V_REVENUE bigint
+,LO_SUPPLYCOST bigint
+,LO_REVENUE bigint
+,PROFIT_RATIO double
+
+-- for kylin used
+,MINUTE_START timestamp
+,HOUR_START timestamp
+,MONTH_START date
+)
+PARTITIONED BY (DAY_START date)
+STORED AS SEQUENCEFILE
+LOCATION 'hdfs:///LacusDir/lambda_flat_table';
+{% endhighlight %}
+
+
+### Create streaming cube in Kylin
+The first step is to add information like broker list and topic name; 
+after that, you should paste sample message into left and let Kylin 
auto-detect the column name and column type.
+You may find some data type is not correct, please fix them manually and make 
sure they are aligned to the data type in Hive table.
+
+For example, you should change the data type of event_time from varchar to 
timestamp.
+And some column names are not the same as Hive Table, so please correct them 
too, such as `customer_info_REGION` to `C_REGION`.
+
+![image](/images/RealtimeOlap/Before-Submit.png)
+
+After that, please choose the right *TSColumn* *TSParser* and correct *Table 
Name*, table name should be identical to the name of Hive Table. After that, 
you should click *submit* buttom.
+If you are lucky enough, table meta info will be saved successfully, otherwise 
please correct data type and column name according to output message.
+
+When you are creating Model, please set *Partition Date Column* with the right 
value. For streaming cube, *Partition Date Column* is used to generate HQL in 
updating segment which source data is from Hive.
+![image](/images/RealtimeOlap/CreateStreamingModel.png)
+
+### Check result with timezone
+
+Let us do a quick check to compare whether *LOCAL_MINUTE* is aligned to 
*HOUR_START*.
+{% highlight Groff markup %}
+SELECT LOCAL_MINUTE, HOUR_START, sum(LO_SUPPLYCOST)
+FROM LAMBDA_FLAT_TABLE
+WHERE day_start = '2019-12-09'
+GROUP BY LOCAL_MINUTE, HOUR_START
+ORDER BY LOCAL_MINUTE, HOUR_START
+{% endhighlight %}
+
+![image](/images/RealtimeOlap/Timezone-checkresult.png)
+ 
+### Update segment
+
+1. Use some ETL tools like spark streaming to write correct data into HDFS, 
and add new partition based on your new data files. 
+2. After that, use Rest API 
`http://localhost:7070/kylin/api/cubes/{cube_name}/rebuild` [Put Method] to 
submit a build job to replace old segments, 
+please add offset according to timezone in `startTime` and `endTime` if you 
have set `kylin.stream.event.timezone`.
+3. In some case, you want to add to a lot of historical data into Kylin 
streaming cube to analyse(not replace something), you can also use the method.
+
+![image](/images/RealtimeOlap/JobMonitor.png)
+![image](/images/RealtimeOlap/LambdaCubeSegment.png)
+
+### Some screenshots
+![image](/images/RealtimeOlap/Table-Meta-1.png)
+![image](/images/RealtimeOlap/Table-Meta-2.png)
+![image](/images/RealtimeOlap/Table-Meta-3.png)
+
diff --git a/website/_docs/tutorial/real_time_olap.md 
b/website/_docs/tutorial/real_time_olap.md
index e7e1047..588f966 100644
--- a/website/_docs/tutorial/real_time_olap.md
+++ b/website/_docs/tutorial/real_time_olap.md
@@ -15,8 +15,9 @@ In this tutorial, we will use Hortonworks HDP-2.4.0.0.169 
Sandbox VM + Kafka v1.
 4. Start consumption
 5. Monitor receiver
 
-The configuration can be found at [Real-time OLAP 
configuration](http://kylin.apache.org/docs30/install/configuration.html#realtime-olap).
+The configuration can be found at [Real-time OLAP 
configuration](http://kylin.apache.org/docs/install/configuration.html#realtime-olap).
 The detail can be found at [Deep Dive into Real-time 
OLAP](http://kylin.apache.org/blog/2019/07/01/deep-dive-real-time-olap/).
+If you want to configure timezone or learn how to use lambda cube, please 
check this [Lambda Mode and 
Timezone](/docs/tutorial/lambda_mode_and_timezone_realtime_olap.html)
 
 ----
 
@@ -238,4 +239,4 @@ When the mouse pointer moves over the segment icon, the 
partition level statisti
 - Please make sure that the port 7070 and 9090 is not occupied. If you have to 
change port, please do this set `kylin.stream.node` in `kylin.properties` for 
receiver or coordinator separately.
 - If you find you have messed up and want to clean up, please remove streaming 
metadata in Zookeeper. 
 This can be done by executing `rmr PATH_TO_DELETE` in `zookeeper-client` 
shell. By default, the root dir of streaming metadata is under 
`kylin.env.zookeeper-base-path` + `kylin.metadata.url` + `/stream`. 
-For example, if you set `kylin.env.zookeeper-base-path` to `/kylin`， set 
`kylin.metadata.url` to `kylin_metadata@hbase`, you should delete path 
`/kylin/kylin_metadata/stream`.
\ No newline at end of file
+For example, if you set `kylin.env.zookeeper-base-path` to `/kylin`， set 
`kylin.metadata.url` to `kylin_metadata@hbase`, you should delete path 
`/kylin/kylin_metadata/stream`.
diff --git a/website/_docs30/tutorial/real_time_olap.md 
b/website/_docs30/tutorial/real_time_olap.md
index cd9de8e..3069b41 100644
--- a/website/_docs30/tutorial/real_time_olap.md
+++ b/website/_docs30/tutorial/real_time_olap.md
@@ -17,6 +17,7 @@ In this tutorial, we will use Hortonworks HDP-2.4.0.0.169 
Sandbox VM + Kafka v1.
 
 The configuration can be found at [Real-time OLAP 
configuration](http://kylin.apache.org/docs30/install/configuration.html#realtime-olap).
 The detail can be found at [Deep Dive into Real-time 
OLAP](http://kylin.apache.org/blog/2019/07/01/deep-dive-real-time-olap/).
+If you want to configure timezone or learn how to use lambda cube, please 
check this (/docs/tutorial/lambda_mode_and_timezone_realtime.html)
 
 ----
 
diff --git a/website/download/index.cn.md b/website/download/index.cn.md
index ea58420..5626d96 100644
--- a/website/download/index.cn.md
+++ b/website/download/index.cn.md
@@ -6,7 +6,7 @@ title: 下载
 您可以按照这些[步骤](https://www.apache.org/info/verification.html) 
并使用这些[KEYS](https://www.apache.org/dist/kylin/KEYS)来验证下载文件的有效性.
 
 #### v3.0.0
-- 这是 Kylin 在 2.x 版本后开发的包含实时 OLAP 等功能的新版本。使用该版本，Kylin 支持对流式数据的亚秒级查询。请访问 [实时 
OLAP 使用教程](/docs30/tutorial/realtime_olap.html) 和 [实时 OLAP 
博客](/blog/2019/04/12/rt-streaming-design/) 获取详情。
+- 这是 Kylin 在 2.x 版本后开发的包含实时 OLAP 等功能的新版本。使用该版本，Kylin 支持对流式数据的亚秒级查询。请访问 [实时 
OLAP 使用教程](/docs/tutorial/realtime_olap.html) 和 [实时 OLAP 
博客](/blog/2019/04/12/rt-streaming-design/) 获取详情。
 - [发布说明](/docs30/release_notes.html), [安装指南](/docs30/install/index.html) and 
[升级指南](/docs30/howto/howto_upgrade.html)
 - 源码下载: 
[apache-kylin-3.0.0-source-release.zip](https://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-3.0.0/apache-kylin-3.0.0-source-release.zip)
 
\[[asc](https://www.apache.org/dist/kylin/apache-kylin-3.0.0/apache-kylin-3.0.0-source-release.zip.asc)\]
 
\[[sha256](https://www.apache.org/dist/kylin/apache-kylin-3.0.0/apache-kylin-3.0.0-source-release.zip.sha256)\]
 - Hadoop 2 二进制包:
diff --git a/website/download/index.md b/website/download/index.md
index 71c1ff0..beeee89 100644
--- a/website/download/index.md
+++ b/website/download/index.md
@@ -7,7 +7,7 @@ permalink: /download/index.html
 You can verify the download by following these 
[procedures](https://www.apache.org/info/verification.html) and using these 
[KEYS](https://www.apache.org/dist/kylin/KEYS).
 
 #### v3.0.0
-- This is a release of Kylin's next generation after 2.x, with the new 
real-time OLAP feature, Kylin can query streaming data with sub-second latency. 
To learn about real-time OLAP, please visit [the tech 
blog](/blog/2019/04/12/rt-streaming-design/) and [the 
tutorial](/docs30/tutorial/realtime_olap.html) for real-time OLAP.
+- This is a release of Kylin's next generation after 2.x, with the new 
real-time OLAP feature, Kylin can query streaming data with sub-second latency. 
To learn about real-time OLAP, please visit [the tech 
blog](/blog/2019/04/12/rt-streaming-design/) and [the 
tutorial](/docs/tutorial/realtime_olap.html) for real-time OLAP.
 - [Release notes](/docs30/release_notes.html), [installation 
guide](/docs30/install/index.html) and [upgrade 
guide](/docs30/howto/howto_upgrade.html)
 - Source download: 
[apache-kylin-3.0.0-source-release.zip](https://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-3.0.0/apache-kylin-3.0.0-source-release.zip)
 
\[[asc](https://www.apache.org/dist/kylin/apache-kylin-3.0.0/apache-kylin-3.0.0-source-release.zip.asc)\]
 
\[[sha256](https://www.apache.org/dist/kylin/apache-kylin-3.0.0/apache-kylin-3.0.0-source-release.zip.sha256)\]
 - Binary for Hadoop 2 download:
diff --git a/website/images/RealtimeOlap/Before-Submit.png 
b/website/images/RealtimeOlap/Before-Submit.png
new file mode 100644
index 0000000..679a86a
Binary files /dev/null and b/website/images/RealtimeOlap/Before-Submit.png 
differ
diff --git a/website/images/RealtimeOlap/CreateStreamingModel.png 
b/website/images/RealtimeOlap/CreateStreamingModel.png
new file mode 100644
index 0000000..221414a
Binary files /dev/null and 
b/website/images/RealtimeOlap/CreateStreamingModel.png differ
diff --git a/website/images/RealtimeOlap/JobMonitor.png 
b/website/images/RealtimeOlap/JobMonitor.png
new file mode 100644
index 0000000..266d128
Binary files /dev/null and b/website/images/RealtimeOlap/JobMonitor.png differ
diff --git a/website/images/RealtimeOlap/LambdaCubeSegment.png 
b/website/images/RealtimeOlap/LambdaCubeSegment.png
new file mode 100644
index 0000000..600d3eb
Binary files /dev/null and b/website/images/RealtimeOlap/LambdaCubeSegment.png 
differ
diff --git a/website/images/RealtimeOlap/Table-Meta-1.png 
b/website/images/RealtimeOlap/Table-Meta-1.png
new file mode 100644
index 0000000..093d303
Binary files /dev/null and b/website/images/RealtimeOlap/Table-Meta-1.png differ
diff --git a/website/images/RealtimeOlap/Table-Meta-2.png 
b/website/images/RealtimeOlap/Table-Meta-2.png
new file mode 100644
index 0000000..820d0f2
Binary files /dev/null and b/website/images/RealtimeOlap/Table-Meta-2.png differ
diff --git a/website/images/RealtimeOlap/Table-Meta-3.png 
b/website/images/RealtimeOlap/Table-Meta-3.png
new file mode 100644
index 0000000..b3bc019
Binary files /dev/null and b/website/images/RealtimeOlap/Table-Meta-3.png differ
diff --git a/website/images/RealtimeOlap/Timezone-checkresult.png 
b/website/images/RealtimeOlap/Timezone-checkresult.png
new file mode 100644
index 0000000..2d32de1
Binary files /dev/null and 
b/website/images/RealtimeOlap/Timezone-checkresult.png differ

[kylin] branch document updated: Update configuration page for 3.0.0 realtime olap.

Reply via email to