[FLINK-5454] [docs] Add stub for docs on "Tuning for large state"
Project: http://git-wip-us.apache.org/repos/asf/flink/repo Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/7aad7514 Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/7aad7514 Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/7aad7514 Branch: refs/heads/release-1.2 Commit: 7aad7514ab8c9c371d02b8e4641c64e4d460d78d Parents: daad28a Author: Stephan Ewen <se...@apache.org> Authored: Mon Jan 9 20:01:38 2017 +0100 Committer: Stephan Ewen <se...@apache.org> Committed: Mon Jan 16 11:52:50 2017 +0100 ---------------------------------------------------------------------- docs/monitoring/README.md | 21 ++++++++++ docs/monitoring/large_state_tuning.md | 62 ++++++++++++++++++++++++++++++ docs/monitoring/rest_api.md | 2 +- 3 files changed, 84 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/flink/blob/7aad7514/docs/monitoring/README.md ---------------------------------------------------------------------- diff --git a/docs/monitoring/README.md b/docs/monitoring/README.md new file mode 100644 index 0000000..88c6509 --- /dev/null +++ b/docs/monitoring/README.md @@ -0,0 +1,21 @@ +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +This folder contains the documentation in the category +**Debugging & Monitoring**. http://git-wip-us.apache.org/repos/asf/flink/blob/7aad7514/docs/monitoring/large_state_tuning.md ---------------------------------------------------------------------- diff --git a/docs/monitoring/large_state_tuning.md b/docs/monitoring/large_state_tuning.md new file mode 100644 index 0000000..c49c106 --- /dev/null +++ b/docs/monitoring/large_state_tuning.md @@ -0,0 +1,62 @@ +--- +title: "Debugging and Tuning Checkpoints and Large State" +nav-parent_id: monitoring +nav-pos: 5 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +This page gives a guide how to improve and tune applications that use large state. + +* ToC +{:toc} + +## Monitoring State and Checkpoints + + - Checkpoint statistics overview + - Interpret time until checkpoints + - Synchronous vs. asynchronous checkpoint time + +## Tuning Checkpointing + + - Checkpoint interval + - Getting work done between checkpoints (min time between checkpoints) + +## Tuning Network Buffers + + - getting a good number of buffers to use + - monitoring if too many buffers cause too much inflight data + +## Make checkpointing asynchronous where possible + + - large state should be on keyed state, not operator state, because keyed state is managed, operator state not (subject to change in future versions) + + - asynchronous snapshots preferrable. long synchronous snapshot times can cause problems on large state and complex topogies. move to RocksDB for that + +## Tuning RocksDB + + - Predefined options + - Custom Options + +## Capacity planning + + - Normal operation should not be constantly back pressured (link to back pressure monitor) + - Allow for some excess capacity to support catch-up in case of failures and checkpoint alignment skew (due to data skew or bad nodes) + + http://git-wip-us.apache.org/repos/asf/flink/blob/7aad7514/docs/monitoring/rest_api.md ---------------------------------------------------------------------- diff --git a/docs/monitoring/rest_api.md b/docs/monitoring/rest_api.md index 2da3726..d49dece 100644 --- a/docs/monitoring/rest_api.md +++ b/docs/monitoring/rest_api.md @@ -1,7 +1,7 @@ --- title: "Monitoring REST API" nav-parent_id: monitoring -nav-pos: 3 +nav-pos: 10 --- <!-- Licensed to the Apache Software Foundation (ASF) under one