This is an automated email from the ASF dual-hosted git repository. chesnay pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/flink-web.git
The following commit(s) were added to refs/heads/asf-site by this push: new d5c542ccd Rebuilt website d5c542ccd is described below commit d5c542ccdd7b7298e802f6eada56fe679db08ee6 Author: Chesnay Schepler <ches...@apache.org> AuthorDate: Fri Mar 24 11:00:02 2023 +0100 Rebuilt website --- .../02/06/announcing-apache-flink-1.2.0/index.html | 1 - .../index.html | 7 ++--- .../index.html | 1 - .../index.html | 1 - .../index.html | 1 - .../index.html | 1 - .../index.html | 1 - .../index.html | 1 - .../index.html | 1 - .../index.html | 1 - .../flink-community-update-september19/index.html | 1 - .../index.html | 1 - .../index.html | 3 -- .../30/flink-community-update-april20/index.html | 1 - .../index.html | 1 - .../index.html | 1 - .../05/06/flink-community-update-may20/index.html | 1 - .../index.html | 1 - .../06/10/flink-community-update-june20/index.html | 1 - .../index.html | 1 - .../07/29/flink-community-update-july20/index.html | 7 ++--- .../04/flink-community-update-august20/index.html | 3 -- .../index.html | 3 +- .../index.html | 1 - .../index.html | 1 - .../index.html | 1 - .../index.html | 13 ++++----- .../index.html | 9 ++---- .../index.html | 1 - .../index.html | 1 - .../index.html | 9 ++---- .../index.html | 9 ++---- .../index.html | 1 - .../index.html | 3 -- .../index.html | 3 -- .../index.html | 1 - .../index.html | 3 -- .../index.html | 1 - .../index.html | 3 -- .../index.html | 7 ++--- .../index.html | 1 - .../01/20/pravega-flink-connector-101/index.html | 1 - .../index.html | 1 - .../02/22/scala-free-in-one-fifteen/index.html | 1 - .../the-generic-asynchronous-base-sink/index.html | 1 - .../index.html | 1 - .../index.html | 1 - .../index.html | 13 ++++----- .../index.html | 1 - content/index.xml | 34 +++++++--------------- content/posts/index.xml | 34 +++++++--------------- content/posts/page/10/index.html | 3 +- content/posts/page/11/index.html | 1 - content/posts/page/17/index.html | 3 +- content/posts/page/3/index.html | 5 ++-- content/posts/page/5/index.html | 5 +--- content/posts/page/6/index.html | 5 +--- content/posts/page/7/index.html | 6 ++-- content/posts/page/8/index.html | 5 ++-- content/posts/page/9/index.html | 1 - .../how-to-contribute/contribute-code/index.html | 3 -- content/zh/how-to-contribute/index.xml | 1 - content/zh/index.xml | 1 - 63 files changed, 56 insertions(+), 180 deletions(-) diff --git a/content/2017/02/06/announcing-apache-flink-1.2.0/index.html b/content/2017/02/06/announcing-apache-flink-1.2.0/index.html index e6d17e908..505c8fd1b 100644 --- a/content/2017/02/06/announcing-apache-flink-1.2.0/index.html +++ b/content/2017/02/06/announcing-apache-flink-1.2.0/index.html @@ -927,7 +927,6 @@ https://github.com/alex-shpak/hugo-book <p>This is the third major release in the 1.x.y series. It is API compatible with the other 1.x.y releases for APIs annotated with the @Public annotation.</p> <p>We encourage everyone to download the release and check out the <a href="//nightlies.apache.org/flinkflink-docs-release-1.2/">documentation</a>. Feedback through the <a href="http://flink.apache.org/community.html#mailing-lists">Flink mailing lists</a> is, as always, gladly encouraged!</p> <p>You can find the binaries on the updated <a href="http://flink.apache.org/downloads.html">Downloads page</a>. Some highlights of the release are listed below.</p> -<p>{% toc %}</p> <h2 id="dynamic-scaling--key-groups"> Dynamic Scaling / Key Groups <a class="anchor" href="#dynamic-scaling--key-groups">#</a> diff --git a/content/2017/07/04/a-deep-dive-into-rescalable-state-in-apache-flink/index.html b/content/2017/07/04/a-deep-dive-into-rescalable-state-in-apache-flink/index.html index ba8058ed8..61debb7e3 100644 --- a/content/2017/07/04/a-deep-dive-into-rescalable-state-in-apache-flink/index.html +++ b/content/2017/07/04/a-deep-dive-into-rescalable-state-in-apache-flink/index.html @@ -6,12 +6,10 @@ <meta name="generator" content="Hugo 0.111.3"> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> -<meta name="description" content="Apache Flink 1.2.0, released in February 2017, introduced support for rescalable state. This post provides a detailed overview of stateful stream processing and rescalable state in Flink. {% toc %} -An Intro to Stateful Stream Processing # At a high level, we can consider state in stream processing as memory in operators that remembers information about past input and can be used to influence the processing of future input. +<meta name="description" content="Apache Flink 1.2.0, released in February 2017, introduced support for rescalable state. This post provides a detailed overview of stateful stream processing and rescalable state in Flink. An Intro to Stateful Stream Processing # At a high level, we can consider state in stream processing as memory in operators that remembers information about past input and can be used to influence the processing of future input. In contrast, operators in stateless stream processing only consider their current inputs, without further context and knowledge about the past."> <meta name="theme-color" content="#FFFFFF"><meta property="og:title" content="A Deep Dive into Rescalable State in Apache Flink" /> -<meta property="og:description" content="Apache Flink 1.2.0, released in February 2017, introduced support for rescalable state. This post provides a detailed overview of stateful stream processing and rescalable state in Flink. {% toc %} -An Intro to Stateful Stream Processing # At a high level, we can consider state in stream processing as memory in operators that remembers information about past input and can be used to influence the processing of future input. +<meta property="og:description" content="Apache Flink 1.2.0, released in February 2017, introduced support for rescalable state. This post provides a detailed overview of stateful stream processing and rescalable state in Flink. An Intro to Stateful Stream Processing # At a high level, we can consider state in stream processing as memory in operators that remembers information about past input and can be used to influence the processing of future input. In contrast, operators in stateless stream processing only consider their current inputs, without further context and knowledge about the past." /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://flink.apache.org/2017/07/04/a-deep-dive-into-rescalable-state-in-apache-flink/" /><meta property="article:section" content="posts" /> @@ -921,7 +919,6 @@ https://github.com/alex-shpak/hugo-book <p><p><em>Apache Flink 1.2.0, released in February 2017, introduced support for rescalable state. This post provides a detailed overview of stateful stream processing and rescalable state in Flink.</em> <br> <br></p> -<p>{% toc %}</p> <h2 id="an-intro-to-stateful-stream-processing"> An Intro to Stateful Stream Processing <a class="anchor" href="#an-intro-to-stateful-stream-processing">#</a> diff --git a/content/2017/12/12/apache-flink-1.4.0-release-announcement/index.html b/content/2017/12/12/apache-flink-1.4.0-release-announcement/index.html index aa07bbf13..e63bf0836 100644 --- a/content/2017/12/12/apache-flink-1.4.0-release-announcement/index.html +++ b/content/2017/12/12/apache-flink-1.4.0-release-announcement/index.html @@ -959,7 +959,6 @@ releases for APIs annotated with the @Public annotation.</p> </ul> <p>A summary of some of the features in the release is available below.</p> <p>For more background on the Flink 1.4.0 release and the work planned for the Flink 1.5.0 release, please refer to <a href="http://flink.apache.org/news/2017/11/22/release-1.4-and-1.5-timeline.html">this blog post</a> on the Apache Flink blog.</p> -<p>{% toc %}</p> <h2 id="new-features-and-improvements"> New Features and Improvements <a class="anchor" href="#new-features-and-improvements">#</a> diff --git a/content/2018/05/18/apache-flink-1.5.0-release-announcement/index.html b/content/2018/05/18/apache-flink-1.5.0-release-announcement/index.html index e6b28756c..6edad2fb3 100644 --- a/content/2018/05/18/apache-flink-1.5.0-release-announcement/index.html +++ b/content/2018/05/18/apache-flink-1.5.0-release-announcement/index.html @@ -936,7 +936,6 @@ https://github.com/alex-shpak/hugo-book <p>We encourage everyone to <a href="http://flink.apache.org/downloads.html">download the release</a> and check out the <a href="//nightlies.apache.org/flinkflink-docs-release-1.5/">documentation</a>. Feedback through the Flink <a href="http://flink.apache.org/community.html#mailing-lists">mailing lists</a> or <a href="https://issues.apache.org/jira/projects/FLINK/summary">JIRA</a> is, as always, very much appreciated!</p> <p>You can find the binaries on the updated <a href="http://flink.apache.org/downloads.html">Downloads page</a> on the Flink project site.</p> -<p>{% toc %}</p> <h2 id="flink-15---streaming-evolved"> Flink 1.5 - Streaming Evolved <a class="anchor" href="#flink-15---streaming-evolved">#</a> diff --git a/content/2018/08/09/apache-flink-1.6.0-release-announcement/index.html b/content/2018/08/09/apache-flink-1.6.0-release-announcement/index.html index 4b03c7811..38cc541c3 100644 --- a/content/2018/08/09/apache-flink-1.6.0-release-announcement/index.html +++ b/content/2018/08/09/apache-flink-1.6.0-release-announcement/index.html @@ -935,7 +935,6 @@ https://github.com/alex-shpak/hugo-book <p>We encourage everyone to <a href="http://flink.apache.org/downloads.html">download the release</a> and check out the <a href="//nightlies.apache.org/flinkflink-docs-release-1.6/">documentation</a>. Feedback through the Flink <a href="http://flink.apache.org/community.html#mailing-lists">mailing lists</a> or <a href="https://issues.apache.org/jira/projects/FLINK/summary">JIRA</a> is, as always, very much appreciated!</p> <p>You can find the binaries on the updated <a href="http://flink.apache.org/downloads.html">Downloads page</a> on the Flink project site.</p> -<p>{% toc %}</p> <h2 id="flink-16---the-next-step-in-stateful-stream-processing"> Flink 1.6 - The next step in stateful stream processing <a class="anchor" href="#flink-16---the-next-step-in-stateful-stream-processing">#</a> diff --git a/content/2018/11/30/apache-flink-1.7.0-release-announcement/index.html b/content/2018/11/30/apache-flink-1.7.0-release-announcement/index.html index 8715e92a1..7d5dd764f 100644 --- a/content/2018/11/30/apache-flink-1.7.0-release-announcement/index.html +++ b/content/2018/11/30/apache-flink-1.7.0-release-announcement/index.html @@ -926,7 +926,6 @@ Please check the <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa The release is available now and we encourage everyone to <a href="http://flink.apache.org/downloads.html">download the release</a> and check out the updated <a href="//nightlies.apache.org/flinkflink-docs-release-1.7/">documentation</a>. Feedback through the Flink <a href="http://flink.apache.org/community.html#mailing-lists">mailing lists</a> or <a href="https://issues.apache.org/jira/projects/FLINK/summary">JIRA</a> is, as always, very much appreciated!</p> <p>You can find the binaries on the updated <a href="http://flink.apache.org/downloads.html">Downloads page</a> on the Flink project site.</p> -<p>{% toc %}</p> <h2 id="flink-170---extending-the-reach-of-stream-processing"> Flink 1.7.0 - Extending the reach of Stream Processing <a class="anchor" href="#flink-170---extending-the-reach-of-stream-processing">#</a> diff --git a/content/2019/04/09/apache-flink-1.8.0-release-announcement/index.html b/content/2019/04/09/apache-flink-1.8.0-release-announcement/index.html index 2eb299168..ecd93453e 100644 --- a/content/2019/04/09/apache-flink-1.8.0-release-announcement/index.html +++ b/content/2019/04/09/apache-flink-1.8.0-release-announcement/index.html @@ -935,7 +935,6 @@ lists</a> or <a href="https://issues.apache.org/jira/projects/FLINK/summary">JIRA</a> is, as always, very much appreciated!</p> <p>You can find the binaries on the updated <a href="/downloads.html">Downloads page</a> on the Flink project site.</p> -<p>{% toc %}</p> <p>With Flink 1.8.0 we come closer to our goals of enabling fast data processing and building data-intensive applications for the Flink community in a seamless way. We do this by cleaning up and refactoring Flink under the hood to allow diff --git a/content/2019/06/05/a-deep-dive-into-flinks-network-stack/index.html b/content/2019/06/05/a-deep-dive-into-flinks-network-stack/index.html index 7b4dd1ca1..6b3fe7a4c 100644 --- a/content/2019/06/05/a-deep-dive-into-flinks-network-stack/index.html +++ b/content/2019/06/05/a-deep-dive-into-flinks-network-stack/index.html @@ -941,7 +941,6 @@ https://github.com/alex-shpak/hugo-book </style> <p>Flink’s network stack is one of the core components that make up the <code>flink-runtime</code> module and sit at the heart of every Flink job. It connects individual work units (subtasks) from all TaskManagers. This is where your streamed-in data flows through and it is therefore crucial to the performance of your Flink job for both the throughput as well as latency you observe. In contrast to the coordination channels between TaskManagers and JobManagers which are using RPCs via Akk [...] <p>This blog post is the first in a series of posts about the network stack. In the sections below, we will first have a high-level look at what abstractions are exposed to the stream operators and then go into detail on the physical implementation and various optimisations Flink did. We will briefly present the result of these optimisations and Flink’s trade-off between throughput and latency. Future blog posts in this series will elaborate more on monitoring and metrics, tuning paramet [...] -<p>{% toc %}</p> <h2 id="logical-view"> Logical View <a class="anchor" href="#logical-view">#</a> diff --git a/content/2019/07/23/flink-network-stack-vol.-2-monitoring-metrics-and-that-backpressure-thing/index.html b/content/2019/07/23/flink-network-stack-vol.-2-monitoring-metrics-and-that-backpressure-thing/index.html index 481c54257..586419232 100644 --- a/content/2019/07/23/flink-network-stack-vol.-2-monitoring-metrics-and-that-backpressure-thing/index.html +++ b/content/2019/07/23/flink-network-stack-vol.-2-monitoring-metrics-and-that-backpressure-thing/index.html @@ -937,7 +937,6 @@ https://github.com/alex-shpak/hugo-book .tg .tg-center{text-align:center;vertical-align:center} </style> <p>In a <a href="/2019/06/05/flink-network-stack.html">previous blog post</a>, we presented how Flink’s network stack works from the high-level abstractions to the low-level details. This second blog post in the series of network stack posts extends on this knowledge and discusses monitoring network-related metrics to identify effects such as backpressure or bottlenecks in throughput and latency. Although this post briefly covers what to do with backpressure, the topic of tuning the netw [...] -<p>{% toc %}</p> <h2 id="monitoring"> Monitoring <a class="anchor" href="#monitoring">#</a> diff --git a/content/2019/08/22/apache-flink-1.9.0-release-announcement/index.html b/content/2019/08/22/apache-flink-1.9.0-release-announcement/index.html index 03ea863d6..53abe2fc1 100644 --- a/content/2019/08/22/apache-flink-1.9.0-release-announcement/index.html +++ b/content/2019/08/22/apache-flink-1.9.0-release-announcement/index.html @@ -953,7 +953,6 @@ the community through the Flink <a href="https://flink.apache.org/community.html lists</a> or <a href="https://issues.apache.org/jira/projects/FLINK/summary">JIRA</a>. As always, feedback is very much appreciated!</p> -<p>{% toc %}</p> <h2 id="new-features-and-improvements"> New Features and Improvements <a class="anchor" href="#new-features-and-improvements">#</a> diff --git a/content/2019/09/05/flink-community-update-september19/index.html b/content/2019/09/05/flink-community-update-september19/index.html index a6d974620..b5c2cbd16 100644 --- a/content/2019/09/05/flink-community-update-september19/index.html +++ b/content/2019/09/05/flink-community-update-september19/index.html @@ -936,7 +936,6 @@ https://github.com/alex-shpak/hugo-book <p><p>This has been an exciting, fast-paced year for the Apache Flink community. But with over 10k messages across the mailing lists, 3k Jira tickets and 2k pull requests, it is not easy to keep up with the latest state of the project. Plus everything happening around it. With that in mind, we want to bring back regular community updates to the Flink blog.</p> <p>The first post in the series takes you on an little detour across the year, to freshen up and make sure you’re all up to date.</p> -<p>{% toc %}</p> <h1 id="the-year-so-far-in-flink"> The Year (so far) in Flink <a class="anchor" href="#the-year-so-far-in-flink">#</a> diff --git a/content/2020/02/11/apache-flink-1.10.0-release-announcement/index.html b/content/2020/02/11/apache-flink-1.10.0-release-announcement/index.html index 0fe02a22f..82e773d46 100644 --- a/content/2020/02/11/apache-flink-1.10.0-release-announcement/index.html +++ b/content/2020/02/11/apache-flink-1.10.0-release-announcement/index.html @@ -930,7 +930,6 @@ https://github.com/alex-shpak/hugo-book <p><p>The Apache Flink community is excited to hit the double digits and announce the release of Flink 1.10.0! As a result of the biggest community effort to date, with over 1.2k issues implemented and more than 200 contributors, this release introduces significant improvements to the overall performance and stability of Flink jobs, a preview of native Kubernetes integration and great advances in Python support (PyFlink).</p> <p>Flink 1.10 also marks the completion of the <a href="https://flink.apache.org/news/2019/08/22/release-1.9.0.html#preview-of-the-new-blink-sql-query-processor">Blink integration</a>, hardening streaming SQL and bringing mature batch processing to Flink with production-ready Hive integration and TPC-DS coverage. This blog post describes all major new features and improvements, important changes to be aware of and what to expect moving forward.</p> -<p>{% toc %}</p> <p>The binary distribution and source artifacts are now available on the updated <a href="/downloads.html">Downloads page</a> of the Flink website. For more details, check the complete <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345845">release changelog</a> and the <a href="//nightlies.apache.org/flinkflink-docs-release-1.10/">updated documentation</a>. We encourage you to download the release and share your feedback with the communit [...] <h2 id="new-features-and-improvements"> New Features and Improvements diff --git a/content/2020/03/27/flink-as-unified-engine-for-modern-data-warehousing-production-ready-hive-integration/index.html b/content/2020/03/27/flink-as-unified-engine-for-modern-data-warehousing-production-ready-hive-integration/index.html index 9fdbbe02f..3a94d9a77 100644 --- a/content/2020/03/27/flink-as-unified-engine-for-modern-data-warehousing-production-ready-hive-integration/index.html +++ b/content/2020/03/27/flink-as-unified-engine-for-modern-data-warehousing-production-ready-hive-integration/index.html @@ -7,13 +7,11 @@ <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="description" content="In this blog post, you will learn our motivation behind the Flink-Hive integration, and how Flink 1.10 can help modernize your data warehouse. -{% toc %} Introduction # What are some of the latest requirements for your data warehouse and data infrastructure in 2020? We’ve came up with some for you. Firstly, today’s business is shifting to a more real-time fashion, and thus demands abilities to process online streaming data with low latency for near-real-time or even real-time analytics."> <meta name="theme-color" content="#FFFFFF"><meta property="og:title" content="Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration" /> <meta property="og:description" content="In this blog post, you will learn our motivation behind the Flink-Hive integration, and how Flink 1.10 can help modernize your data warehouse. -{% toc %} Introduction # What are some of the latest requirements for your data warehouse and data infrastructure in 2020? We’ve came up with some for you. Firstly, today’s business is shifting to a more real-time fashion, and thus demands abilities to process online streaming data with low latency for near-real-time or even real-time analytics." /> @@ -936,7 +934,6 @@ https://github.com/alex-shpak/hugo-book <p><p>In this blog post, you will learn our motivation behind the Flink-Hive integration, and how Flink 1.10 can help modernize your data warehouse.</p> -<p>{% toc %}</p> <h2 id="introduction"> Introduction <a class="anchor" href="#introduction">#</a> diff --git a/content/2020/03/30/flink-community-update-april20/index.html b/content/2020/03/30/flink-community-update-april20/index.html index 406474328..b7f573e33 100644 --- a/content/2020/03/30/flink-community-update-april20/index.html +++ b/content/2020/03/30/flink-community-update-april20/index.html @@ -937,7 +937,6 @@ https://github.com/alex-shpak/hugo-book <p><p>While things slow down around us, the Apache Flink community is privileged to remain as active as ever. This blogpost combs through the past few months to give you an update on the state of things in Flink — from core releases to Stateful Functions; from some good old community stats to a new development blog.</p> <p>And since now it’s more important than ever to keep up the spirits, we’d like to invite you to join the <a href="https://www.flink-forward.org/sf-2020">Flink Forward Virtual Conference</a>, on April 22-24 (see <a href="#upcoming-events">Upcoming Events</a>). Hope to see you there!</p> -<p>{% toc %}</p> <h1 id="the-year-so-far-in-flink"> The Year (so far) in Flink <a class="anchor" href="#the-year-so-far-in-flink">#</a> diff --git a/content/2020/04/07/stateful-functions-2.0-an-event-driven-database-on-apache-flink/index.html b/content/2020/04/07/stateful-functions-2.0-an-event-driven-database-on-apache-flink/index.html index 56223fafe..c660281b0 100644 --- a/content/2020/04/07/stateful-functions-2.0-an-event-driven-database-on-apache-flink/index.html +++ b/content/2020/04/07/stateful-functions-2.0-an-event-driven-database-on-apache-flink/index.html @@ -935,7 +935,6 @@ https://github.com/alex-shpak/hugo-book This release marks a big milestone: Stateful Functions 2.0 is not only an API update, but the <strong>first version of an event-driven database</strong> that is built on Apache Flink.</p> <p>Stateful Functions 2.0 makes it possible to combine StateFun’s powerful approach to state and composition with the elasticity, rapid scaling/scale-to-zero and rolling upgrade capabilities of FaaS implementations like AWS Lambda and modern resource orchestration frameworks like Kubernetes.</p> <p>With these features, Stateful Functions 2.0 addresses <a href="https://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-3.pdf">two of the most cited shortcomings</a> of many FaaS setups today: consistent state and efficient messaging between functions.</p> -<p>{% toc %}</p> <h2 id="an-event-driven-database"> An Event-driven Database <a class="anchor" href="#an-event-driven-database">#</a> diff --git a/content/2020/04/15/flink-serialization-tuning-vol.-1-choosing-your-serializer-if-you-can/index.html b/content/2020/04/15/flink-serialization-tuning-vol.-1-choosing-your-serializer-if-you-can/index.html index 710c31e7a..9918b68b2 100644 --- a/content/2020/04/15/flink-serialization-tuning-vol.-1-choosing-your-serializer-if-you-can/index.html +++ b/content/2020/04/15/flink-serialization-tuning-vol.-1-choosing-your-serializer-if-you-can/index.html @@ -930,7 +930,6 @@ https://github.com/alex-shpak/hugo-book <p><p>Almost every Flink job has to exchange data between its operators and since these records may not only be sent to another instance in the same JVM but instead to a separate process, records need to be serialized to bytes first. Similarly, Flink’s off-heap state-backend is based on a local embedded RocksDB instance which is implemented in native C++ code and thus also needs transformation into bytes on every state access. Wire and state serialization alone can easily cost a lot [...] <p>Since serialization is so crucial to your Flink job, we would like to highlight Flink’s serialization stack in a series of blog posts starting with looking at the different ways Flink can serialize your data types.</p> -<p>{% toc %}</p> <h1 id="recap-flink-serialization"> Recap: Flink Serialization <a class="anchor" href="#recap-flink-serialization">#</a> diff --git a/content/2020/05/06/flink-community-update-may20/index.html b/content/2020/05/06/flink-community-update-may20/index.html index eba5dad48..7c0116fdc 100644 --- a/content/2020/05/06/flink-community-update-may20/index.html +++ b/content/2020/05/06/flink-community-update-may20/index.html @@ -934,7 +934,6 @@ https://github.com/alex-shpak/hugo-book <p><p>Can you smell it? It’s release month! It took a while, but now that we’re <a href="https://flink.apache.org/news/2020/04/01/community-update.html">all caught up with the past</a>, the Community Update is here to stay. This time around, we’re warming up for Flink 1.11 and peeping back to the month of April in the Flink community — with the release of Stateful Functions 2.0, a new self-paced Flink training and some efforts to improve the Flink documentation experience.</p> <p>Last month also marked the debut of Flink Forward Virtual Conference 2020: what did you think? If you missed it altogether or just want to recap some of the sessions, the <a href="https://www.youtube.com/playlist?list=PLDX4T_cnKjD0ngnBSU-bYGfgVv17MiwA7">videos</a> and <a href="https://www.slideshare.net/FlinkForward">slides</a> are now available!</p> -<p>{% toc %}</p> <h1 id="the-past-month-in-flink"> The Past Month in Flink <a class="anchor" href="#the-past-month-in-flink">#</a> diff --git a/content/2020/06/09/stateful-functions-2.1.0-release-announcement/index.html b/content/2020/06/09/stateful-functions-2.1.0-release-announcement/index.html index 910724594..aa4b8dcf2 100644 --- a/content/2020/06/09/stateful-functions-2.1.0-release-announcement/index.html +++ b/content/2020/06/09/stateful-functions-2.1.0-release-announcement/index.html @@ -924,7 +924,6 @@ https://github.com/alex-shpak/hugo-book <p><p>The Apache Flink community is happy to announce the release of Stateful Functions (StateFun) 2.1.0! This release introduces new features around state expiration and performance improvements for co-located deployments, as well as other important changes that improve the stability and testability of the project. As the community around StateFun grows, the release cycle will follow this pattern of smaller and more frequent releases to incorporate user feedback and allow for faster [...] <p>The binary distribution and source artifacts are now available on the updated <a href="https://flink.apache.org/downloads.html">Downloads</a> page of the Flink website, and the most recent Python SDK distribution is available on <a href="https://pypi.org/project/apache-flink-statefun/">PyPI</a>. For more details, check the complete <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12347861">release changelog</a> and the <a href="//nightlies [...] -<p>{% toc %}</p> <h2 id="new-features-and-improvements"> New Features and Improvements <a class="anchor" href="#new-features-and-improvements">#</a> diff --git a/content/2020/06/10/flink-community-update-june20/index.html b/content/2020/06/10/flink-community-update-june20/index.html index 8e32cecfc..a06d6a117 100644 --- a/content/2020/06/10/flink-community-update-june20/index.html +++ b/content/2020/06/10/flink-community-update-june20/index.html @@ -934,7 +934,6 @@ https://github.com/alex-shpak/hugo-book <p><p>And suddenly it’s June. The previous month has been calm on the surface, but quite hectic underneath — the final testing phase for Flink 1.11 is moving at full speed, Stateful Functions 2.1 is out in the wild and Flink has made it into Google Season of Docs 2020.</p> <p>To top it off, a piece of good news: <a href="https://www.flink-forward.org/global-2020">Flink Forward</a> is back on October 19-22 as a free virtual event!</p> -<p>{% toc %}</p> <h1 id="the-past-month-in-flink"> The Past Month in Flink <a class="anchor" href="#the-past-month-in-flink">#</a> diff --git a/content/2020/07/06/apache-flink-1.11.0-release-announcement/index.html b/content/2020/07/06/apache-flink-1.11.0-release-announcement/index.html index 032d492bd..058d8a418 100644 --- a/content/2020/07/06/apache-flink-1.11.0-release-announcement/index.html +++ b/content/2020/07/06/apache-flink-1.11.0-release-announcement/index.html @@ -950,7 +950,6 @@ https://github.com/alex-shpak/hugo-book </li> </ul> <p>Read on for all major new features and improvements, important changes to be aware of and what to expect moving forward!</p> -<p>{% toc %}</p> <p>The binary distribution and source artifacts are now available on the updated <a href="/downloads.html">Downloads page</a> of the Flink website, and the most recent distribution of PyFlink is available on <a href="https://pypi.org/project/apache-flink/">PyPI</a>. Please review the <a href="//nightlies.apache.org/flinkflink-docs-release-1.11/release-notes/flink-1.11.html">release notes</a> carefully, and check the complete <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa [...] <p>We encourage you to download the release and share your feedback with the community through the <a href="https://flink.apache.org/community.html#mailing-lists">Flink mailing lists</a> or <a href="https://issues.apache.org/jira/projects/FLINK/summary">JIRA</a>.</p> <h2 id="new-features-and-improvements"> diff --git a/content/2020/07/29/flink-community-update-july20/index.html b/content/2020/07/29/flink-community-update-july20/index.html index 8c78eb373..617c80c53 100644 --- a/content/2020/07/29/flink-community-update-july20/index.html +++ b/content/2020/07/29/flink-community-update-july20/index.html @@ -8,13 +8,11 @@ <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="description" content="As July draws to an end, we look back at a monthful of activity in the Flink community, including two releases (!) and some work around improving the first-time contribution experience in the project. Also, events are starting to pick up again, so we’ve put together a list of some great ones you can (virtually) attend in August! -{% toc %} -The Past Month in Flink # Flink Releases # Flink 1."> +The Past Month in Flink # Flink Releases # Flink 1.11 # A couple of weeks ago, Flink 1."> <meta name="theme-color" content="#FFFFFF"><meta property="og:title" content="Flink Community Update - July'20" /> <meta property="og:description" content="As July draws to an end, we look back at a monthful of activity in the Flink community, including two releases (!) and some work around improving the first-time contribution experience in the project. Also, events are starting to pick up again, so we’ve put together a list of some great ones you can (virtually) attend in August! -{% toc %} -The Past Month in Flink # Flink Releases # Flink 1." /> +The Past Month in Flink # Flink Releases # Flink 1.11 # A couple of weeks ago, Flink 1." /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://flink.apache.org/2020/07/29/flink-community-update-july20/" /><meta property="article:section" content="posts" /> <meta property="article:published_time" content="2020-07-29T08:00:00+00:00" /> @@ -939,7 +937,6 @@ https://github.com/alex-shpak/hugo-book <p><p>As July draws to an end, we look back at a monthful of activity in the Flink community, including two releases (!) and some work around improving the first-time contribution experience in the project.</p> <p>Also, events are starting to pick up again, so we’ve put together a list of some great ones you can (virtually) attend in August!</p> -<p>{% toc %}</p> <h1 id="the-past-month-in-flink"> The Past Month in Flink <a class="anchor" href="#the-past-month-in-flink">#</a> diff --git a/content/2020/09/04/flink-community-update-august20/index.html b/content/2020/09/04/flink-community-update-august20/index.html index 7d5cc15ac..918bd404b 100644 --- a/content/2020/09/04/flink-community-update-august20/index.html +++ b/content/2020/09/04/flink-community-update-august20/index.html @@ -7,11 +7,9 @@ <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="description" content="Ah, so much for a quiet August month. This time around, we bring you some new Flink Improvement Proposals (FLIPs), a preview of the upcoming Flink Stateful Functions 2.2 release and a look into how far Flink has come in comparison to 2019. -{% toc %} The Past Month in Flink # Flink Releases # Getting Ready for Flink Stateful Functions 2.2 # The details of the next release of Stateful Functions are under discussion in this @dev mailing list thread, and the feature freeze is set for September 10th — so, you can expect Stateful Functions 2."> <meta name="theme-color" content="#FFFFFF"><meta property="og:title" content="Flink Community Update - August'20" /> <meta property="og:description" content="Ah, so much for a quiet August month. This time around, we bring you some new Flink Improvement Proposals (FLIPs), a preview of the upcoming Flink Stateful Functions 2.2 release and a look into how far Flink has come in comparison to 2019. -{% toc %} The Past Month in Flink # Flink Releases # Getting Ready for Flink Stateful Functions 2.2 # The details of the next release of Stateful Functions are under discussion in this @dev mailing list thread, and the feature freeze is set for September 10th — so, you can expect Stateful Functions 2." /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://flink.apache.org/2020/09/04/flink-community-update-august20/" /><meta property="article:section" content="posts" /> @@ -936,7 +934,6 @@ https://github.com/alex-shpak/hugo-book <p><p>Ah, so much for a quiet August month. This time around, we bring you some new Flink Improvement Proposals (FLIPs), a preview of the upcoming <a href="//nightlies.apache.org/flinkflink-statefun-docs-master/">Flink Stateful Functions</a> 2.2 release and a look into how far Flink has come in comparison to 2019.</p> -<p>{% toc %}</p> <h1 id="the-past-month-in-flink"> The Past Month in Flink <a class="anchor" href="#the-past-month-in-flink">#</a> diff --git a/content/2020/09/28/stateful-functions-2.2.0-release-announcement/index.html b/content/2020/09/28/stateful-functions-2.2.0-release-announcement/index.html index 8e863cf1f..1ad85674d 100644 --- a/content/2020/09/28/stateful-functions-2.2.0-release-announcement/index.html +++ b/content/2020/09/28/stateful-functions-2.2.0-release-announcement/index.html @@ -949,8 +949,7 @@ page of the Flink website, and the most recent Python SDK distribution is availa For more details, check the complete <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12348350">release changelog</a> and the <a href="//nightlies.apache.org/flinkflink-statefun-docs-release-2.2/">updated documentation</a>. We encourage you to download the release and share your feedback with the community through the <a href="https://flink.apache.org/community.html#mailing-lists">Flink mailing lists</a> -or <a href="https://issues.apache.org/jira/browse/">JIRA</a>! -{% toc %}</p> +or <a href="https://issues.apache.org/jira/browse/">JIRA</a>!</p> <h2 id="new-features"> New Features <a class="anchor" href="#new-features">#</a> diff --git a/content/2020/10/13/stateful-functions-internals-behind-the-scenes-of-stateful-serverless/index.html b/content/2020/10/13/stateful-functions-internals-behind-the-scenes-of-stateful-serverless/index.html index 6e8b341fa..3cdb93042 100644 --- a/content/2020/10/13/stateful-functions-internals-behind-the-scenes-of-stateful-serverless/index.html +++ b/content/2020/10/13/stateful-functions-internals-behind-the-scenes-of-stateful-serverless/index.html @@ -934,7 +934,6 @@ Most significantly, in the demo, the stateful functions are deployed and service a popular FaaS platform among many others. The goal here is to allow readers to have a good grasp of the interaction between the StateFun runtime and the functions, how that works cohesively to provide a Stateful Serverless experience, and how they can apply what they’ve learnt to deploy their StateFun applications on other public cloud offerings such as GCP or Microsoft Azure.</p> -<p>{% toc %}</p> <h2 id="introducing-the-example-shopping-cart"> Introducing the example: Shopping Cart <a class="anchor" href="#introducing-the-example-shopping-cart">#</a> diff --git a/content/2020/12/02/improvements-in-task-scheduling-for-batch-workloads-in-apache-flink-1.12/index.html b/content/2020/12/02/improvements-in-task-scheduling-for-batch-workloads-in-apache-flink-1.12/index.html index 61b6fd63b..4c15cbb17 100644 --- a/content/2020/12/02/improvements-in-task-scheduling-for-batch-workloads-in-apache-flink-1.12/index.html +++ b/content/2020/12/02/improvements-in-task-scheduling-for-batch-workloads-in-apache-flink-1.12/index.html @@ -942,7 +942,6 @@ Achieving this involves touching a lot of different components of the Flink stac to low-level operator processes such as task scheduling. In this blogpost, we’ll take a closer look at how far the community has come in improving scheduling for batch workloads, why this matters and what you can expect in the Flink 1.12 release with the new <em>pipelined region scheduler</em>.</p> -<p>{% toc %}</p> <h1 id="towards-unified-scheduling"> Towards unified scheduling <a class="anchor" href="#towards-unified-scheduling">#</a> diff --git a/content/2020/12/10/apache-flink-1.12.0-release-announcement/index.html b/content/2020/12/10/apache-flink-1.12.0-release-announcement/index.html index 5b863ae4d..7e073db44 100644 --- a/content/2020/12/10/apache-flink-1.12.0-release-announcement/index.html +++ b/content/2020/12/10/apache-flink-1.12.0-release-announcement/index.html @@ -956,7 +956,6 @@ https://github.com/alex-shpak/hugo-book </li> </ul> <p>This blog post describes all major new features and improvements, important changes to be aware of and what to expect moving forward.</p> -<p>{% toc %}</p> <p>The binary distribution and source artifacts are now available on the updated <a href="/downloads.html">Downloads page</a> of the Flink website, and the most recent distribution of PyFlink is available on <a href="https://pypi.org/project/apache-flink/">PyPI</a>. Please review the <a href="//nightlies.apache.org/flinkflink-docs-release-1.12/release-notes/flink-1.12.html">release notes</a> carefully, and check the complete <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa [...] <p>We encourage you to download the release and share your feedback with the community through the <a href="https://flink.apache.org/community.html#mailing-lists">Flink mailing lists</a> or <a href="https://issues.apache.org/jira/projects/FLINK/summary">JIRA</a>.</p> <h2 id="new-features-and-improvements"> diff --git a/content/2021/01/11/exploring-fine-grained-recovery-of-bounded-data-sets-on-flink/index.html b/content/2021/01/11/exploring-fine-grained-recovery-of-bounded-data-sets-on-flink/index.html index 37b79dc9f..39488403c 100644 --- a/content/2021/01/11/exploring-fine-grained-recovery-of-bounded-data-sets-on-flink/index.html +++ b/content/2021/01/11/exploring-fine-grained-recovery-of-bounded-data-sets-on-flink/index.html @@ -6,13 +6,11 @@ <meta name="generator" content="Hugo 0.111.3"> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> -<meta name="description" content="{% toc %} -Apache Flink is a very versatile tool for all kinds of data processing workloads. It can process incoming data within a few milliseconds or crunch through petabytes of bounded datasets (also known as batch processing). -Processing efficiency is not the only parameter users of data processing systems care about. In the real world, system outages due to hardware or software failure are expected to happen all the time."> +<meta name="description" content="Apache Flink is a very versatile tool for all kinds of data processing workloads. It can process incoming data within a few milliseconds or crunch through petabytes of bounded datasets (also known as batch processing). +Processing efficiency is not the only parameter users of data processing systems care about. In the real world, system outages due to hardware or software failure are expected to happen all the time. For unbounded (or streaming) workloads, Flink is using periodic checkpoints to allow for reliable and correct recovery."> <meta name="theme-color" content="#FFFFFF"><meta property="og:title" content="Exploring fine-grained recovery of bounded data sets on Flink" /> -<meta property="og:description" content="{% toc %} -Apache Flink is a very versatile tool for all kinds of data processing workloads. It can process incoming data within a few milliseconds or crunch through petabytes of bounded datasets (also known as batch processing). -Processing efficiency is not the only parameter users of data processing systems care about. In the real world, system outages due to hardware or software failure are expected to happen all the time." /> +<meta property="og:description" content="Apache Flink is a very versatile tool for all kinds of data processing workloads. It can process incoming data within a few milliseconds or crunch through petabytes of bounded datasets (also known as batch processing). +Processing efficiency is not the only parameter users of data processing systems care about. In the real world, system outages due to hardware or software failure are expected to happen all the time. For unbounded (or streaming) workloads, Flink is using periodic checkpoints to allow for reliable and correct recovery." /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://flink.apache.org/2021/01/11/exploring-fine-grained-recovery-of-bounded-data-sets-on-flink/" /><meta property="article:section" content="posts" /> <meta property="article:published_time" content="2021-01-11T00:00:00+00:00" /> @@ -926,8 +924,7 @@ https://github.com/alex-shpak/hugo-book - <p><p>{% toc %}</p> -<p>Apache Flink is a very versatile tool for all kinds of data processing workloads. It can process incoming data within a few milliseconds or crunch through petabytes of bounded datasets (also known as batch processing).</p> + <p><p>Apache Flink is a very versatile tool for all kinds of data processing workloads. It can process incoming data within a few milliseconds or crunch through petabytes of bounded datasets (also known as batch processing).</p> <p>Processing efficiency is not the only parameter users of data processing systems care about. In the real world, system outages due to hardware or software failure are expected to happen all the time. For unbounded (or streaming) workloads, Flink is using periodic checkpoints to allow for reliable and correct recovery. In case of bounded data sets, having a reliable recovery mechanism is mission critical — as users do not want to potentially lose many hours of intermediate processing r [...] <p>Apache Flink 1.9 introduced <a href="https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures">fine-grained recovery</a> into its internal workload scheduler. The Flink APIs that are made for bounded workloads benefit from this change by individually recovering failed operators, re-using results from the previous processing step.</p> <p>In this blog post, we are going to give an overview over these changes, and we will experimentally validate their effectiveness.</p> diff --git a/content/2021/03/11/a-rundown-of-batch-execution-mode-in-the-datastream-api/index.html b/content/2021/03/11/a-rundown-of-batch-execution-mode-in-the-datastream-api/index.html index 8dcd09b49..26ad5a1ab 100644 --- a/content/2021/03/11/a-rundown-of-batch-execution-mode-in-the-datastream-api/index.html +++ b/content/2021/03/11/a-rundown-of-batch-execution-mode-in-the-datastream-api/index.html @@ -6,11 +6,9 @@ <meta name="generator" content="Hugo 0.111.3"> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> -<meta name="description" content="{% toc %} -Flink has been following the mantra that Batch is a Special Case of Streaming since the very early days. As the project evolved to address specific uses cases, different core APIs ended up being implemented for batch (DataSet API) and streaming execution (DataStream API), but the higher-level Table API/SQL was subsequently designed following this mantra of unification. With Flink 1.12, the community worked on bringing a similarly unified behaviour to the DataStream API, and took the firs [...] +<meta name="description" content="Flink has been following the mantra that Batch is a Special Case of Streaming since the very early days. As the project evolved to address specific uses cases, different core APIs ended up being implemented for batch (DataSet API) and streaming execution (DataStream API), but the higher-level Table API/SQL was subsequently designed following this mantra of unification. With Flink 1.12, the community worked on bringing a similarly unified behaviour to the [...] <meta name="theme-color" content="#FFFFFF"><meta property="og:title" content="A Rundown of Batch Execution Mode in the DataStream API" /> -<meta property="og:description" content="{% toc %} -Flink has been following the mantra that Batch is a Special Case of Streaming since the very early days. As the project evolved to address specific uses cases, different core APIs ended up being implemented for batch (DataSet API) and streaming execution (DataStream API), but the higher-level Table API/SQL was subsequently designed following this mantra of unification. With Flink 1.12, the community worked on bringing a similarly unified behaviour to the DataStream API, and took the firs [...] +<meta property="og:description" content="Flink has been following the mantra that Batch is a Special Case of Streaming since the very early days. As the project evolved to address specific uses cases, different core APIs ended up being implemented for batch (DataSet API) and streaming execution (DataStream API), but the higher-level Table API/SQL was subsequently designed following this mantra of unification. With Flink 1.12, the community worked on bringing a similarly unified behaviour [...] <meta property="og:type" content="article" /> <meta property="og:url" content="https://flink.apache.org/2021/03/11/a-rundown-of-batch-execution-mode-in-the-datastream-api/" /><meta property="article:section" content="posts" /> <meta property="article:published_time" content="2021-03-11T00:00:00+00:00" /> @@ -923,8 +921,7 @@ https://github.com/alex-shpak/hugo-book - <p><p>{% toc %}</p> -<p>Flink has been following the mantra that <a href="https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html">Batch is a Special Case of Streaming</a> since the very early days. As the project evolved to address specific uses cases, different core APIs ended up being implemented for <em>batch</em> (DataSet API) and <em>streaming</em> execution (DataStream API), but the higher-level Table API/SQL was subsequently designed following this mantra of <em>unification</em>. [...] + <p><p>Flink has been following the mantra that <a href="https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html">Batch is a Special Case of Streaming</a> since the very early days. As the project evolved to address specific uses cases, different core APIs ended up being implemented for <em>batch</em> (DataSet API) and <em>streaming</em> execution (DataStream API), but the higher-level Table API/SQL was subsequently designed following this mantra of <em>unification [...] <p>The idea behind making the DataStream API a unified abstraction for <em>batch</em> and <em>streaming</em> execution instead of maintaining separate APIs is two-fold:</p> <ul> <li> diff --git a/content/2021/04/15/stateful-functions-3.0.0-remote-functions-front-and-center/index.html b/content/2021/04/15/stateful-functions-3.0.0-remote-functions-front-and-center/index.html index e0d0ffba6..715d29f6b 100644 --- a/content/2021/04/15/stateful-functions-3.0.0-remote-functions-front-and-center/index.html +++ b/content/2021/04/15/stateful-functions-3.0.0-remote-functions-front-and-center/index.html @@ -938,7 +938,6 @@ to develop scalable, consistent, and elastic distributed applications.</p> separates the application logic from the StateFun cluster the default. It is now easier, more efficient, and more ergonomic to write applications that live in their own processes or containers. With the new Java SDK this is now also possible for all JVM languages, in addition to Python.</p> -<p>{% toc %}</p> <h2 id="background"> Background <a class="anchor" href="#background">#</a> diff --git a/content/2021/05/03/apache-flink-1.13.0-release-announcement/index.html b/content/2021/05/03/apache-flink-1.13.0-release-announcement/index.html index 6b7f00892..1c911e450 100644 --- a/content/2021/05/03/apache-flink-1.13.0-release-announcement/index.html +++ b/content/2021/05/03/apache-flink-1.13.0-release-announcement/index.html @@ -960,7 +960,6 @@ up.</p> some of which we discuss in this article. We hope you enjoy the new release and features. Towards the end of the article, we describe changes to be aware of when upgrading from earlier versions of Apache Flink.</p> -<p>{% toc %}</p> <p>We encourage you to <a href="https://flink.apache.org/downloads.html">download the release</a> and share your feedback with the community through the <a href="https://flink.apache.org/community.html#mailing-lists">Flink mailing lists</a> diff --git a/content/2021/05/06/scaling-flink-automatically-with-reactive-mode/index.html b/content/2021/05/06/scaling-flink-automatically-with-reactive-mode/index.html index c8bc25990..7d3a5d709 100644 --- a/content/2021/05/06/scaling-flink-automatically-with-reactive-mode/index.html +++ b/content/2021/05/06/scaling-flink-automatically-with-reactive-mode/index.html @@ -6,11 +6,9 @@ <meta name="generator" content="Hugo 0.111.3"> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> -<meta name="description" content="{% toc %} -Introduction # Streaming jobs which run for several days or longer usually experience variations in workload during their lifetime. These variations can originate from seasonal spikes, such as day vs. night, weekdays vs. weekend or holidays vs. non-holidays, sudden events or simply the growing popularity of your product. Although some of these variations are more predictable than others, in all cases there is a change in job resource demand that needs to be addressed if you want to ensur [...] +<meta name="description" content="Introduction # Streaming jobs which run for several days or longer usually experience variations in workload during their lifetime. These variations can originate from seasonal spikes, such as day vs. night, weekdays vs. weekend or holidays vs. non-holidays, sudden events or simply the growing popularity of your product. Although some of these variations are more predictable than others, in all cases there is a change in job resource demand that needs to [...] <meta name="theme-color" content="#FFFFFF"><meta property="og:title" content="Scaling Flink automatically with Reactive Mode" /> -<meta property="og:description" content="{% toc %} -Introduction # Streaming jobs which run for several days or longer usually experience variations in workload during their lifetime. These variations can originate from seasonal spikes, such as day vs. night, weekdays vs. weekend or holidays vs. non-holidays, sudden events or simply the growing popularity of your product. Although some of these variations are more predictable than others, in all cases there is a change in job resource demand that needs to be addressed if you want to ensur [...] +<meta property="og:description" content="Introduction # Streaming jobs which run for several days or longer usually experience variations in workload during their lifetime. These variations can originate from seasonal spikes, such as day vs. night, weekdays vs. weekend or holidays vs. non-holidays, sudden events or simply the growing popularity of your product. Although some of these variations are more predictable than others, in all cases there is a change in job resource demand that n [...] <meta property="og:type" content="article" /> <meta property="og:url" content="https://flink.apache.org/2021/05/06/scaling-flink-automatically-with-reactive-mode/" /><meta property="article:section" content="posts" /> <meta property="article:published_time" content="2021-05-06T00:00:00+00:00" /> @@ -925,8 +923,7 @@ https://github.com/alex-shpak/hugo-book - <p><p>{% toc %}</p> -<h2 id="introduction"> + <p><h2 id="introduction"> Introduction <a class="anchor" href="#introduction">#</a> </h2> diff --git a/content/2021/07/07/how-to-identify-the-source-of-backpressure/index.html b/content/2021/07/07/how-to-identify-the-source-of-backpressure/index.html index 8d330c131..5805064b7 100644 --- a/content/2021/07/07/how-to-identify-the-source-of-backpressure/index.html +++ b/content/2021/07/07/how-to-identify-the-source-of-backpressure/index.html @@ -6,12 +6,10 @@ <meta name="generator" content="Hugo 0.111.3"> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> -<meta name="description" content="{% toc %} -Backpressure monitoring in the web UI +<meta name="description" content="Backpressure monitoring in the web UI The backpressure topic was tackled from different angles over the last couple of years. However, when it comes to identifying and analyzing sources of backpressure, things have changed quite a bit in the recent Flink releases (especially with new additions to metrics and the web UI in Flink 1.13). This post will try to clarify some of these changes and go into more detail about how to track down the source of backpressure, but first…"> <meta name="theme-color" content="#FFFFFF"><meta property="og:title" content="How to identify the source of backpressure?" /> -<meta property="og:description" content="{% toc %} -Backpressure monitoring in the web UI +<meta property="og:description" content="Backpressure monitoring in the web UI The backpressure topic was tackled from different angles over the last couple of years. However, when it comes to identifying and analyzing sources of backpressure, things have changed quite a bit in the recent Flink releases (especially with new additions to metrics and the web UI in Flink 1.13). This post will try to clarify some of these changes and go into more detail about how to track down the source of backpressure, but first…" /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://flink.apache.org/2021/07/07/how-to-identify-the-source-of-backpressure/" /><meta property="article:section" content="posts" /> @@ -930,8 +928,7 @@ https://github.com/alex-shpak/hugo-book - <p><p>{% toc %}</p> -<div class="row front-graphic"> + <p><div class="row front-graphic"> <img src="/img/blog/2021-07-07-backpressure/animated.png" alt="Backpressure monitoring in the web UI"/> <p class="align-center">Backpressure monitoring in the web UI</p> </div> diff --git a/content/2021/08/31/stateful-functions-3.1.0-release-announcement/index.html b/content/2021/08/31/stateful-functions-3.1.0-release-announcement/index.html index 39f59df5f..68613f973 100644 --- a/content/2021/08/31/stateful-functions-3.1.0-release-announcement/index.html +++ b/content/2021/08/31/stateful-functions-3.1.0-release-announcement/index.html @@ -946,7 +946,6 @@ You can also find official StateFun Docker images of the new version on <a href= and the <a href="//nightlies.apache.org/flinkflink-statefun-docs-release-3.0/">updated documentation</a>. We encourage you to download the release and share your feedback with the community through the <a href="https://flink.apache.org/community.html#mailing-lists">Flink mailing lists</a> or <a href="https://issues.apache.org/jira/browse/">JIRA</a>!</p> -<p>{% toc %}</p> <h2 id="new-features"> New Features <a class="anchor" href="#new-features">#</a> diff --git a/content/2021/09/07/implementing-a-custom-source-connector-for-table-api-and-sql-part-one/index.html b/content/2021/09/07/implementing-a-custom-source-connector-for-table-api-and-sql-part-one/index.html index 47fd1d0af..05a9116d5 100644 --- a/content/2021/09/07/implementing-a-custom-source-connector-for-table-api-and-sql-part-one/index.html +++ b/content/2021/09/07/implementing-a-custom-source-connector-for-table-api-and-sql-part-one/index.html @@ -7,11 +7,9 @@ <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="description" content="Part one of this tutorial will teach you how to build and run a custom source connector to be used with Table API and SQL, two high-level abstractions in Flink. The tutorial comes with a bundled docker-compose setup that lets you easily run the connector. You can then try it out with Flink’s SQL client. -{% toc %} Introduction # Apache Flink is a data processing engine that aims to keep state locally in order to do computations efficiently."> <meta name="theme-color" content="#FFFFFF"><meta property="og:title" content="Implementing a Custom Source Connector for Table API and SQL - Part One " /> <meta property="og:description" content="Part one of this tutorial will teach you how to build and run a custom source connector to be used with Table API and SQL, two high-level abstractions in Flink. The tutorial comes with a bundled docker-compose setup that lets you easily run the connector. You can then try it out with Flink’s SQL client. -{% toc %} Introduction # Apache Flink is a data processing engine that aims to keep state locally in order to do computations efficiently." /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://flink.apache.org/2021/09/07/implementing-a-custom-source-connector-for-table-api-and-sql-part-one/" /><meta property="article:section" content="posts" /> @@ -916,7 +914,6 @@ https://github.com/alex-shpak/hugo-book <p><p>Part one of this tutorial will teach you how to build and run a custom source connector to be used with Table API and SQL, two high-level abstractions in Flink. The tutorial comes with a bundled <a href="https://docs.docker.com/compose/">docker-compose</a> setup that lets you easily run the connector. You can then try it out with Flink’s SQL client.</p> -<p>{% toc %}</p> <h1 id="introduction"> Introduction <a class="anchor" href="#introduction">#</a> diff --git a/content/2021/09/07/implementing-a-custom-source-connector-for-table-api-and-sql-part-two/index.html b/content/2021/09/07/implementing-a-custom-source-connector-for-table-api-and-sql-part-two/index.html index 5853ad053..c76b6ac5a 100644 --- a/content/2021/09/07/implementing-a-custom-source-connector-for-table-api-and-sql-part-two/index.html +++ b/content/2021/09/07/implementing-a-custom-source-connector-for-table-api-and-sql-part-two/index.html @@ -7,12 +7,10 @@ <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="description" content="In part one of this tutorial, you learned how to build a custom source connector for Flink. In part two, you will learn how to integrate the connector with a test email inbox through the IMAP protocol and filter out emails using Flink SQL. -{% toc %} Goals # Part two of the tutorial will teach you how to: integrate a source connector which connects to a mailbox using the IMAP protocol use Jakarta Mail, a Java library that can send and receive email via the IMAP protocol write Flink SQL and execute the queries in the Ververica Platform for a nicer visualization You are encouraged to follow along with the code in this repository."> <meta name="theme-color" content="#FFFFFF"><meta property="og:title" content="Implementing a custom source connector for Table API and SQL - Part Two " /> <meta property="og:description" content="In part one of this tutorial, you learned how to build a custom source connector for Flink. In part two, you will learn how to integrate the connector with a test email inbox through the IMAP protocol and filter out emails using Flink SQL. -{% toc %} Goals # Part two of the tutorial will teach you how to: integrate a source connector which connects to a mailbox using the IMAP protocol use Jakarta Mail, a Java library that can send and receive email via the IMAP protocol write Flink SQL and execute the queries in the Ververica Platform for a nicer visualization You are encouraged to follow along with the code in this repository." /> <meta property="og:type" content="article" /> @@ -922,7 +920,6 @@ https://github.com/alex-shpak/hugo-book <p><p>In <a href="/2021/09/07/connector-table-sql-api-part1">part one</a> of this tutorial, you learned how to build a custom source connector for Flink. In part two, you will learn how to integrate the connector with a test email inbox through the IMAP protocol and filter out emails using Flink SQL.</p> -<p>{% toc %}</p> <h1 id="goals"> Goals <a class="anchor" href="#goals">#</a> diff --git a/content/2021/09/29/apache-flink-1.14.0-release-announcement/index.html b/content/2021/09/29/apache-flink-1.14.0-release-announcement/index.html index 503a3e208..0148b3ed3 100644 --- a/content/2021/09/29/apache-flink-1.14.0-release-announcement/index.html +++ b/content/2021/09/29/apache-flink-1.14.0-release-announcement/index.html @@ -972,7 +972,6 @@ most prominently we are <strong>removing the old SQL execution engine</strong> a <strong>removing the active integration with Apache Mesos</strong>.</p> <p>We hope you like the new release and we’d be eager to learn about your experience with it, which yet unsolved problems it solves, what new use-cases it unlocks for you.</p> -<p>{% toc %}</p> <h1 id="the-unified-batch-and-stream-processing-experience"> The Unified Batch and Stream Processing Experience <a class="anchor" href="#the-unified-batch-and-stream-processing-experience">#</a> diff --git a/content/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-one/index.html b/content/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-one/index.html index 1470c94e3..e25909b9e 100644 --- a/content/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-one/index.html +++ b/content/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-one/index.html @@ -7,11 +7,9 @@ <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="description" content="Part one of this blog post will explain the motivation behind introducing sort-based blocking shuffle, present benchmark results, and provide guidelines on how to use this new feature. -{% toc %} How data gets passed around between operators # Data shuffling is an important stage in batch processing applications and describes how data is sent from one operator to the next. In this phase, output data of the upstream operator will spill over to persistent storages like disk, then the downstream operator will read the corresponding data and process it."> <meta name="theme-color" content="#FFFFFF"><meta property="og:title" content="Sort-Based Blocking Shuffle Implementation in Flink - Part One" /> <meta property="og:description" content="Part one of this blog post will explain the motivation behind introducing sort-based blocking shuffle, present benchmark results, and provide guidelines on how to use this new feature. -{% toc %} How data gets passed around between operators # Data shuffling is an important stage in batch processing applications and describes how data is sent from one operator to the next. In this phase, output data of the upstream operator will spill over to persistent storages like disk, then the downstream operator will read the corresponding data and process it." /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://flink.apache.org/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-one/" /><meta property="article:section" content="posts" /> @@ -916,7 +914,6 @@ https://github.com/alex-shpak/hugo-book <p><p>Part one of this blog post will explain the motivation behind introducing sort-based blocking shuffle, present benchmark results, and provide guidelines on how to use this new feature.</p> -<p>{% toc %}</p> <h1 id="how-data-gets-passed-around-between-operators"> How data gets passed around between operators <a class="anchor" href="#how-data-gets-passed-around-between-operators">#</a> diff --git a/content/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-two/index.html b/content/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-two/index.html index 3662bdcaa..fc4cd6bce 100644 --- a/content/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-two/index.html +++ b/content/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-two/index.html @@ -932,7 +932,6 @@ https://github.com/alex-shpak/hugo-book <p><p><a href="/2021/10/26/sort-shuffle-part1">Part one</a> of this blog post explained the motivation behind introducing sort-based blocking shuffle, presented benchmark results, and provided guidelines on how to use this new feature.</p> <p>Like sort-merge shuffle implemented by other distributed data processing frameworks, the whole sort-based shuffle process in Flink consists of several important stages, including collecting data in memory, sorting the collected data in memory, spilling the sorted data to files, and reading the shuffle data from these spilled files. However, Flink’s implementation has some core differences, including the multiple data region file structure, the removal of file merge, and IO scheduling.</p> <p>In part two of this blog post, we will give you insight into some core design considerations and implementation details of the sort-based blocking shuffle in Flink and list several ideas for future improvement.</p> -<p>{% toc %}</p> <h1 id="design-considerations"> Design considerations <a class="anchor" href="#design-considerations">#</a> diff --git a/content/2021/11/03/flink-backward-the-apache-flink-retrospective/index.html b/content/2021/11/03/flink-backward-the-apache-flink-retrospective/index.html index 8afb784a3..24de90976 100644 --- a/content/2021/11/03/flink-backward-the-apache-flink-retrospective/index.html +++ b/content/2021/11/03/flink-backward-the-apache-flink-retrospective/index.html @@ -7,11 +7,9 @@ <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="description" content="It has now been a month since the community released Apache Flink 1.14 into the wild. We had a comprehensive look at the enhancements, additions, and fixups in the release announcement blog post, and now we will look at the development cycle from a different angle. Based on feedback collected from contributors involved in this release, we will explore the experiences and processes behind it all. -{% toc %} A retrospective on the release cycle # From the team, we collected emotions that have been attributed to points in time of the 1."> <meta name="theme-color" content="#FFFFFF"><meta property="og:title" content="Flink Backward - The Apache Flink Retrospective" /> <meta property="og:description" content="It has now been a month since the community released Apache Flink 1.14 into the wild. We had a comprehensive look at the enhancements, additions, and fixups in the release announcement blog post, and now we will look at the development cycle from a different angle. Based on feedback collected from contributors involved in this release, we will explore the experiences and processes behind it all. -{% toc %} A retrospective on the release cycle # From the team, we collected emotions that have been attributed to points in time of the 1." /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://flink.apache.org/2021/11/03/flink-backward-the-apache-flink-retrospective/" /><meta property="article:section" content="posts" /> @@ -925,7 +923,6 @@ https://github.com/alex-shpak/hugo-book <p><p>It has now been a month since the community released <a href="https://flink.apache.org/downloads.html#apache-flink-1140">Apache Flink 1.14</a> into the wild. We had a comprehensive look at the enhancements, additions, and fixups in the release announcement blog post, and now we will look at the development cycle from a different angle. Based on feedback collected from contributors involved in this release, we will explore the experiences and processes behind it all.</p> -<p>{% toc %}</p> <h1 id="a-retrospective-on-the-release-cycle"> A retrospective on the release cycle <a class="anchor" href="#a-retrospective-on-the-release-cycle">#</a> diff --git a/content/2022/01/04/how-we-improved-scheduler-performance-for-large-scale-jobs-part-two/index.html b/content/2022/01/04/how-we-improved-scheduler-performance-for-large-scale-jobs-part-two/index.html index 186fd3c29..c8cb7d4f8 100644 --- a/content/2022/01/04/how-we-improved-scheduler-performance-for-large-scale-jobs-part-two/index.html +++ b/content/2022/01/04/how-we-improved-scheduler-performance-for-large-scale-jobs-part-two/index.html @@ -7,12 +7,10 @@ <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="description" content="Part one of this blog post briefly introduced the optimizations we’ve made to improve the performance of the scheduler; compared to Flink 1.12, the time cost and memory usage of scheduling large-scale jobs in Flink 1.14 is significantly reduced. In part two, we will elaborate on the details of these optimizations. -{% toc %} -Reducing complexity with groups # A distribution pattern describes how consumer tasks are connected to producer tasks."> +Reducing complexity with groups # A distribution pattern describes how consumer tasks are connected to producer tasks. Currently, there are two distribution patterns in Flink: pointwise and all-to-all."> <meta name="theme-color" content="#FFFFFF"><meta property="og:title" content="How We Improved Scheduler Performance for Large-scale Jobs - Part Two" /> <meta property="og:description" content="Part one of this blog post briefly introduced the optimizations we’ve made to improve the performance of the scheduler; compared to Flink 1.12, the time cost and memory usage of scheduling large-scale jobs in Flink 1.14 is significantly reduced. In part two, we will elaborate on the details of these optimizations. -{% toc %} -Reducing complexity with groups # A distribution pattern describes how consumer tasks are connected to producer tasks." /> +Reducing complexity with groups # A distribution pattern describes how consumer tasks are connected to producer tasks. Currently, there are two distribution patterns in Flink: pointwise and all-to-all." /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://flink.apache.org/2022/01/04/how-we-improved-scheduler-performance-for-large-scale-jobs-part-two/" /><meta property="article:section" content="posts" /> <meta property="article:published_time" content="2022-01-04T08:00:00+00:00" /> @@ -934,7 +932,6 @@ https://github.com/alex-shpak/hugo-book <p><p><a href="/2022/01/04/scheduler-performance-part-one">Part one</a> of this blog post briefly introduced the optimizations we’ve made to improve the performance of the scheduler; compared to Flink 1.12, the time cost and memory usage of scheduling large-scale jobs in Flink 1.14 is significantly reduced. In part two, we will elaborate on the details of these optimizations.</p> -<p>{% toc %}</p> <h1 id="reducing-complexity-with-groups"> Reducing complexity with groups <a class="anchor" href="#reducing-complexity-with-groups">#</a> diff --git a/content/2022/01/07/apache-flink-ml-2.0.0-release-announcement/index.html b/content/2022/01/07/apache-flink-ml-2.0.0-release-announcement/index.html index 4dcceb6bf..f84df265a 100644 --- a/content/2022/01/07/apache-flink-ml-2.0.0-release-announcement/index.html +++ b/content/2022/01/07/apache-flink-ml-2.0.0-release-announcement/index.html @@ -950,7 +950,6 @@ learning scenarios.</p> the community through the Flink <a href="https://flink.apache.org/community.html#mailing-lists">mailing lists</a> or <a href="https://issues.apache.org/jira/browse/flink">JIRA</a>! We hope you like the new release and we’d be eager to learn about your experience with it.</p> -<p>{% toc %}</p> <h1 id="notable-features"> Notable Features <a class="anchor" href="#notable-features">#</a> diff --git a/content/2022/01/20/pravega-flink-connector-101/index.html b/content/2022/01/20/pravega-flink-connector-101/index.html index 753dcc624..059cb7ce5 100644 --- a/content/2022/01/20/pravega-flink-connector-101/index.html +++ b/content/2022/01/20/pravega-flink-connector-101/index.html @@ -938,7 +938,6 @@ https://github.com/alex-shpak/hugo-book </ul> <p>These key features make streaming pipeline applications easier to develop without worrying about performance and correctness which are the common pain points for many streaming use cases.</p> <p>In this blog post, we will discuss how to use this connector to read and write Pravega streams with the Flink DataStream API.</p> -<p>{% toc %}</p> <h1 id="basic-usages"> Basic usages <a class="anchor" href="#basic-usages">#</a> diff --git a/content/2022/01/31/stateful-functions-3.2.0-release-announcement/index.html b/content/2022/01/31/stateful-functions-3.2.0-release-announcement/index.html index b994a4f4a..c45820fea 100644 --- a/content/2022/01/31/stateful-functions-3.2.0-release-announcement/index.html +++ b/content/2022/01/31/stateful-functions-3.2.0-release-announcement/index.html @@ -939,7 +939,6 @@ You can also find official StateFun Docker images of the new version on <a href= and the <a href="//nightlies.apache.org/flinkflink-statefun-docs-release-3.2/">updated documentation</a>. We encourage you to download the release and share your feedback with the community through the <a href="https://flink.apache.org/community.html#mailing-lists">Flink mailing lists</a> or <a href="https://issues.apache.org/jira/browse/">JIRA</a>!</p> -<p>{% toc %}</p> <h2 id="new-features"> New Features <a class="anchor" href="#new-features">#</a> diff --git a/content/2022/02/22/scala-free-in-one-fifteen/index.html b/content/2022/02/22/scala-free-in-one-fifteen/index.html index a21d7c0c1..1df883d33 100644 --- a/content/2022/02/22/scala-free-in-one-fifteen/index.html +++ b/content/2022/02/22/scala-free-in-one-fifteen/index.html @@ -933,7 +933,6 @@ To remove Scala from the user-code classpath, remove this jar from the lib direc <p><br><br></p> <div class="highlight"><pre><code class="language-bash"><span class="nv">$ </span>rm flink-dist/lib/flink-scala*</code></pre></div> </div> -<p>{% toc %}</p> <h2 id="the-classpath-and-scala"> The Classpath and Scala <a class="anchor" href="#the-classpath-and-scala">#</a> diff --git a/content/2022/03/16/the-generic-asynchronous-base-sink/index.html b/content/2022/03/16/the-generic-asynchronous-base-sink/index.html index 215740ced..9f64b7911 100644 --- a/content/2022/03/16/the-generic-asynchronous-base-sink/index.html +++ b/content/2022/03/16/the-generic-asynchronous-base-sink/index.html @@ -925,7 +925,6 @@ https://github.com/alex-shpak/hugo-book <p>This common abstraction will reduce the effort required to maintain individual sinks that extend from this abstract sink, with bug fixes and improvements to the sink core benefiting all implementations that extend it. The design of <code>AsyncSinkBase</code> focuses on extensibility and a broad support of destinations. The core of the sink is kept generic and free of any connector-specific dependencies.</p> <p>The sink base is designed to participate in checkpointing to provide at-least-once semantics and can work directly with destinations that provide a client that supports asynchronous requests.</p> <p>In this post, we will go over the details of the AsyncSinkBase so that you can start using it to build your own concrete sink.</p> -<p>{% toc %}</p> <h1 id="adding-the-base-sink-as-a-dependency"> Adding the base sink as a dependency <a class="anchor" href="#adding-the-base-sink-as-a-dependency">#</a> diff --git a/content/2022/05/06/exploring-the-thread-mode-in-pyflink/index.html b/content/2022/05/06/exploring-the-thread-mode-in-pyflink/index.html index f2dc7c7a2..3ab8de7f3 100644 --- a/content/2022/05/06/exploring-the-thread-mode-in-pyflink/index.html +++ b/content/2022/05/06/exploring-the-thread-mode-in-pyflink/index.html @@ -947,7 +947,6 @@ which is unacceptable in scenarios where the latency is critical, e.g. quantitat <p>In Flink 1.15, we have introduced a new execution mode named ’thread’ mode (based on <a href="https://github.com/alibaba/pemja">PEMJA</a>) where the Python user-defined functions will be executed in the JVM as a thread instead of a separate Python process. In this article, we will dig into the details about this execution mode and also share some benchmark data to give users a basic understanding of how it works and which scenarios it’s applicable for.</p> -<p>{% toc %}</p> <h2 id="process-mode"> Process Mode <a class="anchor" href="#process-mode">#</a> diff --git a/content/2022/05/06/improvements-to-flink-operations-snapshots-ownership-and-savepoint-formats/index.html b/content/2022/05/06/improvements-to-flink-operations-snapshots-ownership-and-savepoint-formats/index.html index cfd1b9b66..d3bb5031a 100644 --- a/content/2022/05/06/improvements-to-flink-operations-snapshots-ownership-and-savepoint-formats/index.html +++ b/content/2022/05/06/improvements-to-flink-operations-snapshots-ownership-and-savepoint-formats/index.html @@ -934,7 +934,6 @@ towards improving stability and operational simplicity. In the last couple of re some known friction points, which includes improvements to the snapshotting process. Snapshotting takes a global, consistent image of the state of a Flink job and is integral to fault-tolerance and exacty-once processing. Snapshots include savepoints and checkpoints.</p> <p>This post will outline the journey of improving snapshotting in past releases and the upcoming improvements in Flink 1.15, which includes making it possible to take savepoints in the native state backend specific format as well as clarifying snapshots ownership.</p> -<p>{% toc %}</p> <h1 id="past-improvements-to-the-snapshotting-process"> Past improvements to the snapshotting process <a class="anchor" href="#past-improvements-to-the-snapshotting-process">#</a> diff --git a/content/2022/06/17/adaptive-batch-scheduler-automatically-decide-parallelism-of-flink-batch-jobs/index.html b/content/2022/06/17/adaptive-batch-scheduler-automatically-decide-parallelism-of-flink-batch-jobs/index.html index 7c1a45e04..0b5b2b7ca 100644 --- a/content/2022/06/17/adaptive-batch-scheduler-automatically-decide-parallelism-of-flink-batch-jobs/index.html +++ b/content/2022/06/17/adaptive-batch-scheduler-automatically-decide-parallelism-of-flink-batch-jobs/index.html @@ -6,13 +6,11 @@ <meta name="generator" content="Hugo 0.111.3"> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> -<meta name="description" content="{% toc %} -Introduction # Deciding proper parallelisms of operators is not an easy work for many users. For batch jobs, a small parallelism may result in long execution time and big failover regression. While an unnecessary large parallelism may result in resource waste and more overhead cost in task deployment and network shuffling. -To decide a proper parallelism, one needs to know how much data each operator needs to process."> +<meta name="description" content="Introduction # Deciding proper parallelisms of operators is not an easy work for many users. For batch jobs, a small parallelism may result in long execution time and big failover regression. While an unnecessary large parallelism may result in resource waste and more overhead cost in task deployment and network shuffling. +To decide a proper parallelism, one needs to know how much data each operator needs to process. However, It can be hard to predict data volume to be processed by a job because it can be different everyday."> <meta name="theme-color" content="#FFFFFF"><meta property="og:title" content="Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs" /> -<meta property="og:description" content="{% toc %} -Introduction # Deciding proper parallelisms of operators is not an easy work for many users. For batch jobs, a small parallelism may result in long execution time and big failover regression. While an unnecessary large parallelism may result in resource waste and more overhead cost in task deployment and network shuffling. -To decide a proper parallelism, one needs to know how much data each operator needs to process." /> +<meta property="og:description" content="Introduction # Deciding proper parallelisms of operators is not an easy work for many users. For batch jobs, a small parallelism may result in long execution time and big failover regression. While an unnecessary large parallelism may result in resource waste and more overhead cost in task deployment and network shuffling. +To decide a proper parallelism, one needs to know how much data each operator needs to process. However, It can be hard to predict data volume to be processed by a job because it can be different everyday." /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://flink.apache.org/2022/06/17/adaptive-batch-scheduler-automatically-decide-parallelism-of-flink-batch-jobs/" /><meta property="article:section" content="posts" /> <meta property="article:published_time" content="2022-06-17T00:00:00+00:00" /> @@ -941,8 +939,7 @@ https://github.com/alex-shpak/hugo-book - <p><p>{% toc %}</p> -<h1 id="introduction"> + <p><h1 id="introduction"> Introduction <a class="anchor" href="#introduction">#</a> </h1> diff --git a/content/2022/07/12/apache-flink-ml-2.1.0-release-announcement/index.html b/content/2022/07/12/apache-flink-ml-2.1.0-release-announcement/index.html index 3bae872e2..011e39617 100644 --- a/content/2022/07/12/apache-flink-ml-2.1.0-release-announcement/index.html +++ b/content/2022/07/12/apache-flink-ml-2.1.0-release-announcement/index.html @@ -940,7 +940,6 @@ and share your feedback with the community through the Flink <a href="https://flink.apache.org/community.html#mailing-lists">mailing lists</a> or <a href="https://issues.apache.org/jira/browse/flink">JIRA</a>! We hope you like the new release and we’d be eager to learn about your experience with it.</p> -<p>{% toc %}</p> <h1 id="notable-features"> Notable Features <a class="anchor" href="#notable-features">#</a> diff --git a/content/index.xml b/content/index.xml index d5c780362..c98e79ccd 100644 --- a/content/index.xml +++ b/content/index.xml @@ -621,9 +621,8 @@ Release Artifacts # Maven Dependencies # &lt;dependency&gt; &lt;grou <pubDate>Fri, 17 Jun 2022 00:00:00 +0000</pubDate> <guid>https://flink.apache.org/2022/06/17/adaptive-batch-scheduler-automatically-decide-parallelism-of-flink-batch-jobs/</guid> - <description>{% toc %} -Introduction # Deciding proper parallelisms of operators is not an easy work for many users. For batch jobs, a small parallelism may result in long execution time and big failover regression. While an unnecessary large parallelism may result in resource waste and more overhead cost in task deployment and network shuffling. -To decide a proper parallelism, one needs to know how much data each operator needs to process.</description> + <description>Introduction # Deciding proper parallelisms of operators is not an easy work for many users. For batch jobs, a small parallelism may result in long execution time and big failover regression. While an unnecessary large parallelism may result in resource waste and more overhead cost in task deployment and network shuffling. +To decide a proper parallelism, one needs to know how much data each operator needs to process. However, It can be hard to predict data volume to be processed by a job because it can be different everyday.</description> </item> <item> @@ -817,8 +816,7 @@ This release involves a major refactor of the earlier Flink ML library and intro <guid>https://flink.apache.org/2022/01/04/how-we-improved-scheduler-performance-for-large-scale-jobs-part-two/</guid> <description>Part one of this blog post briefly introduced the optimizations we’ve made to improve the performance of the scheduler; compared to Flink 1.12, the time cost and memory usage of scheduling large-scale jobs in Flink 1.14 is significantly reduced. In part two, we will elaborate on the details of these optimizations. -{% toc %} -Reducing complexity with groups # A distribution pattern describes how consumer tasks are connected to producer tasks.</description> +Reducing complexity with groups # A distribution pattern describes how consumer tasks are connected to producer tasks. Currently, there are two distribution patterns in Flink: pointwise and all-to-all.</description> </item> <item> @@ -864,7 +862,6 @@ env.java.opts: -Dlog4j2.formatMsgNoLookups=true If you are already setting env.< <guid>https://flink.apache.org/2021/11/03/flink-backward-the-apache-flink-retrospective/</guid> <description>It has now been a month since the community released Apache Flink 1.14 into the wild. We had a comprehensive look at the enhancements, additions, and fixups in the release announcement blog post, and now we will look at the development cycle from a different angle. Based on feedback collected from contributors involved in this release, we will explore the experiences and processes behind it all. -{% toc %} A retrospective on the release cycle # From the team, we collected emotions that have been attributed to points in time of the 1.</description> </item> @@ -875,7 +872,6 @@ A retrospective on the release cycle # From the team, we collected emotions that <guid>https://flink.apache.org/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-one/</guid> <description>Part one of this blog post will explain the motivation behind introducing sort-based blocking shuffle, present benchmark results, and provide guidelines on how to use this new feature. -{% toc %} How data gets passed around between operators # Data shuffling is an important stage in batch processing applications and describes how data is sent from one operator to the next. In this phase, output data of the upstream operator will spill over to persistent storages like disk, then the downstream operator will read the corresponding data and process it.</description> </item> @@ -919,7 +915,6 @@ This release brings many new features and improvements in areas such as the SQL <guid>https://flink.apache.org/2021/09/07/implementing-a-custom-source-connector-for-table-api-and-sql-part-one/</guid> <description>Part one of this tutorial will teach you how to build and run a custom source connector to be used with Table API and SQL, two high-level abstractions in Flink. The tutorial comes with a bundled docker-compose setup that lets you easily run the connector. You can then try it out with Flink’s SQL client. -{% toc %} Introduction # Apache Flink is a data processing engine that aims to keep state locally in order to do computations efficiently.</description> </item> @@ -930,7 +925,6 @@ Introduction # Apache Flink is a data processing engine that aims to keep state <guid>https://flink.apache.org/2021/09/07/implementing-a-custom-source-connector-for-table-api-and-sql-part-two/</guid> <description>In part one of this tutorial, you learned how to build a custom source connector for Flink. In part two, you will learn how to integrate the connector with a test email inbox through the IMAP protocol and filter out emails using Flink SQL. -{% toc %} Goals # Part two of the tutorial will teach you how to: integrate a source connector which connects to a mailbox using the IMAP protocol use Jakarta Mail, a Java library that can send and receive email via the IMAP protocol write Flink SQL and execute the queries in the Ververica Platform for a nicer visualization You are encouraged to follow along with the code in this repository.</description> </item> @@ -1001,8 +995,7 @@ Updated Maven dependencies: <pubDate>Wed, 07 Jul 2021 00:00:00 +0000</pubDate> <guid>https://flink.apache.org/2021/07/07/how-to-identify-the-source-of-backpressure/</guid> - <description>{% toc %} -Backpressure monitoring in the web UI + <description>Backpressure monitoring in the web UI The backpressure topic was tackled from different angles over the last couple of years. However, when it comes to identifying and analyzing sources of backpressure, things have changed quite a bit in the recent Flink releases (especially with new additions to metrics and the web UI in Flink 1.13). This post will try to clarify some of these changes and go into more detail about how to track down the source of backpressure, but first&hellip;</description> </item> @@ -1038,8 +1031,7 @@ Updated Maven dependencies: <pubDate>Thu, 06 May 2021 00:00:00 +0000</pubDate> <guid>https://flink.apache.org/2021/05/06/scaling-flink-automatically-with-reactive-mode/</guid> - <description>{% toc %} -Introduction # Streaming jobs which run for several days or longer usually experience variations in workload during their lifetime. These variations can originate from seasonal spikes, such as day vs. night, weekdays vs. weekend or holidays vs. non-holidays, sudden events or simply the growing popularity of your product. Although some of these variations are more predictable than others, in all cases there is a change in job resource demand that needs to be addressed if you want to ensur [...] + <description>Introduction # Streaming jobs which run for several days or longer usually experience variations in workload during their lifetime. These variations can originate from seasonal spikes, such as day vs. night, weekdays vs. weekend or holidays vs. non-holidays, sudden events or simply the growing popularity of your product. Although some of these variations are more predictable than others, in all cases there is a change in job resource demand that needs to be addressed i [...] </item> <item> @@ -1081,8 +1073,7 @@ This new release brings remote functions to the front and center of StateFun, ma <pubDate>Thu, 11 Mar 2021 00:00:00 +0000</pubDate> <guid>https://flink.apache.org/2021/03/11/a-rundown-of-batch-execution-mode-in-the-datastream-api/</guid> - <description>{% toc %} -Flink has been following the mantra that Batch is a Special Case of Streaming since the very early days. As the project evolved to address specific uses cases, different core APIs ended up being implemented for batch (DataSet API) and streaming execution (DataStream API), but the higher-level Table API/SQL was subsequently designed following this mantra of unification. With Flink 1.12, the community worked on bringing a similarly unified behaviour to the DataStream API, and took the firs [...] + <description>Flink has been following the mantra that Batch is a Special Case of Streaming since the very early days. As the project evolved to address specific uses cases, different core APIs ended up being implemented for batch (DataSet API) and streaming execution (DataStream API), but the higher-level Table API/SQL was subsequently designed following this mantra of unification. With Flink 1.12, the community worked on bringing a similarly unified behaviour to the DataStream API [...] </item> <item> @@ -1148,9 +1139,8 @@ Attention: Using unaligned checkpoints in Flink 1.12.0 combined with two/multipl <pubDate>Mon, 11 Jan 2021 00:00:00 +0000</pubDate> <guid>https://flink.apache.org/2021/01/11/exploring-fine-grained-recovery-of-bounded-data-sets-on-flink/</guid> - <description>{% toc %} -Apache Flink is a very versatile tool for all kinds of data processing workloads. It can process incoming data within a few milliseconds or crunch through petabytes of bounded datasets (also known as batch processing). -Processing efficiency is not the only parameter users of data processing systems care about. In the real world, system outages due to hardware or software failure are expected to happen all the time.</description> + <description>Apache Flink is a very versatile tool for all kinds of data processing workloads. It can process incoming data within a few milliseconds or crunch through petabytes of bounded datasets (also known as batch processing). +Processing efficiency is not the only parameter users of data processing systems care about. In the real world, system outages due to hardware or software failure are expected to happen all the time. For unbounded (or streaming) workloads, Flink is using periodic checkpoints to allow for reliable and correct recovery.</description> </item> <item> @@ -1264,7 +1254,6 @@ Updated Maven dependencies: <guid>https://flink.apache.org/2020/09/04/flink-community-update-august20/</guid> <description>Ah, so much for a quiet August month. This time around, we bring you some new Flink Improvement Proposals (FLIPs), a preview of the upcoming Flink Stateful Functions 2.2 release and a look into how far Flink has come in comparison to 2019. -{% toc %} The Past Month in Flink # Flink Releases # Getting Ready for Flink Stateful Functions 2.2 # The details of the next release of Stateful Functions are under discussion in this @dev mailing list thread, and the feature freeze is set for September 10th — so, you can expect Stateful Functions 2.</description> </item> @@ -1348,8 +1337,7 @@ Pic source: VanderPlas 2017, slide 52.</description> <guid>https://flink.apache.org/2020/07/29/flink-community-update-july20/</guid> <description>As July draws to an end, we look back at a monthful of activity in the Flink community, including two releases (!) and some work around improving the first-time contribution experience in the project. Also, events are starting to pick up again, so we&rsquo;ve put together a list of some great ones you can (virtually) attend in August! -{% toc %} -The Past Month in Flink # Flink Releases # Flink 1.</description> +The Past Month in Flink # Flink Releases # Flink 1.11 # A couple of weeks ago, Flink 1.</description> </item> <item> @@ -1540,7 +1528,6 @@ And since now it&rsquo;s more important than ever to keep up the spirits, we <guid>https://flink.apache.org/2020/03/27/flink-as-unified-engine-for-modern-data-warehousing-production-ready-hive-integration/</guid> <description>In this blog post, you will learn our motivation behind the Flink-Hive integration, and how Flink 1.10 can help modernize your data warehouse. -{% toc %} Introduction # What are some of the latest requirements for your data warehouse and data infrastructure in 2020? We’ve came up with some for you. Firstly, today’s business is shifting to a more real-time fashion, and thus demands abilities to process online streaming data with low latency for near-real-time or even real-time analytics.</description> @@ -2136,8 +2123,7 @@ Important Notice: A user reported a bug in the FlinkKafkaConsumer (FLINK-7143) t <pubDate>Tue, 04 Jul 2017 09:00:00 +0000</pubDate> <guid>https://flink.apache.org/2017/07/04/a-deep-dive-into-rescalable-state-in-apache-flink/</guid> - <description>Apache Flink 1.2.0, released in February 2017, introduced support for rescalable state. This post provides a detailed overview of stateful stream processing and rescalable state in Flink. {% toc %} -An Intro to Stateful Stream Processing # At a high level, we can consider state in stream processing as memory in operators that remembers information about past input and can be used to influence the processing of future input. + <description>Apache Flink 1.2.0, released in February 2017, introduced support for rescalable state. This post provides a detailed overview of stateful stream processing and rescalable state in Flink. An Intro to Stateful Stream Processing # At a high level, we can consider state in stream processing as memory in operators that remembers information about past input and can be used to influence the processing of future input. In contrast, operators in stateless stream processing only consider their current inputs, without further context and knowledge about the past.</description> </item> diff --git a/content/posts/index.xml b/content/posts/index.xml index 409dea993..b9c67fd5d 100644 --- a/content/posts/index.xml +++ b/content/posts/index.xml @@ -257,9 +257,8 @@ Release Artifacts # Maven Dependencies # &lt;dependency&gt; &lt;grou <pubDate>Fri, 17 Jun 2022 00:00:00 +0000</pubDate> <guid>https://flink.apache.org/2022/06/17/adaptive-batch-scheduler-automatically-decide-parallelism-of-flink-batch-jobs/</guid> - <description>{% toc %} -Introduction # Deciding proper parallelisms of operators is not an easy work for many users. For batch jobs, a small parallelism may result in long execution time and big failover regression. While an unnecessary large parallelism may result in resource waste and more overhead cost in task deployment and network shuffling. -To decide a proper parallelism, one needs to know how much data each operator needs to process.</description> + <description>Introduction # Deciding proper parallelisms of operators is not an easy work for many users. For batch jobs, a small parallelism may result in long execution time and big failover regression. While an unnecessary large parallelism may result in resource waste and more overhead cost in task deployment and network shuffling. +To decide a proper parallelism, one needs to know how much data each operator needs to process. However, It can be hard to predict data volume to be processed by a job because it can be different everyday.</description> </item> <item> @@ -453,8 +452,7 @@ This release involves a major refactor of the earlier Flink ML library and intro <guid>https://flink.apache.org/2022/01/04/how-we-improved-scheduler-performance-for-large-scale-jobs-part-two/</guid> <description>Part one of this blog post briefly introduced the optimizations we’ve made to improve the performance of the scheduler; compared to Flink 1.12, the time cost and memory usage of scheduling large-scale jobs in Flink 1.14 is significantly reduced. In part two, we will elaborate on the details of these optimizations. -{% toc %} -Reducing complexity with groups # A distribution pattern describes how consumer tasks are connected to producer tasks.</description> +Reducing complexity with groups # A distribution pattern describes how consumer tasks are connected to producer tasks. Currently, there are two distribution patterns in Flink: pointwise and all-to-all.</description> </item> <item> @@ -500,7 +498,6 @@ env.java.opts: -Dlog4j2.formatMsgNoLookups=true If you are already setting env.< <guid>https://flink.apache.org/2021/11/03/flink-backward-the-apache-flink-retrospective/</guid> <description>It has now been a month since the community released Apache Flink 1.14 into the wild. We had a comprehensive look at the enhancements, additions, and fixups in the release announcement blog post, and now we will look at the development cycle from a different angle. Based on feedback collected from contributors involved in this release, we will explore the experiences and processes behind it all. -{% toc %} A retrospective on the release cycle # From the team, we collected emotions that have been attributed to points in time of the 1.</description> </item> @@ -511,7 +508,6 @@ A retrospective on the release cycle # From the team, we collected emotions that <guid>https://flink.apache.org/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-one/</guid> <description>Part one of this blog post will explain the motivation behind introducing sort-based blocking shuffle, present benchmark results, and provide guidelines on how to use this new feature. -{% toc %} How data gets passed around between operators # Data shuffling is an important stage in batch processing applications and describes how data is sent from one operator to the next. In this phase, output data of the upstream operator will spill over to persistent storages like disk, then the downstream operator will read the corresponding data and process it.</description> </item> @@ -555,7 +551,6 @@ This release brings many new features and improvements in areas such as the SQL <guid>https://flink.apache.org/2021/09/07/implementing-a-custom-source-connector-for-table-api-and-sql-part-one/</guid> <description>Part one of this tutorial will teach you how to build and run a custom source connector to be used with Table API and SQL, two high-level abstractions in Flink. The tutorial comes with a bundled docker-compose setup that lets you easily run the connector. You can then try it out with Flink’s SQL client. -{% toc %} Introduction # Apache Flink is a data processing engine that aims to keep state locally in order to do computations efficiently.</description> </item> @@ -566,7 +561,6 @@ Introduction # Apache Flink is a data processing engine that aims to keep state <guid>https://flink.apache.org/2021/09/07/implementing-a-custom-source-connector-for-table-api-and-sql-part-two/</guid> <description>In part one of this tutorial, you learned how to build a custom source connector for Flink. In part two, you will learn how to integrate the connector with a test email inbox through the IMAP protocol and filter out emails using Flink SQL. -{% toc %} Goals # Part two of the tutorial will teach you how to: integrate a source connector which connects to a mailbox using the IMAP protocol use Jakarta Mail, a Java library that can send and receive email via the IMAP protocol write Flink SQL and execute the queries in the Ververica Platform for a nicer visualization You are encouraged to follow along with the code in this repository.</description> </item> @@ -637,8 +631,7 @@ Updated Maven dependencies: <pubDate>Wed, 07 Jul 2021 00:00:00 +0000</pubDate> <guid>https://flink.apache.org/2021/07/07/how-to-identify-the-source-of-backpressure/</guid> - <description>{% toc %} -Backpressure monitoring in the web UI + <description>Backpressure monitoring in the web UI The backpressure topic was tackled from different angles over the last couple of years. However, when it comes to identifying and analyzing sources of backpressure, things have changed quite a bit in the recent Flink releases (especially with new additions to metrics and the web UI in Flink 1.13). This post will try to clarify some of these changes and go into more detail about how to track down the source of backpressure, but first&hellip;</description> </item> @@ -674,8 +667,7 @@ Updated Maven dependencies: <pubDate>Thu, 06 May 2021 00:00:00 +0000</pubDate> <guid>https://flink.apache.org/2021/05/06/scaling-flink-automatically-with-reactive-mode/</guid> - <description>{% toc %} -Introduction # Streaming jobs which run for several days or longer usually experience variations in workload during their lifetime. These variations can originate from seasonal spikes, such as day vs. night, weekdays vs. weekend or holidays vs. non-holidays, sudden events or simply the growing popularity of your product. Although some of these variations are more predictable than others, in all cases there is a change in job resource demand that needs to be addressed if you want to ensur [...] + <description>Introduction # Streaming jobs which run for several days or longer usually experience variations in workload during their lifetime. These variations can originate from seasonal spikes, such as day vs. night, weekdays vs. weekend or holidays vs. non-holidays, sudden events or simply the growing popularity of your product. Although some of these variations are more predictable than others, in all cases there is a change in job resource demand that needs to be addressed i [...] </item> <item> @@ -717,8 +709,7 @@ This new release brings remote functions to the front and center of StateFun, ma <pubDate>Thu, 11 Mar 2021 00:00:00 +0000</pubDate> <guid>https://flink.apache.org/2021/03/11/a-rundown-of-batch-execution-mode-in-the-datastream-api/</guid> - <description>{% toc %} -Flink has been following the mantra that Batch is a Special Case of Streaming since the very early days. As the project evolved to address specific uses cases, different core APIs ended up being implemented for batch (DataSet API) and streaming execution (DataStream API), but the higher-level Table API/SQL was subsequently designed following this mantra of unification. With Flink 1.12, the community worked on bringing a similarly unified behaviour to the DataStream API, and took the firs [...] + <description>Flink has been following the mantra that Batch is a Special Case of Streaming since the very early days. As the project evolved to address specific uses cases, different core APIs ended up being implemented for batch (DataSet API) and streaming execution (DataStream API), but the higher-level Table API/SQL was subsequently designed following this mantra of unification. With Flink 1.12, the community worked on bringing a similarly unified behaviour to the DataStream API [...] </item> <item> @@ -784,9 +775,8 @@ Attention: Using unaligned checkpoints in Flink 1.12.0 combined with two/multipl <pubDate>Mon, 11 Jan 2021 00:00:00 +0000</pubDate> <guid>https://flink.apache.org/2021/01/11/exploring-fine-grained-recovery-of-bounded-data-sets-on-flink/</guid> - <description>{% toc %} -Apache Flink is a very versatile tool for all kinds of data processing workloads. It can process incoming data within a few milliseconds or crunch through petabytes of bounded datasets (also known as batch processing). -Processing efficiency is not the only parameter users of data processing systems care about. In the real world, system outages due to hardware or software failure are expected to happen all the time.</description> + <description>Apache Flink is a very versatile tool for all kinds of data processing workloads. It can process incoming data within a few milliseconds or crunch through petabytes of bounded datasets (also known as batch processing). +Processing efficiency is not the only parameter users of data processing systems care about. In the real world, system outages due to hardware or software failure are expected to happen all the time. For unbounded (or streaming) workloads, Flink is using periodic checkpoints to allow for reliable and correct recovery.</description> </item> <item> @@ -900,7 +890,6 @@ Updated Maven dependencies: <guid>https://flink.apache.org/2020/09/04/flink-community-update-august20/</guid> <description>Ah, so much for a quiet August month. This time around, we bring you some new Flink Improvement Proposals (FLIPs), a preview of the upcoming Flink Stateful Functions 2.2 release and a look into how far Flink has come in comparison to 2019. -{% toc %} The Past Month in Flink # Flink Releases # Getting Ready for Flink Stateful Functions 2.2 # The details of the next release of Stateful Functions are under discussion in this @dev mailing list thread, and the feature freeze is set for September 10th — so, you can expect Stateful Functions 2.</description> </item> @@ -984,8 +973,7 @@ Pic source: VanderPlas 2017, slide 52.</description> <guid>https://flink.apache.org/2020/07/29/flink-community-update-july20/</guid> <description>As July draws to an end, we look back at a monthful of activity in the Flink community, including two releases (!) and some work around improving the first-time contribution experience in the project. Also, events are starting to pick up again, so we&rsquo;ve put together a list of some great ones you can (virtually) attend in August! -{% toc %} -The Past Month in Flink # Flink Releases # Flink 1.</description> +The Past Month in Flink # Flink Releases # Flink 1.11 # A couple of weeks ago, Flink 1.</description> </item> <item> @@ -1176,7 +1164,6 @@ And since now it&rsquo;s more important than ever to keep up the spirits, we <guid>https://flink.apache.org/2020/03/27/flink-as-unified-engine-for-modern-data-warehousing-production-ready-hive-integration/</guid> <description>In this blog post, you will learn our motivation behind the Flink-Hive integration, and how Flink 1.10 can help modernize your data warehouse. -{% toc %} Introduction # What are some of the latest requirements for your data warehouse and data infrastructure in 2020? We’ve came up with some for you. Firstly, today’s business is shifting to a more real-time fashion, and thus demands abilities to process online streaming data with low latency for near-real-time or even real-time analytics.</description> @@ -1772,8 +1759,7 @@ Important Notice: A user reported a bug in the FlinkKafkaConsumer (FLINK-7143) t <pubDate>Tue, 04 Jul 2017 09:00:00 +0000</pubDate> <guid>https://flink.apache.org/2017/07/04/a-deep-dive-into-rescalable-state-in-apache-flink/</guid> - <description>Apache Flink 1.2.0, released in February 2017, introduced support for rescalable state. This post provides a detailed overview of stateful stream processing and rescalable state in Flink. {% toc %} -An Intro to Stateful Stream Processing # At a high level, we can consider state in stream processing as memory in operators that remembers information about past input and can be used to influence the processing of future input. + <description>Apache Flink 1.2.0, released in February 2017, introduced support for rescalable state. This post provides a detailed overview of stateful stream processing and rescalable state in Flink. An Intro to Stateful Stream Processing # At a high level, we can consider state in stream processing as memory in operators that remembers information about past input and can be used to influence the processing of future input. In contrast, operators in stateless stream processing only consider their current inputs, without further context and knowledge about the past.</description> </item> diff --git a/content/posts/page/10/index.html b/content/posts/page/10/index.html index 718e8f6a5..d96360cc4 100644 --- a/content/posts/page/10/index.html +++ b/content/posts/page/10/index.html @@ -1762,8 +1762,7 @@ https://github.com/alex-shpak/hugo-book <p>As July draws to an end, we look back at a monthful of activity in the Flink community, including two releases (!) and some work around improving the first-time contribution experience in the project. Also, events are starting to pick up again, so we’ve put together a list of some great ones you can (virtually) attend in August! -{% toc %} -The Past Month in Flink # Flink Releases # Flink 1. +The Past Month in Flink # Flink Releases # Flink 1.11 # A couple of weeks ago, Flink 1. <a href="/2020/07/29/flink-community-update-july20/">...</a> </p> diff --git a/content/posts/page/11/index.html b/content/posts/page/11/index.html index 1a0ec1a56..d4f32917c 100644 --- a/content/posts/page/11/index.html +++ b/content/posts/page/11/index.html @@ -1979,7 +1979,6 @@ And since now it’s more important than ever to keep up the spirits, we’d <p>In this blog post, you will learn our motivation behind the Flink-Hive integration, and how Flink 1.10 can help modernize your data warehouse. -{% toc %} Introduction # What are some of the latest requirements for your data warehouse and data infrastructure in 2020? We’ve came up with some for you. Firstly, today’s business is shifting to a more real-time fashion, and thus demands abilities to process online streaming data with low latency for near-real-time or even real-time analytics. diff --git a/content/posts/page/17/index.html b/content/posts/page/17/index.html index 4ff4ef85b..e9e3f5cfc 100644 --- a/content/posts/page/17/index.html +++ b/content/posts/page/17/index.html @@ -1840,8 +1840,7 @@ Important Notice: A user reported a bug in the FlinkKafkaConsumer (FLINK-7143) t - <p>Apache Flink 1.2.0, released in February 2017, introduced support for rescalable state. This post provides a detailed overview of stateful stream processing and rescalable state in Flink. {% toc %} -An Intro to Stateful Stream Processing # At a high level, we can consider state in stream processing as memory in operators that remembers information about past input and can be used to influence the processing of future input. + <p>Apache Flink 1.2.0, released in February 2017, introduced support for rescalable state. This post provides a detailed overview of stateful stream processing and rescalable state in Flink. An Intro to Stateful Stream Processing # At a high level, we can consider state in stream processing as memory in operators that remembers information about past input and can be used to influence the processing of future input. In contrast, operators in stateless stream processing only consider their current inputs, without further context and knowledge about the past. <a href="/2017/07/04/a-deep-dive-into-rescalable-state-in-apache-flink/">...</a> diff --git a/content/posts/page/3/index.html b/content/posts/page/3/index.html index a9fc8db87..595c18ac1 100644 --- a/content/posts/page/3/index.html +++ b/content/posts/page/3/index.html @@ -1838,9 +1838,8 @@ Release Artifacts # Maven Dependencies # <dependency> <groupId>org. - <p>{% toc %} -Introduction # Deciding proper parallelisms of operators is not an easy work for many users. For batch jobs, a small parallelism may result in long execution time and big failover regression. While an unnecessary large parallelism may result in resource waste and more overhead cost in task deployment and network shuffling. -To decide a proper parallelism, one needs to know how much data each operator needs to process. + <p>Introduction # Deciding proper parallelisms of operators is not an easy work for many users. For batch jobs, a small parallelism may result in long execution time and big failover regression. While an unnecessary large parallelism may result in resource waste and more overhead cost in task deployment and network shuffling. +To decide a proper parallelism, one needs to know how much data each operator needs to process. However, It can be hard to predict data volume to be processed by a job because it can be different everyday. <a href="/2022/06/17/adaptive-batch-scheduler-automatically-decide-parallelism-of-flink-batch-jobs/">...</a> </p> diff --git a/content/posts/page/5/index.html b/content/posts/page/5/index.html index 64c5178aa..964472e42 100644 --- a/content/posts/page/5/index.html +++ b/content/posts/page/5/index.html @@ -1827,8 +1827,7 @@ This release involves a major refactor of the earlier Flink ML library and intro <p>Part one of this blog post briefly introduced the optimizations we’ve made to improve the performance of the scheduler; compared to Flink 1.12, the time cost and memory usage of scheduling large-scale jobs in Flink 1.14 is significantly reduced. In part two, we will elaborate on the details of these optimizations. -{% toc %} -Reducing complexity with groups # A distribution pattern describes how consumer tasks are connected to producer tasks. +Reducing complexity with groups # A distribution pattern describes how consumer tasks are connected to producer tasks. Currently, there are two distribution patterns in Flink: pointwise and all-to-all. <a href="/2022/01/04/how-we-improved-scheduler-performance-for-large-scale-jobs-part-two/">...</a> </p> @@ -1923,7 +1922,6 @@ env.java.opts: -Dlog4j2.formatMsgNoLookups=true If you are already setting env. <p>It has now been a month since the community released Apache Flink 1.14 into the wild. We had a comprehensive look at the enhancements, additions, and fixups in the release announcement blog post, and now we will look at the development cycle from a different angle. Based on feedback collected from contributors involved in this release, we will explore the experiences and processes behind it all. -{% toc %} A retrospective on the release cycle # From the team, we collected emotions that have been attributed to points in time of the 1. <a href="/2021/11/03/flink-backward-the-apache-flink-retrospective/">...</a> @@ -1949,7 +1947,6 @@ A retrospective on the release cycle # From the team, we collected emotions that <p>Part one of this blog post will explain the motivation behind introducing sort-based blocking shuffle, present benchmark results, and provide guidelines on how to use this new feature. -{% toc %} How data gets passed around between operators # Data shuffling is an important stage in batch processing applications and describes how data is sent from one operator to the next. In this phase, output data of the upstream operator will spill over to persistent storages like disk, then the downstream operator will read the corresponding data and process it. <a href="/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-one/">...</a> diff --git a/content/posts/page/6/index.html b/content/posts/page/6/index.html index 4717bba64..242298348 100644 --- a/content/posts/page/6/index.html +++ b/content/posts/page/6/index.html @@ -1791,7 +1791,6 @@ This release brings many new features and improvements in areas such as the SQL <p>Part one of this tutorial will teach you how to build and run a custom source connector to be used with Table API and SQL, two high-level abstractions in Flink. The tutorial comes with a bundled docker-compose setup that lets you easily run the connector. You can then try it out with Flink’s SQL client. -{% toc %} Introduction # Apache Flink is a data processing engine that aims to keep state locally in order to do computations efficiently. <a href="/2021/09/07/implementing-a-custom-source-connector-for-table-api-and-sql-part-one/">...</a> @@ -1817,7 +1816,6 @@ Introduction # Apache Flink is a data processing engine that aims to keep state <p>In part one of this tutorial, you learned how to build a custom source connector for Flink. In part two, you will learn how to integrate the connector with a test email inbox through the IMAP protocol and filter out emails using Flink SQL. -{% toc %} Goals # Part two of the tutorial will teach you how to: integrate a source connector which connects to a mailbox using the IMAP protocol use Jakarta Mail, a Java library that can send and receive email via the IMAP protocol write Flink SQL and execute the queries in the Ververica Platform for a nicer visualization You are encouraged to follow along with the code in this repository. <a href="/2021/09/07/implementing-a-custom-source-connector-for-table-api-and-sql-part-two/">...</a> @@ -1974,8 +1972,7 @@ Updated Maven dependencies: - <p>{% toc %} -Backpressure monitoring in the web UI + <p>Backpressure monitoring in the web UI The backpressure topic was tackled from different angles over the last couple of years. However, when it comes to identifying and analyzing sources of backpressure, things have changed quite a bit in the recent Flink releases (especially with new additions to metrics and the web UI in Flink 1.13). This post will try to clarify some of these changes and go into more detail about how to track down the source of backpressure, but first… <a href="/2021/07/07/how-to-identify-the-source-of-backpressure/">...</a> diff --git a/content/posts/page/7/index.html b/content/posts/page/7/index.html index cd108a5b9..22bccc02b 100644 --- a/content/posts/page/7/index.html +++ b/content/posts/page/7/index.html @@ -1785,8 +1785,7 @@ Updated Maven dependencies: - <p>{% toc %} -Introduction # Streaming jobs which run for several days or longer usually experience variations in workload during their lifetime. These variations can originate from seasonal spikes, such as day vs. night, weekdays vs. weekend or holidays vs. non-holidays, sudden events or simply the growing popularity of your product. Although some of these variations are more predictable than others, in all cases there is a change in job resource demand that needs to be addressed if you want to ensur [...] + <p>Introduction # Streaming jobs which run for several days or longer usually experience variations in workload during their lifetime. These variations can originate from seasonal spikes, such as day vs. night, weekdays vs. weekend or holidays vs. non-holidays, sudden events or simply the growing popularity of your product. Although some of these variations are more predictable than others, in all cases there is a change in job resource demand that needs to be addressed if you want t [...] <a href="/2021/05/06/scaling-flink-automatically-with-reactive-mode/">...</a> </p> @@ -1892,8 +1891,7 @@ This new release brings remote functions to the front and center of StateFun, ma - <p>{% toc %} -Flink has been following the mantra that Batch is a Special Case of Streaming since the very early days. As the project evolved to address specific uses cases, different core APIs ended up being implemented for batch (DataSet API) and streaming execution (DataStream API), but the higher-level Table API/SQL was subsequently designed following this mantra of unification. With Flink 1.12, the community worked on bringing a similarly unified behaviour to the DataStream API, and took the firs [...] + <p>Flink has been following the mantra that Batch is a Special Case of Streaming since the very early days. As the project evolved to address specific uses cases, different core APIs ended up being implemented for batch (DataSet API) and streaming execution (DataStream API), but the higher-level Table API/SQL was subsequently designed following this mantra of unification. With Flink 1.12, the community worked on bringing a similarly unified behaviour to the DataStream API, and took t [...] <a href="/2021/03/11/a-rundown-of-batch-execution-mode-in-the-datastream-api/">...</a> </p> diff --git a/content/posts/page/8/index.html b/content/posts/page/8/index.html index bb9a1ef4c..e5dd82368 100644 --- a/content/posts/page/8/index.html +++ b/content/posts/page/8/index.html @@ -1781,9 +1781,8 @@ https://github.com/alex-shpak/hugo-book - <p>{% toc %} -Apache Flink is a very versatile tool for all kinds of data processing workloads. It can process incoming data within a few milliseconds or crunch through petabytes of bounded datasets (also known as batch processing). -Processing efficiency is not the only parameter users of data processing systems care about. In the real world, system outages due to hardware or software failure are expected to happen all the time. + <p>Apache Flink is a very versatile tool for all kinds of data processing workloads. It can process incoming data within a few milliseconds or crunch through petabytes of bounded datasets (also known as batch processing). +Processing efficiency is not the only parameter users of data processing systems care about. In the real world, system outages due to hardware or software failure are expected to happen all the time. For unbounded (or streaming) workloads, Flink is using periodic checkpoints to allow for reliable and correct recovery. <a href="/2021/01/11/exploring-fine-grained-recovery-of-bounded-data-sets-on-flink/">...</a> </p> diff --git a/content/posts/page/9/index.html b/content/posts/page/9/index.html index fce3fe602..fc9464099 100644 --- a/content/posts/page/9/index.html +++ b/content/posts/page/9/index.html @@ -1816,7 +1816,6 @@ Updated Maven dependencies: <p>Ah, so much for a quiet August month. This time around, we bring you some new Flink Improvement Proposals (FLIPs), a preview of the upcoming Flink Stateful Functions 2.2 release and a look into how far Flink has come in comparison to 2019. -{% toc %} The Past Month in Flink # Flink Releases # Getting Ready for Flink Stateful Functions 2.2 # The details of the next release of Stateful Functions are under discussion in this @dev mailing list thread, and the feature freeze is set for September 10th — so, you can expect Stateful Functions 2. <a href="/2020/09/04/flink-community-update-august20/">...</a> diff --git a/content/zh/how-to-contribute/contribute-code/index.html b/content/zh/how-to-contribute/contribute-code/index.html index f6de5ef87..72959f5d1 100644 --- a/content/zh/how-to-contribute/contribute-code/index.html +++ b/content/zh/how-to-contribute/contribute-code/index.html @@ -9,14 +9,12 @@ <meta name="description" content="贡献代码 # Apache Flink 是一个通过志愿者贡献的代码来维护、改进和扩展的项目。我们欢迎给 Flink 做贡献,但由于项目的规模大,以及为了保持高质量的代码库,我们要求贡献者遵循本文所阐述的贡献流程。 请随时提出任何问题! 可以发送邮件到 dev mailing list,也可以对正在处理的 Jira issue 发表评论。 重要提示:在开始准备代码贡献之前,请仔细阅读本文档。请遵循如下所述的流程和指南,为 Apache Flink 做贡献并不是从创建 pull request 开始的。我们希望贡献者先和我们联系,共同讨论整体方案。如果没有与 Flink committers 达成共识,那么贡献可能需要大量返工或不予审核通过。 -{% toc %} 寻找可贡献的内容 # 如果你已经有好的想法可以贡献,可以直接参考下面的 “代码贡献步骤”。 如果你在寻找可贡献的内容,可以通过 Flink 的问题跟踪列表 浏览处于 open 状态且未被分配的 Jira 工单,然后根据 “代码贡献步骤” 中的描述来参与贡献。 如果你是一个刚刚加入到 Flink 项目中的新人,并希望了解 Flink 及其贡献步骤,可以浏览 适合新手的工单列表 。 这个列表中的工单都带有 starter 标记,适合新手参与。 代码贡献步骤 # 注意:最近(2019 年 6 月),代码贡献步骤有改动。社区决定将原来直接提交 pull request 的方式转移到 Jira 上,要求贡献者在创建 pull request 之前需在 Jira 上达成共识(通过分配到的工单来体现),以减轻 PR review 的压力。 1讨论 在 Jira 上创建工单或邮件列表讨论并达成共识"> <meta name="theme-color" content="#FFFFFF"><meta property="og:title" content="贡献代码" /> <meta property="og:description" content="贡献代码 # Apache Flink 是一个通过志愿者贡献的代码来维护、改进和扩展的项目。我们欢迎给 Flink 做贡献,但由于项目的规模大,以及为了保持高质量的代码库,我们要求贡献者遵循本文所阐述的贡献流程。 请随时提出任何问题! 可以发送邮件到 dev mailing list,也可以对正在处理的 Jira issue 发表评论。 重要提示:在开始准备代码贡献之前,请仔细阅读本文档。请遵循如下所述的流程和指南,为 Apache Flink 做贡献并不是从创建 pull request 开始的。我们希望贡献者先和我们联系,共同讨论整体方案。如果没有与 Flink committers 达成共识,那么贡献可能需要大量返工或不予审核通过。 -{% toc %} 寻找可贡献的内容 # 如果你已经有好的想法可以贡献,可以直接参考下面的 “代码贡献步骤”。 如果你在寻找可贡献的内容,可以通过 Flink 的问题跟踪列表 浏览处于 open 状态且未被分配的 Jira 工单,然后根据 “代码贡献步骤” 中的描述来参与贡献。 如果你是一个刚刚加入到 Flink 项目中的新人,并希望了解 Flink 及其贡献步骤,可以浏览 适合新手的工单列表 。 这个列表中的工单都带有 starter 标记,适合新手参与。 代码贡献步骤 # 注意:最近(2019 年 6 月),代码贡献步骤有改动。社区决定将原来直接提交 pull request 的方式转移到 Jira 上,要求贡献者在创建 pull request 之前需在 Jira 上达成共识(通过分配到的工单来体现),以减轻 PR review 的压力。 1讨论 在 Jira 上创建工单或邮件列表讨论并达成共识" /> <meta property="og:type" content="article" /> @@ -927,7 +925,6 @@ https://github.com/alex-shpak/hugo-book <p>Apache Flink 是一个通过志愿者贡献的代码来维护、改进和扩展的项目。我们欢迎给 Flink 做贡献,但由于项目的规模大,以及为了保持高质量的代码库,我们要求贡献者遵循本文所阐述的贡献流程。</p> <p><strong>请随时提出任何问题!</strong> 可以发送邮件到 <a href="/zh/community.html#mailing-lists">dev mailing list</a>,也可以对正在处理的 Jira issue 发表评论。</p> <p><strong>重要提示</strong>:在开始准备代码贡献之前,请仔细阅读本文档。请遵循如下所述的流程和指南,为 Apache Flink 做贡献并不是从创建 pull request 开始的。我们希望贡献者先和我们联系,共同讨论整体方案。如果没有与 Flink committers 达成共识,那么贡献可能需要大量返工或不予审核通过。</p> -<p>{% toc %}</p> <h2 id="寻找可贡献的内容"> 寻找可贡献的内容 <a class="anchor" href="#%e5%af%bb%e6%89%be%e5%8f%af%e8%b4%a1%e7%8c%ae%e7%9a%84%e5%86%85%e5%ae%b9">#</a> diff --git a/content/zh/how-to-contribute/index.xml b/content/zh/how-to-contribute/index.xml index 8b52bab61..742820d14 100644 --- a/content/zh/how-to-contribute/index.xml +++ b/content/zh/how-to-contribute/index.xml @@ -26,7 +26,6 @@ <description>贡献代码 # Apache Flink 是一个通过志愿者贡献的代码来维护、改进和扩展的项目。我们欢迎给 Flink 做贡献,但由于项目的规模大,以及为了保持高质量的代码库,我们要求贡献者遵循本文所阐述的贡献流程。 请随时提出任何问题! 可以发送邮件到 dev mailing list,也可以对正在处理的 Jira issue 发表评论。 重要提示:在开始准备代码贡献之前,请仔细阅读本文档。请遵循如下所述的流程和指南,为 Apache Flink 做贡献并不是从创建 pull request 开始的。我们希望贡献者先和我们联系,共同讨论整体方案。如果没有与 Flink committers 达成共识,那么贡献可能需要大量返工或不予审核通过。 -{% toc %} 寻找可贡献的内容 # 如果你已经有好的想法可以贡献,可以直接参考下面的 &ldquo;代码贡献步骤&rdquo;。 如果你在寻找可贡献的内容,可以通过 Flink 的问题跟踪列表 浏览处于 open 状态且未被分配的 Jira 工单,然后根据 &ldquo;代码贡献步骤&rdquo; 中的描述来参与贡献。 如果你是一个刚刚加入到 Flink 项目中的新人,并希望了解 Flink 及其贡献步骤,可以浏览 适合新手的工单列表 。 这个列表中的工单都带有 starter 标记,适合新手参与。 代码贡献步骤 # 注意:最近(2019 年 6 月),代码贡献步骤有改动。社区决定将原来直接提交 pull request 的方式转移到 Jira 上,要求贡献者在创建 pull request 之前需在 Jira 上达成共识(通过分配到的工单来体现),以减轻 PR review 的压力。 1讨论 在 Jira 上创建工单或邮件列表讨论并达成共识</description> </item> diff --git a/content/zh/index.xml b/content/zh/index.xml index 4daae9c34..a00d1b220 100644 --- a/content/zh/index.xml +++ b/content/zh/index.xml @@ -367,7 +367,6 @@ CVE ID Affected Flink versions Notes CVE-2020-1960 1.1.0 to 1.1.5, 1.2.0 to 1.2. <description>贡献代码 # Apache Flink 是一个通过志愿者贡献的代码来维护、改进和扩展的项目。我们欢迎给 Flink 做贡献,但由于项目的规模大,以及为了保持高质量的代码库,我们要求贡献者遵循本文所阐述的贡献流程。 请随时提出任何问题! 可以发送邮件到 dev mailing list,也可以对正在处理的 Jira issue 发表评论。 重要提示:在开始准备代码贡献之前,请仔细阅读本文档。请遵循如下所述的流程和指南,为 Apache Flink 做贡献并不是从创建 pull request 开始的。我们希望贡献者先和我们联系,共同讨论整体方案。如果没有与 Flink committers 达成共识,那么贡献可能需要大量返工或不予审核通过。 -{% toc %} 寻找可贡献的内容 # 如果你已经有好的想法可以贡献,可以直接参考下面的 &ldquo;代码贡献步骤&rdquo;。 如果你在寻找可贡献的内容,可以通过 Flink 的问题跟踪列表 浏览处于 open 状态且未被分配的 Jira 工单,然后根据 &ldquo;代码贡献步骤&rdquo; 中的描述来参与贡献。 如果你是一个刚刚加入到 Flink 项目中的新人,并希望了解 Flink 及其贡献步骤,可以浏览 适合新手的工单列表 。 这个列表中的工单都带有 starter 标记,适合新手参与。 代码贡献步骤 # 注意:最近(2019 年 6 月),代码贡献步骤有改动。社区决定将原来直接提交 pull request 的方式转移到 Jira 上,要求贡献者在创建 pull request 之前需在 Jira 上达成共识(通过分配到的工单来体现),以减轻 PR review 的压力。 1讨论 在 Jira 上创建工单或邮件列表讨论并达成共识</description> </item>