This is an automated email from the ASF dual-hosted git repository. echauchot pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/flink-web.git
commit 40cef395f2abab69992dc7ea57047e875f5675c1 Author: Etienne Chauchot <echauc...@apache.org> AuthorDate: Wed May 3 17:38:53 2023 +0200 Rebuild website --- .../index.html | 1278 ++++++++++++++++++++ content/en/sitemap.xml | 13 +- .../source_components.svg | 20 + .../source_reader.svg | 20 + content/index.html | 13 +- content/index.xml | 12 +- content/posts/index.html | 53 +- content/posts/index.xml | 12 +- content/posts/page/10/index.html | 61 +- content/posts/page/11/index.html | 56 +- content/posts/page/12/index.html | 57 +- content/posts/page/13/index.html | 57 +- content/posts/page/14/index.html | 55 +- content/posts/page/15/index.html | 53 +- content/posts/page/16/index.html | 59 +- content/posts/page/17/index.html | 56 +- content/posts/page/18/index.html | 47 +- content/posts/page/19/index.html | 46 +- content/posts/page/2/index.html | 53 +- content/posts/page/20/index.html | 44 +- content/posts/page/21/index.html | 47 +- content/posts/page/22/index.html | 29 + content/posts/page/3/index.html | 59 +- content/posts/page/4/index.html | 58 +- content/posts/page/5/index.html | 56 +- content/posts/page/6/index.html | 57 +- content/posts/page/7/index.html | 57 +- content/posts/page/8/index.html | 58 +- content/posts/page/9/index.html | 62 +- content/sitemap.xml | 2 +- content/zh/index.html | 15 +- 31 files changed, 2052 insertions(+), 513 deletions(-) diff --git a/content/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/index.html b/content/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/index.html new file mode 100644 index 000000000..d26be0cef --- /dev/null +++ b/content/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/index.html @@ -0,0 +1,1278 @@ + +<!DOCTYPE html> +<html lang="en" dir=> + +<head> + <meta name="generator" content="Hugo 0.111.3"> + <meta charset="UTF-8"> +<meta name="viewport" content="width=device-width, initial-scale=1.0"> +<meta name="description" content="Introduction # The Flink community has designed a new Source framework based on FLIP-27 lately. Some connectors have migrated to this new framework. This article is a how-to for creating a batch source using this new framework. It was built while implementing the Flink batch source for Cassandra. If you are interested in contributing or migrating connectors, this blog post is for you! +Implementing the source components # The source architecture is depicted in the diagrams below:"> +<meta name="theme-color" content="#FFFFFF"><meta property="og:title" content="Howto create a batch source with the new Source framework" /> +<meta property="og:description" content="Introduction # The Flink community has designed a new Source framework based on FLIP-27 lately. Some connectors have migrated to this new framework. This article is a how-to for creating a batch source using this new framework. It was built while implementing the Flink batch source for Cassandra. If you are interested in contributing or migrating connectors, this blog post is for you! +Implementing the source components # The source architecture is depicted in the diagrams below:" /> +<meta property="og:type" content="article" /> +<meta property="og:url" content="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/" /><meta property="article:section" content="posts" /> +<meta property="article:published_time" content="2023-05-03T08:00:00+00:00" /> +<meta property="article:modified_time" content="2023-05-03T08:00:00+00:00" /> +<title>Howto create a batch source with the new Source framework | Apache Flink</title> +<link rel="manifest" href="/manifest.json"> +<link rel="icon" href="/favicon.png" type="image/x-icon"> +<link rel="stylesheet" href="/book.min.e3b33391dbc1f4b2cc47778e2f4b86c744ded3ccc82fdfb6f08caf91d8607f9a.css" integrity="sha256-47MzkdvB9LLMR3eOL0uGx0Te08zIL9+28Iyvkdhgf5o="> +<script defer src="/en.search.min.8592fd2e43835d2ef6fab8eb9b8969ee6ad1bdb888a636e37e28032f8bd9887d.js" integrity="sha256-hZL9LkODXS72+rjrm4lp7mrRvbiIpjbjfigDL4vZiH0="></script> +<!-- +Made with Book Theme +https://github.com/alex-shpak/hugo-book +--> + + + +<link rel="stylesheet" type="text/css" href="/font-awesome/css/font-awesome.min.css"> +<script src="/js/anchor.min.js"></script> +<script src="/js/flink.js"></script> +<link rel="canonical" href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/"> + + + <script> + var _paq = window._paq = window._paq || []; + + + _paq.push(['disableCookies']); + + _paq.push(["setDomains", ["*.flink.apache.org","*.nightlies.apache.org/flink"]]); + _paq.push(['trackPageView']); + _paq.push(['enableLinkTracking']); + (function() { + var u="//analytics.apache.org/"; + _paq.push(['setTrackerUrl', u+'matomo.php']); + _paq.push(['setSiteId', '1']); + var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; + g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); + })(); + </script> + +</head> + +<body dir=> + <input type="checkbox" class="hidden toggle" id="menu-control" /> + <input type="checkbox" class="hidden toggle" id="toc-control" /> + <main class="container flex"> + <aside class="book-menu"> + + + +<nav> + + +<a id="logo" href="/"> + <img width="70%" src="/flink-header-logo.svg"> +</a> + +<div class="book-search"> + <input type="text" id="book-search-input" placeholder="Search" aria-label="Search" maxlength="64" data-hotkeys="s/" /> + <div class="book-search-spinner hidden"></div> + <ul id="book-search-results"></ul> +</div> + + + + + + + + + + + + + + + + + + + + + + + + <input type="checkbox" id="section-4117fb24454a2c30ee86e524839e77ec" class="toggle" /> + <label for="section-4117fb24454a2c30ee86e524839e77ec" class="flex justify-between flink-menu-item">What is Apache Flink?<span>▾</span> + </label> + + <ul> + + <li> + + + + + + <label for="section-ffd5922da551e96e0481423fab94c463" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="/what-is-flink/flink-architecture/" class="">Architecture</a> + </label> + + + </li> + + <li> + + + + + + <label for="section-fc28f08b67476edb77e00e03b6c7c2e0" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="/what-is-flink/flink-applications/" class="">Applications</a> + </label> + + + </li> + + <li> + + + + + + <label for="section-612df33a02d5d4ee78d718abaab5b5b4" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="/what-is-flink/flink-operations/" class="">Operations</a> + </label> + + + </li> + + </ul> + + + + + + + + + + + + + +<label for="section-f1ecec07350bd6810050d40158878749" class="flex justify-between flink-menu-item"> + <a href="https://nightlies.apache.org/flink/flink-statefun-docs-stable/" style="color:black" class="">What is Stateful Functions? <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + + + + + + + + + + + + + +<label for="section-4113a4c3072cb35f6fd7a0d4e098ee70" class="flex justify-between flink-menu-item"> + <a href="https://nightlies.apache.org/flink/flink-ml-docs-stable/" style="color:black" class="">What is Flink ML? <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + + + + + + + + + + + + + +<label for="section-b39c70259d0abbe2bf1d8d645425f84d" class="flex justify-between flink-menu-item"> + <a href="https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-stable/" style="color:black" class="">What is the Flink Kubernetes Operator? <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + + + + + + + + + + + + + +<label for="section-53e0b1afcb9ccaf779dc285aa272a014" class="flex justify-between flink-menu-item"> + <a href="https://nightlies.apache.org/flink/flink-table-store-docs-stable/" style="color:black" class="">What is Flink Table Store? <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + + + + + + + + + + + <label for="section-f4973f06a66f063045b4ebdacaf3127d" class="flex justify-between flink-menu-item"> + <a href="/use-cases/" class="">Use Cases</a> + </label> + + + + + + + + + + + + + <label for="section-0f1863835376e859ac438ae9529daff2" class="flex justify-between flink-menu-item"> + <a href="/powered-by/" class="">Powered By</a> + </label> + + + + + + <br/> + + + + + + + + + + + <label for="section-f383f23a96a43d8d0cc66aeb0237e26a" class="flex justify-between flink-menu-item"> + <a href="/downloads/" class="">Downloads</a> + </label> + + + + + + + + + + + + <input type="checkbox" id="section-c727fab97b4d77e5b28ce8c448fb9000" class="toggle" /> + <label for="section-c727fab97b4d77e5b28ce8c448fb9000" class="flex justify-between flink-menu-item">Getting Started<span>▾</span> + </label> + + <ul> + + <li> + + + + + + + + +<label for="section-f45abaa99ab076108b9a5b94edbc6647" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="https://nightlies.apache.org/flink/flink-docs-stable/docs/try-flink/local_installation/" style="color:black" class="">With Flink <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + </li> + + <li> + + + + + + + + +<label for="section-efe2166e9dce6f72e126dcc2396b4402" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="https://nightlies.apache.org/flink/flink-statefun-docs-stable/getting-started/project-setup.html" style="color:black" class="">With Flink Stateful Functions <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + </li> + + <li> + + + + + + + + +<label for="section-7e268d0a469b1093bb33d71d093eb7b9" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="https://nightlies.apache.org/flink/flink-ml-docs-stable/docs/try-flink-ml/quick-start/" style="color:black" class="">With Flink ML <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + </li> + + <li> + + + + + + + + +<label for="section-cc7147cd0441503127bfaf6f219d4fbb" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-stable/docs/try-flink-kubernetes-operator/quick-start/" style="color:black" class="">With Flink Kubernetes Operator <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + </li> + + <li> + + + + + + + + +<label for="section-660ca694e416d8ca9176dda52a60d637" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="https://nightlies.apache.org/flink/flink-table-store-docs-stable/docs/try-table-store/quick-start/" style="color:black" class="">With Flink Table Store <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + </li> + + <li> + + + + + + + + +<label for="section-75db0b47bf4ae9c247aadbba5fbd720d" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="https://nightlies.apache.org/flink/flink-docs-stable/docs/learn-flink/overview/" style="color:black" class="">Training Course <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + </li> + + </ul> + + + + + + + + + + <input type="checkbox" id="section-6318075fef29529089951a49d413d083" class="toggle" /> + <label for="section-6318075fef29529089951a49d413d083" class="flex justify-between flink-menu-item">Documentation<span>▾</span> + </label> + + <ul> + + <li> + + + + + + + + +<label for="section-9a8122d8912450484d1c25394ad40229" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="https://nightlies.apache.org/flink/flink-docs-stable/" style="color:black" class="">Flink 1.17 (stable) <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + </li> + + <li> + + + + + + + + +<label for="section-8b2fd3efb702be3783ba98d650707e3c" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="https://nightlies.apache.org/flink/flink-docs-master/" style="color:black" class="">Flink Master (snapshot) <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + </li> + + <li> + + + + + + + + +<label for="section-5317a079cddb964c59763c27607f43d9" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="https://nightlies.apache.org/flink/flink-statefun-docs-stable/" style="color:black" class="">Stateful Functions 3.2 (stable) <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + </li> + + <li> + + + + + + + + +<label for="section-25b72f108b7156e94d91b04853d8813a" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="https://nightlies.apache.org/flink/flink-statefun-docs-master" style="color:black" class="">Stateful Functions Master (snapshot) <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + </li> + + <li> + + + + + + + + +<label for="section-13a02f969904a2455a39ed90e287593f" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="https://nightlies.apache.org/flink/flink-ml-docs-stable/" style="color:black" class="">ML 2.2 (stable) <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + </li> + + <li> + + + + + + + + +<label for="section-6d895ec5ad127a29a6a9ce101328ccdf" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="https://nightlies.apache.org/flink/flink-ml-docs-master" style="color:black" class="">ML Master (snapshot) <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + </li> + + <li> + + + + + + + + +<label for="section-c83ad0caf34e364bf3729badd233a350" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-stable/" style="color:black" class="">Kubernetes Operator 1.4 (latest) <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + </li> + + <li> + + + + + + + + +<label for="section-a2c75d90005425982ba8f26ae0e160a3" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main" style="color:black" class="">Kubernetes Operator Main (snapshot) <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + </li> + + <li> + + + + + + + + +<label for="section-07b85e4b2f61b1526bf202c64460abcd" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="https://nightlies.apache.org/flink/flink-table-store-docs-stable/" style="color:black" class="">Table Store 0.3 (stable) <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + </li> + + <li> + + + + + + + + +<label for="section-9b9a0032b1e858a34c125d828d1a0718" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="https://nightlies.apache.org/flink/flink-table-store-docs-master/" style="color:black" class="">Table Store Master (snapshot) <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + </li> + + </ul> + + + + + + + + + + + <label for="section-63d6a565d79aa2895f70806a46021c07" class="flex justify-between flink-menu-item"> + <a href="/getting-help/" class="">Getting Help</a> + </label> + + + + + + + + + + + + + + + +<label for="section-1d5066022b83f4732dc80f4e9eaa069a" class="flex justify-between flink-menu-item"> + <a href="https://flink-packages.org/" style="color:black" class="">flink-packages.org <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + + + + <br/> + + + + + + + + + + + <label for="section-7821b78a97db9e919426e86121a7be9c" class="flex justify-between flink-menu-item"> + <a href="/community/" class="">Community & Project Info</a> + </label> + + + + + + + + + + + + + <label for="section-8c042831df4e371c4ef9375f1df06f35" class="flex justify-between flink-menu-item"> + <a href="/roadmap/" class="">Roadmap</a> + </label> + + + + + + + + + + + + <input type="checkbox" id="section-73117efde5302fddcb193307d582b588" class="toggle" /> + <label for="section-73117efde5302fddcb193307d582b588" class="flex justify-between flink-menu-item">How to Contribute<span>▾</span> + </label> + + <ul> + + <li> + + + + + + <label for="section-6646b26b23a3e79b8de9c552ee76f6dd" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="/how-to-contribute/overview/" class="">Overview</a> + </label> + + + </li> + + <li> + + + + + + <label for="section-e6ab9538b82cd5f94103b971adb7c1a9" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="/how-to-contribute/contribute-code/" class="">Contribute Code</a> + </label> + + + </li> + + <li> + + + + + + <label for="section-1c09e1358485e82d9b3f5f689d4ced65" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="/how-to-contribute/reviewing-prs/" class="">Review Pull Requests</a> + </label> + + + </li> + + <li> + + + + + + <label for="section-ed01e0defd235498fa3c9a2a0b3302fb" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="/how-to-contribute/code-style-and-quality-preamble/" class="">Code Style and Quality Guide</a> + </label> + + + </li> + + <li> + + + + + + <label for="section-4e8d5e9924cf15f397711b0d82e15650" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="/how-to-contribute/contribute-documentation/" class="">Contribute Documentation</a> + </label> + + + </li> + + <li> + + + + + + <label for="section-ddaa8307917e5ba7f60ba3316711e492" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="/how-to-contribute/documentation-style-guide/" class="">Documentation Style Guide</a> + </label> + + + </li> + + <li> + + + + + + <label for="section-390a72c171cc82f180a308b95fc3aa72" class="flex justify-between flink-menu-item flink-menu-child"> + <a href="/how-to-contribute/improve-website/" class="">Contribute to the Website</a> + </label> + + + </li> + + </ul> + + + + + + + + + + + <label for="section-9d3ddfd487223d5a199ba301f25c88c6" class="flex justify-between flink-menu-item"> + <a href="/security/" class="">Security</a> + </label> + + + + + + <br/> + + + + + + + + + + <label for="section-a07783f405300745807d39eacf150420" class="flex justify-between flink-menu-item"> + <a href="/posts/" class="">Flink Blog</a> + </label> + + + + + + + + + + + + + + + + + + + + + + + + + +<br/> +<hr class="menu-break"> + + +<label for="section-f71a7070dbb7b669824a6441408ded70" class="flex justify-between flink-menu-item"> + <a href="https://github.com/apache/flink" style="color:black" class="">Flink on GitHub <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + +<label for="section-2ccaaab8c67f3105bbf7df75faca8027" class="flex justify-between flink-menu-item"> + <a href="https://twitter.com/apacheflink" style="color:black" class="">@ApacheFlink <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label> + + + +<hr class="menu-break"> +<table> + <tr> + <th colspan="2"> +<label for="section-78c2028200542d78f8c1a8f6b4cbb36b" class="flex justify-between flink-menu-item"> + <a href="https://www.apache.org/" style="color:black" class="">Apache Software Foundation <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label></th> + </tr> + <tr> + <td> +<label for="section-794df3791a8c800841516007427a2aa3" class="flex justify-between flink-menu-item"> + <a href="https://www.apache.org/licenses/" style="color:black" class="">License <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label></td> + <td> +<label for="section-2fae32629d4ef4fc6341f1751b405e45" class="flex justify-between flink-menu-item"> + <a href="https://www.apache.org/security/" style="color:black" class="">Security <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label></td> + </tr> + <tr> + <td> +<label for="section-0584e445d656b83b431227bb80ff0c30" class="flex justify-between flink-menu-item"> + <a href="https://www.apache.org/foundation/sponsorship.html" style="color:black" class="">Donate <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label></td> + <td> +<label for="section-00d06796e489999226fb5bb27fe1b3b2" class="flex justify-between flink-menu-item"> + <a href="https://www.apache.org/foundation/thanks.html" style="color:black" class="">Thanks <i class="link fa fa-external-link title" aria-hidden="true"></i></a> +</label></td> + </tr> +</table> + +<hr class="menu-break"> + + + + + + + + + + + + + +<a href="/zh/" class="flex align-center"> + <i class="fa fa-globe" aria-hidden="true"></i> + 中文版 +</a> + +<script src="/js/track-search-terms.js"></script> + + +</nav> + + + + + <script>(function(){var e=document.querySelector("aside.book-menu nav");addEventListener("beforeunload",function(){localStorage.setItem("menu.scrollTop",e.scrollTop)}),e.scrollTop=localStorage.getItem("menu.scrollTop")})()</script> + + + + </aside> + + <div class="book-page"> + <header class="book-header"> + + <div class="flex align-center justify-between"> + <label for="menu-control"> + <img src="/svg/menu.svg" class="book-icon" alt="Menu" /> + </label> + + <strong>Howto create a batch source with the new Source framework</strong> + + <label for="toc-control"> + + <img src="/svg/toc.svg" class="book-icon" alt="Table of Contents" /> + + </label> +</div> + + + + <aside class="hidden clearfix"> + + + +<nav id="TableOfContents"><h3>On This Page <button class="toc" onclick="collapseToc()"><i class="fa fa-compress" aria-hidden="true"></i></button></h3> + <ul> + <li><a href="#introduction">Introduction</a></li> + <li><a href="#implementing-the-source-components">Implementing the source components</a> + <ul> + <li><a href="#source">Source</a></li> + <li><a href="#sourcereader">SourceReader</a></li> + <li><a href="#split-and-splitstate">Split and SplitState</a></li> + <li><a href="#splitenumerator-and-splitenumeratorstate">SplitEnumerator and SplitEnumeratorState</a></li> + <li><a href="#splitreader">SplitReader</a></li> + <li><a href="#recordemitter">RecordEmitter</a></li> + <li><a href="#serializers">Serializers</a></li> + </ul> + </li> + <li><a href="#testing-the-source">Testing the source</a></li> + <li><a href="#conclusion">Conclusion</a></li> + </ul> +</nav> + + + </aside> + + + </header> + + + + + + + +<article class="markdown"> + <h1> + <a href="/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </h1> + + May 3, 2023 - + + + + Etienne Chauchot + + <a href="https://twitter.com/echauchot">(@echauchot)</a> + + + + + <p><h2 id="introduction"> + Introduction + <a class="anchor" href="#introduction">#</a> +</h2> +<p>The Flink community has +designed <a href="https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/sources/">a new Source framework</a> +based +on <a href="https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface">FLIP-27</a> +lately. Some connectors have migrated to this new framework. This article is a how-to for creating a +batch +source using this new framework. It was built while implementing +the <a href="https://github.com/apache/flink-connector-cassandra/commit/72e3bef1fb9ee6042955b5e9871a9f70a8837cca">Flink batch source</a> +for <a href="https://cassandra.apache.org/_/index.html">Cassandra</a>. +If you are interested in contributing or migrating connectors, this blog post is for you!</p> +<h2 id="implementing-the-source-components"> + Implementing the source components + <a class="anchor" href="#implementing-the-source-components">#</a> +</h2> +<p>The source architecture is depicted in the diagrams below:</p> +<p><img src="/img/blog/2023-05-03-howto-create-batch-source/source_components.svg" alt="" /></p> +<p><img src="/img/blog/2023-05-03-howto-create-batch-source/source_reader.svg" alt="" /></p> +<h3 id="source"> + Source + <a class="anchor" href="#source">#</a> +</h3> +<p><a href="https://github.com/apache/flink-connector-cassandra/blob/d92dc8d891098a9ca6a7de6062b4630079beaaef/flink-connector-cassandra/src/main/java/org/apache/flink/connector/cassandra/source/CassandraSource.java">Example Cassandra Source</a></p> +<p>The source interface only does the “glue” between all the other components. Its role is to +instantiate all of them and to define the +source <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/connector/source/Boundedness.html">Boundedness</a> +. We also do the source configuration +here along with user configuration validation.</p> +<h3 id="sourcereader"> + SourceReader + <a class="anchor" href="#sourcereader">#</a> +</h3> +<p><a href="https://github.com/apache/flink-connector-cassandra/blob/d92dc8d891098a9ca6a7de6062b4630079beaaef/flink-connector-cassandra/src/main/java/org/apache/flink/connector/cassandra/source/reader/CassandraSourceReader.java">Example Cassandra SourceReader</a></p> +<p>As shown in the graphic above, the instances of +the <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/connector/source/SourceReader.html">SourceReader</a> +(which we will call simply readers +in the continuation of this article) run in parallel in task managers to read the actual data which +is divided into <a href="#split-and-splitstate">Splits</a>. Readers request splits from +the <a href="#splitenumerator-and-splitenumeratorstate">SplitEnumerator</a> and the resulting splits are +assigned to them in return.</p> +<p>Flink provides +the <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/base/source/reader/SourceReaderBase.html">SourceReaderBase</a> +implementation that takes care of all the threading. Flink also provides a useful extension to +this class for most +cases: <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/base/source/reader/SingleThreadMultiplexSourceReaderBase.html">SingleThreadMultiplexSourceReaderBase</a> +. This class has the threading model already configured: +each <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/base/source/reader/splitreader/SplitReader.html">SplitReader</a> +instance reads splits using one thread (but there are several SplitReader instances that live among +task +managers).</p> +<p>What we have left to do in the SourceReader class is:</p> +<ul> +<li>Provide a <a href="#splitreader">SplitReader</a> supplier</li> +<li>Create a <a href="#recordemitter">RecordEmitter</a></li> +<li>Create the shared resources for the SplitReaders (sessions, etc…). As the SplitReader supplier +is +created in the SourceReader constructor in a super() call, using a SourceReader factory to create +the shared resources and pass them to the supplier is a good idea.</li> +<li>Implement <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/connector/source/SourceReader.html#start--">start()</a>: +here we should ask the enumerator for our first split</li> +<li>Override <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/base/source/reader/SourceReaderBase.html#close--">close()</a> +in SourceReaderBase parent class to free up any created resources (the shared +resources for example)</li> +<li>Implement <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/base/source/reader/SourceReaderBase.html#initializedState-SplitT-">initializedState()</a> +to create a mutable <a href="#split-and-splitstate">SplitState</a> from a Split</li> +<li>Implement <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/base/source/reader/SourceReaderBase.html#toSplitType-java.lang.String-SplitStateT-">toSplitType()</a> +to create a Split from the mutable SplitState</li> +<li>Implement <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/base/source/reader/SourceReaderBase.html#onSplitFinished-java.util.Map-">onSplitFinished()</a>: +here, as it is a batch source (finite data), we should ask the +Enumerator for next split</li> +</ul> +<h3 id="split-and-splitstate"> + Split and SplitState + <a class="anchor" href="#split-and-splitstate">#</a> +</h3> +<p><a href="https://github.com/apache/flink-connector-cassandra/blob/d92dc8d891098a9ca6a7de6062b4630079beaaef/flink-connector-cassandra/src/main/java/org/apache/flink/connector/cassandra/source/split/CassandraSplit.java">Example Cassandra Split</a></p> +<p>The <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/connector/source/SourceSplit.html">SourceSplit</a> +represents a partition of the source data. What defines a split depends on the +backend we are reading from. It could be a <em>(partition start, partition end)</em> tuple or an <em>(offset, +split size)</em> tuple for example.</p> +<p>In any case, the Split object should be seen as an immutable object: any update to it should be done +on the +associated <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/base/source/reader/SourceReaderBase.html">SplitState</a>. +The split state is the one that will be stored inside the Flink +<a href="https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/stateful-stream-processing/#checkpointing">checkpoints</a> +. A checkpoint may happen between 2 fetches for 1 split. So, if we’re reading a split, we +must store in the split state the current state of the reading process. This current state needs to +be something serializable (because it will be part of a checkpoint) and something that the backend +source can resume from. That way, in case of failover, the reading could be resumed from where it +was left off. Thus we ensure there will be no duplicates or lost data.<br> +For example, if the records +reading order is deterministic in the backend, then the split state can store the number <em>n</em> of +already read records to restart at <em>n+1</em> after failover.</p> +<h3 id="splitenumerator-and-splitenumeratorstate"> + SplitEnumerator and SplitEnumeratorState + <a class="anchor" href="#splitenumerator-and-splitenumeratorstate">#</a> +</h3> +<p><a href="https://github.com/apache/flink-connector-cassandra/blob/d92dc8d891098a9ca6a7de6062b4630079beaaef/flink-connector-cassandra/src/main/java/org/apache/flink/connector/cassandra/source/enumerator/CassandraSplitEnumerator.java">Example Cassandra SplitEnumerator</a> +and <a href="https://github.com/apache/flink-connector-cassandra/blob/d92dc8d891098a9ca6a7de6062b4630079beaaef/flink-connector-cassandra/src/main/java/org/apache/flink/connector/cassandra/source/enumerator/CassandraEnumeratorState.java">SplitEnumeratorState</a></p> +<p>The <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/connector/source/SplitEnumerator.html">SplitEnumerator</a> +is responsible for creating the splits and serving them to the readers. Whenever +possible, it is preferable to generate the splits lazily, meaning that each time a reader asks the +enumerator for a split, the enumerator generates one on demand and assigns it to the reader. For +that we +implement <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/connector/source/SplitEnumerator.html#handleSplitRequest-int-java.lang.String-">SplitEnumerator#handleSplitRequest()</a> +. Lazy splits generation is preferable to +splits discovery, in which we pre-generate all the splits and store them waiting to assign them to +the readers. Indeed, in some situations, the number of splits can be enormous and consume a lot a +memory which could be problematic in case of straggling readers. The framework offers the ability to +act upon reader registration by +implementing <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/connector/source/SplitEnumerator.html#addReader-int-">addReader()</a> +but, as we do lazy splits generation, we +have nothing to do there. In some cases, generating a split is too costly, so we can pre-generate a +batch (not all) of splits to amortize this cost. The number/size of batched splits need to be taken +into account to avoid consuming too much memory.</p> +<p>Long story short, the tricky part of the source implementation is splitting the source data. The +good equilibrium to find is not to have too many splits (which could lead to too much memory +consumption) nor too few (which could lead to sub-optimal parallelism). One good way to meet this +equilibrium is to evaluate the size of the source data upfront and allow the user to specify the +maximum memory a split will take. That way they can configure this parameter accordingly to the +memory +available on the task managers. This parameter is optional so the source needs to provide a default +value. Also, the source needs to control that the user provided max-split-size is not too little +which would +lead to too many splits. The general rule of thumb is to let the user some freedom but protect him +from unwanted behavior. +For these safety measures, rigid thresholds +don’t work well as the source may start to fail when the thresholds are suddenly exceeded.<br> +For example if we enforce that the number of splits is below twice the parallelism, if +the job is regularly run on a growing table, at some point there will be +more and more splits of max-split-size and the threshold will be exceeded. Of course, the size of +the source data needs to be evaluated without +reading the actual data. For the Cassandra connector it was +done <a href="https://echauchot.blogspot.com/2023/03/cassandra-evaluate-table-size-without.html">like this</a>.</p> +<p>Another important topic is state. If the job manager fails, the split enumerator needs to recover. +For that, as for the split, we need to provide a state for the enumerator that will be part of a +checkpoint. Upon recovery, the enumerator is reconstructed and +receives <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/connector/source/SplitEnumerator.html">an enumerator state</a> +for recovering its previous state. Upon checkpointing, the +enumerator returns its state when <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/connector/source/SplitEnumerator.html#snapshotState-long-">SplitEnumerator#snapshotState()</a> +is called. The state +must contain everything needed to resume where the enumerator was left off after failover. In lazy +split generation scenario, the state will contain everything needed to generate the next split +whenever asked to. It can be for example the start offset of next split, split size, number of +splits still to generate etc… But the SplitEnumeratorState must also contain a list of splits, not +the list of discovered splits, but a list of splits to reassign. Indeed, whenever a reader fails, if +it was assigned splits after last checkpoint, then the checkpoint will not contain those splits. +Consequently, upon restoration, the reader won’t have the splits assigned anymore. There is a +callback to deal with that +case: <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/connector/source/SplitEnumerator.html#addSplitsBack-java.util.List-int-">addSplitsBack()</a> +. There, the splits that were assigned to the +failing reader, can be put back into the enumerator state for later re-assignment to readers. There +is no memory size risk here as the number of splits to reassign is pretty low.</p> +<p>The above topics are the more important regarding splitting. There are 2 methods left to implement: +the +usual <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/connector/source/SplitEnumerator.html#start--">start()</a> +/<a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/connector/source/SplitEnumerator.html#close--">close()</a> +methods for resources creation/disposal. Regarding implementing start(), +the Flink connector framework +provides <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/connector/source/SplitEnumeratorContext.html#callAsync-java.util.concurrent.Callable-java.util.function.BiConsumer-long-long-">enumeratorContext#callAsync()</a> +utility to run long processing +asynchronously such as splits preparation or splits discovery (if lazy splits generation is +impossible). Indeed, the start() method runs in the source coordinator thread, +we don’t want to block it for a long time.</p> +<h3 id="splitreader"> + SplitReader + <a class="anchor" href="#splitreader">#</a> +</h3> +<p><a href="https://github.com/apache/flink-connector-cassandra/blob/d92dc8d891098a9ca6a7de6062b4630079beaaef/flink-connector-cassandra/src/main/java/org/apache/flink/connector/cassandra/source/reader/CassandraSplitReader.java">Example Cassandra SplitReader</a></p> +<p>This class is responsible for reading the actual splits that it receives when the framework +calls <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/base/source/reader/splitreader/SplitReader.html#handleSplitsChanges-org.apache.flink.connector.base.source.reader.splitreader.SplitsChange-">handleSplitsChanges()</a> +. The main part of the split reader is +the <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/base/source/reader/splitreader/SplitReader.html#fetch--">fetch()</a> +implementation where we read all the splits received and return the read records as +a <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/base/source/reader/RecordsBySplits.html">RecordsBySplits</a> +object. This object contains a map of the split ids to the belonging records and also the ids of the +finished splits. Important points need to be considered:</p> +<ul> +<li>The fetch call must be non-blocking. If any call in its code is synchronous and potentially long, +an +escape from the fetch() must be provided. When the framework +calls <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/base/source/reader/splitreader/SplitReader.html#wakeUp--">wakeUp()</a> +we should interrupt the +fetch for example by setting an AtomicBoolean.</li> +<li>Fetch call needs to be re-entrant: an already read split must not be re-read. We should remove it +from the list of splits to read and add its id to the finished splits (along with empty splits) in +the RecordsBySplits that we return.</li> +</ul> +<p>It is totally fine for the implementer to exit the fetch() method early. Also a failure could +interrupt the fetch. In both cases the framework will call fetch() again later on. In that case, the +fetch method must resume the reading from where it was left off using the split state already +discussed. If resuming the read of a split is impossible because of backend constraints, then the +only solution is to read splits atomically (either not read the split at all, or read it entirely). +That way, in case of interrupted fetch, nothing will be output and the split could be read again +from the beginning at next fetch call leading to no duplicates. But if the split is read entirely, +there are points to consider:</p> +<ul> +<li>We should ensure that the total split content (records from the source) fits in memory for example +by specifying a max split size in bytes ( +see <a href="#splitenumerator-and-splitenumeratorstate">SplitEnumarator</a>)</li> +<li>The split state becomes useless, only a Split class is needed</li> +</ul> +<h3 id="recordemitter"> + RecordEmitter + <a class="anchor" href="#recordemitter">#</a> +</h3> +<p><a href="https://github.com/apache/flink-connector-cassandra/blob/d92dc8d891098a9ca6a7de6062b4630079beaaef/flink-connector-cassandra/src/main/java/org/apache/flink/connector/cassandra/source/reader/CassandraRecordEmitter.java">Example Cassandra RecordEmitter</a></p> +<p>The SplitReader reads records in the form +of <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/base/source/reader/splitreader/SplitReader.html">an intermediary record format</a> +that the implementer +provides for each record. It can be the raw format returned by the backend or any format allowing to +extract the actual record afterwards. This format is not the final output format expected by the +source. It contains anything needed to do the conversion to the record output format. We need to +implement <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/base/source/reader/RecordEmitter.html#emitRecord-E-org.apache.flink.api.connector.source.SourceOutput-SplitStateT-">RecordEmitter#emitRecord()</a> +to do this conversion. A good pattern here is to initialize the +RecordEmitter with a mapping Function. The implementation must be idempotent. Indeed the method +maybe interrupted in the middle. In that case, the same set of records will be passed to the record +emitter again later.</p> +<h3 id="serializers"> + Serializers + <a class="anchor" href="#serializers">#</a> +</h3> +<p><a href="https://github.com/apache/flink-connector-cassandra/blob/d92dc8d891098a9ca6a7de6062b4630079beaaef/flink-connector-cassandra/src/main/java/org/apache/flink/connector/cassandra/source/split/CassandraSplitSerializer.java">Example Cassandra SplitSerializer</a> +and <a href="https://github.com/apache/flink-connector-cassandra/blob/d92dc8d891098a9ca6a7de6062b4630079beaaef/flink-connector-cassandra/src/main/java/org/apache/flink/connector/cassandra/source/enumerator/CassandraEnumeratorStateSerializer.java">SplitEnumeratorStateSerializer</a></p> +<p>We need to provide singleton serializers for:</p> +<ul> +<li>Split: splits are serialized when sending them from enumerator to reader, and when checkpointing +the reader’s current state</li> +<li>SplitEnumeratorState: the serializer is used for the result of the +SplitEnumerator#snapshotState()</li> +</ul> +<p>For both, we need to +implement <a href="https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/core/io/SimpleVersionedSerializer.html">SimpleVersionedSerializer</a> +. Care needs to be taken at some important points:</p> +<ul> +<li>Using Java serialization +is <a href="https://flink.apache.org/contributing/code-style-and-quality-java.html#java-serialization">forbidden</a> +in Flink mainly for migration concerns. We should rather manually write the fields of the objects +using ObjectOutputStream. When a class is not supported by the ObjectOutputStream (not String, +Integer, Long…), we should write the size of the object in bytes as an Integer and then write +the object converted to byte[]. Similar method is used to serialize collections. First write the +number of elements of the collection, then serialize all the contained objects. Of course, for +deserialization we do the exact same reading with the same order.</li> +<li>There can be a lot of splits, so we should cache the OutputStream used in SplitSerializer. We can +do so by using.</li> +</ul> +<p><code> ThreadLocal<DataOutputSerializer> SERIALIZER_CACHE = ThreadLocal.withInitial(() -> new DataOutputSerializer(64));</code></p> +<p>The initial stream size depends on the size of a split.</p> +<h2 id="testing-the-source"> + Testing the source + <a class="anchor" href="#testing-the-source">#</a> +</h2> +<p>For the sake of concision of this article, testing the source will be the object of the next +article. Stay tuned !</p> +<h2 id="conclusion"> + Conclusion + <a class="anchor" href="#conclusion">#</a> +</h2> +<p>This article gathering the implementation field feedback was needed as the javadocs cannot cover all +the implementation details for high-performance and maintainable sources. I hope you enjoyed reading +and that it gave you the desire to contribute a new connector to the Flink project !</p> +</p> +</article> + + + + <footer class="book-footer"> + + + + + + + +<a href="https://cwiki.apache.org/confluence/display/FLINK/Flink+Translation+Specifications">Want to contribute translation?</a> +<br><br> +<a href="//github.com/apache/flink-web/edit/asf-site/docs/content/posts/2023-05-03-howto-create-batch-source.md" style="color:black"><i class="fa fa-edit fa-fw"></i>Edit This Page</a> + + + + + </footer> + + + + <div class="book-comments"> + +</div> + + + + <label for="menu-control" class="hidden book-menu-overlay"></label> + </div> + + + <aside class="book-toc"> + + + +<nav id="TableOfContents"><h3>On This Page <button class="toc" onclick="collapseToc()"><i class="fa fa-compress" aria-hidden="true"></i></button></h3> + <ul> + <li><a href="#introduction">Introduction</a></li> + <li><a href="#implementing-the-source-components">Implementing the source components</a> + <ul> + <li><a href="#source">Source</a></li> + <li><a href="#sourcereader">SourceReader</a></li> + <li><a href="#split-and-splitstate">Split and SplitState</a></li> + <li><a href="#splitenumerator-and-splitenumeratorstate">SplitEnumerator and SplitEnumeratorState</a></li> + <li><a href="#splitreader">SplitReader</a></li> + <li><a href="#recordemitter">RecordEmitter</a></li> + <li><a href="#serializers">Serializers</a></li> + </ul> + </li> + <li><a href="#testing-the-source">Testing the source</a></li> + <li><a href="#conclusion">Conclusion</a></li> + </ul> +</nav> + + + </aside> + <aside class="expand-toc"> + <button class="toc" onclick="expandToc()"> + <i class="fa fa-expand" aria-hidden="true"></i> + </button> + </aside> + + </main> + + +</body> + +</html> + + + + + + + + + + + + diff --git a/content/en/sitemap.xml b/content/en/sitemap.xml index 005861731..04fad7ddc 100644 --- a/content/en/sitemap.xml +++ b/content/en/sitemap.xml @@ -363,7 +363,7 @@ /> </url><url> <loc>https://flink.apache.org/posts/</loc> - <lastmod>2023-04-19T08:00:00+00:00</lastmod> + <lastmod>2023-05-03T08:00:00+00:00</lastmod> </url><url> <loc>https://flink.apache.org/flink-packages/</loc> <xhtml:link @@ -508,12 +508,9 @@ hreflang="en" href="https://flink.apache.org/how-to-contribute/improve-website/" /> - </url><url> - <loc>https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/</loc> - <lastmod>2023-04-19T08:00:00+00:00</lastmod> </url><url> <loc>https://flink.apache.org/</loc> - <lastmod>2023-04-19T08:00:00+00:00</lastmod> + <lastmod>2023-05-03T08:00:00+00:00</lastmod> <xhtml:link rel="alternate" hreflang="zh" @@ -524,6 +521,12 @@ hreflang="en" href="https://flink.apache.org/" /> + </url><url> + <loc>https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/</loc> + <lastmod>2023-05-03T08:00:00+00:00</lastmod> + </url><url> + <loc>https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/</loc> + <lastmod>2023-04-19T08:00:00+00:00</lastmod> </url><url> <loc>https://flink.apache.org/2023/03/23/announcing-the-release-of-apache-flink-1.17/</loc> <lastmod>2023-03-23T08:00:00+00:00</lastmod> diff --git a/content/img/blog/2023-05-03-howto-create-batch-source/source_components.svg b/content/img/blog/2023-05-03-howto-create-batch-source/source_components.svg new file mode 100644 index 000000000..8c148fe34 --- /dev/null +++ b/content/img/blog/2023-05-03-howto-create-batch-source/source_components.svg @@ -0,0 +1,20 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +<svg width="855" height="487" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" overflow="hidden"><defs><clipPath id="clip0"><path d="M26 15 881 15 881 502 26 502Z" fill-rule="evenodd" clip-rule="evenodd"/></clipPath></defs><g clip-path="url(#clip0)" transform="translate(-26 -15)"><path d="M26 158.936C26 151.239 32.2394 145 39.936 145L326.064 145C333.761 145 340 151.239 340 158.936L340 298.064C340 305.761 333.761 312 326.064 312L39.936 312C32.2394 312 26 305.7 [...] diff --git a/content/img/blog/2023-05-03-howto-create-batch-source/source_reader.svg b/content/img/blog/2023-05-03-howto-create-batch-source/source_reader.svg new file mode 100644 index 000000000..1d0f3635b --- /dev/null +++ b/content/img/blog/2023-05-03-howto-create-batch-source/source_reader.svg @@ -0,0 +1,20 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +<svg width="1257" height="653" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" overflow="hidden"><defs><clipPath id="clip0"><path d="M-11 41 1246 41 1246 694-11 694Z" fill-rule="evenodd" clip-rule="evenodd"/></clipPath><clipPath id="clip1"><path d="M1017 309 1130 309 1130 420 1017 420Z" fill-rule="evenodd" clip-rule="evenodd"/></clipPath><clipPath id="clip2"><path d="M1017 309 1130 309 1130 420 1017 420Z" fill-rule="evenodd" clip-rule="evenodd"/></clipPath>< [...] diff --git a/content/index.html b/content/index.html index 4cb8b2d9b..b527eb5ab 100644 --- a/content/index.html +++ b/content/index.html @@ -6,7 +6,7 @@ <meta name="generator" content="Hugo 0.111.3"> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> -<meta name="description" content="Apache Flink® — Stateful Computations over Data Streams # All streaming use cases Event-driven Applications Stream & Batch Analytics Data Pipelines & ETL Learn more Guaranteed correctness Exactly-once state consistency Event-time processing Sophisticated late data handling Learn more Layered APIs SQL on Stream & Batch Data DataStream API & DataSet API ProcessFunction (Time & State) Learn more Operational Focus Flexible deployment High [...] +<meta name="description" content="Apache Flink® — Stateful Computations over Data Streams # All streaming use cases Event-driven Applications Stream & Batch Analytics Data Pipelines & ETL Learn more Guaranteed correctness Exactly-once state consistency Event-time processing Sophisticated late data handling Learn more Layered APIs SQL on Stream & Batch Data DataStream API & DataSet API ProcessFunction (Time & State) Learn more Operational Focus Flexible deployment High [...] <meta name="theme-color" content="#FFFFFF"><meta property="og:title" content="Apache Flink® — Stateful Computations over Data Streams" /> <meta property="og:description" content="" /> <meta property="og:type" content="website" /> @@ -1039,6 +1039,11 @@ under the License. + <a href="/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a><br /> + Introduction # The Flink community has designed a new Source framework based on FLIP-27 lately. Some connectors have migrated to this new framework. This article is a how-to for creating a batch source using this new framework. It was built while implementing the Flink batch source for Cassandra. If you are interested in contributing or migrating connectors, this blog post is for you! +Implementing the source components # The source architecture is depicted in the diagrams below: + <br /><br /> + <a href="/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a><br /> The Apache Flink community is excited to announce the release of Flink ML 2.2.0! This release focuses on enriching Flink ML’s feature engineering algorithms. The library now includes 33 feature engineering algorithms, making it a more comprehensive library for feature engineering tasks. With the addition of these algorithms, we believe Flink ML library is ready for use in production jobs that require feature engineering capabilities, whose input can then be consumed by both offline and online machine learning tasks. @@ -1048,12 +1053,6 @@ With the addition of these algorithms, we believe Flink ML library is ready for The Apache Flink PMC is pleased to announce Apache Flink release 1.17.0. Apache Flink is the leading stream processing standard, and the concept of unified stream and batch data processing is being successfully adopted in more and more companies. Thanks to our excellent community and contributors, Apache Flink continues to grow as a technology and remains one of the most active projects in the Apache Software Foundation. Flink 1.17 had 172 contributors enthusiastically participat [...] <br /><br /> - <a href="/2023/03/15/apache-flink-1.15.4-release-announcement/">Apache Flink 1.15.4 Release Announcement</a><br /> - The Apache Flink Community is pleased to announce the fourth bug fix release of the Flink 1.15 series. -This release includes 53 bug fixes, vulnerability fixes, and minor improvements for Flink 1.15. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA. -We highly recommend all users upgrade to Flink 1.15.4. - <br /><br /> - diff --git a/content/index.xml b/content/index.xml index 4cebf5aee..8160ace9e 100644 --- a/content/index.xml +++ b/content/index.xml @@ -6,7 +6,7 @@ <description>Recent content in Apache Flink® — Stateful Computations over Data Streams on Apache Flink</description> <generator>Hugo -- gohugo.io</generator> <language>en-us</language> - <lastBuildDate>Wed, 19 Apr 2023 08:00:00 +0000</lastBuildDate><atom:link href="https://flink.apache.org/index.xml" rel="self" type="application/rss+xml" /> + <lastBuildDate>Wed, 03 May 2023 08:00:00 +0000</lastBuildDate><atom:link href="https://flink.apache.org/index.xml" rel="self" type="application/rss+xml" /> <item> <title>Architecture</title> <link>https://flink.apache.org/what-is-flink/flink-architecture/</link> @@ -371,6 +371,16 @@ Informing visitors about Apache Flink and its features. Encouraging visitors to Obtain the website sources # The website of Apache Flink is hosted in a dedicated git repository which is mirrored to GitHub at https://github.</description> </item> + <item> + <title>Howto create a batch source with the new Source framework</title> + <link>https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/</link> + <pubDate>Wed, 03 May 2023 08:00:00 +0000</pubDate> + + <guid>https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/</guid> + <description>Introduction # The Flink community has designed a new Source framework based on FLIP-27 lately. Some connectors have migrated to this new framework. This article is a how-to for creating a batch source using this new framework. It was built while implementing the Flink batch source for Cassandra. If you are interested in contributing or migrating connectors, this blog post is for you! +Implementing the source components # The source architecture is depicted in the diagrams below:</description> + </item> + <item> <title>Apache Flink ML 2.2.0 Release Announcement</title> <link>https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/</link> diff --git a/content/posts/index.html b/content/posts/index.html index 0a501daed..cabf017b4 100644 --- a/content/posts/index.html +++ b/content/posts/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,30 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </h3> + + May 3, 2023 - + + + + Etienne Chauchot + + <a href="https://twitter.com/echauchot">(@echauchot)</a> + + + + + <p>Introduction # The Flink community has designed a new Source framework based on FLIP-27 lately. Some connectors have migrated to this new framework. This article is a how-to for creating a batch source using this new framework. It was built while implementing the Flink batch source for Cassandra. If you are interested in contributing or migrating connectors, this blog post is for you! +Implementing the source components # The source architecture is depicted in the diagrams below: + <a href="/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">...</a> + + </p> + <a href="/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> @@ -1980,27 +2008,6 @@ Release Notes # Bug # [FLINK-30329] - flink-kubernetes-operator helm chart does <a href="/2022/12/14/apache-flink-kubernetes-operator-1.3.0-release-announcement/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2022/11/25/optimising-the-throughput-of-async-sinks-using-a-custom-ratelimitingstrategy/">Optimising the throughput of async sinks using a custom RateLimitingStrategy</a> - </h3> - - November 25, 2022 - - - - - Hong Liang Teoh - - - - - <p>Introduction # When designing a Flink data processing job, one of the key concerns is maximising job throughput. Sink throughput is a crucial factor because it can determine the entire job’s throughput. We generally want the highest possible write rate in the sink without overloading the destination. However, since the factors impacting a destination’s performance are variable over the job’s lifetime, the sink needs to adjust its write rate dynamically. - <a href="/2022/11/25/optimising-the-throughput-of-async-sinks-using-a-custom-ratelimitingstrategy/">...</a> - - </p> - <a href="/2022/11/25/optimising-the-throughput-of-async-sinks-using-a-custom-ratelimitingstrategy/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -2058,6 +2065,10 @@ Release Notes # Bug # [FLINK-30329] - flink-kubernetes-operator helm chart does <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/index.xml b/content/posts/index.xml index a9c5eb918..a83a0a4d7 100644 --- a/content/posts/index.xml +++ b/content/posts/index.xml @@ -6,7 +6,17 @@ <description>Recent content in Flink Blog on Apache Flink</description> <generator>Hugo -- gohugo.io</generator> <language>en-us</language> - <lastBuildDate>Wed, 19 Apr 2023 08:00:00 +0000</lastBuildDate><atom:link href="https://flink.apache.org/posts/index.xml" rel="self" type="application/rss+xml" /> + <lastBuildDate>Wed, 03 May 2023 08:00:00 +0000</lastBuildDate><atom:link href="https://flink.apache.org/posts/index.xml" rel="self" type="application/rss+xml" /> + <item> + <title>Howto create a batch source with the new Source framework</title> + <link>https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/</link> + <pubDate>Wed, 03 May 2023 08:00:00 +0000</pubDate> + + <guid>https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/</guid> + <description>Introduction # The Flink community has designed a new Source framework based on FLIP-27 lately. Some connectors have migrated to this new framework. This article is a how-to for creating a batch source using this new framework. It was built while implementing the Flink batch source for Cassandra. If you are interested in contributing or migrating connectors, this blog post is for you! +Implementing the source components # The source architecture is depicted in the diagrams below:</description> + </item> + <item> <title>Apache Flink ML 2.2.0 Release Announcement</title> <link>https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/</link> diff --git a/content/posts/page/10/index.html b/content/posts/page/10/index.html index d27e33d71..dec531a60 100644 --- a/content/posts/page/10/index.html +++ b/content/posts/page/10/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,35 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2020/08/04/pyflink-the-integration-of-pandas-into-pyflink/">PyFlink: The integration of Pandas into PyFlink</a> + </h3> + + August 4, 2020 - + + + + Jincheng Sun + + <a href="https://twitter.com/sunjincheng121">(@sunjincheng121)</a> + + + Markos Sfikas + + <a href="https://twitter.com/MarkSfik">(@MarkSfik)</a> + + + + + <p>Python has evolved into one of the most important programming languages for many fields of data processing. So big has been Python’s popularity, that it has pretty much become the default data processing language for data scientists. On top of that, there is a plethora of Python-based data processing tools such as NumPy, Pandas, and Scikit-learn that have gained additional popularity due to their flexibility or powerful functionalities. +Pic source: VanderPlas 2017, slide 52. + <a href="/2020/08/04/pyflink-the-integration-of-pandas-into-pyflink/">...</a> + + </p> + <a href="/2020/08/04/pyflink-the-integration-of-pandas-into-pyflink/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2020/07/30/advanced-flink-application-patterns-vol.3-custom-window-processing/">Advanced Flink Application Patterns Vol.3: Custom Window Processing</a> @@ -1965,30 +1998,6 @@ Streaming Data Visualization # With Zeppelin, you can build a real time streamin <a href="/2020/06/15/flink-on-zeppelin-notebooks-for-interactive-data-analysis-part-1/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2020/06/10/flink-community-update-june20/">Flink Community Update - June'20</a> - </h3> - - June 10, 2020 - - - - - Marta Paes - - <a href="https://twitter.com/morsapaes">(@morsapaes)</a> - - - - - <p>And suddenly it’s June. The previous month has been calm on the surface, but quite hectic underneath — the final testing phase for Flink 1.11 is moving at full speed, Stateful Functions 2.1 is out in the wild and Flink has made it into Google Season of Docs 2020. -To top it off, a piece of good news: Flink Forward is back on October 19-22 as a free virtual event! - <a href="/2020/06/10/flink-community-update-june20/">...</a> - - </p> - <a href="/2020/06/10/flink-community-update-june20/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -2046,6 +2055,10 @@ To top it off, a piece of good news: Flink Forward is back on October 19-22 as a <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/page/11/index.html b/content/posts/page/11/index.html index 96e9cacca..d853567cb 100644 --- a/content/posts/page/11/index.html +++ b/content/posts/page/11/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,30 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2020/06/10/flink-community-update-june20/">Flink Community Update - June'20</a> + </h3> + + June 10, 2020 - + + + + Marta Paes + + <a href="https://twitter.com/morsapaes">(@morsapaes)</a> + + + + + <p>And suddenly it’s June. The previous month has been calm on the surface, but quite hectic underneath — the final testing phase for Flink 1.11 is moving at full speed, Stateful Functions 2.1 is out in the wild and Flink has made it into Google Season of Docs 2020. +To top it off, a piece of good news: Flink Forward is back on October 19-22 as a free virtual event! + <a href="/2020/06/10/flink-community-update-june20/">...</a> + + </p> + <a href="/2020/06/10/flink-community-update-june20/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2020/06/09/stateful-functions-2.1.0-release-announcement/">Stateful Functions 2.1.0 Release Announcement</a> @@ -1965,30 +1993,6 @@ Stateful Functions 2.0 makes it possible to combine StateFun’s powerful approa <a href="/2020/04/07/stateful-functions-2.0-an-event-driven-database-on-apache-flink/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2020/03/30/flink-community-update-april20/">Flink Community Update - April'20</a> - </h3> - - March 30, 2020 - - - - - Marta Paes - - <a href="https://twitter.com/morsapaes">(@morsapaes)</a> - - - - - <p>While things slow down around us, the Apache Flink community is privileged to remain as active as ever. This blogpost combs through the past few months to give you an update on the state of things in Flink — from core releases to Stateful Functions; from some good old community stats to a new development blog. -And since now it’s more important than ever to keep up the spirits, we’d like to invite you to join the Flink Forward Virtual Conference, on April 22-24 (see Upcoming Events). - <a href="/2020/03/30/flink-community-update-april20/">...</a> - - </p> - <a href="/2020/03/30/flink-community-update-april20/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -2046,6 +2050,10 @@ And since now it’s more important than ever to keep up the spirits, we’d <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/page/12/index.html b/content/posts/page/12/index.html index 6052b71f3..b2cd4b494 100644 --- a/content/posts/page/12/index.html +++ b/content/posts/page/12/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,30 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2020/03/30/flink-community-update-april20/">Flink Community Update - April'20</a> + </h3> + + March 30, 2020 - + + + + Marta Paes + + <a href="https://twitter.com/morsapaes">(@morsapaes)</a> + + + + + <p>While things slow down around us, the Apache Flink community is privileged to remain as active as ever. This blogpost combs through the past few months to give you an update on the state of things in Flink — from core releases to Stateful Functions; from some good old community stats to a new development blog. +And since now it’s more important than ever to keep up the spirits, we’d like to invite you to join the Flink Forward Virtual Conference, on April 22-24 (see Upcoming Events). + <a href="/2020/03/30/flink-community-update-april20/">...</a> + + </p> + <a href="/2020/03/30/flink-community-update-april20/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2020/03/27/flink-as-unified-engine-for-modern-data-warehousing-production-ready-hive-integration/">Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</a> @@ -1973,31 +2001,6 @@ Dynamic updates of application logic allow Flink jobs to change at runtime, with <a href="/2020/01/15/advanced-flink-application-patterns-vol.1-case-study-of-a-fraud-detection-system/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2019/12/11/apache-flink-1.8.3-released/">Apache Flink 1.8.3 Released</a> - </h3> - - December 11, 2019 - - - - - Hequn Cheng - - - - - <p>The Apache Flink community released the third bugfix version of the Apache Flink 1.8 series. -This release includes 45 fixes and minor improvements for Flink 1.8.2. The list below includes a detailed list of all fixes and improvements. -We highly recommend all users to upgrade to Flink 1.8.3. -Updated Maven dependencies: -<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.8.3</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.11</artifactId> <version>1.8.3</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients_2.11</artifactId> <versio [...] - <a href="/2019/12/11/apache-flink-1.8.3-released/">...</a> - - </p> - <a href="/2019/12/11/apache-flink-1.8.3-released/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -2055,6 +2058,10 @@ Updated Maven dependencies: <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/page/13/index.html b/content/posts/page/13/index.html index 3a669c717..df68b954e 100644 --- a/content/posts/page/13/index.html +++ b/content/posts/page/13/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,31 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2019/12/11/apache-flink-1.8.3-released/">Apache Flink 1.8.3 Released</a> + </h3> + + December 11, 2019 - + + + + Hequn Cheng + + + + + <p>The Apache Flink community released the third bugfix version of the Apache Flink 1.8 series. +This release includes 45 fixes and minor improvements for Flink 1.8.2. The list below includes a detailed list of all fixes and improvements. +We highly recommend all users to upgrade to Flink 1.8.3. +Updated Maven dependencies: +<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.8.3</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.11</artifactId> <version>1.8.3</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients_2.11</artifactId> <versio [...] + <a href="/2019/12/11/apache-flink-1.8.3-released/">...</a> + + </p> + <a href="/2019/12/11/apache-flink-1.8.3-released/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2019/11/25/how-to-query-pulsar-streams-using-apache-flink/">How to query Pulsar Streams using Apache Flink</a> @@ -1975,30 +2004,6 @@ Updated Maven dependencies: <a href="/2019/07/02/apache-flink-1.8.1-released/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2019/06/26/a-practical-guide-to-broadcast-state-in-apache-flink/">A Practical Guide to Broadcast State in Apache Flink</a> - </h3> - - June 26, 2019 - - - - - Fabian Hueske - - <a href="https://twitter.com/fhueske">(@fhueske)</a> - - - - - <p>Since version 1.5.0, Apache Flink features a new type of state which is called Broadcast State. In this post, we explain what Broadcast State is, and show an example of how it can be applied to an application that evaluates dynamic patterns on an event stream. We walk you through the processing steps and the source code to implement this application in practice. -What is Broadcast State? # The Broadcast State can be used to combine and jointly process two streams of events in a specific way. - <a href="/2019/06/26/a-practical-guide-to-broadcast-state-in-apache-flink/">...</a> - - </p> - <a href="/2019/06/26/a-practical-guide-to-broadcast-state-in-apache-flink/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -2056,6 +2061,10 @@ What is Broadcast State? # The Broadcast State can be used to combine and jointl <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/page/14/index.html b/content/posts/page/14/index.html index 61ba44723..ce220b497 100644 --- a/content/posts/page/14/index.html +++ b/content/posts/page/14/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,30 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2019/06/26/a-practical-guide-to-broadcast-state-in-apache-flink/">A Practical Guide to Broadcast State in Apache Flink</a> + </h3> + + June 26, 2019 - + + + + Fabian Hueske + + <a href="https://twitter.com/fhueske">(@fhueske)</a> + + + + + <p>Since version 1.5.0, Apache Flink features a new type of state which is called Broadcast State. In this post, we explain what Broadcast State is, and show an example of how it can be applied to an application that evaluates dynamic patterns on an event stream. We walk you through the processing steps and the source code to implement this application in practice. +What is Broadcast State? # The Broadcast State can be used to combine and jointly process two streams of events in a specific way. + <a href="/2019/06/26/a-practical-guide-to-broadcast-state-in-apache-flink/">...</a> + + </p> + <a href="/2019/06/26/a-practical-guide-to-broadcast-state-in-apache-flink/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2019/06/05/a-deep-dive-into-flinks-network-stack/">A Deep-Dive into Flink's Network Stack</a> @@ -1957,29 +1985,6 @@ Updated Maven dependencies: <a href="/2019/02/25/apache-flink-1.6.4-released/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2019/02/21/monitoring-apache-flink-applications-101/">Monitoring Apache Flink Applications 101</a> - </h3> - - February 21, 2019 - - - - - Konstantin Knauf - - <a href="https://twitter.com/snntrable">(@snntrable)</a> - - - - - <p>This blog post provides an introduction to Apache Flink’s built-in monitoring and metrics system, that allows developers to effectively monitor their Flink jobs. Oftentimes, the task of picking the relevant metrics to monitor a Flink application can be overwhelming for a DevOps team that is just starting with stream processing and Apache Flink. Having worked with many organizations that deploy Flink at scale, I would like to share my experience and some best practice with the community. - <a href="/2019/02/21/monitoring-apache-flink-applications-101/">...</a> - - </p> - <a href="/2019/02/21/monitoring-apache-flink-applications-101/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -2037,6 +2042,10 @@ Updated Maven dependencies: <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/page/15/index.html b/content/posts/page/15/index.html index 84cc18300..a64c9f08e 100644 --- a/content/posts/page/15/index.html +++ b/content/posts/page/15/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,29 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2019/02/21/monitoring-apache-flink-applications-101/">Monitoring Apache Flink Applications 101</a> + </h3> + + February 21, 2019 - + + + + Konstantin Knauf + + <a href="https://twitter.com/snntrable">(@snntrable)</a> + + + + + <p>This blog post provides an introduction to Apache Flink’s built-in monitoring and metrics system, that allows developers to effectively monitor their Flink jobs. Oftentimes, the task of picking the relevant metrics to monitor a Flink application can be overwhelming for a DevOps team that is just starting with stream processing and Apache Flink. Having worked with many organizations that deploy Flink at scale, I would like to share my experience and some best practice with the community. + <a href="/2019/02/21/monitoring-apache-flink-applications-101/">...</a> + + </p> + <a href="/2019/02/21/monitoring-apache-flink-applications-101/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2019/02/15/apache-flink-1.7.2-released/">Apache Flink 1.7.2 Released</a> @@ -1960,28 +1987,6 @@ Updated Maven dependencies: <a href="/2018/09/20/apache-flink-1.5.4-released/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2018/09/20/apache-flink-1.6.1-released/">Apache Flink 1.6.1 Released</a> - </h3> - - September 20, 2018 - - - - - - - <p>The Apache Flink community released the first bugfix version of the Apache Flink 1.6 series. -This release includes 60 fixes and minor improvements for Flink 1.6.1. The list below includes a detailed list of all fixes. -We highly recommend all users to upgrade to Flink 1.6.1. -Updated Maven dependencies: -<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.6.1</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.11</artifactId> <version>1.6.1</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients_2.11</artifactId> <versio [...] - <a href="/2018/09/20/apache-flink-1.6.1-released/">...</a> - - </p> - <a href="/2018/09/20/apache-flink-1.6.1-released/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -2039,6 +2044,10 @@ Updated Maven dependencies: <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/page/16/index.html b/content/posts/page/16/index.html index 07f26b2b2..04e8283e5 100644 --- a/content/posts/page/16/index.html +++ b/content/posts/page/16/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,28 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2018/09/20/apache-flink-1.6.1-released/">Apache Flink 1.6.1 Released</a> + </h3> + + September 20, 2018 - + + + + + + <p>The Apache Flink community released the first bugfix version of the Apache Flink 1.6 series. +This release includes 60 fixes and minor improvements for Flink 1.6.1. The list below includes a detailed list of all fixes. +We highly recommend all users to upgrade to Flink 1.6.1. +Updated Maven dependencies: +<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.6.1</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.11</artifactId> <version>1.6.1</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients_2.11</artifactId> <versio [...] + <a href="/2018/09/20/apache-flink-1.6.1-released/">...</a> + + </p> + <a href="/2018/09/20/apache-flink-1.6.1-released/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2018/08/21/apache-flink-1.5.3-released/">Apache Flink 1.5.3 Released</a> @@ -1959,35 +1985,6 @@ Updated Maven dependencies: <a href="/2018/02/15/apache-flink-1.4.1-released/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2018/01/30/managing-large-state-in-apache-flink-an-intro-to-incremental-checkpointing/">Managing Large State in Apache Flink: An Intro to Incremental Checkpointing</a> - </h3> - - January 30, 2018 - - - - - Stefan Ricther - - <a href="https://twitter.com/StefanRRicther">(@StefanRRicther)</a> - - - Chris Ward - - <a href="https://twitter.com/chrischinch">(@chrischinch)</a> - - - - - <p>Apache Flink was purpose-built for stateful stream processing. However, what is state in a stream processing application? I defined state and stateful stream processing in a previous blog post, and in case you need a refresher, state is defined as memory in an application’s operators that stores information about previously-seen events that you can use to influence the processing of future events. -State is a fundamental, enabling concept in stream processing required for a majority of complex use cases. - <a href="/2018/01/30/managing-large-state-in-apache-flink-an-intro-to-incremental-checkpointing/">...</a> - - </p> - <a href="/2018/01/30/managing-large-state-in-apache-flink-an-intro-to-incremental-checkpointing/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -2045,6 +2042,10 @@ State is a fundamental, enabling concept in stream processing required for a maj <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/page/17/index.html b/content/posts/page/17/index.html index 21317ac01..bcf040114 100644 --- a/content/posts/page/17/index.html +++ b/content/posts/page/17/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,35 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2018/01/30/managing-large-state-in-apache-flink-an-intro-to-incremental-checkpointing/">Managing Large State in Apache Flink: An Intro to Incremental Checkpointing</a> + </h3> + + January 30, 2018 - + + + + Stefan Ricther + + <a href="https://twitter.com/StefanRRicther">(@StefanRRicther)</a> + + + Chris Ward + + <a href="https://twitter.com/chrischinch">(@chrischinch)</a> + + + + + <p>Apache Flink was purpose-built for stateful stream processing. However, what is state in a stream processing application? I defined state and stateful stream processing in a previous blog post, and in case you need a refresher, state is defined as memory in an application’s operators that stores information about previously-seen events that you can use to influence the processing of future events. +State is a fundamental, enabling concept in stream processing required for a majority of complex use cases. + <a href="/2018/01/30/managing-large-state-in-apache-flink-an-intro-to-incremental-checkpointing/">...</a> + + </p> + <a href="/2018/01/30/managing-large-state-in-apache-flink-an-intro-to-incremental-checkpointing/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2017/12/21/apache-flink-in-2017-year-in-review/">Apache Flink in 2017: Year in Review</a> @@ -1962,25 +1995,6 @@ FLINK-6353 Restoring using CheckpointedRestoring does not work from 1.2 to 1.2 F <a href="/2017/04/26/apache-flink-1.2.1-released/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2017/03/30/continuous-queries-on-dynamic-tables/">Continuous Queries on Dynamic Tables</a> - </h3> - - March 30, 2017 - - - - - - - <p>Analyzing Data Streams with SQL # More and more companies are adopting stream processing and are migrating existing batch applications to streaming or implementing streaming solutions for new use cases. Many of those applications focus on analyzing streaming data. The data streams that are analyzed come from a wide variety of sources such as database transactions, clicks, sensor measurements, or IoT devices. -Apache Flink is very well suited to power streaming analytics applications because it provides support for event-time semantics, stateful exactly-once processing, and achieves high throughput and low latency at the same time. - <a href="/2017/03/30/continuous-queries-on-dynamic-tables/">...</a> - - </p> - <a href="/2017/03/30/continuous-queries-on-dynamic-tables/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -2038,6 +2052,10 @@ Apache Flink is very well suited to power streaming analytics applications becau <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/page/18/index.html b/content/posts/page/18/index.html index 18ea8cc71..1f0003677 100644 --- a/content/posts/page/18/index.html +++ b/content/posts/page/18/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,25 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2017/03/30/continuous-queries-on-dynamic-tables/">Continuous Queries on Dynamic Tables</a> + </h3> + + March 30, 2017 - + + + + + + <p>Analyzing Data Streams with SQL # More and more companies are adopting stream processing and are migrating existing batch applications to streaming or implementing streaming solutions for new use cases. Many of those applications focus on analyzing streaming data. The data streams that are analyzed come from a wide variety of sources such as database transactions, clicks, sensor measurements, or IoT devices. +Apache Flink is very well suited to power streaming analytics applications because it provides support for event-time semantics, stateful exactly-once processing, and achieves high throughput and low latency at the same time. + <a href="/2017/03/30/continuous-queries-on-dynamic-tables/">...</a> + + </p> + <a href="/2017/03/30/continuous-queries-on-dynamic-tables/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2017/03/29/from-streams-to-tables-and-back-again-an-update-on-flinks-table-sql-api/">From Streams to Tables and Back Again: An Update on Flink's Table & SQL API</a> @@ -1927,26 +1950,6 @@ This release is the first major release in the 1.X.X series of releases, which m <a href="/2016/08/04/announcing-apache-flink-1.1.0/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2016/08/04/flink-1.1.1-released/">Flink 1.1.1 Released</a> - </h3> - - August 4, 2016 - - - - - - - <p>Today, the Flink community released Flink version 1.1.1. -The Maven artifacts published on Maven central for 1.1.0 had a Hadoop dependency issue: No Hadoop 1 specific version (with version 1.1.0-hadoop1) was deployed and 1.1.0 artifacts have a dependency on Hadoop 1 instead of Hadoop 2. -This was fixed with this release and we highly recommend all users to use this version of Flink by bumping your Flink dependencies to version 1. - <a href="/2016/08/04/flink-1.1.1-released/">...</a> - - </p> - <a href="/2016/08/04/flink-1.1.1-released/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -2004,6 +2007,10 @@ This was fixed with this release and we highly recommend all users to use this v <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/page/19/index.html b/content/posts/page/19/index.html index 8b75d4c5f..1fa21f411 100644 --- a/content/posts/page/19/index.html +++ b/content/posts/page/19/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,26 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2016/08/04/flink-1.1.1-released/">Flink 1.1.1 Released</a> + </h3> + + August 4, 2016 - + + + + + + <p>Today, the Flink community released Flink version 1.1.1. +The Maven artifacts published on Maven central for 1.1.0 had a Hadoop dependency issue: No Hadoop 1 specific version (with version 1.1.0-hadoop1) was deployed and 1.1.0 artifacts have a dependency on Hadoop 1 instead of Hadoop 2. +This was fixed with this release and we highly recommend all users to use this version of Flink by bumping your Flink dependencies to version 1. + <a href="/2016/08/04/flink-1.1.1-released/">...</a> + + </p> + <a href="/2016/08/04/flink-1.1.1-released/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2016/05/24/stream-processing-for-everyone-with-sql-and-apache-flink/">Stream Processing for Everyone with SQL and Apache Flink</a> @@ -1921,24 +1945,6 @@ Overall, we have seen Flink grow in terms of functionality from an engine to one <a href="/2015/12/18/flink-2015-a-year-in-review-and-a-lookout-to-2016/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2015/12/11/storm-compatibility-in-apache-flink-how-to-run-existing-storm-topologies-on-flink/">Storm Compatibility in Apache Flink: How to run existing Storm topologies on Flink</a> - </h3> - - December 11, 2015 - - - - - - - <p>Apache Storm was one of the first distributed and scalable stream processing systems available in the open source space offering (near) real-time tuple-by-tuple processing semantics. Initially released by the developers at Backtype in 2011 under the Eclipse open-source license, it became popular very quickly. Only shortly afterwards, Twitter acquired Backtype. Since then, Storm has been growing in popularity, is used in production at many big companies, and is the de-facto industr [...] - <a href="/2015/12/11/storm-compatibility-in-apache-flink-how-to-run-existing-storm-topologies-on-flink/">...</a> - - </p> - <a href="/2015/12/11/storm-compatibility-in-apache-flink-how-to-run-existing-storm-topologies-on-flink/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -1996,6 +2002,10 @@ Overall, we have seen Flink grow in terms of functionality from an engine to one <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/page/2/index.html b/content/posts/page/2/index.html index 231fa209f..b437dd388 100644 --- a/content/posts/page/2/index.html +++ b/content/posts/page/2/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,27 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2022/11/25/optimising-the-throughput-of-async-sinks-using-a-custom-ratelimitingstrategy/">Optimising the throughput of async sinks using a custom RateLimitingStrategy</a> + </h3> + + November 25, 2022 - + + + + Hong Liang Teoh + + + + + <p>Introduction # When designing a Flink data processing job, one of the key concerns is maximising job throughput. Sink throughput is a crucial factor because it can determine the entire job’s throughput. We generally want the highest possible write rate in the sink without overloading the destination. However, since the factors impacting a destination’s performance are variable over the job’s lifetime, the sink needs to adjust its write rate dynamically. + <a href="/2022/11/25/optimising-the-throughput-of-async-sinks-using-a-custom-ratelimitingstrategy/">...</a> + + </p> + <a href="/2022/11/25/optimising-the-throughput-of-async-sinks-using-a-custom-ratelimitingstrategy/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2022/11/10/apache-flink-1.15.3-release-announcement/">Apache Flink 1.15.3 Release Announcement</a> @@ -1965,30 +1990,6 @@ Release Highlights # A non-exhaustive list of some of the more exciting features <a href="/2022/07/25/apache-flink-kubernetes-operator-1.1.0-release-announcement/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2022/07/12/apache-flink-ml-2.1.0-release-announcement/">Apache Flink ML 2.1.0 Release Announcement</a> - </h3> - - July 12, 2022 - - - - - Zhipeng Zhang - - - Dong Lin - - - - - <p>The Apache Flink community is excited to announce the release of Flink ML 2.1.0! This release focuses on improving Flink ML’s infrastructure, such as Python SDK, memory management, and benchmark framework, to facilitate the development of performant, memory-safe, and easy-to-use algorithm libraries. We validated the enhanced infrastructure by implementing, benchmarking, and optimizing 10 new algorithms in Flink ML, and confirmed that Flink ML can meet or exceed the performan [...] - <a href="/2022/07/12/apache-flink-ml-2.1.0-release-announcement/">...</a> - - </p> - <a href="/2022/07/12/apache-flink-ml-2.1.0-release-announcement/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -2046,6 +2047,10 @@ Release Highlights # A non-exhaustive list of some of the more exciting features <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/page/20/index.html b/content/posts/page/20/index.html index 213f757d5..2cbfb79c6 100644 --- a/content/posts/page/20/index.html +++ b/content/posts/page/20/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,24 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2015/12/11/storm-compatibility-in-apache-flink-how-to-run-existing-storm-topologies-on-flink/">Storm Compatibility in Apache Flink: How to run existing Storm topologies on Flink</a> + </h3> + + December 11, 2015 - + + + + + + <p>Apache Storm was one of the first distributed and scalable stream processing systems available in the open source space offering (near) real-time tuple-by-tuple processing semantics. Initially released by the developers at Backtype in 2011 under the Eclipse open-source license, it became popular very quickly. Only shortly afterwards, Twitter acquired Backtype. Since then, Storm has been growing in popularity, is used in production at many big companies, and is the de-facto industr [...] + <a href="/2015/12/11/storm-compatibility-in-apache-flink-how-to-run-existing-storm-topologies-on-flink/">...</a> + + </p> + <a href="/2015/12/11/storm-compatibility-in-apache-flink-how-to-run-existing-storm-topologies-on-flink/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2015/12/04/introducing-stream-windows-in-apache-flink/">Introducing Stream Windows in Apache Flink</a> @@ -1921,24 +1943,6 @@ Flink 0. <a href="/2015/05/14/april-2015-in-the-flink-community/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2015/05/11/juggling-with-bits-and-bytes/">Juggling with Bits and Bytes</a> - </h3> - - May 11, 2015 - - - - - - - <p>How Apache Flink operates on binary data # Nowadays, a lot of open-source systems for analyzing large data sets are implemented in Java or other JVM-based programming languages. The most well-known example is Apache Hadoop, but also newer frameworks such as Apache Spark, Apache Drill, and also Apache Flink run on JVMs. A common challenge that JVM-based data analysis engines face is to store large amounts of data in memory - both for caching and for efficient processing such as sor [...] - <a href="/2015/05/11/juggling-with-bits-and-bytes/">...</a> - - </p> - <a href="/2015/05/11/juggling-with-bits-and-bytes/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -1996,6 +2000,10 @@ Flink 0. <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/page/21/index.html b/content/posts/page/21/index.html index cc9f85043..433c19a4f 100644 --- a/content/posts/page/21/index.html +++ b/content/posts/page/21/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,24 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2015/05/11/juggling-with-bits-and-bytes/">Juggling with Bits and Bytes</a> + </h3> + + May 11, 2015 - + + + + + + <p>How Apache Flink operates on binary data # Nowadays, a lot of open-source systems for analyzing large data sets are implemented in Java or other JVM-based programming languages. The most well-known example is Apache Hadoop, but also newer frameworks such as Apache Spark, Apache Drill, and also Apache Flink run on JVMs. A common challenge that JVM-based data analysis engines face is to store large amounts of data in memory - both for caching and for efficient processing such as sor [...] + <a href="/2015/05/11/juggling-with-bits-and-bytes/">...</a> + + </p> + <a href="/2015/05/11/juggling-with-bits-and-bytes/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2015/04/13/announcing-flink-0.9.0-milestone1-preview-release/">Announcing Flink 0.9.0-milestone1 preview release</a> @@ -1923,27 +1945,6 @@ Flink graduation # The biggest news is that the Apache board approved Flink as a <a href="/2014/11/18/hadoop-compatibility-in-flink/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2014/11/04/apache-flink-0.7.0-available/">Apache Flink 0.7.0 available</a> - </h3> - - November 4, 2014 - - - - - - - <p>We are pleased to announce the availability of Flink 0.7.0. This release includes new user-facing features as well as performance and bug fixes, brings the Scala and Java APIs in sync, and introduces Flink Streaming. A total of 34 people have contributed to this release, a big thanks to all of them! -Download Flink 0.7.0 here -See the release changelog here -Overview of major new features # Flink Streaming: The gem of the 0. - <a href="/2014/11/04/apache-flink-0.7.0-available/">...</a> - - </p> - <a href="/2014/11/04/apache-flink-0.7.0-available/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -2001,6 +2002,10 @@ Overview of major new features # Flink Streaming: The gem of the 0. <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/page/22/index.html b/content/posts/page/22/index.html index 7eaa18440..078fda517 100644 --- a/content/posts/page/22/index.html +++ b/content/posts/page/22/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,27 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2014/11/04/apache-flink-0.7.0-available/">Apache Flink 0.7.0 available</a> + </h3> + + November 4, 2014 - + + + + + + <p>We are pleased to announce the availability of Flink 0.7.0. This release includes new user-facing features as well as performance and bug fixes, brings the Scala and Java APIs in sync, and introduces Flink Streaming. A total of 34 people have contributed to this release, a big thanks to all of them! +Download Flink 0.7.0 here +See the release changelog here +Overview of major new features # Flink Streaming: The gem of the 0. + <a href="/2014/11/04/apache-flink-0.7.0-available/">...</a> + + </p> + <a href="/2014/11/04/apache-flink-0.7.0-available/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2014/10/03/upcoming-events/">Upcoming Events</a> @@ -1860,6 +1885,10 @@ What is Flink? # Apache Flink is a general-purpose data processing engine for cl <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/page/3/index.html b/content/posts/page/3/index.html index c4cdce199..e069a7f04 100644 --- a/content/posts/page/3/index.html +++ b/content/posts/page/3/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,30 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2022/07/12/apache-flink-ml-2.1.0-release-announcement/">Apache Flink ML 2.1.0 Release Announcement</a> + </h3> + + July 12, 2022 - + + + + Zhipeng Zhang + + + Dong Lin + + + + + <p>The Apache Flink community is excited to announce the release of Flink ML 2.1.0! This release focuses on improving Flink ML’s infrastructure, such as Python SDK, memory management, and benchmark framework, to facilitate the development of performant, memory-safe, and easy-to-use algorithm libraries. We validated the enhanced infrastructure by implementing, benchmarking, and optimizing 10 new algorithms in Flink ML, and confirmed that Flink ML can meet or exceed the performan [...] + <a href="/2022/07/12/apache-flink-ml-2.1.0-release-announcement/">...</a> + + </p> + <a href="/2022/07/12/apache-flink-ml-2.1.0-release-announcement/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2022/07/11/flip-147-support-checkpoints-after-tasks-finished-part-one/">FLIP-147: Support Checkpoints After Tasks Finished - Part One</a> @@ -1978,33 +2006,6 @@ In this multi-part series, we will present a collection of low-latency technique <a href="/2022/05/18/getting-into-low-latency-gears-with-apache-flink-part-one/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2022/05/11/apache-flink-table-store-0.1.0-release-announcement/">Apache Flink Table Store 0.1.0 Release Announcement</a> - </h3> - - May 11, 2022 - - - - - Jingsong Lee - - - Jiangjie (Becket) Qin - - - - - <p>The Apache Flink community is pleased to announce the preview release of the Apache Flink Table Store (0.1.0). -Please check out the full documentation for detailed information and user guides. -Note: Flink Table Store is still in beta status and undergoing rapid development. We do not recommend that you use it directly in a production environment. -What is Flink Table Store # In the past years, thanks to our numerous contributors and users, Apache Flink has established itself as one of the best distributed computing engines, especially for stateful stream processing at large scale. - <a href="/2022/05/11/apache-flink-table-store-0.1.0-release-announcement/">...</a> - - </p> - <a href="/2022/05/11/apache-flink-table-store-0.1.0-release-announcement/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -2062,6 +2063,10 @@ What is Flink Table Store # In the past years, thanks to our numerous contributo <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/page/4/index.html b/content/posts/page/4/index.html index 2584c2863..266f4b7a2 100644 --- a/content/posts/page/4/index.html +++ b/content/posts/page/4/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,33 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2022/05/11/apache-flink-table-store-0.1.0-release-announcement/">Apache Flink Table Store 0.1.0 Release Announcement</a> + </h3> + + May 11, 2022 - + + + + Jingsong Lee + + + Jiangjie (Becket) Qin + + + + + <p>The Apache Flink community is pleased to announce the preview release of the Apache Flink Table Store (0.1.0). +Please check out the full documentation for detailed information and user guides. +Note: Flink Table Store is still in beta status and undergoing rapid development. We do not recommend that you use it directly in a production environment. +What is Flink Table Store # In the past years, thanks to our numerous contributors and users, Apache Flink has established itself as one of the best distributed computing engines, especially for stateful stream processing at large scale. + <a href="/2022/05/11/apache-flink-table-store-0.1.0-release-announcement/">...</a> + + </p> + <a href="/2022/05/11/apache-flink-table-store-0.1.0-release-announcement/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2022/05/06/exploring-the-thread-mode-in-pyflink/">Exploring the thread mode in PyFlink</a> @@ -1981,29 +2012,6 @@ The binary distribution and source artifacts are now available on the updated Do <a href="/2022/01/31/stateful-functions-3.2.0-release-announcement/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2022/01/20/pravega-flink-connector-101/">Pravega Flink Connector 101</a> - </h3> - - January 20, 2022 - - - - - Yumin Zhou (Brian) - - <a href="https://twitter.com/crazy__zhou">(@crazy__zhou)</a> - - - - - <p>Pravega, which is now a CNCF sandbox project, is a cloud-native storage system based on abstractions for both batch and streaming data consumption. Pravega streams (a new storage abstraction) are durable, consistent, and elastic, while natively supporting long-term data retention. In comparison, Apache Flink is a popular real-time computing engine that provides unified batch and stream processing. Flink provides high-throughput, low-latency computation, as well as support for comp [...] - <a href="/2022/01/20/pravega-flink-connector-101/">...</a> - - </p> - <a href="/2022/01/20/pravega-flink-connector-101/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -2061,6 +2069,10 @@ The binary distribution and source artifacts are now available on the updated Do <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/page/5/index.html b/content/posts/page/5/index.html index 3ecfafcb7..f7e544eba 100644 --- a/content/posts/page/5/index.html +++ b/content/posts/page/5/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,29 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2022/01/20/pravega-flink-connector-101/">Pravega Flink Connector 101</a> + </h3> + + January 20, 2022 - + + + + Yumin Zhou (Brian) + + <a href="https://twitter.com/crazy__zhou">(@crazy__zhou)</a> + + + + + <p>Pravega, which is now a CNCF sandbox project, is a cloud-native storage system based on abstractions for both batch and streaming data consumption. Pravega streams (a new storage abstraction) are durable, consistent, and elastic, while natively supporting long-term data retention. In comparison, Apache Flink is a popular real-time computing engine that provides unified batch and stream processing. Flink provides high-throughput, low-latency computation, as well as support for comp [...] + <a href="/2022/01/20/pravega-flink-connector-101/">...</a> + + </p> + <a href="/2022/01/20/pravega-flink-connector-101/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2022/01/17/apache-flink-1.14.3-release-announcement/">Apache Flink 1.14.3 Release Announcement</a> @@ -1987,31 +2014,6 @@ How data gets passed around between operators # Data shuffling is an important s <a href="/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-one/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-two/">Sort-Based Blocking Shuffle Implementation in Flink - Part Two</a> - </h3> - - October 26, 2021 - - - - - Yingjie Cao (Kevin) - - - Daisy Tsang - - - - - <p>Part one of this blog post explained the motivation behind introducing sort-based blocking shuffle, presented benchmark results, and provided guidelines on how to use this new feature. -Like sort-merge shuffle implemented by other distributed data processing frameworks, the whole sort-based shuffle process in Flink consists of several important stages, including collecting data in memory, sorting the collected data in memory, spilling the sorted data to files, and reading the shuffle data from these spilled files. - <a href="/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-two/">...</a> - - </p> - <a href="/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-two/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -2069,6 +2071,10 @@ Like sort-merge shuffle implemented by other distributed data processing framewo <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/page/6/index.html b/content/posts/page/6/index.html index 4893e67cc..9fb11b233 100644 --- a/content/posts/page/6/index.html +++ b/content/posts/page/6/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,31 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-two/">Sort-Based Blocking Shuffle Implementation in Flink - Part Two</a> + </h3> + + October 26, 2021 - + + + + Yingjie Cao (Kevin) + + + Daisy Tsang + + + + + <p>Part one of this blog post explained the motivation behind introducing sort-based blocking shuffle, presented benchmark results, and provided guidelines on how to use this new feature. +Like sort-merge shuffle implemented by other distributed data processing frameworks, the whole sort-based shuffle process in Flink consists of several important stages, including collecting data in memory, sorting the collected data in memory, spilling the sorted data to files, and reading the shuffle data from these spilled files. + <a href="/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-two/">...</a> + + </p> + <a href="/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-two/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2021/10/19/apache-flink-1.13.3-released/">Apache Flink 1.13.3 Released</a> @@ -1985,30 +2014,6 @@ Updated Maven dependencies: <a href="/2021/08/06/apache-flink-1.13.2-released/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2021/07/07/how-to-identify-the-source-of-backpressure/">How to identify the source of backpressure?</a> - </h3> - - July 7, 2021 - - - - - Piotr Nowojski - - <a href="https://twitter.com/PiotrNowojski">(@PiotrNowojski)</a> - - - - - <p>Backpressure monitoring in the web UI -The backpressure topic was tackled from different angles over the last couple of years. However, when it comes to identifying and analyzing sources of backpressure, things have changed quite a bit in the recent Flink releases (especially with new additions to metrics and the web UI in Flink 1.13). This post will try to clarify some of these changes and go into more detail about how to track down the source of backpressure, but first… - <a href="/2021/07/07/how-to-identify-the-source-of-backpressure/">...</a> - - </p> - <a href="/2021/07/07/how-to-identify-the-source-of-backpressure/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -2066,6 +2071,10 @@ The backpressure topic was tackled from different angles over the last couple of <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/page/7/index.html b/content/posts/page/7/index.html index 6a7acf062..bb37343e4 100644 --- a/content/posts/page/7/index.html +++ b/content/posts/page/7/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,30 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2021/07/07/how-to-identify-the-source-of-backpressure/">How to identify the source of backpressure?</a> + </h3> + + July 7, 2021 - + + + + Piotr Nowojski + + <a href="https://twitter.com/PiotrNowojski">(@PiotrNowojski)</a> + + + + + <p>Backpressure monitoring in the web UI +The backpressure topic was tackled from different angles over the last couple of years. However, when it comes to identifying and analyzing sources of backpressure, things have changed quite a bit in the recent Flink releases (especially with new additions to metrics and the web UI in Flink 1.13). This post will try to clarify some of these changes and go into more detail about how to track down the source of backpressure, but first… + <a href="/2021/07/07/how-to-identify-the-source-of-backpressure/">...</a> + + </p> + <a href="/2021/07/07/how-to-identify-the-source-of-backpressure/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2021/05/28/apache-flink-1.13.1-released/">Apache Flink 1.13.1 Released</a> @@ -1979,31 +2007,6 @@ From release to release, the Flink community has made significant progress in in <a href="/2021/02/10/how-to-natively-deploy-flink-on-kubernetes-with-high-availability-ha/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2021/01/29/apache-flink-1.10.3-released/">Apache Flink 1.10.3 Released</a> - </h3> - - January 29, 2021 - - - - - Xintong Song - - - - - <p>The Apache Flink community released the third bugfix version of the Apache Flink 1.10 series. -This release includes 36 fixes and minor improvements for Flink 1.10.2. The list below includes a detailed list of all fixes and improvements. -We highly recommend all users to upgrade to Flink 1.10.3. -Updated Maven dependencies: -<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.10.3</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.11</artifactId> <version>1.10.3</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients_2.11</artifactId> <vers [...] - <a href="/2021/01/29/apache-flink-1.10.3-released/">...</a> - - </p> - <a href="/2021/01/29/apache-flink-1.10.3-released/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -2061,6 +2064,10 @@ Updated Maven dependencies: <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/page/8/index.html b/content/posts/page/8/index.html index c5c8e1a6e..9621fce4f 100644 --- a/content/posts/page/8/index.html +++ b/content/posts/page/8/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,31 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2021/01/29/apache-flink-1.10.3-released/">Apache Flink 1.10.3 Released</a> + </h3> + + January 29, 2021 - + + + + Xintong Song + + + + + <p>The Apache Flink community released the third bugfix version of the Apache Flink 1.10 series. +This release includes 36 fixes and minor improvements for Flink 1.10.2. The list below includes a detailed list of all fixes and improvements. +We highly recommend all users to upgrade to Flink 1.10.3. +Updated Maven dependencies: +<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.10.3</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.11</artifactId> <version>1.10.3</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients_2.11</artifactId> <vers [...] + <a href="/2021/01/29/apache-flink-1.10.3-released/">...</a> + + </p> + <a href="/2021/01/29/apache-flink-1.10.3-released/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2021/01/19/apache-flink-1.12.1-released/">Apache Flink 1.12.1 Released</a> @@ -1970,31 +1999,6 @@ We strongly recommend all users to upgrade to 2. <a href="/2020/11/11/stateful-functions-2.2.1-release-announcement/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2020/10/15/from-aligned-to-unaligned-checkpoints-part-1-checkpoints-alignment-and-backpressure/">From Aligned to Unaligned Checkpoints - Part 1: Checkpoints, Alignment, and Backpressure</a> - </h3> - - October 15, 2020 - - - - - Arvid Heise - - - Stephan Ewen - - - - - <p>Apache Flink’s checkpoint-based fault tolerance mechanism is one of its defining features. Because of that design, Flink unifies batch and stream processing, can easily scale to both very small and extremely large scenarios and provides support for many operational features like stateful upgrades with state evolution or roll-backs and time-travel. -Despite all these great properties, Flink’s checkpointing method has an Achilles Heel: the speed of a completed checkpoint is determined by the speed at which data flows through the application. - <a href="/2020/10/15/from-aligned-to-unaligned-checkpoints-part-1-checkpoints-alignment-and-backpressure/">...</a> - - </p> - <a href="/2020/10/15/from-aligned-to-unaligned-checkpoints-part-1-checkpoints-alignment-and-backpressure/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -2052,6 +2056,10 @@ Despite all these great properties, Flink’s checkpointing method has an Ac <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/posts/page/9/index.html b/content/posts/page/9/index.html index 68ed8a430..a27f974a8 100644 --- a/content/posts/page/9/index.html +++ b/content/posts/page/9/index.html @@ -880,6 +880,10 @@ https://github.com/alex-shpak/hugo-book <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> @@ -1748,6 +1752,31 @@ https://github.com/alex-shpak/hugo-book + <article class="markdown book-post"> + <h3> + <a href="/2020/10/15/from-aligned-to-unaligned-checkpoints-part-1-checkpoints-alignment-and-backpressure/">From Aligned to Unaligned Checkpoints - Part 1: Checkpoints, Alignment, and Backpressure</a> + </h3> + + October 15, 2020 - + + + + Arvid Heise + + + Stephan Ewen + + + + + <p>Apache Flink’s checkpoint-based fault tolerance mechanism is one of its defining features. Because of that design, Flink unifies batch and stream processing, can easily scale to both very small and extremely large scenarios and provides support for many operational features like stateful upgrades with state evolution or roll-backs and time-travel. +Despite all these great properties, Flink’s checkpointing method has an Achilles Heel: the speed of a completed checkpoint is determined by the speed at which data flows through the application. + <a href="/2020/10/15/from-aligned-to-unaligned-checkpoints-part-1-checkpoints-alignment-and-backpressure/">...</a> + + </p> + <a href="/2020/10/15/from-aligned-to-unaligned-checkpoints-part-1-checkpoints-alignment-and-backpressure/">Continue reading »</a> + </article> + <article class="markdown book-post"> <h3> <a href="/2020/10/13/stateful-functions-internals-behind-the-scenes-of-stateful-serverless/">Stateful Functions Internals: Behind the scenes of Stateful Serverless</a> @@ -1969,35 +1998,6 @@ IoT networks are composed of many individual, but interconnected components, whi <a href="/2020/08/06/accelerating-your-workload-with-gpu-and-other-external-resources/">Continue reading »</a> </article> - <article class="markdown book-post"> - <h3> - <a href="/2020/08/04/pyflink-the-integration-of-pandas-into-pyflink/">PyFlink: The integration of Pandas into PyFlink</a> - </h3> - - August 4, 2020 - - - - - Jincheng Sun - - <a href="https://twitter.com/sunjincheng121">(@sunjincheng121)</a> - - - Markos Sfikas - - <a href="https://twitter.com/MarkSfik">(@MarkSfik)</a> - - - - - <p>Python has evolved into one of the most important programming languages for many fields of data processing. So big has been Python’s popularity, that it has pretty much become the default data processing language for data scientists. On top of that, there is a plethora of Python-based data processing tools such as NumPy, Pandas, and Scikit-learn that have gained additional popularity due to their flexibility or powerful functionalities. -Pic source: VanderPlas 2017, slide 52. - <a href="/2020/08/04/pyflink-the-integration-of-pandas-into-pyflink/">...</a> - - </p> - <a href="/2020/08/04/pyflink-the-integration-of-pandas-into-pyflink/">Continue reading »</a> - </article> - <ul class="pagination pagination-default"> @@ -2055,6 +2055,10 @@ Pic source: VanderPlas 2017, slide 52. <nav> <ul> + <li> + <a href="https://flink.apache.org/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a> + </li> + <li> <a href="https://flink.apache.org/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a> </li> diff --git a/content/sitemap.xml b/content/sitemap.xml index b8f5dca77..edf5c3947 100644 --- a/content/sitemap.xml +++ b/content/sitemap.xml @@ -4,7 +4,7 @@ <sitemap> <loc>https://flink.apache.org/en/sitemap.xml</loc> - <lastmod>2023-04-19T08:00:00+00:00</lastmod> + <lastmod>2023-05-03T08:00:00+00:00</lastmod> </sitemap> diff --git a/content/zh/index.html b/content/zh/index.html index 487aa5644..e2893b198 100644 --- a/content/zh/index.html +++ b/content/zh/index.html @@ -6,8 +6,8 @@ <meta name="generator" content="Hugo 0.111.3"> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> -<meta name="description" content="Apache Flink® - 数据流上的有状态计算 # 所有流式场景 事件驱动应用 流批分析 数据管道 & ETL 了解更多 正确性保证 Exactly-once 状态一致性 事件时间处理 成熟的迟到数据处理 了解更多 分层 API SQL on Stream & Batch Data DataStream API & DataSet API ProcessFunction (Time & State) 了解更多 聚焦运维 灵活部署 高可用 保存点 了解更多 大规模计算 水平扩展架构 支持超大状态 增量检查点机制 了解更多 性能卓越 低延迟 高吞吐 内存计算 了解更多 最新博客列表 # Apache Flink ML 2.2.0 Release Announcement -The Apache Flink community is excited to announce the release of Flink ML 2."> +<meta name="description" content="Apache Flink® - 数据流上的有状态计算 # 所有流式场景 事件驱动应用 流批分析 数据管道 & ETL 了解更多 正确性保证 Exactly-once 状态一致性 事件时间处理 成熟的迟到数据处理 了解更多 分层 API SQL on Stream & Batch Data DataStream API & DataSet API ProcessFunction (Time & State) 了解更多 聚焦运维 灵活部署 高可用 保存点 了解更多 大规模计算 水平扩展架构 支持超大状态 增量检查点机制 了解更多 性能卓越 低延迟 高吞吐 内存计算 了解更多 最新博客列表 # Howto create a batch source with the new Source framework +Introduction # The Flink community has designed a new Source framework based on FLIP-27 lately."> <meta name="theme-color" content="#FFFFFF"><meta property="og:title" content="Apache Flink Documentation" /> <meta property="og:description" content="" /> <meta property="og:type" content="website" /> @@ -1040,6 +1040,11 @@ under the License. + <a href="/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/">Howto create a batch source with the new Source framework</a><br /> + Introduction # The Flink community has designed a new Source framework based on FLIP-27 lately. Some connectors have migrated to this new framework. This article is a how-to for creating a batch source using this new framework. It was built while implementing the Flink batch source for Cassandra. If you are interested in contributing or migrating connectors, this blog post is for you! +Implementing the source components # The source architecture is depicted in the diagrams below: + <br /><br /> + <a href="/2023/04/19/apache-flink-ml-2.2.0-release-announcement/">Apache Flink ML 2.2.0 Release Announcement</a><br /> The Apache Flink community is excited to announce the release of Flink ML 2.2.0! This release focuses on enriching Flink ML’s feature engineering algorithms. The library now includes 33 feature engineering algorithms, making it a more comprehensive library for feature engineering tasks. With the addition of these algorithms, we believe Flink ML library is ready for use in production jobs that require feature engineering capabilities, whose input can then be consumed by both offline and online machine learning tasks. @@ -1049,12 +1054,6 @@ With the addition of these algorithms, we believe Flink ML library is ready for The Apache Flink PMC is pleased to announce Apache Flink release 1.17.0. Apache Flink is the leading stream processing standard, and the concept of unified stream and batch data processing is being successfully adopted in more and more companies. Thanks to our excellent community and contributors, Apache Flink continues to grow as a technology and remains one of the most active projects in the Apache Software Foundation. Flink 1.17 had 172 contributors enthusiastically participat [...] <br /><br /> - <a href="/2023/03/15/apache-flink-1.15.4-release-announcement/">Apache Flink 1.15.4 Release Announcement</a><br /> - The Apache Flink Community is pleased to announce the fourth bug fix release of the Flink 1.15 series. -This release includes 53 bug fixes, vulnerability fixes, and minor improvements for Flink 1.15. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA. -We highly recommend all users upgrade to Flink 1.15.4. - <br /><br /> -