This is an automated email from the ASF dual-hosted git repository. chesnay pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/flink-web.git
commit 80ae871143ff8ad817d564ad8dc2f5433771bd3d Author: Dian Fu <fudian...@alibaba-inc.com> AuthorDate: Thu Dec 5 10:10:25 2019 +0800 Rebuild website --- content/blog/feed.xml | 419 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 419 insertions(+) diff --git a/content/blog/feed.xml b/content/blog/feed.xml index 1410119..be37a00 100644 --- a/content/blog/feed.xml +++ b/content/blog/feed.xml @@ -7,6 +7,425 @@ <atom:link href="https://flink.apache.org/blog/feed.xml" rel="self" type="application/rss+xml" /> <item> +<title>How to query Pulsar Streams using Apache Flink</title> +<description><p>In a previous <a href="https://flink.apache.org/2019/05/03/pulsar-flink.html">story</a> on the Flink blog, we explained the different ways that <a href="https://flink.apache.org/">Apache Flink</a> and <a href="https://pulsar.apache.org/">Apache Pulsar</a> can integrate to provide elastic data processing at large scale. This blog post discusses the new developments and integrations between the two fra [...] + +<h1 id="a-short-intro-to-apache-pulsar">A short intro to Apache Pulsar</h1> + +<p>Apache Pulsar is a flexible pub/sub messaging system, backed by durable log storage. Some of the framework’s highlights include multi-tenancy, a unified message model, structured event streams and a cloud-native architecture that make it a perfect fit for a wide set of use cases, ranging from billing, payments and trading services all the way to the unification of the different messaging architectures in an organization. If you are interested in finding out more about Pulsar, yo [...] + +<h1 id="existing-pulsar--flink-integration-apache-flink-16">Existing Pulsar &amp; Flink integration (Apache Flink 1.6+)</h1> + +<p>The existing integration between Pulsar and Flink exploits Pulsar as a message queue in a Flink application. Flink developers can utilize Pulsar as a streaming source and streaming sink for their Flink applications by selecting a specific Pulsar source and connecting to their desired Pulsar cluster and topic:</p> + +<div class="highlight"><pre><code class="language-java"><span class="c1">// create and configure Pulsar consumer</span> +<span class="n">PulsarSourceBuilder</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span><span class="n">builder</span> <span class="o">=</span> <span class="n">PulsarSourceBuilder</span> + <span class="o">.</span><span class="na">builder</span><span class="o">(</span><span class="k">new</span> <span class="nf">SimpleStringSchema</span><span class="o">())</span> + <span class="o">.</span><span class="na">serviceUrl</span><span class="o">(</span><span class="n">serviceUrl</span><span class="o">)</span> + <span class="o">.</span><span class="na">topic</span><span class="o">(</span><span class="n">inputTopic</span><span class="o">)</span> + <span class="o">.</span><span class="na">subsciptionName</span><span class="o">(</span><span class="n">subscription</span><span class="o">);</span> +<span class="n">SourceFunction</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">src</span> <span class="o">=</span> <span class="n">builder</span><span class="o">.</span><span class="na">build</span><span class="o"&g [...] +<span class="c1">// ingest DataStream with Pulsar consumer</span> +<span class="n">DataStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">words</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="na">addSource</span><span class="o"> [...] + +<p>Pulsar streams can then get connected to the Flink processing logic…</p> + +<div class="highlight"><pre><code class="language-java"><span class="c1">// perform computation on DataStream (here a simple WordCount)</span> +<span class="n">DataStream</span><span class="o">&lt;</span><span class="n">WordWithCount</span><span class="o">&gt;</span> <span class="n">wc</span> <span class="o">=</span> <span class="n">words</span> + <span class="o">.</span><span class="na">flatmap</span><span class="o">((</span><span class="n">FlatMapFunction</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">WordWithCount</span><span class="o">&gt;)</span> <span class=" [...] + <span class="n">collector</span><span class="o">.</span><span class="na">collect</span><span class="o">(</span><span class="k">new</span> <span class="nf">WordWithCount</span><span class="o">(</span><span class="n">word</span><span class="o">,</span> <span class="mi">1</ [...] + <span class="o">})</span> + + <span class="o">.</span><span class="na">returns</span><span class="o">(</span><span class="n">WordWithCount</span><span class="o">.</span><span class="na">class</span><span class="o">)</span> + <span class="o">.</span><span class="na">keyBy</span><span class="o">(</span><span class="s">&quot;word&quot;</span><span class="o">)</span> + <span class="o">.</span><span class="na">timeWindow</span><span class="o">(</span><span class="n">Time</span><span class="o">.</span><span class="na">seconds</span><span class="o">(</span><span class="mi">5</span><span class="o">))</span> + <span class="o">.</span><span class="na">reduce</span><span class="o">((</span><span class="n">ReduceFunction</span><span class="o">&lt;</span><span class="n">WordWithCount</span><span class="o">&gt;)</span> <span class="o">(</span><span class="n">c1</span><span class="o" [...] + <span class="k">new</span> <span class="nf">WordWithCount</span><span class="o">(</span><span class="n">c1</span><span class="o">.</span><span class="na">word</span><span class="o">,</span> <span class="n">c1</span><span class="o">.</span><span class="na">count</span> [...] + +<p>…and then get emitted back to Pulsar (used now as a sink), sending one’s computation results downstream, back to a Pulsar topic:</p> + +<div class="highlight"><pre><code class="language-java"><span class="c1">// emit result via Pulsar producer </span> +<span class="n">wc</span><span class="o">.</span><span class="na">addSink</span><span class="o">(</span><span class="k">new</span> <span class="n">FlinkPulsarProducer</span><span class="o">&lt;&gt;(</span> + <span class="n">serviceUrl</span><span class="o">,</span> + <span class="n">outputTopic</span><span class="o">,</span> + <span class="k">new</span> <span class="nf">AuthentificationDisabled</span><span class="o">(),</span> + <span class="n">wordWithCount</span> <span class="o">-&gt;</span> <span class="n">wordWithCount</span><span class="o">.</span><span class="na">toString</span><span class="o">().</span><span class="na">getBytes</span><span class="o">(</span><span class="n">UTF_8</span><span class=" [...] + <span class="n">wordWithCount</span> <span class="o">-&gt;</span> <span class="n">wordWithCount</span><span class="o">.</span><span class="na">word</span><span class="o">)</span> +<span class="o">);</span></code></pre></div> + +<p>Although this is a great first integration step, the existing design is not leveraging the full power of Pulsar. Some shortcomings of the integration with Flink 1.6.0 relate to Pulsar neither being utilized as durable storage nor having schema integration with Flink, resulting in manual input when describing an application’s schema registry.</p> + +<h1 id="pulsars-integration-with-flink-19-using-pulsar-as-a-flink-catalog">Pulsar’s integration with Flink 1.9: Using Pulsar as a Flink catalog</h1> + +<p>The latest integration between <a href="https://flink.apache.org/downloads.html#apache-flink-191">Flink 1.9.0</a> and Pulsar addresses most of the previously mentioned shortcomings. The <a href="https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html">contribution of Alibaba’s Blink to the Flink repository</a> adds many enhancements and new features to the processing framework that make the integration with Pulsar s [...] + +<h1 id="leveraging-the-flink--pulsar-schema-integration">Leveraging the Flink &lt;&gt; Pulsar Schema Integration</h1> + +<p>Before delving into the integration details and how you can use Pulsar schema with Flink, let us describe how schema in Pulsar works. Schema in Apache Pulsar already co-exists and serves as the representation of the data on the broker side of the framework, something that makes schema registry with external systems obsolete. Additionally, the data schema in Pulsar is associated with each topic so both producers and consumers send data with predefined schema information, while th [...] + +<p>Below you can find an example of Pulsar’s schema on both the producer and consumer side. On the producer side, you can specify which schema you want to use and Pulsar then sends a POJO class without the need to perform any serialization/deserialization. Similarly, on the consumer end, you can also specify the data schema and upon receiving the data, Pulsar will automatically validate the schema information, fetch the schema of the given version and then deserialize the data back [...] + +<div class="highlight"><pre><code class="language-java"><span class="c1">// Create producer with Struct schema and send messages</span> +<span class="n">Producer</span><span class="o">&lt;</span><span class="n">User</span><span class="o">&gt;</span> <span class="n">producer</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="na">newProducer</span><span class="o" [...] +<span class="n">producer</span><span class="o">.</span><span class="na">newMessage</span><span class="o">()</span> + <span class="o">.</span><span class="na">value</span><span class="o">(</span><span class="n">User</span><span class="o">.</span><span class="na">builder</span><span class="o">()</span> + <span class="o">.</span><span class="na">userName</span><span class="o">(</span><span class="err">“</span><span class="n">pulsar</span><span class="o">-</span><span class="n">user</span><span class="err">”</span><span class="o">)</span> + <span class="o">.</span><span class="na">userId</span><span class="o">(</span><span class="mi">1L</span><span class="o">)</span> + <span class="o">.</span><span class="na">build</span><span class="o">())</span> + <span class="o">.</span><span class="na">send</span><span class="o">();</span></code></pre></div> + +<div class="highlight"><pre><code class="language-java"><span class="c1">// Create consumer with Struct schema and receive messages</span> +<span class="n">Consumer</span><span class="o">&lt;</span><span class="n">User</span><span class="o">&gt;</span> <span class="n">consumer</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="na">newCOnsumer</span><span class="o" [...] +<span class="n">consumer</span><span class="o">.</span><span class="na">receive</span><span class="o">();</span></code></pre></div> + +<p>Let’s assume we have an application that specifies a schema to the producer and/or consumer. Upon receiving the schema information, the producer (or consumer) — that is connected to the broker — will transfer such information so that the broker can then perform schema registration, validations and schema compatibility checks before returning or rejecting the schema as illustrated in the diagram below:</p> + +<center> +<img src="/img/blog/flink-pulsar-sql-blog-post-visual.png" width="600px" alt="Pulsar Schema" /> +</center> +<p><br /></p> + +<p>Not only is Pulsar able to handle and store the schema information, but is additionally able to handle any schema evolution — where necessary. Pulsar will effectively manage any schema evolution in the broker, keeping track of all different versions of your schema while performing any necessary compatibility checks.</p> + +<p>Moreover, when messages are published on the producer side, Pulsar will tag each message with the schema version as part of each message’s metadata. On the consumer side, when the message is received and the metadata is deserialized, Pulsar will check the schema version associated with this message and will fetch the corresponding schema information from the broker. As a result, when Pulsar integrates with a Flink application it uses the pre-existing schema information and maps [...] + +<p>For the cases when Flink users do not interact with schema directly or make use of primitive schema (for example, using a topic to store a string or long number), Pulsar will either convert the message payload into a Flink row, called ‘value’ or — for the cases of structured schema types, like JSON and AVRO — Pulsar will extract the individual fields from the schema information and will map the fields to Flink’s type system. Finally, all metadata information associated with eac [...] + +<center> +<img src="/img/blog/flink-pulsar-sql-blog-post-visual-primitive-avro-schema.png" width="600px" alt="Primitive and AVRO Schema" /> +</center> +<p><br /></p> + +<p>Once all the schema information is mapped to Flink’s type system, you can start building a Pulsar source, sink or catalog in Flink based on the specified schema information as illustrated below:</p> + +<h1 id="flink--pulsar-read-data-from-pulsar">Flink &amp; Pulsar: Read data from Pulsar</h1> + +<ul> + <li>Create a Pulsar source for streaming queries</li> +</ul> + +<div class="highlight"><pre><code class="language-java"><span class="n">val</span> <span class="n">env</span> <span class="o">=</span> <span class="n">StreamExecutionEnvironment</span><span class="o">.</span><span class="na">getExecutionEnvironment</span> +<span class="n">val</span> <span class="n">props</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">Properties</span><span class="o">()</span> +<span class="n">props</span><span class="o">.</span><span class="na">setProperty</span><span class="o">(</span><span class="s">&quot;service.url&quot;</span><span class="o">,</span> <span class="s">&quot;pulsar://...&quot;</span><span class="o">)</span> +<span class="n">props</span><span class="o">.</span><span class="na">setProperty</span><span class="o">(</span><span class="s">&quot;admin.url&quot;</span><span class="o">,</span> <span class="s">&quot;http://...&quot;</span><span class="o">)</span> +<span class="n">props</span><span class="o">.</span><span class="na">setProperty</span><span class="o">(</span><span class="s">&quot;partitionDiscoveryIntervalMillis&quot;</span><span class="o">,</span> <span class="s">&quot;5000&quot;</span><span class="o">)</span> +<span class="n">props</span><span class="o">.</span><span class="na">setProperty</span><span class="o">(</span><span class="s">&quot;startingOffsets&quot;</span><span class="o">,</span> <span class="s">&quot;earliest&quot;</span><span class="o">)</span> +<span class="n">props</span><span class="o">.</span><span class="na">setProperty</span><span class="o">(</span><span class="s">&quot;topic&quot;</span><span class="o">,</span> <span class="s">&quot;test-source-topic&quot;</span><span class="o">)</span> +<span class="n">val</span> <span class="n">source</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">FlinkPulsarSource</span><span class="o">(</span><span class="n">props</span><span class="o">)</span> +<span class="c1">// you don&#39;t need to provide a type information to addSource since FlinkPulsarSource is ResultTypeQueryable</span> +<span class="n">val</span> <span class="n">dataStream</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="na">addSource</span><span class="o">(</span><span class="n">source</span><span class="o">)(</span><span class="kc">null</span& [...] + +<span class="c1">// chain operations on dataStream of Row and sink the output</span> +<span class="c1">// end method chaining</span> + +<span class="n">env</span><span class="o">.</span><span class="na">execute</span><span class="o">()</span></code></pre></div> + +<ul> + <li>Register topics in Pulsar as streaming tables</li> +</ul> + +<div class="highlight"><pre><code class="language-java"><span class="n">val</span> <span class="n">env</span> <span class="o">=</span> <span class="n">StreamExecutionEnvironment</span><span class="o">.</span><span class="na">getExecutionEnvironment</span> +<span class="n">val</span> <span class="n">tEnv</span> <span class="o">=</span> <span class="n">StreamTableEnvironment</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="n">env</span><span class="o">)</span> + +<span class="n">val</span> <span class="n">prop</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">Properties</span><span class="o">()</span> +<span class="n">prop</span><span class="o">.</span><span class="na">setProperty</span><span class="o">(</span><span class="s">&quot;service.url&quot;</span><span class="o">,</span> <span class="n">serviceUrl</span><span class="o">)</span> +<span class="n">prop</span><span class="o">.</span><span class="na">setProperty</span><span class="o">(</span><span class="s">&quot;admin.url&quot;</span><span class="o">,</span> <span class="n">adminUrl</span><span class="o">)</span> +<span class="n">prop</span><span class="o">.</span><span class="na">setProperty</span><span class="o">(</span><span class="s">&quot;flushOnCheckpoint&quot;</span><span class="o">,</span> <span class="s">&quot;true&quot;</span><span class="o">)</span> +<span class="n">prop</span><span class="o">.</span><span class="na">setProperty</span><span class="o">(</span><span class="s">&quot;failOnWrite&quot;</span><span class="o">,</span> <span class="s">&quot;true&quot;</span><span class="o">)</span> +<span class="n">props</span><span class="o">.</span><span class="na">setProperty</span><span class="o">(</span><span class="s">&quot;topic&quot;</span><span class="o">,</span> <span class="s">&quot;test-sink-topic&quot;</span><span class="o">)</span> + +<span class="n">tEnv</span> + <span class="o">.</span><span class="na">connect</span><span class="o">(</span><span class="k">new</span> <span class="nf">Pulsar</span><span class="o">().</span><span class="na">properties</span><span class="o">(</span><span class="n">props</span><span class="o">))</span> + <span class="o">.</span><span class="na">inAppendMode</span><span class="o">()</span> + <span class="o">.</span><span class="na">registerTableSource</span><span class="o">(</span><span class="s">&quot;sink-table&quot;</span><span class="o">)</span> + +<span class="n">val</span> <span class="n">sql</span> <span class="o">=</span> <span class="s">&quot;INSERT INTO sink-table .....&quot;</span> +<span class="n">tEnv</span><span class="o">.</span><span class="na">sqlUpdate</span><span class="o">(</span><span class="n">sql</span><span class="o">)</span> +<span class="n">env</span><span class="o">.</span><span class="na">execute</span><span class="o">()</span></code></pre></div> + +<h1 id="flink--pulsar-write-data-to-pulsar">Flink &amp; Pulsar: Write data to Pulsar</h1> + +<ul> + <li>Create a Pulsar sink for streaming queries</li> +</ul> + +<div class="highlight"><pre><code class="language-java"><span class="n">val</span> <span class="n">env</span> <span class="o">=</span> <span class="n">StreamExecutionEnvironment</span><span class="o">.</span><span class="na">getExecutionEnvironment</span> +<span class="n">val</span> <span class="n">stream</span> <span class="o">=</span> <span class="o">.....</span> + +<span class="n">val</span> <span class="n">prop</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">Properties</span><span class="o">()</span> +<span class="n">prop</span><span class="o">.</span><span class="na">setProperty</span><span class="o">(</span><span class="s">&quot;service.url&quot;</span><span class="o">,</span> <span class="n">serviceUrl</span><span class="o">)</span> +<span class="n">prop</span><span class="o">.</span><span class="na">setProperty</span><span class="o">(</span><span class="s">&quot;admin.url&quot;</span><span class="o">,</span> <span class="n">adminUrl</span><span class="o">)</span> +<span class="n">prop</span><span class="o">.</span><span class="na">setProperty</span><span class="o">(</span><span class="s">&quot;flushOnCheckpoint&quot;</span><span class="o">,</span> <span class="s">&quot;true&quot;</span><span class="o">)</span> +<span class="n">prop</span><span class="o">.</span><span class="na">setProperty</span><span class="o">(</span><span class="s">&quot;failOnWrite&quot;</span><span class="o">,</span> <span class="s">&quot;true&quot;</span><span class="o">)</span> +<span class="n">props</span><span class="o">.</span><span class="na">setProperty</span><span class="o">(</span><span class="s">&quot;topic&quot;</span><span class="o">,</span> <span class="s">&quot;test-sink-topic&quot;</span><span class="o">)</span> + +<span class="n">stream</span><span class="o">.</span><span class="na">addSink</span><span class="o">(</span><span class="k">new</span> <span class="nf">FlinkPulsarSink</span><span class="o">(</span><span class="n">prop</span><span class="o">,</span> <span class="n">DummyTopicKe [...] +<span class="n">env</span><span class="o">.</span><span class="na">execute</span><span class="o">()</span></code></pre></div> + +<ul> + <li>Write a streaming table to Pulsar</li> +</ul> + +<div class="highlight"><pre><code class="language-java"><span class="n">val</span> <span class="n">env</span> <span class="o">=</span> <span class="n">StreamExecutionEnvironment</span><span class="o">.</span><span class="na">getExecutionEnvironment</span> +<span class="n">val</span> <span class="n">tEnv</span> <span class="o">=</span> <span class="n">StreamTableEnvironment</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="n">env</span><span class="o">)</span> + +<span class="n">val</span> <span class="n">prop</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">Properties</span><span class="o">()</span> +<span class="n">prop</span><span class="o">.</span><span class="na">setProperty</span><span class="o">(</span><span class="s">&quot;service.url&quot;</span><span class="o">,</span> <span class="n">serviceUrl</span><span class="o">)</span> +<span class="n">prop</span><span class="o">.</span><span class="na">setProperty</span><span class="o">(</span><span class="s">&quot;admin.url&quot;</span><span class="o">,</span> <span class="n">adminUrl</span><span class="o">)</span> +<span class="n">prop</span><span class="o">.</span><span class="na">setProperty</span><span class="o">(</span><span class="s">&quot;flushOnCheckpoint&quot;</span><span class="o">,</span> <span class="s">&quot;true&quot;</span><span class="o">)</span> +<span class="n">prop</span><span class="o">.</span><span class="na">setProperty</span><span class="o">(</span><span class="s">&quot;failOnWrite&quot;</span><span class="o">,</span> <span class="s">&quot;true&quot;</span><span class="o">)</span> +<span class="n">props</span><span class="o">.</span><span class="na">setProperty</span><span class="o">(</span><span class="s">&quot;topic&quot;</span><span class="o">,</span> <span class="s">&quot;test-sink-topic&quot;</span><span class="o">)</span> + +<span class="n">tEnv</span> + <span class="o">.</span><span class="na">connect</span><span class="o">(</span><span class="k">new</span> <span class="nf">Pulsar</span><span class="o">().</span><span class="na">properties</span><span class="o">(</span><span class="n">props</span><span class="o">))</span> + <span class="o">.</span><span class="na">inAppendMode</span><span class="o">()</span> + <span class="o">.</span><span class="na">registerTableSource</span><span class="o">(</span><span class="s">&quot;sink-table&quot;</span><span class="o">)</span> + +<span class="n">val</span> <span class="n">sql</span> <span class="o">=</span> <span class="s">&quot;INSERT INTO sink-table .....&quot;</span> +<span class="n">tEnv</span><span class="o">.</span><span class="na">sqlUpdate</span><span class="o">(</span><span class="n">sql</span><span class="o">)</span> +<span class="n">env</span><span class="o">.</span><span class="na">execute</span><span class="o">()</span></code></pre></div> + +<p>In every instance, Flink developers only need to specify the properties of how Flink will connect to a Pulsar cluster without worrying about any schema registry, or serialization/deserialization actions and register the Pulsar cluster as a source, sink or streaming table in Flink. Once all three elements are put together, Pulsar can then be registered as a catalog in Flink, something that drastically simplifies how you process and query data like, for example, writing a program [...] + +<h1 id="next-steps--future-integration">Next Steps &amp; Future Integration</h1> + +<p>The goal of the integration between Pulsar and Flink is to simplify how developers use the two frameworks to build a unified data processing stack. As we progress from the classical Lamda architectures — where an online, speeding layer is combined with an offline, batch layer to run data computations — Flink and Pulsar present a great combination in providing a truly unified data processing stack. We see Flink as a unified computation engine, handling both online (streaming) and [...] + +<p>There is still a lot of ongoing work and effort from both communities in getting the integration even better, such as a new source API (<a href="https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface">FLIP-27</a>) that will allow the <a href="http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discussion-Flink-Pulsar-Connector-td22019.html">contribution of the Pulsar connectors to the Flink communit [...] + +<p>You can find a more detailed overview of the integration work between the two communities in this <a href="https://youtu.be/3sBXXfgl5vs">recording video</a> from Flink Forward Europe 2019 or sign up to the <a href="https://flink.apache.org/community.html#mailing-lists">Flink dev mailing list</a> for the latest contribution and integration efforts between Flink and Pulsar.</p> +</description> +<pubDate>Mon, 25 Nov 2019 13:00:00 +0100</pubDate> +<link>https://flink.apache.org/news/2019/11/25/query-pulsar-streams-using-apache-flink.html</link> +<guid isPermaLink="true">/news/2019/11/25/query-pulsar-streams-using-apache-flink.html</guid> +</item> + +<item> +<title>Apache Flink 1.9.1 Released</title> +<description><p>The Apache Flink community released the first bugfix version of the Apache Flink 1.9 series.</p> + +<p>This release includes 96 fixes and minor improvements for Flink 1.9.0. The list below includes a detailed list of all fixes and improvements.</p> + +<p>We highly recommend all users to upgrade to Flink 1.9.1.</p> + +<p>Updated Maven dependencies:</p> + +<div class="highlight"><pre><code class="language-xml"><span class="nt">&lt;dependency&gt;</span> + <span class="nt">&lt;groupId&gt;</span>org.apache.flink<span class="nt">&lt;/groupId&gt;</span> + <span class="nt">&lt;artifactId&gt;</span>flink-java<span class="nt">&lt;/artifactId&gt;</span> + <span class="nt">&lt;version&gt;</span>1.9.1<span class="nt">&lt;/version&gt;</span> +<span class="nt">&lt;/dependency&gt;</span> +<span class="nt">&lt;dependency&gt;</span> + <span class="nt">&lt;groupId&gt;</span>org.apache.flink<span class="nt">&lt;/groupId&gt;</span> + <span class="nt">&lt;artifactId&gt;</span>flink-streaming-java_2.11<span class="nt">&lt;/artifactId&gt;</span> + <span class="nt">&lt;version&gt;</span>1.9.1<span class="nt">&lt;/version&gt;</span> +<span class="nt">&lt;/dependency&gt;</span> +<span class="nt">&lt;dependency&gt;</span> + <span class="nt">&lt;groupId&gt;</span>org.apache.flink<span class="nt">&lt;/groupId&gt;</span> + <span class="nt">&lt;artifactId&gt;</span>flink-clients_2.11<span class="nt">&lt;/artifactId&gt;</span> + <span class="nt">&lt;version&gt;</span>1.9.1<span class="nt">&lt;/version&gt;</span> +<span class="nt">&lt;/dependency&gt;</span></code></pre></div> + +<p>You can find the binaries on the updated <a href="/downloads.html">Downloads page</a>.</p> + +<p>List of resolved issues:</p> + +<h2> Bug +</h2> +<ul> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-11630">FLINK-11630</a>] - TaskExecutor does not wait for Task termination when terminating itself +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13490">FLINK-13490</a>] - Fix if one column value is null when reading JDBC, the following values are all null +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13941">FLINK-13941</a>] - Prevent data-loss by not cleaning up small part files from S3. +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-12501">FLINK-12501</a>] - AvroTypeSerializer does not work with types generated by avrohugger +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13386">FLINK-13386</a>] - Fix some frictions in the new default Web UI +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13526">FLINK-13526</a>] - Switching to a non existing catalog or database crashes sql-client +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13568">FLINK-13568</a>] - DDL create table doesn&#39;t allow STRING data type +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13805">FLINK-13805</a>] - Bad Error Message when TaskManager is lost +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13806">FLINK-13806</a>] - Metric Fetcher floods the JM log with errors when TM is lost +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-14010">FLINK-14010</a>] - Dispatcher &amp; JobManagers don&#39;t give up leadership when AM is shut down +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-14145">FLINK-14145</a>] - CompletedCheckpointStore#getLatestCheckpoint(true) returns wrong checkpoint +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13059">FLINK-13059</a>] - Cassandra Connector leaks Semaphore on Exception and hangs on close +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13534">FLINK-13534</a>] - Unable to query Hive table with decimal column +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13562">FLINK-13562</a>] - Throws exception when FlinkRelMdColumnInterval meets two stage stream group aggregate +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13563">FLINK-13563</a>] - TumblingGroupWindow should implement toString method +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13564">FLINK-13564</a>] - Throw exception if constant with YEAR TO MONTH resolution was used for group windows +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13588">FLINK-13588</a>] - StreamTask.handleAsyncException throws away the exception cause +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13653">FLINK-13653</a>] - ResultStore should avoid using RowTypeInfo when creating a result +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13711">FLINK-13711</a>] - Hive array values not properly displayed in SQL CLI +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13737">FLINK-13737</a>] - flink-dist should add provided dependency on flink-examples-table +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13738">FLINK-13738</a>] - Fix NegativeArraySizeException in LongHybridHashTable +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13742">FLINK-13742</a>] - Fix code generation when aggregation contains both distinct aggregate with and without filter +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13760">FLINK-13760</a>] - Fix hardcode Scala version dependency in hive connector +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13761">FLINK-13761</a>] - `SplitStream` should be deprecated because `SplitJavaStream` is deprecated +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13789">FLINK-13789</a>] - Transactional Id Generation fails due to user code impacting formatting string +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13823">FLINK-13823</a>] - Incorrect debug log in CompileUtils +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13825">FLINK-13825</a>] - The original plugins dir is not restored after e2e test run +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13831">FLINK-13831</a>] - Free Slots / All Slots display error +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13887">FLINK-13887</a>] - Ensure defaultInputDependencyConstraint to be non-null when setting it in ExecutionConfig +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13897">FLINK-13897</a>] - OSS FS NOTICE file is placed in wrong directory +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13933">FLINK-13933</a>] - Hive Generic UDTF can not be used in table API both stream and batch mode +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13936">FLINK-13936</a>] - NOTICE-binary is outdated +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13966">FLINK-13966</a>] - Jar sorting in collect_license_files.sh is locale dependent +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-14009">FLINK-14009</a>] - Cron jobs broken due to verifying incorrect NOTICE-binary file +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-14049">FLINK-14049</a>] - Update error message for failed partition updates to include task name +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-14076">FLINK-14076</a>] - &#39;ClassNotFoundException: KafkaException&#39; on Flink v1.9 w/ checkpointing +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-14107">FLINK-14107</a>] - Kinesis consumer record emitter deadlock under event time alignment +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-14119">FLINK-14119</a>] - Clean idle state for RetractableTopNFunction +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-14139">FLINK-14139</a>] - Fix potential memory leak of rest server when using session/standalone cluster +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-14140">FLINK-14140</a>] - The Flink Logo Displayed in Flink Python Shell is Broken +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-14150">FLINK-14150</a>] - Unnecessary __pycache__ directories appears in pyflink.zip +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-14288">FLINK-14288</a>] - Add Py4j NOTICE for source release +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13892">FLINK-13892</a>] - HistoryServerTest failed on Travis +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-14043">FLINK-14043</a>] - SavepointMigrationTestBase is super slow +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-12164">FLINK-12164</a>] - JobMasterTest.testJobFailureWhenTaskExecutorHeartbeatTimeout is unstable +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-9900">FLINK-9900</a>] - Fix unstable test ZooKeeperHighAvailabilityITCase#testRestoreBehaviourWithFaultyStateHandles +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13484">FLINK-13484</a>] - ConnectedComponents end-to-end test instable with NoResourceAvailableException +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13489">FLINK-13489</a>] - Heavy deployment end-to-end test fails on Travis with TM heartbeat timeout +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13514">FLINK-13514</a>] - StreamTaskTest.testAsyncCheckpointingConcurrentCloseAfterAcknowledge unstable +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13530">FLINK-13530</a>] - AbstractServerTest failed on Travis +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13585">FLINK-13585</a>] - Fix sporadical deallock in TaskAsyncCallTest#testSetsUserCodeClassLoader() +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13599">FLINK-13599</a>] - Kinesis end-to-end test failed on Travis +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13663">FLINK-13663</a>] - SQL Client end-to-end test for modern Kafka failed on Travis +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13688">FLINK-13688</a>] - HiveCatalogUseBlinkITCase.testBlinkUdf constantly failed +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13739">FLINK-13739</a>] - BinaryRowTest.testWriteString() fails in some environments +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13746">FLINK-13746</a>] - Elasticsearch (v2.3.5) sink end-to-end test fails on Travis +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13769">FLINK-13769</a>] - BatchFineGrainedRecoveryITCase.testProgram failed on Travis +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13807">FLINK-13807</a>] - Flink-avro unit tests fails if the character encoding in the environment is not default to UTF-8 +</li> +</ul> + +<h2> Improvement +</h2> +<ul> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13965">FLINK-13965</a>] - Keep hasDeprecatedKeys and deprecatedKeys methods in ConfigOption and mark it with @Deprecated annotation +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-9941">FLINK-9941</a>] - Flush in ScalaCsvOutputFormat before close method +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13336">FLINK-13336</a>] - Remove the legacy batch fault tolerance page and redirect it to the new task failure recovery page +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13380">FLINK-13380</a>] - Improve the usability of Flink session cluster on Kubernetes +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13819">FLINK-13819</a>] - Introduce RpcEndpoint State +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13845">FLINK-13845</a>] - Drop all the content of removed &quot;Checkpointed&quot; interface +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13957">FLINK-13957</a>] - Log dynamic properties on job submission +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13967">FLINK-13967</a>] - Generate full binary licensing via collect_license_files.sh +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13968">FLINK-13968</a>] - Add travis check for the correctness of the binary licensing +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13449">FLINK-13449</a>] - Add ARM architecture to MemoryArchitecture +</li> +</ul> + +<h2> Documentation +</h2> +<ul> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13105">FLINK-13105</a>] - Add documentation for blink planner&#39;s built-in functions +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13277">FLINK-13277</a>] - add documentation of Hive source/sink +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13354">FLINK-13354</a>] - Add documentation for how to use blink planner +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13355">FLINK-13355</a>] - Add documentation for Temporal Table Join in blink planner +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13356">FLINK-13356</a>] - Add documentation for TopN and Deduplication in blink planner +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13359">FLINK-13359</a>] - Add documentation for DDL introduction +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13362">FLINK-13362</a>] - Add documentation for Kafka &amp; ES &amp; FileSystem DDL +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13363">FLINK-13363</a>] - Add documentation for streaming aggregate performance tunning. +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13706">FLINK-13706</a>] - add documentation of how to use Hive functions in Flink +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13942">FLINK-13942</a>] - Add Overview page for Getting Started section +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13863">FLINK-13863</a>] - Update Operations Playground to Flink 1.9.0 +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13937">FLINK-13937</a>] - Fix wrong hive dependency version in documentation +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13830">FLINK-13830</a>] - The Document about Cluster on yarn have some problems +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-14160">FLINK-14160</a>] - Extend Operations Playground with --backpressure option +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13388">FLINK-13388</a>] - Update UI screenshots in the documentation to the new default Web Frontend +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13415">FLINK-13415</a>] - Document how to use hive connector in scala shell +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13517">FLINK-13517</a>] - Restructure Hive Catalog documentation +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13643">FLINK-13643</a>] - Document the workaround for users with a different minor Hive version +</li> +<li>[<a href="https://issues.apache.org/jira/browse/FLINK-13757">FLINK-13757</a>] - Fix wrong description of "IS NOT TRUE" function documentation +</li> +</ul> + +</description> +<pubDate>Fri, 18 Oct 2019 14:00:00 +0200</pubDate> +<link>https://flink.apache.org/news/2019/10/18/release-1.9.1.html</link> +<guid isPermaLink="true">/news/2019/10/18/release-1.9.1.html</guid> +</item> + +<item> <title>The State Processor API: How to Read, write and modify the state of Flink applications</title> <description><p>Whether you are running Apache Flink<sup>Ⓡ</sup> in production or evaluated Flink as a computation framework in the past, you’ve probably found yourself asking the question: How can I access, write or update state in a Flink savepoint? Ask no more! <a href="https://flink.apache.org/news/2019/08/22/release-1.9.0.html">Apache Flink 1.9.0</a> introduces the <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.9/de [...]