This is an automated email from the ASF dual-hosted git repository. fhueske pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/flink-web.git
commit e381e99627fbe619d5745a94f5dddb23264daf3e Author: Fabian Hueske <fhue...@apache.org> AuthorDate: Thu Feb 14 09:37:59 2019 +0100 Rebuild website --- content/news/2019/02/13/unified-batch-streaming-blink.html | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/content/news/2019/02/13/unified-batch-streaming-blink.html b/content/news/2019/02/13/unified-batch-streaming-blink.html index 1b8446e..a32106a 100644 --- a/content/news/2019/02/13/unified-batch-streaming-blink.html +++ b/content/news/2019/02/13/unified-batch-streaming-blink.html @@ -176,7 +176,7 @@ <p>Pure stream processing systems are very slow at batch processing workloads. No one would consider it a good idea to use a stream processor that shuffles through message queues to analyze large amounts of available data.</p> </li> <li> - <p>Unified APIs like <a href="https://beam.apache.org">Apache Beam</a> often delegate to different runtimes depending on whether the data is continuous/unbounded or fix/bounded. For example, the implementations of the batch and streaming runtime of Google Cloud Dataflow are different, to get the desired performance and resilience in each case.</p> + <p>Unified APIs like <a href="https://beam.apache.org">Apache Beam</a> often delegate to different runtimes depending on whether the data is continuous/unbounded of fix/bounded. For example, the implementations of the batch and streaming runtime of Google Cloud Dataflow are different, to get the desired performance and resilience in each case.</p> </li> <li> <p><em>Apache Flink</em> has a streaming API that can do bounded/unbounded use cases, but still offers a separate DataSet API and runtime stack that is faster for batch use cases.</p> @@ -213,7 +213,7 @@ For data larger than memory, the batch join can partition both data sets into su <h2 id="what-is-still-missing">What is still missing?</h2> -<p>To conclude the approach and makes Flink’s experience on bounded data (batch) state-of-the-art, we need to add a few more enhancements. We believe that these features are key to realizing our vision:</p> +<p>To conclude the approach and make Flink’s experience on bounded data (batch) state-of-the-art, we need to add a few more enhancements. We believe that these features are key to realizing our vision:</p> <p><strong>(1) A truly unified runtime operator stack</strong>: Currently the bounded and unbounded operators have a different network and threading model and don’t mix and match. The original reason was that batch operators followed a “pull model” (easier for batch algorithms), while streaming operators followed a “push model” (better latency/throughput characteristics). In a unified stack, continuous streaming operators are the foundation. When operating on bounded data without latency [...] @@ -231,7 +231,7 @@ For data larger than memory, the batch join can partition both data sets into su <p><strong>Unified Stream Operators:</strong> Blink extends the Flink streaming runtime operator model to support selectively reading from different inputs, while keeping the push model for very low latency. This control over the inputs helps to now support algorithms like hybrid hash-joins on the same operator and threading model as continuous symmetric joins through RocksDB. These operators also form the basis for future features like <a href="https://cwiki.apache.org/confluence/displa [...] -<p><strong>Table API & SQL Query Processor:</strong> The SQL query processor is the component that evolved and changed most compared to the latest Flink master branch:</p> +<p><strong>Table API & SQL Query Processor:</strong> The SQL query processor is the component that evolved the changed most compared to the latest Flink master branch:</p> <ul> <li> @@ -274,9 +274,9 @@ The performance improvement is in average 10x.<br /> <h2 id="how-do-we-plan-to-merge-blink-and-flink">How do we plan to merge Blink and Flink?</h2> -<p>Blink’s code is currently available as a <a href="https://github.com/apache/flink/tree/blink">branch</a> in the Apache Flink repository. It is a challenge to merge such a big amount of changes, while making the merge process as non-disruptive as possible and keeping public APIs as stable as possible.</p> +<p>Blink’s code is currently available as a <a href="https://github.com/apache/flink/tree/blink">branch</a> in the Apache Flink repository. It is a challenge to merge a such big amount of changes, while making the merge process as non-disruptive as possible and keeping public APIs as stable as possible.</p> -<p>The community’s <a href="https://lists.apache.org/thread.html/6066abd0f09fc1c41190afad67770ede8efd0bebc36f00938eecc118@%3Cdev.flink.apache.org%3E">merge plan</a> focuses initially on the bounded/batch processing features mentioned above and proposes the following approach to ensure a smooth integration:</p> +<p>The community’s <a href="https://lists.apache.org/thread.html/6066abd0f09fc1c41190afad67770ede8efd0bebc36f00938eecc118@%3Cdev.flink.apache.org%3E">merge plan</a> focuses initially on the bounded/batch processing features mentioned above and follows the following approach to ensure a smooth integration:</p> <ul> <li> @@ -285,7 +285,7 @@ Following some restructuring of the Table/SQL module (<a href="https://cwiki.apa Initially, users will be able to select which query processor to use. After a transition period in which the new query processor will be developed to subsume the current query processor, the current processor will most likely be deprecated and eventually dropped. Given that SQL is such a well defined interface, we anticipate that this transition has little friction for users. Mostly a pleasant surprise to have broader SQL feature coverage and a boost in performance.</p> </li> <li> - <p>To support the merge of Blink’s <em>enhancements to scheduling and recovery</em> for jobs on bounded data, the Flink community is already working on refactoring its current scheduler and adding support for <a href="https://issues.apache.org/jira/browse/FLINK-10429">pluggable scheduling and fail-over strategies</a>.<br /> + <p>To support the merge of Blink’s <em>enhancements to scheduling and recovery</em> for jobs on bounded data, the Flink community is already working on refactoring its current schedule and adding support for <a href="https://issues.apache.org/jira/browse/FLINK-10429">pluggable scheduling and fail-over strategies</a>.<br /> Once this effort is finished, we can add Blink’s scheduling and recovery strategies as a new scheduling strategy that is used by the new query processor. Eventually, we plan to use the new scheduling strategy also for bounded DataStream programs.</p> </li> <li>