Rebuild website for 'Dynamic Tables' blog post
Project: http://git-wip-us.apache.org/repos/asf/flink-web/repo Commit: http://git-wip-us.apache.org/repos/asf/flink-web/commit/f6ac337f Tree: http://git-wip-us.apache.org/repos/asf/flink-web/tree/f6ac337f Diff: http://git-wip-us.apache.org/repos/asf/flink-web/diff/f6ac337f Branch: refs/heads/asf-site Commit: f6ac337f010ee819a650f0683524f08b21dfa14b Parents: c0574c3 Author: Fabian Hueske <fhue...@gmail.com> Authored: Tue Apr 4 21:30:36 2017 +0200 Committer: Fabian Hueske <fhue...@gmail.com> Committed: Tue Apr 4 21:30:36 2017 +0200 ---------------------------------------------------------------------- content/blog/feed.xml | 185 +++++++++ content/blog/index.html | 35 +- content/blog/page2/index.html | 34 +- content/blog/page3/index.html | 33 +- content/blog/page4/index.html | 35 +- content/blog/page5/index.html | 23 ++ content/img/blog/dynamic-tables/append-mode.png | Bin 0 -> 28566 bytes .../blog/dynamic-tables/query-append-out.png | Bin 0 -> 189962 bytes .../blog/dynamic-tables/query-groupBy-cnt.png | Bin 0 -> 125589 bytes .../dynamic-tables/query-groupBy-window-cnt.png | Bin 0 -> 154035 bytes .../blog/dynamic-tables/query-update-out.png | Bin 0 -> 302496 bytes content/img/blog/dynamic-tables/redo-mode.png | Bin 0 -> 81266 bytes .../img/blog/dynamic-tables/replace-mode.png | Bin 0 -> 23172 bytes .../blog/dynamic-tables/stream-query-stream.png | Bin 0 -> 177708 bytes content/img/blog/dynamic-tables/streams.png | Bin 0 -> 409340 bytes .../img/blog/dynamic-tables/undo-redo-mode.png | Bin 0 -> 72870 bytes content/index.html | 7 +- content/news/2017/04/04/dynamic-tables.html | 372 +++++++++++++++++++ 18 files changed, 672 insertions(+), 52 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/flink-web/blob/f6ac337f/content/blog/feed.xml ---------------------------------------------------------------------- diff --git a/content/blog/feed.xml b/content/blog/feed.xml index b677b23..81e8a39 100644 --- a/content/blog/feed.xml +++ b/content/blog/feed.xml @@ -7,6 +7,191 @@ <atom:link href="http://flink.apache.org/blog/feed.xml" rel="self" type="application/rss+xml" /> <item> +<title>Continuous Queries on Dynamic Tables</title> +<description><h4 id="analyzing-data-streams-with-sql">Analyzing Data Streams with SQL</h4> + +<p>More and more companies are adopting stream processing and are migrating existing batch applications to streaming or implementing streaming solutions for new use cases. Many of those applications focus on analyzing streaming data. The data streams that are analyzed come from a wide variety of sources such as database transactions, clicks, sensor measurements, or IoT devices.</p> + +<center> +<img src="/img/blog/dynamic-tables/streams.png" style="width:45%;margin:10px" /> +</center> + +<p>Apache Flink is very well suited to power streaming analytics applications because it provides support for event-time semantics, stateful exactly-once processing, and achieves high throughput and low latency at the same time. Due to these features, Flink is able to compute exact and deterministic results from high-volume input streams in near real-time while providing exactly-once semantics in case of failures.</p> + +<p>Flinkâs core API for stream processing, the <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html">DataStream API</a>, is very expressive and provides primitives for many common operations. Among other features, it offers highly customizable windowing logic, different state primitives with varying performance characteristics, hooks to register and react on timers, and tooling for efficient asynchronous requests to external systems. On the other hand, many stream analytics applications follow similar patterns and do not require the level of expressiveness as provided by the DataStream API. They could be expressed in a more natural and concise way using a domain specific language. As we all know, SQL is the de-facto standard for data analytics. For streaming analytics, SQL would enable a larger pool of people to specify applications on data streams in less time. However, no open source stream processor offers dece nt SQL support yet.</p> + +<h2 id="why-is-sql-on-streams-a-big-deal">Why is SQL on Streams a Big Deal?</h2> + +<p>SQL is the most widely used language for data analytics for many good reasons:</p> + +<ul> + <li>SQL is declarative: You specify what you want but not how to compute it.</li> + <li>SQL can be effectively optimized: An optimizer figures out an efficient plan to compute your result.</li> + <li>SQL can be efficiently evaluated: The processing engine knows exactly what to compute and how to do so efficiently.</li> + <li>And finally, everybody knows and many tools speak SQL.</li> +</ul> + +<p>So being able to process and analyze data streams with SQL makes stream processing technology available to many more users. Moreover, it significantly reduces the time and effort to define efficient stream analytics applications due to the SQLâs declarative nature and potential to be automatically optimized.</p> + +<p>However, SQL (and the relational data model and algebra) were not designed with streaming data in mind. Relations are (multi-)sets and not infinite sequences of tuples. When executing a SQL query, conventional database systems and query engines read and process a data set, which is completely available, and produce a fixed sized result. In contrast, data streams continuously provide new records such that data arrives over time. Hence, streaming queries have to continuously process the arriving data and never âcompleteâ.</p> + +<p>That being said, processing streams with SQL is not impossible. Some relational database systems feature eager maintenance of materialized views, which is similar to evaluating SQL queries on streams of data. A materialized view is defined as a SQL query just like a regular (virtual) view. However, the result of the query is actually stored (or materialized) in memory or on disk such that the view does not need to be computed on-the-fly when it is queried. In order to prevent that a materialized view becomes stale, the database system needs to update the view whenever its base relations (the tables referenced in its definition query) are modified. If we consider the changes on the viewâs base relations as a stream of modifications (or as a changelog stream) it becomes obvious that materialized view maintenance and SQL on streams are somehow related.</p> + +<h2 id="flinks-relational-apis-table-api-and-sql">Flinkâs Relational APIs: Table API and SQL</h2> + +<p>Since version 1.1.0 (released in August 2016), Flink features two semantically equivalent relational APIs, the language-embedded Table API (for Java and Scala) and standard SQL. Both APIs are designed as unified APIs for online streaming and historic batch data. This means that,</p> + +<p><strong><em>a query produces exactly the same result regardless whether its input is static batch data or streaming data.</em></strong></p> + +<p>Unified APIs for stream and batch processing are important for several reasons. First of all, users only need to learn a single API to process static and streaming data. Moreover, the same query can be used to analyze batch and streaming data, which allows to jointly analyze historic and live data in the same query. At the current state we havenât achieved complete unification of batch and streaming semantics yet, but the community is making very good progress towards this goal.</p> + +<p>The following code snippet shows two equivalent Table API and SQL queries that compute a simple windowed aggregate on a stream of temperature sensor measurements. The syntax of the SQL query is based on <a href="https://calcite.apache.org">Apache Calciteâs</a> syntax for <a href="https://calcite.apache.org/docs/reference.html#grouped-window-functions">grouped window functions</a> and will be supported in version 1.3.0 of Flink.</p> + +<div class="highlight"><pre><code class="language-scala"><span class="k">val</span> <span class="n">env</span> <span class="k">=</span> <span class="nc">StreamExecutionEnvironment</span><span class="o">.</span><span class="n">getExecutionEnvironment</span> +<span class="n">env</span><span class="o">.</span><span class="n">setStreamTimeCharacteristic</span><span class="o">(</span><span class="nc">TimeCharacteristic</span><span class="o">.</span><span class="nc">EventTime</span><span class="o">)</span> + +<span class="k">val</span> <span class="n">tEnv</span> <span class="k">=</span> <span class="nc">TableEnvironment</span><span class="o">.</span><span class="n">getTableEnvironment</span><span class="o">(</span><span class="n">env</span><span class="o">)</span> + +<span class="c1">// define a table source to read sensor data (sensorId, time, room, temp)</span> +<span class="k">val</span> <span class="n">sensorTable</span> <span class="k">=</span> <span class="o">???</span> <span class="c1">// can be a CSV file, Kafka topic, database, or ...</span> +<span class="c1">// register the table source</span> +<span class="n">tEnv</span><span class="o">.</span><span class="n">registerTableSource</span><span class="o">(</span><span class="s">&quot;sensors&quot;</span><span class="o">,</span> <span class="n">sensorTable</span><span class="o">)</span> + +<span class="c1">// Table API</span> +<span class="k">val</span> <span class="n">tapiResult</span><span class="k">:</span> <span class="kt">Table</span> <span class="o">=</span> <span class="n">tEnv</span><span class="o">.</span><span class="n">scan</span><span class="o">(</span><span class="s">&quot;sensors&quot;</span><span class="o">)</span> <span class="c1">// scan sensors table</span> + <span class="o">.</span><span class="n">window</span><span class="o">(</span><span class="nc">Tumble</span> <span class="n">over</span> <span class="mf">1.</span><span class="n">hour</span> <span class="n">on</span> <span class="-Symbol">&#39;rowtime</span> <span class="n">as</span> <span class="-Symbol">&#39;w</span><span class="o">)</span> <span class="c1">// define 1-hour window</span> + <span class="o">.</span><span class="n">groupBy</span><span class="o">(</span><span class="-Symbol">&#39;w</span><span class="o">,</span> <span class="-Symbol">&#39;room</span><span class="o">)</span> <span class="c1">// group by window and room</span> + <span class="o">.</span><span class="n">select</span><span class="o">(</span><span class="-Symbol">&#39;room</span><span class="o">,</span> <span class="-Symbol">&#39;w</span><span class="o">.</span><span class="n">end</span><span class="o">,</span> <span class="-Symbol">&#39;temp</span><span class="o">.</span><span class="n">avg</span> <span class="n">as</span> <span class="-Symbol">&#39;avgTemp</span><span class="o">)</span> <span class="c1">// compute average temperature</span> + +<span class="c1">// SQL</span> +<span class="k">val</span> <span class="n">sqlResult</span><span class="k">:</span> <span class="kt">Table</span> <span class="o">=</span> <span class="n">tEnv</span><span class="o">.</span><span class="n">sql</span><span class="o">(</span><span class="s">&quot;&quot;&quot;</span> +<span class="s"> |SELECT room, TUMBLE_END(rowtime, INTERVAL &#39;1&#39; HOUR), AVG(temp) AS avgTemp</span> +<span class="s"> |FROM sensors</span> +<span class="s"> |GROUP BY TUMBLE(rowtime, INTERVAL &#39;1&#39; HOUR), room</span> +<span class="s"> |&quot;&quot;&quot;</span><span class="o">.</span><span class="n">stripMargin</span><span class="o">)</span></code></pre></div> + +<p>As you can see, both APIs are tightly integrated with each other and Flinkâs primary <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html">DataStream</a> and <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/batch/index.html">DataSet</a> APIs. A <code>Table</code> can be generated from and converted to a <code>DataSet</code> or <code>DataStream</code>. Hence, it is easily possible to scan an external table source such as a database or <a href="https://parquet.apache.org">Parquet</a> file, do some preprocessing with a Table API query, convert the result into a <code>DataSet</code> and run a <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/libs/gelly/index.html">Gelly</a> graph algorithm on it. The queries defined in the example above can also be used to process batch data by changing the execution environment.</p> + +<p>Internally, both APIs are translated into the same logical representation, optimized by Apache Calcite, and compiled into DataStream or DataSet programs. In fact, the optimization and translation process does not know whether a query was defined using the Table API or SQL. If you are curious about the details of the optimization process, have a look at <a href="http://flink.apache.org/news/2016/05/24/stream-sql.html">a blog post</a> that we published last year. Since the Table API and SQL are equivalent in terms of semantics and only differ in syntax, we always refer to both APIs when we talk about SQL in this post.</p> + +<p>In its current state (version 1.2.0), Flinkâs relational APIs support a limited set of relational operators on data streams, including projections, filters, and windowed aggregates. All supported operators have in common that they never update result records which have been emitted. This is clearly not an issue for record-at-a-time operators such as projection and filter. However, it affects operators that collect and process multiple records as for instance windowed aggregates. Since emitted results cannot be updated, input records, which arrive after a result has been emitted, have to be discarded in Flink 1.2.0.</p> + +<p>The limitations of the current version are acceptable for applications that emit data to storage systems such as Kafka topics, message queues, or files which only support append operations and no updates or deletes. Common use cases that follow this pattern are for example continuous ETL and stream archiving applications that persist streams to an archive or prepare data for further online (streaming) analysis or later offline analysis. Since it is not possible to update previously emitted results, these kinds of applications have to make sure that the emitted results are correct and will not need to be corrected in the future. The following figure illustrates such applications.</p> + +<center> +<img src="/img/blog/dynamic-tables/query-append-out.png" style="width:60%;margin:10px" /> +</center> + +<p>While queries that only support appends are useful for some kinds of applications and certain types of storage systems, there are many streaming analytics use cases that need to update results. This includes streaming applications that cannot discard late arriving records, need early results for (long-running) windowed aggregates, or require non-windowed aggregates. In each of these cases, previously emitted result records need to be updated. Result-updating queries often materialize their result to an external database or key-value store in order to make it accessible and queryable for external applications. Applications that implement this pattern are dashboards, reporting applications, or <a href="http://2016.flink-forward.org/kb_sessions/joining-infinity-windowless-stream-processing-with-flink/">other applications</a>, which require timely access to continuously updated results. The following figure illustrates these kind of applications.</p> + +<center> +<img src="/img/blog/dynamic-tables/query-update-out.png" style="width:60%;margin:10px" /> +</center> + +<h2 id="continuous-queries-on-dynamic-tables">Continuous Queries on Dynamic Tables</h2> + +<p>Support for queries that update previously emitted results is the next big step for Flinkâs relational APIs. This feature is so important because it vastly increases the scope of the APIs and the range of supported use cases. Moreover, many of the newly supported use cases can be challenging to implement using the DataStream API.</p> + +<p>So when adding support for result-updating queries, we must of course preserve the unified semantics for stream and batch inputs. We achieve this by the concept of <em>Dynamic Tables</em>. A dynamic table is a table that is continuously updated and can be queried like a regular, static table. However, in contrast to a query on a batch table which terminates and returns a static table as result, a query on a dynamic table runs continuously and produces a table that is continuously updated depending on the modification on the input table. Hence, the resulting table is a dynamic table as well. This concept is very similar to materialized view maintenance as we discussed before.</p> + +<p>Assuming we can run queries on dynamic tables which produce new dynamic tables, the next question is, How do streams and dynamic tables relate to each other? The answer is that streams can be converted into dynamic tables and dynamic tables can be converted into streams. The following figure shows the conceptual model of processing a relational query on a stream.</p> + +<center> +<img src="/img/blog/dynamic-tables/stream-query-stream.png" style="width:70%;margin:10px" /> +</center> + +<p>First, the stream is converted into a dynamic table. The dynamic table is queried with a continuous query, which produces a new dynamic table. Finally, the resulting table is converted back into a stream. It is important to note that this is only the logical model and does not imply how the query is actually executed. In fact, a continuous query is internally translated into a conventional DataStream program.</p> + +<p>In the following, we describe the different steps of this model:</p> + +<ol> + <li>Defining a dynamic table on a stream,</li> + <li>Querying a dynamic table, and</li> + <li>Emitting a dynamic table.</li> +</ol> + +<h2 id="defining-a-dynamic-table-on-a-stream">Defining a Dynamic Table on a Stream</h2> + +<p>The first step of evaluating a SQL query on a dynamic table is to define a dynamic table on a stream. This means we have to specify how the records of a stream modify the dynamic table. The stream must carry records with a schema that is mapped to the relational schema of the table. There are two modes to define a dynamic table on a stream: <em>Append Mode</em> and <em>Update Mode</em>.</p> + +<p>In append mode each stream record is an insert modification to the dynamic table. Hence, all records of a stream are appended to the dynamic table such that it is ever-growing and infinite in size. The following figure illustrates the append mode.</p> + +<center> +<img src="/img/blog/dynamic-tables/append-mode.png" style="width:70%;margin:10px" /> +</center> + +<p>In update mode a stream record can represent an insert, update, or delete modification on the dynamic table (append mode is in fact a special case of update mode). When defining a dynamic table on a stream via update mode, we can specify a unique key attribute on the table. In that case, update and delete operations are performed with respect to the key attribute. The update mode is visualized in the following figure.</p> + +<center> +<img src="/img/blog/dynamic-tables/replace-mode.png" style="width:70%;margin:10px" /> +</center> + +<h2 id="querying-a-dynamic-table">Querying a Dynamic Table</h2> + +<p>Once we have defined a dynamic table, we can run a query on it. Since dynamic tables change over time, we have to define what it means to query a dynamic table. Letâs imagine we take a snapshot of a dynamic table at a specific point in time. This snapshot can be treated as a regular static batch table. We denote a snapshot of a dynamic table <em>A</em> at a point <em>t</em> as <em>A[t]</em>. The snapshot can be queried with any SQL query. The query produces a regular static table as result. We denote the result of a query <em>q</em> on a dynamic table <em>A</em> at time <em>t</em> as <em>q(A[t])</em>. If we repeatedly compute the result of a query on snapshots of a dynamic table for progressing points in time, we obtain many static result tables which are changing over time and effectively constitute a dynamic table. We define the semantics of a query on a dynamic table as follows.</p&g t; + +<p>A query <em>q</em> on a dynamic table <em>A</em> produces a dynamic table <em>R</em>, which is at each point in time <em>t</em> equivalent to the result of applying <em>q</em> on <em>A[t]</em>, i.e., <em>R[t] = q(A[t])</em>. This definition implies that running the same query on <em>q</em> on a batch table and on a streaming table produces the same result. In the following, we show two examples to illustrate the semantics of queries on dynamic tables.</p> + +<p>In the figure below, we see a dynamic input table <em>A</em> on the left side, which is defined in append mode. At time <em>t = 8</em>, <em>A</em> consists of six rows (colored in blue). At time <em>t = 9</em> and <em>t = 12</em>, one row is appended to <em>A</em> (visualized in green and orange, respectively). We run a simple query on table <em>A</em> which is shown in the center of the figure. The query groups by attribute <em>k</em> and counts the records per group. On the right hand side we see the result of query <em>q</em> at time <em>t = 8</em> (blue), <em>t = 9</em> (green), and <em>t = 12</em> (orange). At each point in time t, the result table is equivalent to a batch query on the dynamic table <em>A</em> at time <em>t</em>.</p> + +<center> +<img src="/img/blog/dynamic-tables/query-groupBy-cnt.png" style="width:70%;margin:10px" /> +</center> + +<p>The query in this example is a simple grouped (but not windowed) aggregation query. Hence, the size of the result table depends on the number of distinct grouping keys of the input table. Moreover, it is worth noticing that the query continuously updates result rows that it had previously emitted instead of merely adding new rows.</p> + +<p>The second example shows a similar query which differs in one important aspect. In addition to grouping on the key attribute <em>k</em>, the query also groups records into tumbling windows of five seconds, which means that it computes a count for each value of <em>k</em> every five seconds. Again, we use Calciteâs <a href="https://calcite.apache.org/docs/reference.html#grouped-window-functions">group window functions</a> to specify this query. On the left side of the figure we see the input table <em>A</em> and how it changes over time in append mode. On the right we see the result table and how it evolves over time.</p> + +<center> +<img src="/img/blog/dynamic-tables/query-groupBy-window-cnt.png" style="width:80%;margin:10px" /> +</center> + +<p>In contrast to the result of the first example, the resulting table grows relative to the time, i.e., every five seconds new result rows are computed (given that the input table received more records in the last five seconds). While the non-windowed query (mostly) updates rows of the result table, the windowed aggregation query only appends new rows to the result table.</p> + +<p>Although this blog post focuses on the semantics of SQL queries on dynamic tables and not on how to efficiently process such a query, weâd like to point out that it is not possible to compute the complete result of a query from scratch whenever an input table is updated. Instead, the query is compiled into a streaming program which continuously updates its result based on the changes on its input. This implies that not all valid SQL queries are supported but only those that can be continuously, incrementally, and efficiently computed. We plan discuss details about the evaluation of SQL queries on dynamic tables in a follow up blog post.</p> + +<h2 id="emitting-a-dynamic-table">Emitting a Dynamic Table</h2> + +<p>Querying a dynamic table yields another dynamic table, which represents the queryâs results. Depending on the query and its input tables, the result table is continuously modified by insert, update, and delete changes just like a regular database table. It might be a table with a single row, which is constantly updated, an insert-only table without update modifications, or anything in between.</p> + +<p>Traditional database systems use logs to rebuild tables in case of failures and for replication. There are different logging techniques, such as UNDO, REDO, and UNDO/REDO logging. In a nutshell, UNDO logs record the previous value of a modified element to revert incomplete transactions, REDO logs record the new value of a modified element to redo lost changes of completed transactions, and UNDO/REDO logs record the old and the new value of a changed element to undo incomplete transactions and redo lost changes of completed transactions. Based on the principles of these logging techniques, a dynamic table can be converted into two types of changelog streams, a <em>REDO Stream</em> and a <em>REDO+UNDO Stream</em>.</p> + +<p>A dynamic table is converted into a redo+undo stream by converting the modifications on the table into stream messages. An insert modification is emitted as an insert message with the new row, a delete modification is emitted as a delete message with the old row, and an update modification is emitted as a delete message with the old row and an insert message with the new row. This behavior is illustrated in the following figure.</p> + +<center> +<img src="/img/blog/dynamic-tables/undo-redo-mode.png" style="width:70%;margin:10px" /> +</center> + +<p>The left shows a dynamic table which is maintained in append mode and serves as input to the query in the center. The result of the query converted into a redo+undo stream which is shown at the bottom. The first record <em>(1, A)</em> of the input table results in a new record in the result table and hence in an insert message <em>+(A, 1)</em> to the stream. The second input record with <em>k = âAâ</em> <em>(4, A)</em> produces an update of the <em>(A, 1)</em> record in the result table and hence yields a delete message <em>-(A, 1)</em> and an insert message for <em>+(A, 2)</em>. All downstream operators or data sinks need to be able to correctly handle both types of messages.</p> + +<p>A dynamic table can be converted into a redo stream in two cases: either it is an append-only table (i.e., it only has insert modifications) or it has a unique key attribute. Each insert modification on the dynamic table results in an insert message with the new row to the redo stream. Due to the restriction of redo streams, only tables with unique keys can have update and delete modifications. If a key is removed from the keyed dynamic table, either because a row is deleted or because the key attribute of a row was modified, a delete message with the removed key is emitted to the redo stream. An update modification yields an update message with the updating, i.e., new row. Since delete and update modifications are defined with respect to the unique key, the downstream operators need to be able to access previous values by key. The figure below shows how the result table of the same query as above is converted into a redo stream.</p> + +<center> +<img src="/img/blog/dynamic-tables/redo-mode.png" style="width:70%;margin:10px" /> +</center> + +<p>The row <em>(1, A)</em> which yields an insert into the dynamic table results in the <em>+(A, 1)</em> insert message. The row <em>(4, A)</em> which produces an update yields the <em>*(A, 2)</em> update message.</p> + +<p>Common use cases for redo streams are to write the result of a query to an append-only storage system, like rolling files or a Kafka topic, or to a data store with keyed access, such as Cassandra, a relational DBMS, or a compacted Kafka topic. It is also possible to materialize a dynamic table as keyed state inside of the streaming application that evaluates the continuous query and make it queryable from external systems. With this design Flink itself maintains the result of a continuous SQL query on a stream and serves key lookups on the result table, for instance from a dashboard application.</p> + +<h2 id="what-will-change-when-switching-to-dynamic-tables">What will Change When Switching to Dynamic Tables?</h2> + +<p>In version 1.2, all streaming operators of Flinkâs relational APIs, like filter, project, and group window aggregates, only emit new rows and are not capable of updating previously emitted results. In contrast, dynamic table are able to handle update and delete modifications. Now you might ask yourself, How does the processing model of the current version relate to the new dynamic table model? Will the semantics of the APIs completely change and do we need to reimplement the APIs from scratch to achieve the desired semantics?</p> + +<p>The answer to all these questions is simple. The current processing model is a subset of the dynamic table model. Using the terminology we introduced in this post, the current model converts a stream into a dynamic table in append mode, i.e., an infinitely growing table. Since all operators only accept insert changes and produce insert changes on their result table (i.e., emit new rows), all supported queries result in dynamic append tables, which are converted back into DataStreams using the redo model for append-only tables. Consequently, the semantics of the current model are completely covered and preserved by the new dynamic table model.</p> + +<h2 id="conclusion-and-outlook">Conclusion and Outlook</h2> + +<p>Flinkâs relational APIs are great to implement stream analytics applications in no time and used in several production settings. In this blog post we discussed the future of the Table API and SQL. This effort will make Flink and stream processing accessible to more people. Moreover, the unified semantics for querying historic and real-time data as well as the concept of querying and maintaining dynamic tables will enable and significantly ease the implementation of many exciting use cases and applications. As this post was focusing on the semantics of relational queries on streams and dynamic tables, we did not discuss the details of how a query will be executed, which includes the internal implementation of retractions, handling of late events, support for early results, and bounding space requirements. We plan to publish a follow up blog post on this topic at a later point in time.</p> + +<p>In recent months, many members of the Flink community have been discussing and contributing to the relational APIs. We made great progress so far. While most work has focused on processing streams in append mode, the next steps on the agenda are to work on dynamic tables to support queries that update their results. If you are excited about the idea of processing streams with SQL and would like to contribute to this effort, please give feedback, join the discussions on the mailing list, or grab a JIRA issue to work on.</p> +</description> +<pubDate>Tue, 04 Apr 2017 14:00:00 +0200</pubDate> +<link>http://flink.apache.org/news/2017/04/04/dynamic-tables.html</link> +<guid isPermaLink="true">/news/2017/04/04/dynamic-tables.html</guid> +</item> + +<item> <title>From Streams to Tables and Back Again: An Update on Flink's Table & SQL API</title> <description><p>Stream processing can deliver a lot of value. Many organizations have recognized the benefit of managing large volumes of data in real-time, reacting quickly to trends, and providing customers with live services at scale. Streaming applications with well-defined business logic can deliver a competitive advantage.</p> http://git-wip-us.apache.org/repos/asf/flink-web/blob/f6ac337f/content/blog/index.html ---------------------------------------------------------------------- diff --git a/content/blog/index.html b/content/blog/index.html index e9f15ae..968b3e0 100644 --- a/content/blog/index.html +++ b/content/blog/index.html @@ -142,6 +142,18 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2017/04/04/dynamic-tables.html">Continuous Queries on Dynamic Tables</a></h2> + <p>04 Apr 2017 by Fabian Hueske, Shaoxuan Wang, and Xiaowei Jiang</p> + + <p><p>Flink's relational APIs, the Table API and SQL, are unified APIs for stream and batch processing, meaning that a query produces the same result when being evaluated on streaming or static data.</p> +<p>In this blog post we discuss the future of these APIs and introduce the concept of Dynamic Tables. Dynamic tables will significantly expand the scope of the Table API and SQL on streams and enable many more advanced use cases. We discuss how streams and dynamic tables relate to each other and explain the semantics of continuously evaluating queries on dynamic tables.</p></p> + + <p><a href="/news/2017/04/04/dynamic-tables.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2017/03/29/table-sql-api-update.html">From Streams to Tables and Back Again: An Update on Flink's Table & SQL API</a></h2> <p>29 Mar 2017 by Timo Walther (<a href="https://twitter.com/twalthr">@twalthr</a>)</p> @@ -252,19 +264,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2016/08/08/release-1.1.0.html">Announcing Apache Flink 1.1.0</a></h2> - <p>08 Aug 2016</p> - - <p><div class="alert alert-success"><strong>Important</strong>: The Maven artifacts published with version 1.1.0 on Maven central have a Hadoop dependency issue. It is highly recommended to use <strong>1.1.1</strong> or <strong>1.1.1-hadoop1</strong> as the Flink version.</div> - -</p> - - <p><a href="/news/2016/08/08/release-1.1.0.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -297,6 +296,16 @@ <ul id="markdown-toc"> + <li><a href="/news/2017/04/04/dynamic-tables.html">Continuous Queries on Dynamic Tables</a></li> + + + + + + + + + <li><a href="/news/2017/03/29/table-sql-api-update.html">From Streams to Tables and Back Again: An Update on Flink's Table & SQL API</a></li> http://git-wip-us.apache.org/repos/asf/flink-web/blob/f6ac337f/content/blog/page2/index.html ---------------------------------------------------------------------- diff --git a/content/blog/page2/index.html b/content/blog/page2/index.html index c39e2ba..9725477 100644 --- a/content/blog/page2/index.html +++ b/content/blog/page2/index.html @@ -142,6 +142,19 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2016/08/08/release-1.1.0.html">Announcing Apache Flink 1.1.0</a></h2> + <p>08 Aug 2016</p> + + <p><div class="alert alert-success"><strong>Important</strong>: The Maven artifacts published with version 1.1.0 on Maven central have a Hadoop dependency issue. It is highly recommended to use <strong>1.1.1</strong> or <strong>1.1.1-hadoop1</strong> as the Flink version.</div> + +</p> + + <p><a href="/news/2016/08/08/release-1.1.0.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2016/05/24/stream-sql.html">Stream Processing for Everyone with SQL and Apache Flink</a></h2> <p>24 May 2016 by Fabian Hueske (<a href="https://twitter.com/fhueske">@fhueske</a>)</p> @@ -253,17 +266,6 @@ <hr> - <article> - <h2 class="blog-title"><a href="/news/2015/12/11/storm-compatibility.html">Storm Compatibility in Apache Flink: How to run existing Storm topologies on Flink</a></h2> - <p>11 Dec 2015 by Matthias J. Sax (<a href="https://twitter.com/MatthiasJSax">@MatthiasJSax</a>)</p> - - <p>In this blog post, we describe Flink's compatibility package for <a href="https://storm.apache.org">Apache Storm</a> that allows to embed Spouts (sources) and Bolts (operators) in a regular Flink streaming job. Furthermore, the compatibility package provides a Storm compatible API in order to execute whole Storm topologies with (almost) no code adaption.</p> - - <p><a href="/news/2015/12/11/storm-compatibility.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -296,6 +298,16 @@ <ul id="markdown-toc"> + <li><a href="/news/2017/04/04/dynamic-tables.html">Continuous Queries on Dynamic Tables</a></li> + + + + + + + + + <li><a href="/news/2017/03/29/table-sql-api-update.html">From Streams to Tables and Back Again: An Update on Flink's Table & SQL API</a></li> http://git-wip-us.apache.org/repos/asf/flink-web/blob/f6ac337f/content/blog/page3/index.html ---------------------------------------------------------------------- diff --git a/content/blog/page3/index.html b/content/blog/page3/index.html index 75ac4b3..cdd4d36 100644 --- a/content/blog/page3/index.html +++ b/content/blog/page3/index.html @@ -142,6 +142,17 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2015/12/11/storm-compatibility.html">Storm Compatibility in Apache Flink: How to run existing Storm topologies on Flink</a></h2> + <p>11 Dec 2015 by Matthias J. Sax (<a href="https://twitter.com/MatthiasJSax">@MatthiasJSax</a>)</p> + + <p>In this blog post, we describe Flink's compatibility package for <a href="https://storm.apache.org">Apache Storm</a> that allows to embed Spouts (sources) and Bolts (operators) in a regular Flink streaming job. Furthermore, the compatibility package provides a Storm compatible API in order to execute whole Storm topologies with (almost) no code adaption.</p> + + <p><a href="/news/2015/12/11/storm-compatibility.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2015/12/04/Introducing-windows.html">Introducing Stream Windows in Apache Flink</a></h2> <p>04 Dec 2015 by Fabian Hueske (<a href="https://twitter.com/fhueske">@fhueske</a>)</p> @@ -261,18 +272,6 @@ vertex-centric or gather-sum-apply to Flink dataflows.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2015/05/11/Juggling-with-Bits-and-Bytes.html">Juggling with Bits and Bytes</a></h2> - <p>11 May 2015 by Fabian Hüske (<a href="https://twitter.com/fhueske">@fhueske</a>)</p> - - <p><p>Nowadays, a lot of open-source systems for analyzing large data sets are implemented in Java or other JVM-based programming languages. The most well-known example is Apache Hadoop, but also newer frameworks such as Apache Spark, Apache Drill, and also Apache Flink run on JVMs. A common challenge that JVM-based data analysis engines face is to store large amounts of data in memory - both for caching and for efficient processing such as sorting and joining of data. Managing the JVM memory well makes the difference between a system that is hard to configure and has unpredictable reliability and performance and a system that behaves robustly with few configuration knobs.</p> -<p>In this blog post we discuss how Apache Flink manages memory, talk about its custom data de/serialization stack, and show how it operates on binary data.</p></p> - - <p><a href="/news/2015/05/11/Juggling-with-Bits-and-Bytes.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -305,6 +304,16 @@ vertex-centric or gather-sum-apply to Flink dataflows.</p> <ul id="markdown-toc"> + <li><a href="/news/2017/04/04/dynamic-tables.html">Continuous Queries on Dynamic Tables</a></li> + + + + + + + + + <li><a href="/news/2017/03/29/table-sql-api-update.html">From Streams to Tables and Back Again: An Update on Flink's Table & SQL API</a></li> http://git-wip-us.apache.org/repos/asf/flink-web/blob/f6ac337f/content/blog/page4/index.html ---------------------------------------------------------------------- diff --git a/content/blog/page4/index.html b/content/blog/page4/index.html index 7d66840..7b0db94 100644 --- a/content/blog/page4/index.html +++ b/content/blog/page4/index.html @@ -142,6 +142,18 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2015/05/11/Juggling-with-Bits-and-Bytes.html">Juggling with Bits and Bytes</a></h2> + <p>11 May 2015 by Fabian Hüske (<a href="https://twitter.com/fhueske">@fhueske</a>)</p> + + <p><p>Nowadays, a lot of open-source systems for analyzing large data sets are implemented in Java or other JVM-based programming languages. The most well-known example is Apache Hadoop, but also newer frameworks such as Apache Spark, Apache Drill, and also Apache Flink run on JVMs. A common challenge that JVM-based data analysis engines face is to store large amounts of data in memory - both for caching and for efficient processing such as sorting and joining of data. Managing the JVM memory well makes the difference between a system that is hard to configure and has unpredictable reliability and performance and a system that behaves robustly with few configuration knobs.</p> +<p>In this blog post we discuss how Apache Flink manages memory, talk about its custom data de/serialization stack, and show how it operates on binary data.</p></p> + + <p><a href="/news/2015/05/11/Juggling-with-Bits-and-Bytes.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2015/04/13/release-0.9.0-milestone1.html">Announcing Flink 0.9.0-milestone1 preview release</a></h2> <p>13 Apr 2015</p> @@ -268,19 +280,6 @@ and offers a new API including definition of flexible windows.</p> <hr> - <article> - <h2 class="blog-title"><a href="/news/2014/11/04/release-0.7.0.html">Apache Flink 0.7.0 available</a></h2> - <p>04 Nov 2014</p> - - <p><p>We are pleased to announce the availability of Flink 0.7.0. This release includes new user-facing features as well as performance and bug fixes, brings the Scala and Java APIs in sync, and introduces Flink Streaming. A total of 34 people have contributed to this release, a big thanks to all of them!</p> - -</p> - - <p><a href="/news/2014/11/04/release-0.7.0.html">Continue reading »</a></p> - </article> - - <hr> - <!-- Pagination links --> @@ -313,6 +312,16 @@ and offers a new API including definition of flexible windows.</p> <ul id="markdown-toc"> + <li><a href="/news/2017/04/04/dynamic-tables.html">Continuous Queries on Dynamic Tables</a></li> + + + + + + + + + <li><a href="/news/2017/03/29/table-sql-api-update.html">From Streams to Tables and Back Again: An Update on Flink's Table & SQL API</a></li> http://git-wip-us.apache.org/repos/asf/flink-web/blob/f6ac337f/content/blog/page5/index.html ---------------------------------------------------------------------- diff --git a/content/blog/page5/index.html b/content/blog/page5/index.html index e212840..ad576d2 100644 --- a/content/blog/page5/index.html +++ b/content/blog/page5/index.html @@ -142,6 +142,19 @@ <!-- Blog posts --> <article> + <h2 class="blog-title"><a href="/news/2014/11/04/release-0.7.0.html">Apache Flink 0.7.0 available</a></h2> + <p>04 Nov 2014</p> + + <p><p>We are pleased to announce the availability of Flink 0.7.0. This release includes new user-facing features as well as performance and bug fixes, brings the Scala and Java APIs in sync, and introduces Flink Streaming. A total of 34 people have contributed to this release, a big thanks to all of them!</p> + +</p> + + <p><a href="/news/2014/11/04/release-0.7.0.html">Continue reading »</a></p> + </article> + + <hr> + + <article> <h2 class="blog-title"><a href="/news/2014/10/03/upcoming_events.html">Upcoming Events</a></h2> <p>03 Oct 2014</p> @@ -215,6 +228,16 @@ academic and open source project that Flink originates from.</p> <ul id="markdown-toc"> + <li><a href="/news/2017/04/04/dynamic-tables.html">Continuous Queries on Dynamic Tables</a></li> + + + + + + + + + <li><a href="/news/2017/03/29/table-sql-api-update.html">From Streams to Tables and Back Again: An Update on Flink's Table & SQL API</a></li> http://git-wip-us.apache.org/repos/asf/flink-web/blob/f6ac337f/content/img/blog/dynamic-tables/append-mode.png ---------------------------------------------------------------------- diff --git a/content/img/blog/dynamic-tables/append-mode.png b/content/img/blog/dynamic-tables/append-mode.png new file mode 100644 index 0000000..a09d716 Binary files /dev/null and b/content/img/blog/dynamic-tables/append-mode.png differ http://git-wip-us.apache.org/repos/asf/flink-web/blob/f6ac337f/content/img/blog/dynamic-tables/query-append-out.png ---------------------------------------------------------------------- diff --git a/content/img/blog/dynamic-tables/query-append-out.png b/content/img/blog/dynamic-tables/query-append-out.png new file mode 100755 index 0000000..eaf056e Binary files /dev/null and b/content/img/blog/dynamic-tables/query-append-out.png differ http://git-wip-us.apache.org/repos/asf/flink-web/blob/f6ac337f/content/img/blog/dynamic-tables/query-groupBy-cnt.png ---------------------------------------------------------------------- diff --git a/content/img/blog/dynamic-tables/query-groupBy-cnt.png b/content/img/blog/dynamic-tables/query-groupBy-cnt.png new file mode 100644 index 0000000..0e66ceb Binary files /dev/null and b/content/img/blog/dynamic-tables/query-groupBy-cnt.png differ http://git-wip-us.apache.org/repos/asf/flink-web/blob/f6ac337f/content/img/blog/dynamic-tables/query-groupBy-window-cnt.png ---------------------------------------------------------------------- diff --git a/content/img/blog/dynamic-tables/query-groupBy-window-cnt.png b/content/img/blog/dynamic-tables/query-groupBy-window-cnt.png new file mode 100644 index 0000000..e4f3cf4 Binary files /dev/null and b/content/img/blog/dynamic-tables/query-groupBy-window-cnt.png differ http://git-wip-us.apache.org/repos/asf/flink-web/blob/f6ac337f/content/img/blog/dynamic-tables/query-update-out.png ---------------------------------------------------------------------- diff --git a/content/img/blog/dynamic-tables/query-update-out.png b/content/img/blog/dynamic-tables/query-update-out.png new file mode 100755 index 0000000..9fd9db0 Binary files /dev/null and b/content/img/blog/dynamic-tables/query-update-out.png differ http://git-wip-us.apache.org/repos/asf/flink-web/blob/f6ac337f/content/img/blog/dynamic-tables/redo-mode.png ---------------------------------------------------------------------- diff --git a/content/img/blog/dynamic-tables/redo-mode.png b/content/img/blog/dynamic-tables/redo-mode.png new file mode 100644 index 0000000..1ff9d07 Binary files /dev/null and b/content/img/blog/dynamic-tables/redo-mode.png differ http://git-wip-us.apache.org/repos/asf/flink-web/blob/f6ac337f/content/img/blog/dynamic-tables/replace-mode.png ---------------------------------------------------------------------- diff --git a/content/img/blog/dynamic-tables/replace-mode.png b/content/img/blog/dynamic-tables/replace-mode.png new file mode 100644 index 0000000..11368d2 Binary files /dev/null and b/content/img/blog/dynamic-tables/replace-mode.png differ http://git-wip-us.apache.org/repos/asf/flink-web/blob/f6ac337f/content/img/blog/dynamic-tables/stream-query-stream.png ---------------------------------------------------------------------- diff --git a/content/img/blog/dynamic-tables/stream-query-stream.png b/content/img/blog/dynamic-tables/stream-query-stream.png new file mode 100644 index 0000000..b32b7ef Binary files /dev/null and b/content/img/blog/dynamic-tables/stream-query-stream.png differ http://git-wip-us.apache.org/repos/asf/flink-web/blob/f6ac337f/content/img/blog/dynamic-tables/streams.png ---------------------------------------------------------------------- diff --git a/content/img/blog/dynamic-tables/streams.png b/content/img/blog/dynamic-tables/streams.png new file mode 100755 index 0000000..16b1922 Binary files /dev/null and b/content/img/blog/dynamic-tables/streams.png differ http://git-wip-us.apache.org/repos/asf/flink-web/blob/f6ac337f/content/img/blog/dynamic-tables/undo-redo-mode.png ---------------------------------------------------------------------- diff --git a/content/img/blog/dynamic-tables/undo-redo-mode.png b/content/img/blog/dynamic-tables/undo-redo-mode.png new file mode 100644 index 0000000..6959527 Binary files /dev/null and b/content/img/blog/dynamic-tables/undo-redo-mode.png differ http://git-wip-us.apache.org/repos/asf/flink-web/blob/f6ac337f/content/index.html ---------------------------------------------------------------------- diff --git a/content/index.html b/content/index.html index f0ec841..2ad34f5 100644 --- a/content/index.html +++ b/content/index.html @@ -168,6 +168,10 @@ <dl> + <dt> <a href="/news/2017/04/04/dynamic-tables.html">Continuous Queries on Dynamic Tables</a></dt> + <dd><p>Flink's relational APIs, the Table API and SQL, are unified APIs for stream and batch processing, meaning that a query produces the same result when being evaluated on streaming or static data.</p> +<p>In this blog post we discuss the future of these APIs and introduce the concept of Dynamic Tables. Dynamic tables will significantly expand the scope of the Table API and SQL on streams and enable many more advanced use cases. We discuss how streams and dynamic tables relate to each other and explain the semantics of continuously evaluating queries on dynamic tables.</p></dd> + <dt> <a href="/news/2017/03/29/table-sql-api-update.html">From Streams to Tables and Back Again: An Update on Flink's Table & SQL API</a></dt> <dd><p>Broadening the user base and unifying batch & streaming with relational APIs</p></dd> @@ -183,9 +187,6 @@ <dd><p>The Apache Flink community released the next bugfix version of the Apache Flink 1.1 series.</p> </dd> - - <dt> <a href="/news/2016/12/19/2016-year-in-review.html">Apache Flink in 2016: Year in Review</a></dt> - <dd><p>As 2016 comes to a close, let's take a moment to look back on the Flink community's great work during the past year.</p></dd> </dl>