This is an automated email from the ASF dual-hosted git repository.
guozhang pushed a commit to branch 2.0
in repository https://gitbox.apache.org/repos/asf/kafka.git
The following commit(s) were added to refs/heads/2.0 by this push:
new 1d239a0 HOTFIX: Fix broken links (#5676)
1d239a0 is described below
commit 1d239a0fd87dcfdfffd030d293c00f6faff57ba9
Author: Bill Bejeck <[email protected]>
AuthorDate: Wed Oct 3 21:27:15 2018 -0400
HOTFIX: Fix broken links (#5676)
Reviewers: Joel Hamill <[email protected]>,
Guozhang Wang <[email protected]>
---
docs/streams/core-concepts.html | 48 ++++++++
docs/streams/developer-guide/dsl-api.html | 147 ++++++++++++++++++++----
docs/streams/developer-guide/processor-api.html | 4 +-
docs/streams/upgrade-guide.html | 2 +-
4 files changed, 178 insertions(+), 23 deletions(-)
diff --git a/docs/streams/core-concepts.html b/docs/streams/core-concepts.html
index 3f9eab5..b6d7762 100644
--- a/docs/streams/core-concepts.html
+++ b/docs/streams/core-concepts.html
@@ -131,6 +131,54 @@
Note, that the describe default behavior can be changed in the
Processor API by assigning timestamps to output records explicitly when calling
<code>#forward()</code>.
</p>
+ <h3><a id="streams_concepts_aggregations"
href="#streams_concepts_aggregations">Aggregations</a></h3>
+ <p>
+ An <strong>aggregation</strong> operation takes one input stream or
table, and yields a new table by combining multiple input records into a single
output record. Examples of aggregations are computing counts or sum.
+ </p>
+
+ <p>
+ In the <code>Kafka Streams DSL</code>, an input stream of an
<code>aggregation</code> can be a KStream or a KTable, but the output stream
will always be a KTable. This allows Kafka Streams to update an aggregate value
upon the late arrival of further records after the value was produced and
emitted. When such late arrival happens, the aggregating KStream or KTable
emits a new aggregate value. Because the output is a KTable, the new value is
considered to overwrite the old value w [...]
+ </p>
+
+ <h3> <a id="streams_concepts_windowing"
href="#streams_concepts_windowing">Windowing</a></h3>
+ <p>
+ Windowing lets you control how to <em>group records that have the same
key</em> for stateful operations such as <code>aggregations</code> or
<code>joins</code> into so-called <em>windows</em>. Windows are tracked per
record key.
+ </p>
+ <p>
+ <code>Windowing operations</code> are available in the <code>Kafka
Streams DSL</code>. When working with windows, you can specify a
<strong>retention period</strong> for the window. This retention period
controls how long Kafka Streams will wait for <strong>out-of-order</strong> or
<strong>late-arriving</strong> data records for a given window. If a record
arrives after the retention period of a window has passed, the record is
discarded and will not be processed in that window.
+ </p>
+ <p>
+ Late-arriving records are always possible in the real world and should
be properly accounted for in your applications. It depends on the effective
<code>time semantics </code> how late records are handled. In the case of
processing-time, the semantics are "when the record is being
processed", which means that the notion of late records is not applicable
as, by definition, no record can be late. Hence, late-arriving records can only
be considered as such (i.e. as arrivin [...]
+ </p>
+
+ <h3><a id="streams_concepts_duality"
href="#streams-concepts-duality">Duality of Streams and Tables</a></h3>
+ <p>
+ When implementing stream processing use cases in practice, you
typically need both <strong>streams</strong> and also
<strong>databases</strong>.
+ An example use case that is very common in practice is an e-commerce
application that enriches an incoming <em>stream</em> of customer
+ transactions with the latest customer information from a <em>database
table</em>. In other words, streams are everywhere, but databases are
everywhere, too.
+ </p>
+
+ <p>
+ Any stream processing technology must therefore provide
<strong>first-class support for streams and tables</strong>.
+ Kafka's Streams API provides such functionality through its core
abstractions for
+ <code class="interpreted-text" data-role="ref">streams
<streams_concepts_kstream></code> and
+ <code class="interpreted-text" data-role="ref">tables
<streams_concepts_ktable></code>, which we will talk about in a minute.
+ Now, an interesting observation is that there is actually a
<strong>close relationship between streams and tables</strong>,
+ the so-called stream-table duality.
+ And Kafka exploits this duality in many ways: for example, to make
your applications
+ <code class="interpreted-text" data-role="ref">elastic
<streams_developer-guide_execution-scaling></code>,
+ to support <code class="interpreted-text"
data-role="ref">fault-tolerant stateful processing
<streams_developer-guide_state-store_fault-tolerance></code>,
+ or to run <code class="interpreted-text" data-role="ref">interactive
queries <streams_concepts_interactive-queries></code>
+ against your application's latest processing results. And, beyond its
internal usage, the Kafka Streams API
+ also allows developers to exploit this duality in their own
applications.
+ </p>
+
+ <p>
+ Before we discuss concepts such as <code class="interpreted-text"
data-role="ref">aggregations <streams_concepts_aggregations></code>
+ in Kafka Streams we must first introduce <strong>tables</strong> in
more detail, and talk about the aforementioned stream-table duality.
+ Essentially, this duality means that a stream can be viewed as a
table, and a table can be viewed as a stream.
+ </p>
+
<h3><a id="streams_state" href="#streams_state">States</a></h3>
<p>
diff --git a/docs/streams/developer-guide/dsl-api.html
b/docs/streams/developer-guide/dsl-api.html
index beb83a3..aa44dea 100644
--- a/docs/streams/developer-guide/dsl-api.html
+++ b/docs/streams/developer-guide/dsl-api.html
@@ -79,9 +79,9 @@
<h2><a class="toc-backref" href="#id7">Overview</a><a
class="headerlink" href="#overview" title="Permalink to this headline"></a></h2>
<p>In comparison to the <a class="reference internal"
href="processor-api.html#streams-developer-guide-processor-api"><span
class="std std-ref">Processor API</span></a>, only the DSL supports:</p>
<ul class="simple">
- <li>Built-in abstractions for <a class="reference internal"
href="../concepts.html#streams-concepts-duality"><span class="std
std-ref">streams and tables</span></a> in the form of
- <a class="reference internal"
href="../concepts.html#streams-concepts-kstream"><span class="std
std-ref">KStream</span></a>, <a class="reference internal"
href="../concepts.html#streams-concepts-ktable"><span class="std
std-ref">KTable</span></a>, and
- <a class="reference internal"
href="../concepts.html#streams-concepts-globalktable"><span class="std
std-ref">GlobalKTable</span></a>. Having first-class support for streams and
tables is crucial
+ <li>Built-in abstractions for <a class="reference internal"
href="../core-concepts.html#streams_concepts_duality"><span class="std
std-ref">streams and tables</span></a> in the form of
+ <a class="reference internal"
href="#streams_concepts_kstream"><span class="std std-ref">KStream</span></a>,
<a class="reference internal" href="#streams_concepts_ktable"><span class="std
std-ref">KTable</span></a>, and
+ <a class="reference internal"
href="#streams_concepts_globalktable"><span class="std
std-ref">GlobalKTable</span></a>. Having first-class support for streams and
tables is crucial
because, in practice, most use cases require not just
either streams or databases/tables, but a combination of both.
For example, if your use case is to create a customer
360-degree view that is updated in real-time, what your
application will be doing is transforming many input
<em>streams</em> of customer-related events into an output <em>table</em>
@@ -93,7 +93,7 @@
<a class="reference internal"
href="#streams-developer-guide-dsl-joins"><span class="std
std-ref">joins</span></a> (e.g. <code class="docutils literal"><span
class="pre">leftJoin</span></code>), and
<a class="reference internal"
href="#streams-developer-guide-dsl-windowing"><span class="std
std-ref">windowing</span></a> (e.g. <a class="reference internal"
href="#windowing-session"><span class="std std-ref">session
windows</span></a>).</li>
</ul>
- <p>With the DSL, you can define <a class="reference internal"
href="../concepts.html#streams-concepts-processor-topology"><span class="std
std-ref">processor topologies</span></a> (i.e., the logical
+ <p>With the DSL, you can define <a class="reference internal"
href="../core-concepts.html#streams_topology"><span class="std
std-ref">processor topologies</span></a> (i.e., the logical
processing plan) in your application. The steps to accomplish
this are:</p>
<ol class="arabic simple">
<li>Specify <a class="reference internal"
href="#streams-developer-guide-dsl-sources"><span class="std std-ref">one or
more input streams that are read from Kafka topics</span></a>.</li>
@@ -104,6 +104,113 @@
action). A step-by-step guide for writing a stream processing
application using the DSL is provided below.</p>
<p>For a complete list of available API functionality, see also
the <a
href="../../../javadoc/org/apache/kafka/streams/package-summary.html">Streams</a>
API docs.</p>
</div>
+
+ <div class="section" id="dsl-core-constructs-overview">
+ <h4><a id="streams_concepts_kstream"
href="#streams_concepts_kstream">KStream</a></h4>
+
+ <p>
+ Only the <strong>Kafka Streams DSL</strong> has the notion of
a <code>KStream</code>.
+ </p>
+
+ <p>
+ A <strong>KStream</strong> is an abstraction of a
<strong>record stream</strong>, where each data record represents a
self-contained datum in the unbounded data set. Using the table analogy, data
records in a record stream are always interpreted as an "INSERT" --
think: adding more entries to an append-only ledger -- because no record
replaces an existing row with the same key. Examples are a credit card
transaction, a page view event, or a server log entry.
+ </p>
+
+ <p>
+ To illustrate, let's imagine the following two data records
are being sent to the stream:
+ </p>
+
+ <div class="sourcecode">
+ <p>("alice", 1) --> ("alice", 3)</p>
+ </div>
+
+ <p>
+ If your stream processing application were to sum the values
per user, it would return <code>4</code> for <code>alice</code>. Why? Because
the second data record would not be considered an update of the previous
record. Compare this behavior of KStream to <code>KTable</code> below,
+ which would return <code>3</code> for <code>alice</code>.
+ </p>
+
+ <h4><a id="streams_concepts_ktable"
href="#streams_concepts_ktable">KTable</a></h4>
+
+ <p>
+ Only the <strong>Kafka Streams DSL</strong> has the notion of
a <code>KTable</code>.
+ </p>
+
+ <p>
+ A <strong>KTable</strong> is an abstraction of a
<strong>changelog stream</strong>, where each data record represents an update.
More precisely, the value in a data record is interpreted as an
"UPDATE" of the last value for the same record key, if any (if a
corresponding key doesn't exist yet, the update will be considered an INSERT).
Using the table analogy, a data record in a changelog stream is interpreted as
an UPSERT aka INSERT/UPDATE because any existing r [...]
+ </p>
+ <p>
+ To illustrate, let's imagine the following two data records
are being sent to the stream:
+ </p>
+
+ <div class="sourcecode">
+ <p>
+ ("alice", 1) --> ("alice", 3)
+ </p>
+ </div>
+
+ <p>
+ If your stream processing application were to sum the values
per user, it would return <code>3</code> for <code>alice</code>. Why? Because
the second data record would be considered an update of the previous record.
+ </p>
+
+ <p>
+ <strong>Effects of Kafka's log compaction:</strong> Another
way of thinking about KStream and KTable is as follows: If you were to store a
KTable into a Kafka topic, you'd probably want to enable Kafka's <a
href="http://kafka.apache.org/documentation.html#compaction">log compaction</a>
feature, e.g. to save storage space.
+ </p>
+
+ <p>
+ However, it would not be safe to enable log compaction in the
case of a KStream because, as soon as log compaction would begin purging older
data records of the same key, it would break the semantics of the data. To pick
up the illustration example again, you'd suddenly get a <code>3</code> for
<code>alice</code> instead of a <code>4</code> because log compaction would
have removed the <code>("alice", 1)</code> data record. Hence log
compaction is perfectly safe [...]
+ </p>
+
+ <p>
+ We have already seen an example of a changelog stream in the
section <strong>streams_concepts_duality</strong>. Another example are change
data capture (CDC) records in the changelog of a relational database,
representing which row in a database table was inserted, updated, or deleted.
+ </p>
+
+ <p>
+ KTable also provides an ability to look up <em>current</em>
values of data records by keys. This table-lookup functionality is available
through <strong>join operations</strong> (see also <strong>Joining</strong> in
the Developer Guide) as well as through <strong>Interactive Queries</strong>.
+ </p>
+
+ <h4><a id="streams_concepts_globalktable"
href="#streams_concepts_globalktable">GlobalKTable</a></h4>
+
+ <p>Only the <strong>Kafka Streams DSL</strong> has the notion of a
<strong>GlobalKTable</strong>.</p>
+
+ <p>
+ Like a <strong>KTable</strong>, a
<strong>GlobalKTable</strong> is an abstraction of a <strong>changelog
stream</strong>, where each data record represents an update.
+ </p>
+
+ <p>
+ A GlobalKTable differs from a KTable in the data that they are
being populated with, i.e. which data from the underlying Kafka topic is being
read into the respective table. Slightly simplified, imagine you have an input
topic with 5 partitions. In your application, you want to read this topic into
a table. Also, you want to run your application across 5 application instances
for <strong>maximum parallelism</strong>.
+ </p>
+
+ <ul>
+ <li>
+ If you read the input topic into a
<strong>KTable</strong>, then the "local" KTable instance of each
application instance will be populated with data <strong>from only 1
partition</strong> of the topic's 5 partitions.
+ </li>
+
+ <li>
+ If you read the input topic into a
<strong>GlobalKTable</strong>, then the local GlobalKTable instance of each
application instance will be populated with data <strong>from all partitions of
the topic</strong>.
+ </li>
+ </ul>
+
+ <p>
+ GlobalKTable provides the ability to look up <em>current</em>
values of data records by keys. This table-lookup functionality is available
through <code class="interpreted-text">join operations</code>.
+ </p>
+ <p>Benefits of global tables:</p>
+
+ <ul>
+ <li>
+ More convenient and/or efficient <strong>joins</strong>:
Notably, global tables allow you to perform star joins, they support
"foreign-key" lookups (i.e., you can lookup data in the table not
just by record key, but also by data in the record values), and they are more
efficient when chaining multiple joins. Also, when joining against a global
table, the input data does not need to be <strong>co-partitioned</strong>.
+ </li>
+ <li>
+ Can be used to "broadcast" information to all
the running instances of your application.
+ </li>
+ </ul>
+
+ <p>Downsides of global tables:</p>
+ <ul>
+ <li>Increased local storage consumption compared to the
(partitioned) KTable because the entire topic is tracked.</li>
+ <li>Increased network and Kafka broker load compared to the
(partitioned) KTable because the entire topic is read.</li>
+ </ul>
+
+ </div>
<div class="section" id="creating-source-streams-from-kafka">
<span id="streams-developer-guide-dsl-sources"></span><h2><a
class="toc-backref" href="#id8">Creating source streams from Kafka</a><a
class="headerlink" href="#creating-source-streams-from-kafka" title="Permalink
to this headline"></a></h2>
<p>You can easily read data from Kafka topics into your
application. The following operations are supported.</p>
@@ -123,8 +230,8 @@
<li><em>input topics</em> → KStream</li>
</ul>
</td>
- <td><p class="first">Creates a <a class="reference
internal" href="../concepts.html#streams-concepts-kstream"><span class="std
std-ref">KStream</span></a> from the specified Kafka input topics and
interprets the data
- as a <a class="reference internal"
href="../concepts.html#streams-concepts-kstream"><span class="std
std-ref">record stream</span></a>.
+ <td><p class="first">Creates a <a class="reference
internal" href="#streams_concepts_kstream"><span class="std
std-ref">KStream</span></a> from the specified Kafka input topics and
interprets the data
+ as a <a class="reference internal"
href="#streams_concepts_kstream"><span class="std std-ref">record
stream</span></a>.
A <code class="docutils literal"><span
class="pre">KStream</span></code> represents a <em>partitioned</em> record
stream.
<a class="reference external"
href="../../../javadoc/org/apache/kafka/streams/StreamsBuilder.html#stream(java.lang.String)">(details)</a></p>
<p>In the case of a KStream, the local KStream
instance of every application instance will
@@ -157,7 +264,7 @@
<li><em>input topic</em> → KTable</li>
</ul>
</td>
- <td><p class="first">Reads the specified Kafka input topic
into a <a class="reference internal"
href="../concepts.html#streams-concepts-ktable"><span class="std
std-ref">KTable</span></a>. The topic is
+ <td><p class="first">Reads the specified Kafka input topic
into a <a class="reference internal" href="#streams_concepts_ktable"><span
class="std std-ref">KTable</span></a>. The topic is
interpreted as a changelog stream, where records with
the same key are interpreted as UPSERT aka INSERT/UPDATE
(when the record value is not <code class="docutils
literal"><span class="pre">null</span></code>) or as DELETE (when the value is
<code class="docutils literal"><span class="pre">null</span></code>) for that
key.
<a class="reference external"
href="../../../javadoc/org/apache/kafka/streams/StreamsBuilder.html#table-java.lang.String(java.lang.String)">(details)</a></p>
@@ -182,7 +289,7 @@
<li><em>input topic</em> → GlobalKTable</li>
</ul>
</td>
- <td><p class="first">Reads the specified Kafka input topic
into a <a class="reference internal"
href="../concepts.html#streams-concepts-globalktable"><span class="std
std-ref">GlobalKTable</span></a>. The topic is
+ <td><p class="first">Reads the specified Kafka input topic
into a <a class="reference internal"
href="#streams_concepts_globalktable"><span class="std
std-ref">GlobalKTable</span></a>. The topic is
interpreted as a changelog stream, where records with
the same key are interpreted as UPSERT aka INSERT/UPDATE
(when the record value is not <code class="docutils
literal"><span class="pre">null</span></code>) or as DELETE (when the value is
<code class="docutils literal"><span class="pre">null</span></code>) for that
key.
<a class="reference external"
href="../../../javadoc/org/apache/kafka/streams/StreamsBuilder.html#globalTable-java.lang.String(java.lang.String)">(details)</a></p>
@@ -225,7 +332,7 @@
<p>Some KStream transformations may generate one or more KStream
objects, for example:
- <code class="docutils literal"><span
class="pre">filter</span></code> and <code class="docutils literal"><span
class="pre">map</span></code> on a KStream will generate another KStream
- <code class="docutils literal"><span
class="pre">branch</span></code> on KStream can generate multiple KStreams</p>
- <p>Some others may generate a KTable object, for example an
aggregation of a KStream also yields a KTable. This allows Kafka Streams to
continuously update the computed value upon arrivals of <a class="reference
internal" href="../concepts.html#streams-concepts-aggregations"><span
class="std std-ref">late records</span></a> after it
+ <p>Some others may generate a KTable object, for example an
aggregation of a KStream also yields a KTable. This allows Kafka Streams to
continuously update the computed value upon arrivals of <a class="reference
internal" href="../core-concepts.html#streams_concepts_aggregations"><span
class="std std-ref">late records</span></a> after it
has already been produced to the downstream transformation
operators.</p>
<p>All KTable transformation operations can only generate another
KTable. However, the Kafka Streams DSL does provide a special function
that converts a KTable representation into a KStream. All of
these transformation methods can be chained together to compose
@@ -1461,7 +1568,7 @@
prices, inventory, customer information) in order to
enrich a new data record (e.g. customer transaction) with context
information. That is, scenarios where you need to
perform table lookups at very large scale and with a low processing
latency. Here, a popular pattern is to make the
information in the databases available in Kafka through so-called
- <em>change data capture</em> in combination with <a
class="reference internal" href="../../connect/index.html#kafka-connect"><span
class="std std-ref">Kafka’s Connect API</span></a>, and then implementing
+ <em>change data capture</em> in combination with <a
class="reference internal" href="../../#connect"><span class="std
std-ref">Kafka’s Connect API</span></a>, and then implementing
applications that leverage the Streams API to perform
very fast and efficient local joins
of such tables and streams, rather than requiring the
application to make a query to a remote database over the network
for each record. In this example, the KTable concept
in Kafka Streams would enable you to track the latest state
@@ -1522,7 +1629,7 @@
<strong>It is the responsibility of the user to
ensure data co-partitioning when joining</strong>.</p>
<div class="admonition tip">
<p><b>Tip</b></p>
- <p class="last">If possible, consider using <a
class="reference internal"
href="../concepts.html#streams-concepts-globalktable"><span class="std
std-ref">global tables</span></a> (<code class="docutils literal"><span
class="pre">GlobalKTable</span></code>) for joining because they do not require
data co-partitioning.</p>
+ <p class="last">If possible, consider using <a
class="reference internal" href="#streams_concepts_globalktable"><span
class="std std-ref">global tables</span></a> (<code class="docutils
literal"><span class="pre">GlobalKTable</span></code>) for joining because they
do not require data co-partitioning.</p>
</div>
<p>The requirements for data co-partitioning are:</p>
<ul class="simple">
@@ -1530,7 +1637,7 @@
<li>All applications that <em>write</em> to the
input topics must have the <strong>same partitioning strategy</strong> so that
records with
the same key are delivered to same partition
number. In other words, the keyspace of the input data must be
distributed across partitions in the same
manner.
- This means that, for example, applications
that use Kafka’s <a class="reference internal"
href="../../clients/index.html#kafka-clients"><span class="std std-ref">Java
Producer API</span></a> must use the
+ This means that, for example, applications
that use Kafka’s <a class="reference internal"
href="../../#producerapi"><span class="std std-ref">Java Producer
API</span></a> must use the
same partitioner (cf. the producer setting
<code class="docutils literal"><span
class="pre">"partitioner.class"</span></code> aka <code
class="docutils literal"><span
class="pre">ProducerConfig.PARTITIONER_CLASS_CONFIG</span></code>),
and applications that use the Kafka’s
Streams API must use the same <code class="docutils literal"><span
class="pre">StreamPartitioner</span></code> for operations such as
<code class="docutils literal"><span
class="pre">KStream#to()</span></code>. The good news is that, if you happen
to use the default partitioner-related settings across all
@@ -1934,7 +2041,7 @@
<span
id="streams-developer-guide-dsl-joins-ktable-ktable"></span><h5><a
class="toc-backref" href="#id16">KTable-KTable Join</a><a class="headerlink"
href="#ktable-ktable-join" title="Permalink to this headline"></a></h5>
<p>KTable-KTable joins are always
<em>non-windowed</em> joins. They are designed to be consistent with their
counterparts in
relational databases. The changelog streams of
both KTables are materialized into local state stores to represent the
- latest snapshot of their <a class="reference
internal" href="../concepts.html#streams-concepts-ktable"><span class="std
std-ref">table duals</span></a>.
+ latest snapshot of their <a class="reference
internal" href="#streams_concepts_ktable"><span class="std std-ref">table
duals</span></a>.
The join result is a new KTable that represents
the changelog stream of the join operation.</p>
<p>Join output records are effectively created as
follows, leveraging the user-supplied <code class="docutils literal"><span
class="pre">ValueJoiner</span></code>:</p>
<div class="highlight-java"><div
class="highlight"><pre><span></span><span class="n">KeyValue</span><span
class="o"><</span><span class="n">K</span><span class="o">,</span> <span
class="n">LV</span><span class="o">></span> <span
class="n">leftRecord</span> <span class="o">=</span> <span class="o">...;</span>
@@ -2499,13 +2606,13 @@
<div class="section" id="kstream-globalktable-join">
<span
id="streams-developer-guide-dsl-joins-kstream-globalktable"></span><h5><a
class="toc-backref" href="#id18">KStream-GlobalKTable Join</a><a
class="headerlink" href="#kstream-globalktable-join" title="Permalink to this
headline"></a></h5>
<p>KStream-GlobalKTable joins are always
<em>non-windowed</em> joins. They allow you to perform <em>table lookups</em>
against a
- <a class="reference internal"
href="../concepts.html#streams-concepts-globalktable"><span class="std
std-ref">GlobalKTable</span></a> (entire changelog stream) upon receiving a new
record from the
+ <a class="reference internal"
href="#streams_concepts_globalktable"><span class="std
std-ref">GlobalKTable</span></a> (entire changelog stream) upon receiving a new
record from the
KStream (record stream). An example use case
would be “star queries” or “star joins”, where you
would enrich a stream
of user activities (KStream) with the latest user
profile information (GlobalKTable) and further context information
(further GlobalKTables).</p>
<p>At a high-level, KStream-GlobalKTable joins are
very similar to
<a class="reference internal"
href="#streams-developer-guide-dsl-joins-kstream-ktable"><span class="std
std-ref">KStream-KTable joins</span></a>. However, global tables provide you
- with much more flexibility at the <a
class="reference internal"
href="../concepts.html#streams-concepts-globalktable"><span class="std
std-ref">some expense</span></a> when compared to partitioned
+ with much more flexibility at the <a
class="reference internal" href="#streams_concepts_globalktable"><span
class="std std-ref">some expense</span></a> when compared to partitioned
tables:</p>
<ul class="simple">
<li>They do not require <a class="reference
internal" href="#streams-developer-guide-dsl-joins-co-partitioning"><span
class="std std-ref">data co-partitioning</span></a>.</li>
@@ -2671,7 +2778,7 @@
defined window boundary. In aggregating operations, a
windowing state store is used to store the latest aggregation
results per window.
Old records in the state store are purged after the
specified
- <a class="reference internal"
href="../concepts.html#streams-concepts-windowing"><span class="std
std-ref">window retention period</span></a>.
+ <a class="reference internal"
href="../core-concepts.html#streams_concepts_windowing"><span class="std
std-ref">window retention period</span></a>.
Kafka Streams guarantees to keep a window for at least
this specified time; the default value is one day and can be
changed via <code class="docutils literal"><span
class="pre">Windows#until()</span></code> and <code class="docutils
literal"><span class="pre">SessionWindows#until()</span></code>.</p>
<p>The DSL supports the following types of windows:</p>
@@ -2851,7 +2958,7 @@ t=5 (blue), which lead to a merge of sessions and an
extension of a session, res
<li><strong>Combining ease-of-use with full flexibility
where it’s needed:</strong> Even though you generally prefer to use
the expressiveness of the DSL, there are certain steps
in your processing that require more flexibility and
tinkering than the DSL provides. For example, only
the Processor API provides access to a
- <a class="reference internal"
href="../faq.html#streams-faq-processing-record-metadata"><span class="std
std-ref">record’s metadata</span></a> such as its topic, partition, and
offset information.
+ record’s metadata such as its topic, partition,
and offset information.
However, you don’t want to switch completely to
the Processor API just because of that.</li>
<li><strong>Migrating from other tools:</strong> You are
migrating from other stream processing technologies that provide an
imperative API, and migrating some of your legacy code
to the Processor API was faster and/or easier than to
@@ -2877,7 +2984,7 @@ t=5 (blue), which lead to a merge of sessions and an
extension of a session, res
<code class="docutils literal"><span
class="pre">process()</span></code> allows you to leverage the <a
class="reference internal"
href="processor-api.html#streams-developer-guide-processor-api"><span
class="std std-ref">Processor API</span></a> from the DSL.
(<a class="reference external"
href="../../../javadoc/org/apache/kafka/streams/kstream/KStream.html#process-org.apache.kafka.streams.processor.ProcessorSupplier-java.lang.String...-">details</a>)</p>
<p>This is essentially equivalent to adding the
<code class="docutils literal"><span class="pre">Processor</span></code> via
<code class="docutils literal"><span
class="pre">Topology#addProcessor()</span></code> to your
- <a class="reference internal"
href="../concepts.html#streams-concepts-processor-topology"><span class="std
std-ref">processor topology</span></a>.</p>
+ <a class="reference internal"
href="../core-concepts.html#streams_topology"><span class="std
std-ref">processor topology</span></a>.</p>
<p class="last">An example is available in the
<a class="reference external"
href="../../../javadoc/org/apache/kafka/streams/kstream/KStream.html#process-org.apache.kafka.streams.processor.ProcessorSupplier-java.lang.String...-">javadocs</a>.</p>
</td>
@@ -2897,7 +3004,7 @@ t=5 (blue), which lead to a merge of sessions and an
extension of a session, res
Applying a grouping or a join after <code
class="docutils literal"><span class="pre">transform</span></code> will result
in re-partitioning of the records.
If possible use <code class="docutils
literal"><span class="pre">transformValues</span></code> instead, which will
not cause data re-partitioning.</p>
<p><code class="docutils literal"><span
class="pre">transform</span></code> is essentially equivalent to adding the
<code class="docutils literal"><span class="pre">Transformer</span></code> via
<code class="docutils literal"><span
class="pre">Topology#addProcessor()</span></code> to your
- <a class="reference internal"
href="../concepts.html#streams-concepts-processor-topology"><span class="std
std-ref">processor topology</span></a>.</p>
+ <a class="reference internal"
href="../core-concepts.html#streams_topology"><span class="std
std-ref">processor topology</span></a>.</p>
<p class="last">An example is available in the
<a class="reference external"
href="../../../javadoc/org/apache/kafka/streams/kstream/KStream.html#transform-org.apache.kafka.streams.kstream.TransformerSupplier-java.lang.String...-">javadocs</a>.
</p>
@@ -2916,7 +3023,7 @@ t=5 (blue), which lead to a merge of sessions and an
extension of a session, res
The <code class="docutils literal"><span
class="pre">ValueTransformer</span></code> may return <code class="docutils
literal"><span class="pre">null</span></code> as the new value for a record.</p>
<p><code class="docutils literal"><span
class="pre">transformValues</span></code> is preferable to <code
class="docutils literal"><span class="pre">transform</span></code> because it
will not cause data re-partitioning.</p>
<p><code class="docutils literal"><span
class="pre">transformValues</span></code> is essentially equivalent to adding
the <code class="docutils literal"><span
class="pre">ValueTransformer</span></code> via <code class="docutils
literal"><span class="pre">Topology#addProcessor()</span></code> to your
- <a class="reference internal"
href="../concepts.html#streams-concepts-processor-topology"><span class="std
std-ref">processor topology</span></a>.</p>
+ <a class="reference internal"
href="../core-concepts.html#streams_topology"><span class="std
std-ref">processor topology</span></a>.</p>
<p class="last">An example is available in the
<a class="reference external"
href="../../../javadoc/org/apache/kafka/streams/kstream/KStream.html#transformValues-org.apache.kafka.streams.kstream.ValueTransformerSupplier-java.lang.String...-">javadocs</a>.</p>
</td>
diff --git a/docs/streams/developer-guide/processor-api.html
b/docs/streams/developer-guide/processor-api.html
index cb45cd9..22630b9 100644
--- a/docs/streams/developer-guide/processor-api.html
+++ b/docs/streams/developer-guide/processor-api.html
@@ -70,7 +70,7 @@
</div>
<div class="section" id="defining-a-stream-processor">
<span id="streams-developer-guide-stream-processor"></span><h2><a
class="toc-backref" href="#id2">Defining a Stream Processor</a><a
class="headerlink" href="#defining-a-stream-processor" title="Permalink to this
headline"></a></h2>
- <p>A <a class="reference internal"
href="../concepts.html#streams-concepts"><span class="std std-ref">stream
processor</span></a> is a node in the processor topology that represents a
single processing step.
+ <p>A <a class="reference internal"
href="../core-concepts.html#streams_processor_node"><span class="std
std-ref">stream processor</span></a> is a node in the processor topology that
represents a single processing step.
With the Processor API, you can define arbitrary stream
processors that processes one received record at a time, and connect
these processors with their associated state stores to compose
the processor topology.</p>
<p>You can define a customized stream processor by implementing
the <code class="docutils literal"><span class="pre">Processor</span></code>
interface, which provides the <code class="docutils literal"><span
class="pre">process()</span></code> API method.
@@ -91,7 +91,7 @@
Note, that <code class="docutils literal"><span
class="pre">#forward()</span></code> also allows to change the default behavior
by passing a custom timestamp for the output record.</p>
<p>Specifically, <code class="docutils literal"><span
class="pre">ProcessorContext#schedule()</span></code> accepts a user <code
class="docutils literal"><span class="pre">Punctuator</span></code> callback
interface, which triggers its <code class="docutils literal"><span
class="pre">punctuate()</span></code>
API method periodically based on the <code class="docutils
literal"><span class="pre">PunctuationType</span></code>. The <code
class="docutils literal"><span class="pre">PunctuationType</span></code>
determines what notion of time is used
- for the punctuation scheduling: either <a class="reference
internal" href="../concepts.html#streams-concepts-time"><span class="std
std-ref">stream-time</span></a> or wall-clock-time (by default, stream-time
+ for the punctuation scheduling: either <a class="reference
internal" href="../core-concepts.html#streams_time"><span class="std
std-ref">stream-time</span></a> or wall-clock-time (by default, stream-time
is configured to represent event-time via <code
class="docutils literal"><span class="pre">TimestampExtractor</span></code>).
When stream-time is used, <code class="docutils literal"><span
class="pre">punctuate()</span></code> is triggered purely
by data because stream-time is determined (and advanced
forward) by the timestamps derived from the input data. When there
is no new input data arriving, stream-time is not advanced and
thus <code class="docutils literal"><span class="pre">punctuate()</span></code>
is not called.</p>
diff --git a/docs/streams/upgrade-guide.html b/docs/streams/upgrade-guide.html
index 34f66ce..5ec4103 100644
--- a/docs/streams/upgrade-guide.html
+++ b/docs/streams/upgrade-guide.html
@@ -130,7 +130,7 @@
put operation metrics would now be
<code>kafka.streams:type=stream-rocksdb-window-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),rocksdb-window-state-id=([-.\w]+)</code>.
Users need to update their metrics collecting and reporting systems
for their time and session windowed stores accordingly.
- For more details, please read the <a
href="/{{version}}/documentation/ops.html#kafka_streams_store_monitoring">State
Store Metrics</a> section.
+ For more details, please read the <a
href="/{{version}}/documentation/#kafka_streams_store_monitoring">State Store
Metrics</a> section.
</p>
<p>