Re: [PR] KAFKA-17671: Create better documentation for transactions [kafka]

via GitHub Tue, 22 Oct 2024 10:56:44 -0700


AndrewJSchofield commented on code in PR #17454:
URL: https://github.com/apache/kafka/pull/17454#discussion_r1811157856



##########
docs/design.html:
##########
@@ -254,32 +254,32 @@ <h3 class="anchor-heading"><a id="semantics" 
class="anchor-link"></a><a href="#s
         <i>At least once</i>&mdash;Messages are never lost but may be 
redelivered.
     </li>
     <li>
-        <i>Exactly once</i>&mdash;this is what people actually want, each 
message is delivered once and only once.
+        <i>Exactly once</i>&mdash;Each message is delivered once and only once.
     </li>
     </ul>
 
     It's worth noting that this breaks down into two problems: the durability 
guarantees for publishing a message and the guarantees when consuming a message.
     <p>
-    Many systems claim to provide "exactly once" delivery semantics, but it is 
important to read the fine print, most of these claims are misleading (i.e. 
they don't translate to the case where consumers or producers
+    Many systems claim to provide "exactly-once" delivery semantics, but it is 
important to read the fine print, because sometimes these claims are misleading 
(i.e. they don't translate to the case where consumers or producers
     can fail, cases where there are multiple consumer processes, or cases 
where data written to disk can be lost).
     <p>
-    Kafka's semantics are straight-forward. When publishing a message we have 
a notion of the message being "committed" to the log. Once a published message 
is committed it will not be lost as long as one broker that
-    replicates the partition to which this message was written remains 
"alive". The definition of committed message, alive partition as well as a 
description of which types of failures we attempt to handle will be
+    Kafka's semantics are straightforward. When publishing a message we have a 
notion of the message being "committed" to the log. Once a published message is 
committed, it will not be lost as long as one broker that
+    replicates the partition to which this message was written remains 
"alive". The definition of committed message and alive partition as well as a 
description of which types of failures we attempt to handle will be
     described in more detail in the next section. For now let's assume a 
perfect, lossless broker and try to understand the guarantees to the producer 
and consumer. If a producer attempts to publish a message and
-    experiences a network error it cannot be sure if this error happened 
before or after the message was committed. This is similar to the semantics of 
inserting into a database table with an autogenerated key.
+    experiences a network error, it cannot be sure if this error happened 
before or after the message was committed. This is similar to the semantics of 
inserting into a database table with an autogenerated key.
     <p>
     Prior to 0.11.0.0, if a producer failed to receive a response indicating 
that a message was committed, it had little choice but to resend the message. 
This provides at-least-once delivery semantics since the
     message may be written to the log again during resending if the original 
request had in fact succeeded. Since 0.11.0.0, the Kafka producer also supports 
an idempotent delivery option which guarantees that resending
     will not result in duplicate entries in the log. To achieve this, the 
broker assigns each producer an ID and deduplicates messages using a sequence 
number that is sent by the producer along with every message.
-    Also beginning with 0.11.0.0, the producer supports the ability to send 
messages to multiple topic partitions using transaction-like semantics: i.e. 
either all messages are successfully written or none of them are.
+    Also beginning with 0.11.0.0, the producer supports the ability to send 
messages to multiple topic partitions using transactional semantics, so that 
either all messages are successfully written or none of them are.
     The main use case for this is exactly-once processing between Kafka topics 
(described below).
     <p>
-    Not all use cases require such strong guarantees. For uses which are 
latency sensitive we allow the producer to specify the durability level it 
desires. If the producer specifies that it wants to wait on the message
-    being committed this can take on the order of 10 ms. However the producer 
can also specify that it wants to perform the send completely asynchronously or 
that it wants to wait only until the leader (but not
+    Not all use cases require such strong guarantees. For uses which are 
latency-sensitive, we allow the producer to specify the durability level it 
desires. If the producer specifies that it wants to wait on the message
+    being committed, this can take on the order of 10 ms. However the producer 
can also specify that it wants to perform the send completely asynchronously or 
that it wants to wait only until the leader (but not
     necessarily the followers) have the message.

Review Comment:
   You are probably correct but you are reviewing the existing text, not the 
changes. It could do with a fair comprehensive re-write I think. I wasn't 
expecting to do that now. For example, it's hardly relevant nowadays what the 
0.11.0.0 release introduced :) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] KAFKA-17671: Create better documentation for transactions [kafka]

Reply via email to