[GitHub] [kafka] mjsax commented on a diff in pull request #13496: KAFKA-14834: [1/N] Add timestamped get to KTableValueGetter

via GitHub Tue, 11 Apr 2023 11:42:07 -0700


mjsax commented on code in PR #13496:
URL: https://github.com/apache/kafka/pull/13496#discussion_r1163195888



##########
streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableKTableInnerJoin.java:
##########
@@ -153,11 +154,37 @@ public void init(final ProcessorContext<?, ?> context) {
 
         @Override
         public ValueAndTimestamp<VOut> get(final K key) {
-            final ValueAndTimestamp<V1> valueAndTimestamp1 = 
valueGetter1.get(key);
+            return computeJoin(key, valueGetter1::get, valueGetter2::get);
+        }
+
+        @Override
+        public ValueAndTimestamp<VOut> get(final K key, final long 
asOfTimestamp) {

Review Comment:
   > The first option is nice in that now stream-(table-table) and 
(stream-table)-table joins with no intermediate materialization produce the 
same results,
   
   But are both joins really the same if the intermediate table-table result is 
not materialized? Semantically, the intermediate table-table result is a 
non-versioned store, and thus we cannot do a lookup into the history of it, ie, 
we have a stream-tsTable join. The second query is two `stream-vTable` joins so 
it seems ok if they produce different results?
   
   > but it's also confusing because stream-(table-table) produces different 
results if the user materializes the result of the table-table join as a 
versioned store (which is wrong).
   
   I don't see it as confusing (it might be very subtle to be fair...) -- the 
intermediate result of a non-materialized t-t-join is semantically a tsTable 
(or course, it does not get out-of-order updates, because the _join_ that 
computes it has two versioned tables as input and thus drop out-of-order 
updates) -- if the intermediate result is materialized as tsKV-store, semantics 
should not change. If one materialized it as vKV store though, it seem ok that 
semantics change, because the semantics of the intermediate result change from 
being non-versioned to versioned, and thus the join changed from 
`stream-tsTable` to `stream-vTable`.
   
   My point is, that for a table-table join, there are 4 entities: both input 
tables, the join operator, plus the result table. The two input table (v-table 
vs ts-table) determine what join operator we pick (ie, drop out-of-order 
updates yes/no), and the join produces an result that we know feed into the 
result table with it's own semantics (by default, ts-semantics, not 
v-semantics) -- Of course, depending on the used join semantics, we apply 
different updates to the table, but we don't change the table semantics itself.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [kafka] mjsax commented on a diff in pull request #13496: KAFKA-14834: [1/N] Add timestamped get to KTableValueGetter

Reply via email to