bbejeck commented on issue #8504:
URL: https://github.com/apache/kafka/pull/8504#issuecomment-618447861
>Why that? Because such a topology would hit the bug, it could never be
deployed, and thus nobody can actually run such a topology? In fact, shouldn't
be "burn" and index even if a name is provided (IIRC, we do this for some
cases)?
Yes in some cases we increment the index when users provide names. But
right now we don't increment the counter at all when creating repartition
topics as we reuse the name as is. My main point is that if we started to
generate new names for repartition topics we'd create topology compatibility
issues as the newly generated name would bump the count for all downstream
nodes. Right now I'm leaning towards going with the solution you presented in
point one in https://github.com/apache/kafka/pull/8504#discussion_r413380852
>I agree thought, that merging repartition topics (as proposed in (1))
should be done if possible (it's a historic artifact that we did not merge them
in the past and IMHO we should not make the same mistake again?).
But by doing so we are "leaking" optimization logic as you pointed out
above. I'm leaning towards building the topology "as is", meaning create two
repartition topics if that's what is required. But I don't have a strong
opinion and I would be fine with keeping the current solution in this PR.
>For (2), it's a tricky question because the different names are used for
different stores and changelog topics (ie, main purpose?) -- it seems to be a
"nasty side effect" if we would end up with two repartition topics for this
case? Of course, given the new repartition() operator, a user can work around
it by using it after map() and before calling join(). Just brainstorming here
what the impact could be and what tradeoff we want to pick.
I'm not sure I follow here the "nasty side effect" comment.
If a user does `streamA.join(streamB, ..., StreamJoined.name("foo")` and
`streamA.join(streamC, ..., StreamJoined.name("bar")` then we should create
two repartiton topics as that's what the user is expecting. If they elect to
use optimization then removing redundant repartition topics is expected
behavior. I think this also goes back to your original comment about the
leaking of optimization details.
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org