mcvsubbu commented on issue #8492:
URL: https://github.com/apache/pinot/issues/8492#issuecomment-1096995899
>
Yes, @jackjlli and I had a discussion yesterday, and reached the same
conclusion. Using some sort of an ID. The problem is that the threads taking up
any of these activities can be in any order. [Of course, we may choose not to
handle multiple simultaneous requests]. In this case, it is useful to know the
_highest_ sequence number of messages handled.
And then we need to consider restarts. If a server restarts, then we should
treat it as if all messages are handled. Then we may enter into race conditions
when a message may never be received (or discarded after receipt), etc. Not
very straight-forward.
A more complete solution will be to keep a mirror of the IDEALSTATE znode in
zk (or, if Helix 1.x supports additional data along with idealstate). For
example,
`"segment_0": { "host1": {"ONLINE", "105"}, "host2":{"ONLINE", 106"}}`
In this case, it indicates that host 1 processed message 105 whereas host2
processed 106.
Unless helix 1.x supports an idealstate extension natively, this can lead to
a new znode, something we have tried to avoid in the past.
I suggest that we NOT implement this API. Instead, we consider the use cases.
Most common cases:
- operators may issue a single reload, and can use the segment status API to
get the status
- operators almost never will try to get statuses and verify that they are
the same if no reload is issued. So, we either remove this from the test code,
or just put a sleep. But let us not code for this to work right.
- If operators indeed issue multiple reloads for the (segments of) same
table, the code will work fine as long as the bug that has been found is fixed
(the bug where we load the table config before grabbing the semaphore).
Just fix the bug, and document it well (that if users have 1000 servers for
a single table, and do a reload, the zookeeper server may be overloaded with
1000 requests coming in at the same time.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]