walterddr opened a new issue, #10657:
URL: https://github.com/apache/pinot/issues/10657
Background
===
Currently we have multiple abstractions reused with different components in
planner and runtime. it causes several problems
- when trying to add partition-based routing and planning it is super complex
- information only required in plan time and dispatch time get leaked to
runtime which is not useful, but somehow usage is mixed and hard to change
- mailbox uses information way more than necessary and makes it hard to
identify b/c the mailboxIdentifier equal requires all those to be identical.
- ... many other issues
Proposed changes
===
Several abstract is being introduced and will replace the current abstract
1. Step 1a: replace `VirtualServer`
`VirtualServer` is now a `ServerInstance + VirtualID`, it will be replaced
with
`Worker` which is indicating parallelism of work. It:
(1) is globally indexed per stage;
(2) mapped to a single `ServerInstance` stored in `StageMetadata`,
(3) contains partition or segment info which will be put into a new
abstract called: `WorkerMetadata`
with this `VirtualServer` is completely removed, and we decoupled
`ServerInstance` which is not useful in runtime from `VirtualID` or `workerID`
which is used in runtime.
- Step 1b: replace identifiers:
- `MailboxIdentifier` will use `workerID` which is globally indexed to
uniquely identify a stream as:
`reqID|sendingStageID|sendingWorkerID|receivingStageID|receivingStageWorkerID`
- `OpChainID` will use `WorkerID` as well `reqID|stageID|workerID`
- Step 2: support Hash-Partitioned data distribution
see:
https://docs.google.com/document/d/1CdvxmOOctk6kS5PdgCy7f5KVh5urw4YY0YZGbwuPJt4/edit#
- Step 3: support worker assignment based on data partition and
worker/parallelism
see:
https://docs.google.com/document/d/1SKDKV6LXr4uFFUsR3djz5BWWMqcSJIYEqJBoL1zeDD8/edit]
CC @Jackie-Jiang @xiangfu0 @ankitsultana @somandal @siddharthteotia
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]