tibrewalpratik17 opened a new issue, #13284: URL: https://github.com/apache/pinot/issues/13284
### Current scenario All routing strategies are instance-based strategy. In one of recent incidents in our cluster, we saw a lot of query failures because 2 instances were marked as unavailable (unqueryable) and they were serving as 2 common replicas for around 3000 segments. During query, we marked these 3000 segments as unavailable. Why these 2 instances were marked as unqueryable was because 2 different segments were OFFLINE in these 2 instances. Example: There were serverA, serverB and there were 3000 segments which were having both replicas in these 2 servers. Now segmentX was OFFLINE in serverA and segmentY was OFFLINE in serverB. We are using `strictReplicaGroup` strategy and so all 3000 segments became unavailable as serverA, serverB were not treated as eligible serving candidates anymore. ### Possible prevention Both segmentX and segmentY were from different partitions. If we would have accounted serverA to not serve queries only for partitions as that of segmentX and same serverB to not serve queries only for partitions of segmentY, we could have prevented this incident. Note: if each server has only one partition then it makes sense to not allow it to serve queries completely. ### Proposal What we are proposing in this issue is to have a routing strategy like `instancePartitionReplicaGroup` (name can be discussed during implementation). Here, we will maintain a mapping of available instance-partitions and not just instances. If all segments of a partition is ONLINE in a given instance, then we enable it to serve query for that instance-partition. At present, we use Ideal-state to find segments --> instances mapping. In this scenario, we can even fallback to instanceToPartitions info in ZK to fetch this info. Note: this proposal only affects REALTIME tables and not OFFLINE tables. cc @ankitsultana -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org