tibrewalpratik17 opened a new issue, #13284:
URL: https://github.com/apache/pinot/issues/13284

   ### Current scenario
   
   All routing strategies are instance-based strategy. 
   In one of recent incidents in our cluster, we saw a lot of query failures 
because 2 instances were marked as unavailable (unqueryable) and they were 
serving as 2 common replicas for around 3000 segments. During query, we marked 
these 3000 segments as unavailable. Why these 2 instances were marked as 
unqueryable was because 2 different segments were OFFLINE in these 2 instances.
   
   Example:
   
   There were serverA, serverB and there were 3000 segments which were having 
both replicas in these 2 servers.
   Now segmentX was OFFLINE in serverA and segmentY was OFFLINE in serverB. We 
are using `strictReplicaGroup` strategy and so all 3000 segments became 
unavailable as serverA, serverB were not treated as eligible serving candidates 
anymore.
   
   ### Possible prevention
   
   Both segmentX and segmentY were from different partitions. If we would have 
accounted serverA to not serve queries only for partitions as that of segmentX 
and same serverB to not serve queries only for partitions of segmentY, we could 
have prevented this incident.
   
   Note: if each server has only one partition then it makes sense to not allow 
it to serve queries completely.
   
   ### Proposal
   
   What we are proposing in this issue is to have a routing strategy like 
`instancePartitionReplicaGroup` (name can be discussed during implementation). 
Here, we will maintain a mapping of available instance-partitions and not just 
instances. If all segments of a partition is ONLINE in a given instance, then 
we enable it to serve query for that instance-partition.
   
   At present, we use Ideal-state to find segments --> instances mapping. In 
this scenario, we can even fallback to instanceToPartitions info in ZK to fetch 
this info.
   
   Note: this proposal only affects REALTIME tables and not OFFLINE tables.
   
   cc @ankitsultana 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to