Yes, the high level idea works. We will need to iron out the details a bit. Few examples:
1. For refresh use cases with large data size though, even the servers may non-trivial amount of time to download segments. We need to wait until all new version is ONLINE in external view before making the version switch. This implies that local storage on servers need to be doubled. 2. There will also be details to take care of such as when one of the segments does not come ONLINE in the new version due to an error, how long would we wait before incrementing the version. 3. Handling various failure scenarios such as rest-api call failing/timeouts etc. Will start a doc for the same. Thanks, Mayank ________________________________ From: kishore g <g.kish...@gmail.com> Sent: Monday, April 6, 2020 8:15 PM To: dev@pinot.apache.org <dev@pinot.apache.org> Subject: Re: Issue: Data-inconsistency during segment push One suggestion - high level Put the time boundary in ZK (may be in /propertystore/table/routingInfo). This can be updated via one of the following - The upload job can set this via controller API after everything is pushed - Controller can do this periodically - Anything else Brokers watch for this node and use the entry in routingInfo - to come up with time boundary for APPEND use case - use the version number for refresh use case to change to a newer version. Append and Refresh use cases use different routing table provider implementations. On Thu, Mar 5, 2020 at 6:20 AM Mayank Shrivastava <maya...@apache.org> wrote: > Today, we have an issue in Pinot, where data is in an inconsistent state > during segment push, and the query results may be incorrect. This issue > becomes more critical for enterprise applications to maintain customer > trust, more so in case of REFRESH use cases with large data size, causing > the period of inconsistency can be quite large. There are various flavors > of this problem: > > 1. In APPEND use cases, the time-boundary is updated as soon as the first > segment from the periodic push arrives. This causes queries to hit the > offline table for period which does not have complete data in the offline > table. > > 2. For REFRESH use cases, there is no requirement for segments to be > partitioned, so data can be in an entirely inconsistent state during the > push time. > > 3. We are seeing enterprise applications that create different > denormalizations from source data(s) creating multiple tables in Pinot. In > these cases, the same application queries multiple tables for their > product. And there's increasing asks to ensure some sort of inter-table > consistencies (provided client side takes care of synchronized data pushes > to these tables). > > For 1 and 2. there are several potential bottlenecks that may increase the > push time, including Pinot controller, deep-store and network b/w. For our > cases, it seems that the biggest bottleneck is the network b/w between > controller and compute farm that creates the segment. > > Next steps: Exchange ideas, and create proposals for the problems above. > > Cheers, > Mayank >