[GitHub] [pinot] sajjad-moradi commented on issue #10147: Pauseless Consumption

via GitHub Thu, 26 Jan 2023 16:54:12 -0800


sajjad-moradi commented on issue #10147:
URL: https://github.com/apache/pinot/issues/10147#issuecomment-1405873383


   > 1. `Server Query Executor sends the incoming query to the main segment as 
well as the temporary segment if there is one` - Could you elaborate on this? 
How will we send the query to temporary segment, if there's no record of it in 
ideal state/external view, given the EV is what drives all the routing logic? 
I'm wondering if that would end up looking like a lot of if-else in the query 
execution at consuming segment level, as we won't have any info about this 
segment in the planner/executor etc.
   
   I was thinking of using a `PauselessConsumptionManager` that knows which 
consuming segments have kicked off their corresponding temporary segments. 
`Server Query Executor` simply passed in the list of `segmentsToQuery` to the 
pauseless consumption manager and that returns the list of temporary segments 
to be added to segmentsToQuery list. So there's not a lot of `if-else`es.
   
   That being said, I agree that Jackie's suggestion is 1) more explicit and 2) 
it also has the benefit that if the next consuming segment is going to be 
assigned to a different server, the data in temporary segment of the current 
server doesn't need to be thrown away.
   
   If we want to take that route, we need to think of how to address the 
followings:
   1. final number of rows for the new segment
   2. segment commit failure
   
   For the first one, we can start the next segment (les't call it S2) with the 
same size of the existing one (S1). When S1 commits successfully, we can 
calculate the proper size for S2 and if the new size is within a threshold 
(something like 5%) of the size of S1, we can assume S2 size is good enough so 
we leave it as is. If the new size is not within the 5% threshold, then we send 
a helix transition message from Controller to the servers hosting S2 to create 
a new mutable segment with the new size and copy over the data.
   
   For the second problem, I think we need to change the commit protocol a bit, 
because currently we assume if there's commit failure, segments which have been 
on hold will simply start over. With pauseless consumption, since the new 
segment (S2) already has started consumption with a specific start offset, we 
need to make sure the commit protocol for S1 ends with the same offset.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] sajjad-moradi commented on issue #10147: Pauseless Consumption

Reply via email to