sajjad-moradi opened a new issue, #10147:
URL: https://github.com/apache/pinot/issues/10147

   # Background
   Currently when commit protocol starts for a partition, the realtime server 
stops consumption and till the new consuming->online helix message for the next 
consuming segment arrives, messages are not read from the stream. Depending on 
the segment size, the period in which there's no consumption can take something 
from few seconds to a several minutes (and in some cases tens of minutes 
because of hitting concurrent segment build threshold on the servers). This 
leads to data freshness issues specially for high ingesting use cases.
   
   # Proposal
   A simple solution to prevent these consumption pauses is to spin off a 
temporary consuming segment as soon as the segment commit starts. This 
temporary segment is a supplement to the main segment that is committing. There 
won't be any changes in Ideal State and External View. 
   
   ## Query execution
   To make the ingested data in the temporary segment available for query 
execution, Server Query Executor sends the incoming query to the main segment 
as well as the temporary segment if there is one. Both segments execute the 
query and return the results. 
   
   ## Next Offline to Consuming Transition
   After the main segment is committed and the next consuming segment is 
created during handling of offline->consuming helix transition message, the 
data in the temporary segment can be used to bootstrap the new consuming 
segment. Once the mutable segment is created for the next consuming segment, 
and before the consume loop starts, the data in the temporary segment can be 
read and inserted into the new mutable segment. After the data is copied over, 
the temporary segment can be deleted, and the consume loop in the new consuming 
segment starts from the offset of the last copied message.
   
   ## Configurations
   - We can add a new boolean flag to the Stream Config properties to 
enable/disable pause-less consumption. We can use something like 
`pauseless.consumption.enabled`. 
   - For the size of the temporary segment, we can go with a default value of 
20% of the main segment size. This ratio can be configured as a stream config 
property, something like `pauseless.consumption.temporary.segment.size.ratio`.
   - We also need to set a timeout for temporary consumption in case something 
is wrong and the helix transition message for the next consuming segment 
doesn't arrive to the server. We can have a default 10min value and also a new 
property in stream config to override that, something like 
`pauseless.consumption.timeout`. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to