pengding-stripe opened a new issue, #17657: URL: https://github.com/apache/pinot/issues/17657
**Problem** After upgrading to Pinot 1.5 ([commit](https://github.com/apache/pinot/commit/d0a6aba30bdad4dc0179ecd189d42ff2913b71a9)), we observed large amount of timing out and server not responded query errors every few hours. ``` 2026-02-06 04:52:49.535 | ERROR [PinotClientRequest] [jersey-server-managed-async-executor-62279:62811] Query processing exceptions: {427=1 servers [pinotdbserver--0f2c3775715e106f0_O] not responded} ``` **Observations** 1. This seems like a broker error, we ran broker on 1.5 with servers on 1.4 and 1.5, same issue happened for both cases 2. The timing of the error matches the timeline of offline ingestion 3. The table is ~1-2 TB with ~1000 segments 4. Errors happened mostly on tables with high QPS and big segments 5. We saw large amount of segment refresh logs followed by the server not responded error and same segments are refreshed multiple times ``` 2026-02-06 04:52:49.231 | INFO [BaseBrokerRoutingManager] [HelixTaskExecutor-message_handle_thread_41:359] Refreshing segment: table_3702 for table: table 2026-02-06 04:52:49.232 | INFO [BaseBrokerRoutingManager] [HelixTaskExecutor-message_handle_thread_51:532] Refreshed segment: table_3702 for table: table ``` 6. There are big CPU spikes when the error happens, >90% 7. The segment loading time for segments with similar size is same in 1.5 and 1.4 (We suspect Pinot might rebuild v4 index and cause CPU usage overwhelm) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
