xiangfu0 commented on PR #18256: URL: https://github.com/apache/pinot/pull/18256#issuecomment-4350331607
Reviewed latest head `eedaee89c6fe250cdf8a80f43668cb44bfe9c428`. The tenant checker wiring is fixed, but two cron config/lifecycle issues remain. ### Finding 1 - **Severity**: MAJOR - **Rule**: Configured cron tasks must either schedule successfully or fail startup - **Where**: `pinot-core/src/main/java/org/apache/pinot/core/periodictask/PeriodicTaskScheduler.java` - `start()` cron path - **Issue**: Quartz init failure is only logged, then the scheduler continues with `_quartzScheduler` unset. Cron job scheduling/parsing failures are also caught and only logged after `periodicTask.start()` has already run. In both cases the controller continues startup with the configured cron task not scheduled and no fixed-delay fallback. - **Risk**: A controller can start with retention, validation, rebalance, relocation, or cleanup tasks disabled simply because a cron expression or Quartz setup failed. This is a controller lifecycle/config-contract issue; blast radius is controller periodic maintenance tasks, not `pinot-spi` or wire protocol. - **Suggested fix**: Validate every non-empty cron expression and create/start Quartz before starting tasks. If Quartz init or any cron scheduling fails, throw from `start()` or reject the config earlier, and clean up the executor/started tasks. Add a regression test for invalid cron expression proving startup fails and the task is not left started. ### Finding 2 - **Severity**: MAJOR - **Rule**: New config keys must be wired or removed - **Where**: `pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java` - `OFFLINE_SEGMENT_INTERVAL_CHECKER_CRON_EXPRESSION`; `OfflineSegmentValidationManager` constructor and `shouldRunSegmentValidation()` - **Issue**: `controller.offline.segment.interval.checker.cronExpression` still has no runtime effect. The manager uses `getOfflineSegmentValidationCronExpression()` for the whole periodic task, and the interval-checker path still only reads `getOfflineSegmentIntervalCheckerFrequencyInSeconds()` to decide when segment-level validation should run. - **Risk**: Operators can set the new interval-checker cron key and Pinot will silently ignore it. This exposes a config contract that does not control scheduling, and it is especially confusing because the similarly named `controller.offline.segment.validation.cronExpression` does work for the outer task cadence. - **Suggested fix**: Remove `OFFLINE_SEGMENT_INTERVAL_CHECKER_CRON_EXPRESSION` and its getter/setter from this PR unless the inner segment-level validation cadence is actually made cron-aware. If implementing it now, add a focused test that sets the interval-checker cron key and proves segment-level validation follows that schedule. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
