xiangfu0 commented on PR #18256:
URL: https://github.com/apache/pinot/pull/18256#issuecomment-4350331607

   Reviewed latest head `eedaee89c6fe250cdf8a80f43668cb44bfe9c428`. The tenant 
checker wiring is fixed, but two cron config/lifecycle issues remain.
   
   ### Finding 1
   - **Severity**: MAJOR
   - **Rule**: Configured cron tasks must either schedule successfully or fail 
startup
   - **Where**: 
`pinot-core/src/main/java/org/apache/pinot/core/periodictask/PeriodicTaskScheduler.java`
 - `start()` cron path
   - **Issue**: Quartz init failure is only logged, then the scheduler 
continues with `_quartzScheduler` unset. Cron job scheduling/parsing failures 
are also caught and only logged after `periodicTask.start()` has already run. 
In both cases the controller continues startup with the configured cron task 
not scheduled and no fixed-delay fallback.
   - **Risk**: A controller can start with retention, validation, rebalance, 
relocation, or cleanup tasks disabled simply because a cron expression or 
Quartz setup failed. This is a controller lifecycle/config-contract issue; 
blast radius is controller periodic maintenance tasks, not `pinot-spi` or wire 
protocol.
   - **Suggested fix**: Validate every non-empty cron expression and 
create/start Quartz before starting tasks. If Quartz init or any cron 
scheduling fails, throw from `start()` or reject the config earlier, and clean 
up the executor/started tasks. Add a regression test for invalid cron 
expression proving startup fails and the task is not left started.
   
   ### Finding 2
   - **Severity**: MAJOR
   - **Rule**: New config keys must be wired or removed
   - **Where**: 
`pinot-controller/src/main/java/org/apache/pinot/controller/ControllerConf.java`
 - `OFFLINE_SEGMENT_INTERVAL_CHECKER_CRON_EXPRESSION`; 
`OfflineSegmentValidationManager` constructor and `shouldRunSegmentValidation()`
   - **Issue**: `controller.offline.segment.interval.checker.cronExpression` 
still has no runtime effect. The manager uses 
`getOfflineSegmentValidationCronExpression()` for the whole periodic task, and 
the interval-checker path still only reads 
`getOfflineSegmentIntervalCheckerFrequencyInSeconds()` to decide when 
segment-level validation should run.
   - **Risk**: Operators can set the new interval-checker cron key and Pinot 
will silently ignore it. This exposes a config contract that does not control 
scheduling, and it is especially confusing because the similarly named 
`controller.offline.segment.validation.cronExpression` does work for the outer 
task cadence.
   - **Suggested fix**: Remove 
`OFFLINE_SEGMENT_INTERVAL_CHECKER_CRON_EXPRESSION` and its getter/setter from 
this PR unless the inner segment-level validation cadence is actually made 
cron-aware. If implementing it now, add a focused test that sets the 
interval-checker cron key and proves segment-level validation follows that 
schedule.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to