kfaraz opened a new pull request, #18402:
URL: https://github.com/apache/druid/pull/18402

   ### Summary
   
   - Add batch indexing templates that live in the Druid catalog.
   - These templates can be used to perform a variety of batch indexing 
operations
   such as compaction, export, batch ingestion, or any ETL in general.
   - Templates allow easy reuse of logic with all the niceties of Druid catalog.
   - Add batch indexing supervisor as a base supervisor for managing batch 
tasks (analogous to `SeekableStreamSupervisor`)
   - `CompactionSupervisor` now implements `BatchIndexingSupervisor` and is 
powered by
   `CompactionJobTemplates`.
   - Using templates instead of the `DataSourceCompactionConfig` makes the 
logic much more extensible.
   - In the future, we can have `ScheduledBatchSupervisor` also implement 
`BatchIndexingSupervisor`
   
   ### Major changes
   
   - Add `BatchIndexingJob` which may contain exactly one of:
      - a `Task` (as a `ClientTaskQuery`)
      - OR an MSQ SQL (as a `ClientSqlQuery`)
   - Add `BatchIndexingJobTemplate`
     - `List<BatchIndexingJob> createJobs(InputSource source, OutputDestination 
destination, JobParams params)`
   - Add `BatchIndexingSupervisor<J, P>`
     - `List<J> createJobs(P params)`  
   - Implement `BatchIndexingSupervisor` with `CompactionSupervisor`
   - Extend `BatchIndexingJob` with `CompactionJob` to contain additional info 
such as `CompactionCandidate` and `compactionInterval`
   
   #### Supported template types
   
   |Type|Class|Function|
   |-----|-----|---------|
   |`compactInline`|`InlineCompactionJobTemplate`|Simple template to define a 
`segmentGranularity`, `partitionsSpec` etc. to use in a compaction job. Can be 
used directly inside a cascading template or stored in the Druid catalog.|
   |`compactCatalog`|`CatalogCompactionJobTemplate`|Template which references 
to a delegate template stored in the Druid catalog. The referenced template can 
currently only be an `InlineCompactionJobTemplate`.|
   |`compactCascade`|`CascadingCompactionTemplate`|Template used by compaction 
supervisor for period-based cascading compaction.|
   
   #### Catalog changes
   
   - Add a new schema `index_template`
   - Register a new table definition `IndexingTemplateDefn` which can currently 
contain only one property `payload`
   - Technically, the indexing template does not fit the bill for a 
`TableDefn`, but rather `ObjectDefn`
   - But since a template is flexible enough, it can be thought of as a table 
creator
   - And owing to the flexible model of the catalog `TableSpec`, treating the 
templates as a table definition seemed
   like the cleanest approach
   
   Please leave feedback on whether there is a better/cleaner alternative.
   
   ### Pending changes
   
   - Add a basic MSQ template
   - Handle some minor corner cases
   - Fill timeline gaps due to misaligned granularities in cascading compaction
   
   ### Release note
   
   TODO
   
   <hr>
   
   This PR has:
   
   - [ ] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to