kfaraz opened a new pull request, #18402:
URL: https://github.com/apache/druid/pull/18402
### Summary
- Add batch indexing templates that live in the Druid catalog.
- These templates can be used to perform a variety of batch indexing
operations
such as compaction, export, batch ingestion, or any ETL in general.
- Templates allow easy reuse of logic with all the niceties of Druid catalog.
- Add batch indexing supervisor as a base supervisor for managing batch
tasks (analogous to `SeekableStreamSupervisor`)
- `CompactionSupervisor` now implements `BatchIndexingSupervisor` and is
powered by
`CompactionJobTemplates`.
- Using templates instead of the `DataSourceCompactionConfig` makes the
logic much more extensible.
- In the future, we can have `ScheduledBatchSupervisor` also implement
`BatchIndexingSupervisor`
### Major changes
- Add `BatchIndexingJob` which may contain exactly one of:
- a `Task` (as a `ClientTaskQuery`)
- OR an MSQ SQL (as a `ClientSqlQuery`)
- Add `BatchIndexingJobTemplate`
- `List<BatchIndexingJob> createJobs(InputSource source, OutputDestination
destination, JobParams params)`
- Add `BatchIndexingSupervisor<J, P>`
- `List<J> createJobs(P params)`
- Implement `BatchIndexingSupervisor` with `CompactionSupervisor`
- Extend `BatchIndexingJob` with `CompactionJob` to contain additional info
such as `CompactionCandidate` and `compactionInterval`
#### Supported template types
|Type|Class|Function|
|-----|-----|---------|
|`compactInline`|`InlineCompactionJobTemplate`|Simple template to define a
`segmentGranularity`, `partitionsSpec` etc. to use in a compaction job. Can be
used directly inside a cascading template or stored in the Druid catalog.|
|`compactCatalog`|`CatalogCompactionJobTemplate`|Template which references
to a delegate template stored in the Druid catalog. The referenced template can
currently only be an `InlineCompactionJobTemplate`.|
|`compactCascade`|`CascadingCompactionTemplate`|Template used by compaction
supervisor for period-based cascading compaction.|
#### Catalog changes
- Add a new schema `index_template`
- Register a new table definition `IndexingTemplateDefn` which can currently
contain only one property `payload`
- Technically, the indexing template does not fit the bill for a
`TableDefn`, but rather `ObjectDefn`
- But since a template is flexible enough, it can be thought of as a table
creator
- And owing to the flexible model of the catalog `TableSpec`, treating the
templates as a table definition seemed
like the cleanest approach
Please leave feedback on whether there is a better/cleaner alternative.
### Pending changes
- Add a basic MSQ template
- Handle some minor corner cases
- Fill timeline gaps due to misaligned granularities in cascading compaction
### Release note
TODO
<hr>
This PR has:
- [ ] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] a release note entry in the PR description.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [ ] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]