findingrish opened a new pull request, #14985: URL: https://github.com/apache/druid/pull/14985
## Description Original proposal https://github.com/abhishekagarwal87/druid/blob/metadata_design_proposal/design-proposal.md#3-proposed-solution-storing-segment-schema-in-metadata-store In the current design, brokers query both data nodes and tasks to fetch the schema of the segments they serve. The table schema is then constructed by combining the schemas of all segments within a dataSource. However, this approach leads to a high number of segment metadata queries during broker startup, resulting in slow startup times and various issues outlined in the design proposal. To address these challenges, we propose centralizing the table schema management process within the coordinator. This change is the first step in that direction. In the new arrangement, the coordinator will take on the responsibility of querying both data nodes and tasks to fetch segment schema and subsequently building the table schema. Brokers will now simply query the Coordinator to fetch table schema. Importantly, brokers will still retain the capability to build table schemas if the need arises, ensuring both flexibility and resilience. ## Design ### Coordinator changes These changes enhance the coordinator's capabilities, enabling it to manage table schema and provide schema related information through its APIs. **SegmentMetadataCache refactor**: We introduce a new class, `CoordinatorSegmentMetadataCache`, extending `AbstractSegmentMetadataCache`, which manages the cache and forms the core of the schema management process. **Changes to Coordinator timeline**: A new interface, `CoordinatorTimeline`, is introduced, and the existing coordinator timeline, `CoordinatorServerView`, now implements it. Additionally, we add a new implementation, `QueryableCoordinatorServerView`, which extends `BrokerServerView`. This facilitates running segment metadata queries on the coordinator using `CachingClusteredClient`. **API Changes**: - We introduce a new container class, `DataSourceInformation`, for handling `RowSignature` information. An API is exposed to retrieve dataSource information. - The existing api, responsible for returning used segments is modified to accept a new query parameter called `includeRealtimeSegments`. When set, this parameter will include realtime segments in the result. Additionally, we enhance the return object `SegmentStatusInCluster`, to provide additional information regarding `realtime` and `numRows`. **Binding for Query Modules**: We add essential bindings for required query modules in `CliCoordinator`. ### Broker changes Broker now defaults to querying the Coordinator for fetching table schema, eliminating the need for querying data nodes and tasks for segment schema. Additionally, there are changes in the logic for building sys segments table. **Broker-side SegmentMetadataCache**: We introduce a new implementation of `AbstractSegmentMetadataCache` called `BrokerSegmentMetadataCache`. This implementation queries the coordinator for table schema and falls back to running a segment metadata query when needed. It also assumes the responsibility for building `PhysicalDataSourceMetadata`, for which a new class is introduced. **System schema changes**: In `MetadataSegmentView`, requests for realtime segments are now included when polling the coordinator. The segment table building logic is updated accordingly. ### Upgrade considerations The general upgrade order should be followed. The new code is behind a feature flag, so it is compatible with existing setups. Brokers with the new changes can communicate with old Coordinators without issues. ### Release note This feature is experimental and addresses multiple challenges outlined in the description. To enable it, set `druid.coordinator.segmentMetadataCache.enabled` in Coordinator configurations. <hr> <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist below are strictly necessary, but it would be very helpful if you at least self-review the PR. --> This PR has: - [ ] been self-reviewed. - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.) - [ ] added documentation for new or modified features or behaviors. - [ ] a release note entry in the PR description. - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md) - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [ ] been tested in a test Druid cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
