cryptoe commented on code in PR #13596: URL: https://github.com/apache/druid/pull/13596#discussion_r1051889300
########## docs/release_notes_25.md: ########## @@ -0,0 +1,664 @@ +Apache Druid 25.0.0 contains over 300 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 51 contributors. [See the complete set of changes for additional details](https://github.com/apache/druid/milestone/48). + +## <a name="25.0.0-highlights" href="#25.0.0-highlights">#</a> Highlights + +### <a name="25.0.0-highlights-multi-stage-query-" href="#25.0.0-highlights-multi-stage-query-">#</a> Multi-stage query + +The multi-stage query (MSQ) task engine used for SQL-based ingestion is now production ready. Use it for any supported workloads. For more information, see the following pages: + +- [Ingestion](https://druid.apache.org/docs/latest/ingestion/index.html) +- [SQL-based ingestion](https://druid.apache.org/docs/latest/multi-stage-query/index.html) + +### <a name="25.0.0-highlights-deploying-druid" href="#25.0.0-highlights-deploying-druid">#</a> Deploying Druid + +The new `start-druid` script greatly simplifies deploying any combination of Druid services on a single-server. It comes pre-packaged with the required configs and can be used to launch a fully functional Druid cluster simply by invoking `./start-druid`. For experienced Druids, it also gives complete control over the runtime properties and JVM arguments to have a cluster that exactly fits your needs. + +The `start-druid` script deprecates the existing profiles such as `start-micro-quickstart` and `start-nano-quickstart`. These profiles may be removed in future releases. For more information, see [Single server deployment](https://druid.apache.org/docs/latest/operations/single-server.html). + +### <a name="25.0.0-highlights-string-dictionary-compression-%28experimental%29" href="#25.0.0-highlights-string-dictionary-compression-%28experimental%29">#</a> String dictionary compression (experimental) + +Added support for front coded string dictionaries for smaller string columns, leading to reduced segment sizes with only minor performance penalties for most Druid queries. + +This functionality can be utilized by a new property to `IndexSpec.stringDictionaryEncoding`, which can be set to `{"type":"frontCoded", "bucketSize": 4}`, `{"type":"frontCoded", "bucketSize": 16}`, or any power of 2 that is 128 or lower. This property instructs indexing tasks to write segments with the compressed dictionaries with the specific bucket size specified. (`{"type":"utf8"}` is the default). + +> Any segment written using string dictionary compression is not readable by older versions of Druid. + +For more information, see [Front coding](https://druid.apache.org/docs/latest/ingestion/ingestion-spec.html#front-coding). + +https://github.com/apache/druid/pull/12277 + +### <a name="25.0.0-highlights-kubernetes-native-tasks" href="#25.0.0-highlights-kubernetes-native-tasks">#</a> Kubernetes native tasks + +Druid can now use Kubernetes to launch and manage tasks, eliminating the need for MiddleManagers. + +To use this feature, enable the [`druid-kubernetes-overlord-extensions`]((../extensions.md#loading-extensions) in the extensions load list for your Overlord process. + +https://github.com/apache/druid/pull/13156 + +## <a name="25.0.0-behavior-changes" href="#25.0.0-behavior-changes">#</a> Behavior changes + +### <a name="25.0.0-behavior-changes-hll-and-quantiles-sketches" href="#25.0.0-behavior-changes-hll-and-quantiles-sketches">#</a> HLL and quantiles sketches + +The aggregation functions for HLL and quantiles sketches returned sketches or numbers when they are finalized depending on where they were in the native query plan. + +Druid no longer finalizes aggregators in the following two cases: + + - aggregators appear in the outer level of a query + - aggregators are used as input to an expression or finalizing-field-access post-aggregator + +This change aligns the behavior of HLL and quantiles sketches with theta sketches. + +To provide backwards compatibility, you can use the `sqlFinalizeOuterSketches` query context parameter that restores the old behavior. + +https://github.com/apache/druid/pull/13247 + +### <a name="25.0.0-behavior-changes-segment-discovery" href="#25.0.0-behavior-changes-segment-discovery">#</a> Segment discovery + +The default segment discovery method now uses HTTP instead of ZooKeeper. + +This update changes the defaults for the following properties: + +| Property | New default | Previous default | +| - | - | - | +| `druid.serverview.type` for segment management | http | batch | +| `druid.coordinator.loadqueuepeon.type` for segment management | http | curator | +| `druid.indexer.runner.type` for the Overlord | httpRemote | local | + +To use ZooKeeper instead of HTTP, change the values for the properties back to the previous defaults. +ZooKeeper-based implementations for these properties are deprecated and will be removed in a subsequent release. + +https://github.com/apache/druid/pull/13092 + +### <a name="25.0.0-behavior-changes-kill-tasks-do-not-include-markasunuseddone" href="#25.0.0-behavior-changes-kill-tasks-do-not-include-markasunuseddone">#</a> Kill tasks do not include markAsUnusedDone + +When you kill a task, Druid no longer automatically marks segments as unused. You must explicitly mark them as unused with `POST /druid/coordinator/v1/datasources/{dataSourceName}/markUnused`. +For more information, see the [API reference](https://druid.apache.org/docs/latest/operations/api-reference.html#coordinator) + +https://github.com/apache/druid/pull/13104 + +### <a name="25.0.0-behavior-changes-memory-estimates" href="#25.0.0-behavior-changes-memory-estimates">#</a> Memory estimates + +The task context flag `useMaxMemoryEstimates` is now set to false by default to improve memory usage estimation while building an on-heap incremental index. + +https://github.com/apache/druid/pull/13178 + +## <a name="25.0.0-multi-stage-query-task-engine" href="#25.0.0-multi-stage-query-task-engine">#</a> Multi-stage query task engine + +### <a name="25.0.0-multi-stage-query-task-engine-clustered-by-limit" href="#25.0.0-multi-stage-query-task-engine-clustered-by-limit">#</a> CLUSTERED BY limit + +When using the MSQ task engine to ingest data, there is now a 1,500 column limit to the number of columns that can be passed in the CLUSTERED BY clause. + +https://github.com/apache/druid/pull/13352 + +### <a name="25.0.0-multi-stage-query-task-engine-metrics-used-to-downsample-bucket" href="#25.0.0-multi-stage-query-task-engine-metrics-used-to-downsample-bucket">#</a> Metrics used to downsample bucket + +Changed the way the MSQ task engine determines whether or not to downsample data, to improve accuracy. The task engine now uses the number of bytes instead of number of keys. + +https://github.com/apache/druid/pull/12998 + +### <a name="25.0.0-multi-stage-query-task-engine-msq-heap-footprint" href="#25.0.0-multi-stage-query-task-engine-msq-heap-footprint">#</a> MSQ heap footprint + +When determining partition boundaries, the heap footprint of the sketches that MSQ uses is capped at 10% of available memory or 300 MB, whichever is lower. Previously, the cap was strictly 300 MB. + +https://github.com/apache/druid/pull/13274 + +### <a name="25.0.0-multi-stage-query-task-engine-msq-docker-improvement" href="#25.0.0-multi-stage-query-task-engine-msq-docker-improvement">#</a> MSQ Docker improvement + +Enabled MSQ task query engine for Docker by default. + +https://github.com/apache/druid/pull/13069 + + +### <a name="25.0.0-multi-stage-query-task-engine-improved-msq-warnings" href="#25.0.0-multi-stage-query-task-engine-improved-msq-warnings">#</a> Improved MSQ warnings + +For disallowed MSQ warnings of certain types, the warning is now surfaced as the error. + +https://github.com/apache/druid/pull/13198 + +### <a name="25.0.0-multi-stage-query-task-engine-added-support-for-indexspec" href="#25.0.0-multi-stage-query-task-engine-added-support-for-indexspec">#</a> Added support for indexSpec + +The MSQ task engine now supports the `indexSpec` context parameter. This context parameter can also be configured through the web console. + +https://github.com/apache/druid/pull/13275 + +### <a name="25.0.0-multi-stage-query-task-engine-added-task-start-status-to-the-worker-report" href="#25.0.0-multi-stage-query-task-engine-added-task-start-status-to-the-worker-report">#</a> Added task start status to the worker report + +Added `pendingTasks` and `runningTasks` fields to the worker report for the MSQ task engine. +See [Query task status information](#query-task-status-information) for related web console changes. + +https://github.com/apache/druid/pull/13263 + +### <a name="25.0.0-multi-stage-query-task-engine-improved-handling-of-secrets" href="#25.0.0-multi-stage-query-task-engine-improved-handling-of-secrets">#</a> Improved handling of secrets + +When MSQ submits tasks containing SQL with sensitive keys, the keys can get logged in the file. +Druid now masks the sensitive keys in the log files using regular expressions. + +https://github.com/apache/druid/pull/13231 + +### <a name="25.0.0-multi-stage-query-task-engine-query-history" href="#25.0.0-multi-stage-query-task-engine-query-history">#</a> Query history + +Multi-stage queries no longer show up in the Query history dialog. They are still available in the **Recent query tasks** panel. + +## <a name="25.0.0-query-engine" href="#25.0.0-query-engine">#</a> Query engine Review Comment: I think this heading seems wrong. We can remove this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
