kfaraz commented on code in PR #17641: URL: https://github.com/apache/druid/pull/17641#discussion_r1935009145
########## docs/release-info/release-notes.md: ########## @@ -57,40 +57,284 @@ For tips about how to write a good release note, see [Release notes](https://git This section contains important information about new and existing features. +### ANSI-SQL compatibility and query results + +Support for the configs that let you maintain older behavior that wasn't ANSI-SQL compliant have been removed: + +- `druid.generic.useDefaultValueForNull=true` +- `druid.expressions.useStrictBooleans=false` +- `druid.generic.useThreeValueLogicForNativeFilters=false` + +They no longer affect your query results. Only SQL-compliant non-legacy behavior is supported now. + +If the configs are set to the legacy behavior, Druid services will fail to start. + +If you want to continue to get the same results without these settings, you must update your queries or your results will be incorrect after you upgrade. + +For more information about how to update your queries, see the [migration guide](https://druid.apache.org/docs/latest/release-info/migr-ansi-sql-null). + +[#17568](https://github.com/apache/druid/pull/17568) [#17609](https://github.com/apache/druid/pull/17609) + +### Java support + +Java support in Druid has been updated: + +- Java 8 support has been removed +- Java 11 support is deprecated + +We recommend that you upgrade to Java 17. + +[#17466](https://github.com/apache/druid/pull/17466) + +### Hadoop-based ingestion + +Hadoop-based ingestion is now deprecated. We recommend that you migrate to SQL-based ingestion. + +#### Join hints in MSQ task engine queries + +Druid now supports hints for SQL JOIN queries that use the MSQ task engine. This allows queries to provide hints for the JOIN type that should be used at a per join level. Join hints recursively affect sub queries. + +```sql +select /*+ sort_merge */ w1.cityName, w2.countryName +from +( + select /*+ broadcast */ w3.cityName AS cityName, w4.countryName AS countryName from wikipedia w3 LEFT JOIN wikipedia-set2 w4 ON w3.regionName = w4.regionName +) w1 +JOIN wikipedia-set1 w2 ON w1.cityName = w2.cityName +where w1.cityName='New York'; +``` + +(#17406) + +### New Overlord APIs + +APIs for marking segments as used or unused have been moved from the Coordinator to the Overlord service: Review Comment: We should also call out that the corresponding coordinator APIs are now deprecated and will be removed in a future release, and that the coordinator now calls the overlord to serve these requests. The original PR has a list of the deprecated APIs. ########## docs/release-info/release-notes.md: ########## @@ -114,6 +358,15 @@ If you're already using this feature, you don't need to take any action. ### Developer notes +- Improved dependency support between extensions. When an extension has a dependency on another extension, it now tries to use the dependency's class loader to find classes required classes [#16973](https://github.com/apache/druid/pull/16973) Review Comment: We should also add a point for deprecated coordinator APIs. ########## docs/release-info/release-notes.md: ########## @@ -57,40 +57,284 @@ For tips about how to write a good release note, see [Release notes](https://git This section contains important information about new and existing features. +### ANSI-SQL compatibility and query results + +Support for the configs that let you maintain older behavior that wasn't ANSI-SQL compliant have been removed: + +- `druid.generic.useDefaultValueForNull=true` +- `druid.expressions.useStrictBooleans=false` +- `druid.generic.useThreeValueLogicForNativeFilters=false` + +They no longer affect your query results. Only SQL-compliant non-legacy behavior is supported now. + +If the configs are set to the legacy behavior, Druid services will fail to start. + +If you want to continue to get the same results without these settings, you must update your queries or your results will be incorrect after you upgrade. + +For more information about how to update your queries, see the [migration guide](https://druid.apache.org/docs/latest/release-info/migr-ansi-sql-null). + +[#17568](https://github.com/apache/druid/pull/17568) [#17609](https://github.com/apache/druid/pull/17609) + +### Java support + +Java support in Druid has been updated: + +- Java 8 support has been removed +- Java 11 support is deprecated + +We recommend that you upgrade to Java 17. + +[#17466](https://github.com/apache/druid/pull/17466) + +### Hadoop-based ingestion + +Hadoop-based ingestion is now deprecated. We recommend that you migrate to SQL-based ingestion. + +#### Join hints in MSQ task engine queries + +Druid now supports hints for SQL JOIN queries that use the MSQ task engine. This allows queries to provide hints for the JOIN type that should be used at a per join level. Join hints recursively affect sub queries. + +```sql +select /*+ sort_merge */ w1.cityName, w2.countryName +from +( + select /*+ broadcast */ w3.cityName AS cityName, w4.countryName AS countryName from wikipedia w3 LEFT JOIN wikipedia-set2 w4 ON w3.regionName = w4.regionName +) w1 +JOIN wikipedia-set1 w2 ON w1.cityName = w2.cityName +where w1.cityName='New York'; +``` + +(#17406) + +### New Overlord APIs + +APIs for marking segments as used or unused have been moved from the Coordinator to the Overlord service: + +- Mark all segments of a datasource as unused: +`POST /druid/indexer/v1/datasources/{dataSourceName}` + +- Mark all (non-overshadowed) segments of a datasource as used: +`DELETE /druid/indexer/v1/datasources/{dataSourceName}` + +- Mark multiple segments as used +`POST /druid/indexer/v1/datasources/{dataSourceName}/markUsed` +- Mark multiple (non-overshadowed) segments as unused +`POST /druid/indexer/v1/datasources/{dataSourceName}/markUnused` + +- Mark a single segment as used: +`POST /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}` + +- Mark a single segment as unused: +`DELETE /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}` + +[#17545](https://github.com/apache/druid/pull/17545) + + +### 17386 + +https://github.com/apache/druid/pull/17386 + ## Functional area and related changes This section contains detailed release notes separated by areas. ### Web console +#### Explore view (experimental) + +Several improvements have been made to the Explore view in the web console: + +The time chart visualization now supports zooming, dragging, and is smarter about granularity detection: + + + +Filters been improved with helper tables and additional context: + + +Tiles can now be shown side-by-side: + + +[#17627](https://github.com/apache/druid/pull/17627) + +#### Segment timeline view + +The segment timeline is now more interactive and no longer forces day granularity. + +**New view** + + + +**Old view:** + + +[#17521](https://github.com/apache/druid/pull/17521) + #### Other web console improvements +- The timezoner picker now always shows your timezone [#17521](https://github.com/apache/druid/pull/17521) +- UNNEST is now supported for autocomplete suggestions [#17521](https://github.com/apache/druid/pull/17521) +- Tables now support less than and greater than filters [#17521](https://github.com/apache/druid/pull/17521) +- You can now resize the side panels in the Query view [#17387](https://github.com/apache/druid/pull/17387) +- Added the `expectedLoadTimeMillis` segment loading metric to the web console [#17359](https://github.com/apache/druid/pull/17359) + ### Ingestion +#### Numbers for CSV and TSV input formats + +Use the new optional config `tryParseNumbers` for CSV and TSV input formats to control how numbers are treated. If enabled, any numbers present in the input will be parsed in the following manner: + +- long data type for integer types and +- double for floating-point numbers + +By default, this configuration is set to false, so numeric strings will be treated as strings. + +[#17082](https://github.com/apache/druid/pull/17082) + +#### Other ingestion improvements + +- Reduce the direct memory requirement on non-query processing tasks by not reserving query buffers for them [#16887](https://github.com/apache/druid/pull/16887) +- JSON-based and SQL-based ingestion now support request headers when using an HTTP input source [#16974](https://github.com/apache/druid/pull/16974) + #### SQL-based ingestion ##### Other SQL-based ingestion improvements +- SQL-based ingestion now supports dynamic parameters for queries besides SELECT queries, such as REPLACE [#17126](https://github.com/apache/druid/pull/17126) +- Improved thread names to include the stage ID and worker number to help with troubleshooting [#17324](https://github.com/apache/druid/pull/17324) + #### Streaming ingestion +##### Control how many segments get merged for publishing + +You can now use the `maxColumsnToMerge` property in your supervisor spec to specify the number of segments to merge in a single phase when merging segments for publishing. This limit affects the total number of columns present in a set of segments to merge. If the limit is exceeded, segment merging occurs in multiple phases. Druid merges at least 2 segments each phase, regardless of this setting. Review Comment: ```suggestion You can now use the `maxColumnsToMerge` property in your supervisor spec to specify the number of segments to merge in a single phase when merging segments for publishing. This limit affects the total number of columns present in a set of segments to merge. If the limit is exceeded, segment merging occurs in multiple phases. Druid merges at least 2 segments each phase, regardless of this setting. ``` ########## docs/release-info/release-notes.md: ########## @@ -57,40 +57,284 @@ For tips about how to write a good release note, see [Release notes](https://git This section contains important information about new and existing features. +### ANSI-SQL compatibility and query results + +Support for the configs that let you maintain older behavior that wasn't ANSI-SQL compliant have been removed: + +- `druid.generic.useDefaultValueForNull=true` +- `druid.expressions.useStrictBooleans=false` +- `druid.generic.useThreeValueLogicForNativeFilters=false` + +They no longer affect your query results. Only SQL-compliant non-legacy behavior is supported now. + +If the configs are set to the legacy behavior, Druid services will fail to start. + +If you want to continue to get the same results without these settings, you must update your queries or your results will be incorrect after you upgrade. + +For more information about how to update your queries, see the [migration guide](https://druid.apache.org/docs/latest/release-info/migr-ansi-sql-null). + +[#17568](https://github.com/apache/druid/pull/17568) [#17609](https://github.com/apache/druid/pull/17609) + +### Java support + +Java support in Druid has been updated: + +- Java 8 support has been removed +- Java 11 support is deprecated + +We recommend that you upgrade to Java 17. + +[#17466](https://github.com/apache/druid/pull/17466) + +### Hadoop-based ingestion + +Hadoop-based ingestion is now deprecated. We recommend that you migrate to SQL-based ingestion. + +#### Join hints in MSQ task engine queries + +Druid now supports hints for SQL JOIN queries that use the MSQ task engine. This allows queries to provide hints for the JOIN type that should be used at a per join level. Join hints recursively affect sub queries. + +```sql +select /*+ sort_merge */ w1.cityName, w2.countryName +from +( + select /*+ broadcast */ w3.cityName AS cityName, w4.countryName AS countryName from wikipedia w3 LEFT JOIN wikipedia-set2 w4 ON w3.regionName = w4.regionName +) w1 +JOIN wikipedia-set1 w2 ON w1.cityName = w2.cityName +where w1.cityName='New York'; +``` + +(#17406) + +### New Overlord APIs + +APIs for marking segments as used or unused have been moved from the Coordinator to the Overlord service: + +- Mark all segments of a datasource as unused: +`POST /druid/indexer/v1/datasources/{dataSourceName}` + +- Mark all (non-overshadowed) segments of a datasource as used: +`DELETE /druid/indexer/v1/datasources/{dataSourceName}` + +- Mark multiple segments as used +`POST /druid/indexer/v1/datasources/{dataSourceName}/markUsed` +- Mark multiple (non-overshadowed) segments as unused +`POST /druid/indexer/v1/datasources/{dataSourceName}/markUnused` + +- Mark a single segment as used: +`POST /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}` + +- Mark a single segment as unused: +`DELETE /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}` + +[#17545](https://github.com/apache/druid/pull/17545) + + +### 17386 + +https://github.com/apache/druid/pull/17386 + ## Functional area and related changes This section contains detailed release notes separated by areas. ### Web console +#### Explore view (experimental) + +Several improvements have been made to the Explore view in the web console: + +The time chart visualization now supports zooming, dragging, and is smarter about granularity detection: + + + +Filters been improved with helper tables and additional context: + + +Tiles can now be shown side-by-side: + + +[#17627](https://github.com/apache/druid/pull/17627) + +#### Segment timeline view + +The segment timeline is now more interactive and no longer forces day granularity. + +**New view** + + + +**Old view:** + + +[#17521](https://github.com/apache/druid/pull/17521) + #### Other web console improvements +- The timezoner picker now always shows your timezone [#17521](https://github.com/apache/druid/pull/17521) +- UNNEST is now supported for autocomplete suggestions [#17521](https://github.com/apache/druid/pull/17521) +- Tables now support less than and greater than filters [#17521](https://github.com/apache/druid/pull/17521) +- You can now resize the side panels in the Query view [#17387](https://github.com/apache/druid/pull/17387) +- Added the `expectedLoadTimeMillis` segment loading metric to the web console [#17359](https://github.com/apache/druid/pull/17359) + ### Ingestion +#### Numbers for CSV and TSV input formats + +Use the new optional config `tryParseNumbers` for CSV and TSV input formats to control how numbers are treated. If enabled, any numbers present in the input will be parsed in the following manner: + +- long data type for integer types and +- double for floating-point numbers + +By default, this configuration is set to false, so numeric strings will be treated as strings. + +[#17082](https://github.com/apache/druid/pull/17082) + +#### Other ingestion improvements + +- Reduce the direct memory requirement on non-query processing tasks by not reserving query buffers for them [#16887](https://github.com/apache/druid/pull/16887) +- JSON-based and SQL-based ingestion now support request headers when using an HTTP input source [#16974](https://github.com/apache/druid/pull/16974) + #### SQL-based ingestion ##### Other SQL-based ingestion improvements +- SQL-based ingestion now supports dynamic parameters for queries besides SELECT queries, such as REPLACE [#17126](https://github.com/apache/druid/pull/17126) +- Improved thread names to include the stage ID and worker number to help with troubleshooting [#17324](https://github.com/apache/druid/pull/17324) + #### Streaming ingestion +##### Control how many segments get merged for publishing + +You can now use the `maxColumsnToMerge` property in your supervisor spec to specify the number of segments to merge in a single phase when merging segments for publishing. This limit affects the total number of columns present in a set of segments to merge. If the limit is exceeded, segment merging occurs in multiple phases. Druid merges at least 2 segments each phase, regardless of this setting. + +[#17030](https://github.com/apache/druid/pull/17030) + ##### Other streaming ingestion improvements +- Druid now properly supports early/late rejection periods when `stopTasksCount` is configured and streaming tasks run longer than the configured task duration [#17442](https://github.com/apache/druid/pull/17442) +- Improved segment publishing when resubmitting supervisors or when task publishing takes a long time [#17509](https://github.com/apache/druid/pull/17509) + ### Querying +#### Window queries + +The following fields are deprecated for window queries that use the MSQ task engine: `maxRowsMaterializedInWindow` and `partitionColumnNames`. They will be removed in a future release. + +[#17433](https://github.com/apache/druid/pull/17433) + + + Review Comment: Missing content? ########## docs/release-info/release-notes.md: ########## @@ -57,40 +57,284 @@ For tips about how to write a good release note, see [Release notes](https://git This section contains important information about new and existing features. +### ANSI-SQL compatibility and query results + +Support for the configs that let you maintain older behavior that wasn't ANSI-SQL compliant have been removed: + +- `druid.generic.useDefaultValueForNull=true` +- `druid.expressions.useStrictBooleans=false` +- `druid.generic.useThreeValueLogicForNativeFilters=false` + +They no longer affect your query results. Only SQL-compliant non-legacy behavior is supported now. + +If the configs are set to the legacy behavior, Druid services will fail to start. + +If you want to continue to get the same results without these settings, you must update your queries or your results will be incorrect after you upgrade. + +For more information about how to update your queries, see the [migration guide](https://druid.apache.org/docs/latest/release-info/migr-ansi-sql-null). + +[#17568](https://github.com/apache/druid/pull/17568) [#17609](https://github.com/apache/druid/pull/17609) + +### Java support + +Java support in Druid has been updated: + +- Java 8 support has been removed +- Java 11 support is deprecated + +We recommend that you upgrade to Java 17. + +[#17466](https://github.com/apache/druid/pull/17466) + +### Hadoop-based ingestion + +Hadoop-based ingestion is now deprecated. We recommend that you migrate to SQL-based ingestion. + +#### Join hints in MSQ task engine queries + +Druid now supports hints for SQL JOIN queries that use the MSQ task engine. This allows queries to provide hints for the JOIN type that should be used at a per join level. Join hints recursively affect sub queries. + +```sql +select /*+ sort_merge */ w1.cityName, w2.countryName +from +( + select /*+ broadcast */ w3.cityName AS cityName, w4.countryName AS countryName from wikipedia w3 LEFT JOIN wikipedia-set2 w4 ON w3.regionName = w4.regionName +) w1 +JOIN wikipedia-set1 w2 ON w1.cityName = w2.cityName +where w1.cityName='New York'; +``` + +(#17406) + +### New Overlord APIs + +APIs for marking segments as used or unused have been moved from the Coordinator to the Overlord service: + +- Mark all segments of a datasource as unused: +`POST /druid/indexer/v1/datasources/{dataSourceName}` + +- Mark all (non-overshadowed) segments of a datasource as used: +`DELETE /druid/indexer/v1/datasources/{dataSourceName}` + +- Mark multiple segments as used +`POST /druid/indexer/v1/datasources/{dataSourceName}/markUsed` +- Mark multiple (non-overshadowed) segments as unused +`POST /druid/indexer/v1/datasources/{dataSourceName}/markUnused` + +- Mark a single segment as used: +`POST /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}` + +- Mark a single segment as unused: +`DELETE /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}` + +[#17545](https://github.com/apache/druid/pull/17545) + + +### 17386 + +https://github.com/apache/druid/pull/17386 + ## Functional area and related changes This section contains detailed release notes separated by areas. ### Web console +#### Explore view (experimental) + +Several improvements have been made to the Explore view in the web console: + +The time chart visualization now supports zooming, dragging, and is smarter about granularity detection: + + + +Filters been improved with helper tables and additional context: + + +Tiles can now be shown side-by-side: + + +[#17627](https://github.com/apache/druid/pull/17627) + +#### Segment timeline view + +The segment timeline is now more interactive and no longer forces day granularity. + +**New view** + + + +**Old view:** + + +[#17521](https://github.com/apache/druid/pull/17521) + #### Other web console improvements +- The timezoner picker now always shows your timezone [#17521](https://github.com/apache/druid/pull/17521) +- UNNEST is now supported for autocomplete suggestions [#17521](https://github.com/apache/druid/pull/17521) +- Tables now support less than and greater than filters [#17521](https://github.com/apache/druid/pull/17521) +- You can now resize the side panels in the Query view [#17387](https://github.com/apache/druid/pull/17387) +- Added the `expectedLoadTimeMillis` segment loading metric to the web console [#17359](https://github.com/apache/druid/pull/17359) + ### Ingestion +#### Numbers for CSV and TSV input formats + +Use the new optional config `tryParseNumbers` for CSV and TSV input formats to control how numbers are treated. If enabled, any numbers present in the input will be parsed in the following manner: + +- long data type for integer types and +- double for floating-point numbers + +By default, this configuration is set to false, so numeric strings will be treated as strings. + +[#17082](https://github.com/apache/druid/pull/17082) + +#### Other ingestion improvements + +- Reduce the direct memory requirement on non-query processing tasks by not reserving query buffers for them [#16887](https://github.com/apache/druid/pull/16887) +- JSON-based and SQL-based ingestion now support request headers when using an HTTP input source [#16974](https://github.com/apache/druid/pull/16974) + #### SQL-based ingestion ##### Other SQL-based ingestion improvements +- SQL-based ingestion now supports dynamic parameters for queries besides SELECT queries, such as REPLACE [#17126](https://github.com/apache/druid/pull/17126) +- Improved thread names to include the stage ID and worker number to help with troubleshooting [#17324](https://github.com/apache/druid/pull/17324) + #### Streaming ingestion +##### Control how many segments get merged for publishing + +You can now use the `maxColumsnToMerge` property in your supervisor spec to specify the number of segments to merge in a single phase when merging segments for publishing. This limit affects the total number of columns present in a set of segments to merge. If the limit is exceeded, segment merging occurs in multiple phases. Druid merges at least 2 segments each phase, regardless of this setting. + +[#17030](https://github.com/apache/druid/pull/17030) + ##### Other streaming ingestion improvements +- Druid now properly supports early/late rejection periods when `stopTasksCount` is configured and streaming tasks run longer than the configured task duration [#17442](https://github.com/apache/druid/pull/17442) Review Comment: ```suggestion - Druid now fully supports early/late rejection periods when `stopTasksCount` is configured and streaming tasks run longer than the configured task duration [#17442](https://github.com/apache/druid/pull/17442) ``` ########## docs/release-info/release-notes.md: ########## @@ -57,40 +57,284 @@ For tips about how to write a good release note, see [Release notes](https://git This section contains important information about new and existing features. +### ANSI-SQL compatibility and query results + +Support for the configs that let you maintain older behavior that wasn't ANSI-SQL compliant have been removed: + +- `druid.generic.useDefaultValueForNull=true` +- `druid.expressions.useStrictBooleans=false` +- `druid.generic.useThreeValueLogicForNativeFilters=false` + +They no longer affect your query results. Only SQL-compliant non-legacy behavior is supported now. + +If the configs are set to the legacy behavior, Druid services will fail to start. + +If you want to continue to get the same results without these settings, you must update your queries or your results will be incorrect after you upgrade. + +For more information about how to update your queries, see the [migration guide](https://druid.apache.org/docs/latest/release-info/migr-ansi-sql-null). + +[#17568](https://github.com/apache/druid/pull/17568) [#17609](https://github.com/apache/druid/pull/17609) + +### Java support + +Java support in Druid has been updated: + +- Java 8 support has been removed +- Java 11 support is deprecated + +We recommend that you upgrade to Java 17. + +[#17466](https://github.com/apache/druid/pull/17466) + +### Hadoop-based ingestion + +Hadoop-based ingestion is now deprecated. We recommend that you migrate to SQL-based ingestion. + +#### Join hints in MSQ task engine queries + +Druid now supports hints for SQL JOIN queries that use the MSQ task engine. This allows queries to provide hints for the JOIN type that should be used at a per join level. Join hints recursively affect sub queries. + +```sql +select /*+ sort_merge */ w1.cityName, w2.countryName +from +( + select /*+ broadcast */ w3.cityName AS cityName, w4.countryName AS countryName from wikipedia w3 LEFT JOIN wikipedia-set2 w4 ON w3.regionName = w4.regionName +) w1 +JOIN wikipedia-set1 w2 ON w1.cityName = w2.cityName +where w1.cityName='New York'; +``` + +(#17406) + +### New Overlord APIs + +APIs for marking segments as used or unused have been moved from the Coordinator to the Overlord service: + +- Mark all segments of a datasource as unused: +`POST /druid/indexer/v1/datasources/{dataSourceName}` + +- Mark all (non-overshadowed) segments of a datasource as used: +`DELETE /druid/indexer/v1/datasources/{dataSourceName}` + +- Mark multiple segments as used +`POST /druid/indexer/v1/datasources/{dataSourceName}/markUsed` +- Mark multiple (non-overshadowed) segments as unused +`POST /druid/indexer/v1/datasources/{dataSourceName}/markUnused` + +- Mark a single segment as used: +`POST /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}` + +- Mark a single segment as unused: +`DELETE /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}` + +[#17545](https://github.com/apache/druid/pull/17545) + + +### 17386 + +https://github.com/apache/druid/pull/17386 + ## Functional area and related changes This section contains detailed release notes separated by areas. ### Web console +#### Explore view (experimental) + +Several improvements have been made to the Explore view in the web console: + +The time chart visualization now supports zooming, dragging, and is smarter about granularity detection: + + + +Filters been improved with helper tables and additional context: + + +Tiles can now be shown side-by-side: + + +[#17627](https://github.com/apache/druid/pull/17627) + +#### Segment timeline view + +The segment timeline is now more interactive and no longer forces day granularity. + +**New view** + + + +**Old view:** + + +[#17521](https://github.com/apache/druid/pull/17521) + #### Other web console improvements +- The timezoner picker now always shows your timezone [#17521](https://github.com/apache/druid/pull/17521) +- UNNEST is now supported for autocomplete suggestions [#17521](https://github.com/apache/druid/pull/17521) +- Tables now support less than and greater than filters [#17521](https://github.com/apache/druid/pull/17521) +- You can now resize the side panels in the Query view [#17387](https://github.com/apache/druid/pull/17387) +- Added the `expectedLoadTimeMillis` segment loading metric to the web console [#17359](https://github.com/apache/druid/pull/17359) + ### Ingestion +#### Numbers for CSV and TSV input formats + +Use the new optional config `tryParseNumbers` for CSV and TSV input formats to control how numbers are treated. If enabled, any numbers present in the input will be parsed in the following manner: + +- long data type for integer types and +- double for floating-point numbers + +By default, this configuration is set to false, so numeric strings will be treated as strings. + +[#17082](https://github.com/apache/druid/pull/17082) + +#### Other ingestion improvements + +- Reduce the direct memory requirement on non-query processing tasks by not reserving query buffers for them [#16887](https://github.com/apache/druid/pull/16887) +- JSON-based and SQL-based ingestion now support request headers when using an HTTP input source [#16974](https://github.com/apache/druid/pull/16974) + #### SQL-based ingestion ##### Other SQL-based ingestion improvements +- SQL-based ingestion now supports dynamic parameters for queries besides SELECT queries, such as REPLACE [#17126](https://github.com/apache/druid/pull/17126) +- Improved thread names to include the stage ID and worker number to help with troubleshooting [#17324](https://github.com/apache/druid/pull/17324) + #### Streaming ingestion +##### Control how many segments get merged for publishing + +You can now use the `maxColumsnToMerge` property in your supervisor spec to specify the number of segments to merge in a single phase when merging segments for publishing. This limit affects the total number of columns present in a set of segments to merge. If the limit is exceeded, segment merging occurs in multiple phases. Druid merges at least 2 segments each phase, regardless of this setting. + +[#17030](https://github.com/apache/druid/pull/17030) + ##### Other streaming ingestion improvements +- Druid now properly supports early/late rejection periods when `stopTasksCount` is configured and streaming tasks run longer than the configured task duration [#17442](https://github.com/apache/druid/pull/17442) +- Improved segment publishing when resubmitting supervisors or when task publishing takes a long time [#17509](https://github.com/apache/druid/pull/17509) + ### Querying +#### Window queries + +The following fields are deprecated for window queries that use the MSQ task engine: `maxRowsMaterializedInWindow` and `partitionColumnNames`. They will be removed in a future release. + +[#17433](https://github.com/apache/druid/pull/17433) + + + + +[#17541](https://github.com/apache/druid/pull/17541) + #### Other querying improvements +- Added automatic query prioritization based on the period of the segments scanned in a query. You can set the duration threshold in ISO format using `druid.query.scheduler.prioritization.segmentRangeThreshold` [#17009](https://github.com/apache/druid/pull/17009) +- Improved error handling for incomplete queries. A trailer header to indicate an error is returned now [#16672](https://github.com/apache/druid/pull/16672) +- Improved scan queries to account for column types in more situations [#17463](https://github.com/apache/druid/pull/17463) +- Improved lookups so that they can now iterate over fetched data [#17212](https://github.com/apache/druid/pull/17212) +- Improved projections so that they can contain only aggregators and no grouping columns [#17484](https://github.com/apache/druid/pull/17484) +- Removed microseconds as a supported unit for EXTRACT [#17247](https://github.com/apache/druid/pull/17247) + + ### Cluster management +#### Reduced metadata IO Review Comment: ```suggestion #### Reduce metadata IO in batch segment allocation ``` ########## docs/release-info/release-notes.md: ########## @@ -57,40 +57,284 @@ For tips about how to write a good release note, see [Release notes](https://git This section contains important information about new and existing features. +### ANSI-SQL compatibility and query results + +Support for the configs that let you maintain older behavior that wasn't ANSI-SQL compliant have been removed: + +- `druid.generic.useDefaultValueForNull=true` +- `druid.expressions.useStrictBooleans=false` +- `druid.generic.useThreeValueLogicForNativeFilters=false` + +They no longer affect your query results. Only SQL-compliant non-legacy behavior is supported now. + +If the configs are set to the legacy behavior, Druid services will fail to start. + +If you want to continue to get the same results without these settings, you must update your queries or your results will be incorrect after you upgrade. + +For more information about how to update your queries, see the [migration guide](https://druid.apache.org/docs/latest/release-info/migr-ansi-sql-null). + +[#17568](https://github.com/apache/druid/pull/17568) [#17609](https://github.com/apache/druid/pull/17609) + +### Java support + +Java support in Druid has been updated: + +- Java 8 support has been removed +- Java 11 support is deprecated + +We recommend that you upgrade to Java 17. + +[#17466](https://github.com/apache/druid/pull/17466) + +### Hadoop-based ingestion + +Hadoop-based ingestion is now deprecated. We recommend that you migrate to SQL-based ingestion. + +#### Join hints in MSQ task engine queries + +Druid now supports hints for SQL JOIN queries that use the MSQ task engine. This allows queries to provide hints for the JOIN type that should be used at a per join level. Join hints recursively affect sub queries. + +```sql +select /*+ sort_merge */ w1.cityName, w2.countryName +from +( + select /*+ broadcast */ w3.cityName AS cityName, w4.countryName AS countryName from wikipedia w3 LEFT JOIN wikipedia-set2 w4 ON w3.regionName = w4.regionName +) w1 +JOIN wikipedia-set1 w2 ON w1.cityName = w2.cityName +where w1.cityName='New York'; +``` + +(#17406) + +### New Overlord APIs + +APIs for marking segments as used or unused have been moved from the Coordinator to the Overlord service: + +- Mark all segments of a datasource as unused: +`POST /druid/indexer/v1/datasources/{dataSourceName}` + +- Mark all (non-overshadowed) segments of a datasource as used: +`DELETE /druid/indexer/v1/datasources/{dataSourceName}` + +- Mark multiple segments as used +`POST /druid/indexer/v1/datasources/{dataSourceName}/markUsed` +- Mark multiple (non-overshadowed) segments as unused +`POST /druid/indexer/v1/datasources/{dataSourceName}/markUnused` + +- Mark a single segment as used: +`POST /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}` + +- Mark a single segment as unused: +`DELETE /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}` + +[#17545](https://github.com/apache/druid/pull/17545) + + +### 17386 + +https://github.com/apache/druid/pull/17386 + ## Functional area and related changes This section contains detailed release notes separated by areas. ### Web console +#### Explore view (experimental) + +Several improvements have been made to the Explore view in the web console: + +The time chart visualization now supports zooming, dragging, and is smarter about granularity detection: + + + +Filters been improved with helper tables and additional context: + + +Tiles can now be shown side-by-side: + + +[#17627](https://github.com/apache/druid/pull/17627) + +#### Segment timeline view + +The segment timeline is now more interactive and no longer forces day granularity. + +**New view** + + + +**Old view:** + + +[#17521](https://github.com/apache/druid/pull/17521) + #### Other web console improvements +- The timezoner picker now always shows your timezone [#17521](https://github.com/apache/druid/pull/17521) +- UNNEST is now supported for autocomplete suggestions [#17521](https://github.com/apache/druid/pull/17521) +- Tables now support less than and greater than filters [#17521](https://github.com/apache/druid/pull/17521) +- You can now resize the side panels in the Query view [#17387](https://github.com/apache/druid/pull/17387) +- Added the `expectedLoadTimeMillis` segment loading metric to the web console [#17359](https://github.com/apache/druid/pull/17359) + ### Ingestion +#### Numbers for CSV and TSV input formats + +Use the new optional config `tryParseNumbers` for CSV and TSV input formats to control how numbers are treated. If enabled, any numbers present in the input will be parsed in the following manner: + +- long data type for integer types and +- double for floating-point numbers + +By default, this configuration is set to false, so numeric strings will be treated as strings. + +[#17082](https://github.com/apache/druid/pull/17082) + +#### Other ingestion improvements + +- Reduce the direct memory requirement on non-query processing tasks by not reserving query buffers for them [#16887](https://github.com/apache/druid/pull/16887) +- JSON-based and SQL-based ingestion now support request headers when using an HTTP input source [#16974](https://github.com/apache/druid/pull/16974) + #### SQL-based ingestion ##### Other SQL-based ingestion improvements +- SQL-based ingestion now supports dynamic parameters for queries besides SELECT queries, such as REPLACE [#17126](https://github.com/apache/druid/pull/17126) +- Improved thread names to include the stage ID and worker number to help with troubleshooting [#17324](https://github.com/apache/druid/pull/17324) + #### Streaming ingestion +##### Control how many segments get merged for publishing + +You can now use the `maxColumsnToMerge` property in your supervisor spec to specify the number of segments to merge in a single phase when merging segments for publishing. This limit affects the total number of columns present in a set of segments to merge. If the limit is exceeded, segment merging occurs in multiple phases. Druid merges at least 2 segments each phase, regardless of this setting. + +[#17030](https://github.com/apache/druid/pull/17030) + ##### Other streaming ingestion improvements +- Druid now properly supports early/late rejection periods when `stopTasksCount` is configured and streaming tasks run longer than the configured task duration [#17442](https://github.com/apache/druid/pull/17442) +- Improved segment publishing when resubmitting supervisors or when task publishing takes a long time [#17509](https://github.com/apache/druid/pull/17509) + ### Querying +#### Window queries + +The following fields are deprecated for window queries that use the MSQ task engine: `maxRowsMaterializedInWindow` and `partitionColumnNames`. They will be removed in a future release. + +[#17433](https://github.com/apache/druid/pull/17433) + + + + +[#17541](https://github.com/apache/druid/pull/17541) + #### Other querying improvements +- Added automatic query prioritization based on the period of the segments scanned in a query. You can set the duration threshold in ISO format using `druid.query.scheduler.prioritization.segmentRangeThreshold` [#17009](https://github.com/apache/druid/pull/17009) +- Improved error handling for incomplete queries. A trailer header to indicate an error is returned now [#16672](https://github.com/apache/druid/pull/16672) +- Improved scan queries to account for column types in more situations [#17463](https://github.com/apache/druid/pull/17463) +- Improved lookups so that they can now iterate over fetched data [#17212](https://github.com/apache/druid/pull/17212) +- Improved projections so that they can contain only aggregators and no grouping columns [#17484](https://github.com/apache/druid/pull/17484) +- Removed microseconds as a supported unit for EXTRACT [#17247](https://github.com/apache/druid/pull/17247) + + ### Cluster management +#### Reduced metadata IO + +The Overlord runtime property `druid.indexer.tasklock.batchAllocationReduceMetadataIO` can help reduce IO during segment allocation. Setting this flag to true (default value) allows the Overlord to fetch only necessary segment payloads during segment allocation. Review Comment: ```suggestion The Overlord runtime property `druid.indexer.tasklock.batchAllocationReduceMetadataIO` can help reduce IO during batch segment allocation. Setting this flag to true (default value) allows the Overlord to fetch only necessary segment payloads during segment allocation. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
