vtlim commented on code in PR #16681:
URL: https://github.com/apache/druid/pull/16681#discussion_r1803459601


##########
docs/data-management/automatic-compaction.md:
##########
@@ -221,6 +228,137 @@ The following auto-compaction configuration compacts 
updates the `wikipedia` seg
 }
 ```
 
+## Auto-compaction using compaction supervisors  
+
+:::info Experimental
+Compaction supervisors are experimental. For production use, we recommend 
[auto-compaction using Coordinator 
duties](#auto-compaction-using-coordinator-duties).
+:::
+
+You can run automatic compaction using compaction supervisors on the Overlord 
rather than Coordinator duties. Compaction supervisors provide the following 
benefits over Coordinator duties:
+
+* Can use the supervisor framework to get information about the 
auto-compaction, such as status or state
+* More easily suspend or resume compaction for a datasource
+* Can use either the native compaction engine or the [MSQ task 
engine](#use-msq-for-auto-compaction)
+* More reactive and submits tasks as soon as a compaction slot is available
+* Tracked compaction task status to avoid re-compacting an interval repeatedly
+
+
+To use compaction supervisors, set the following properties in your Overlord 
runtime properties:
+  *  `druid.supervisor.compaction.enabled` to `true` so that compaction tasks 
can be run as a supervisor tasks

Review Comment:
   ```suggestion
     *  `druid.supervisor.compaction.enabled` to `true` so that compaction 
tasks can be run as supervisor tasks
   ```



##########
docs/data-management/automatic-compaction.md:
##########
@@ -221,6 +228,137 @@ The following auto-compaction configuration compacts 
updates the `wikipedia` seg
 }
 ```
 
+## Auto-compaction using compaction supervisors  
+
+:::info Experimental
+Compaction supervisors are experimental. For production use, we recommend 
[auto-compaction using Coordinator 
duties](#auto-compaction-using-coordinator-duties).
+:::
+
+You can run automatic compaction using compaction supervisors on the Overlord 
rather than Coordinator duties. Compaction supervisors provide the following 
benefits over Coordinator duties:
+
+* Can use the supervisor framework to get information about the 
auto-compaction, such as status or state
+* More easily suspend or resume compaction for a datasource
+* Can use either the native compaction engine or the [MSQ task 
engine](#use-msq-for-auto-compaction)
+* More reactive and submits tasks as soon as a compaction slot is available
+* Tracked compaction task status to avoid re-compacting an interval repeatedly
+
+
+To use compaction supervisors, set the following properties in your Overlord 
runtime properties:
+  *  `druid.supervisor.compaction.enabled` to `true` so that compaction tasks 
can be run as a supervisor tasks
+  *  `druid.supervisor.compaction.engine` to  `msq` to specify the MSQ task 
engine as the compaction engine or to `native` to use the native engine. This 
is the default engine if the `engine` field is omitted from your compaction 
config
+
+Compaction supervisors use the same syntax as auto-compaction using  
Coordinator duties with one key difference: you submit the auto-compaction as a 
a supervisor spec. In the spec, set the `type` to `autocompact` and include the 
auto-compaction config in the `spec` .

Review Comment:
   ```suggestion
   Compaction supervisors use the same syntax as auto-compaction using  
Coordinator duties with one key difference: you submit the auto-compaction as a 
a supervisor spec. In the spec, set the `type` to `autocompact` and include the 
auto-compaction config in the `spec`.
   ```



##########
docs/data-management/automatic-compaction.md:
##########
@@ -221,6 +228,137 @@ The following auto-compaction configuration compacts 
updates the `wikipedia` seg
 }
 ```
 
+## Auto-compaction using compaction supervisors  
+
+:::info Experimental
+Compaction supervisors are experimental. For production use, we recommend 
[auto-compaction using Coordinator 
duties](#auto-compaction-using-coordinator-duties).
+:::
+
+You can run automatic compaction using compaction supervisors on the Overlord 
rather than Coordinator duties. Compaction supervisors provide the following 
benefits over Coordinator duties:
+
+* Can use the supervisor framework to get information about the 
auto-compaction, such as status or state
+* More easily suspend or resume compaction for a datasource
+* Can use either the native compaction engine or the [MSQ task 
engine](#use-msq-for-auto-compaction)
+* More reactive and submits tasks as soon as a compaction slot is available
+* Tracked compaction task status to avoid re-compacting an interval repeatedly
+
+
+To use compaction supervisors, set the following properties in your Overlord 
runtime properties:
+  *  `druid.supervisor.compaction.enabled` to `true` so that compaction tasks 
can be run as a supervisor tasks
+  *  `druid.supervisor.compaction.engine` to  `msq` to specify the MSQ task 
engine as the compaction engine or to `native` to use the native engine. This 
is the default engine if the `engine` field is omitted from your compaction 
config
+
+Compaction supervisors use the same syntax as auto-compaction using  
Coordinator duties with one key difference: you submit the auto-compaction as a 
a supervisor spec. In the spec, set the `type` to `autocompact` and include the 
auto-compaction config in the `spec` .
+
+To submit an automatic compaction task, you can submit a supervisor spec 
through the [web console](#manage-compaction-supervisors-with-the-web-console) 
or the [supervisor API](#manage-compaction-supervisors-with-supervisor-apis).
+
+
+### Manage compaction supervisors with the web console
+
+To submit a supervisor spec for MSQ task engine automatic compaction, perform 
the following steps:
+
+1. In the web console, go to the **Supervisors** tab.
+1. Click **...** > **Submit JSON supervisor**.
+1. In the dialog, include the following:
+     - The type of supervisor spec by setting `"type": "autocompact"`
+     - The compaction configuration by adding it to the `spec` field
+    ```json
+    {
+   "type": "autocompact",
+   "spec": {
+      "dataSource": YOUR_DATASOURCE,
+      "tuningConfig": {...},
+       "granularitySpec": {...},
+       "engine": <native|msq>,
+       ...
+   }
+    ```
+1. Submit the supervisor.
+
+To stop the automatic compaction task, suspend or terminate the supervisor 
through the UI or API.
+
+### Manage compaction supervisors with supervisor APIs
+
+Submitting an automatic compaction as a supervisor task uses the same endpoint 
as supervisor tasks for streaming ingestion.
+
+The following example configures auto-compaction for the `wikipedia` 
datasource:
+
+```sh
+curl --location --request POST 
'http://localhost:8081/druid/indexer/v1/supervisor' \
+--header 'Content-Type: application/json' \
+--data-raw '{
+   "type": "autocompact",                     // required
+   "suspended": false,                        // optional 
+   "spec": {                                  // required
+       "dataSource": "wikipedia",             // required
+       "tuningConfig": {...},                 // optional
+       "granularitySpec": {...},              // optional
+       "engine": <native|msq>,                // optional
+       ...
+   }
+}'
+```
+
+Note that if you omit `spec.engine`, Druid uses the default compaction engine. 
You can control the default compaction engine with the 
`druid.supervisor.compaction.engine` Overlord runtime property. If 
`spec.engine` and `druid.supervisor.compaction.engine` are omitted, Druid 
defaults to the native engine.
+
+To stop the automatic compaction task, suspend or terminate the supervisor 
through the UI or API.
+
+### Use MSQ for auto-compaction
+
+The MSQ task engine is available as a compaction engine if you configure 
auto-compaction to use compaction supervisors. To use the MSQ task engine for 
automatic compaction, make sure the following requirements are met:
+
+* [Load the MSQ task engine 
extension](../multi-stage-query/index.md#load-the-extension).
+* In your Overlord runtime properties, set the following properties:
+  *  `druid.supervisor.compaction.enabled` to `true` so that compaction tasks 
can be run as a supervisor task.
+  *  Optionally, set `druid.supervisor.compaction.engine` to `msq` to specify 
the MSQ task engine as the default compaction engine. If you don't do this, 
you'll need to set `spec.engine` to `msq` for each compaction supervisor spec 
where you want to use the MSQ task engine.
+* Have at least two compaction task slots available or set 
`compactionConfig.taskContext.maxNumTasks` to two or more. The MSQ task engine 
requires at least two tasks to run, one controller task and one worker task.
+
+You can use [MSQ task engine context 
parameters](../multi-stage-query/reference.md#context-parameters) in 
`spec.taskContext` when configuring your datasource for automatic compaction, 
such as setting the maximum number of tasks using the 
`spec.taskContext.maxNumTasks` parameter. Some of the MSQ task engine context 
parameters overlap with automatic compaction parameters. When these settings 
overlap, set one or the other.
+
+
+#### MSQ task engine limitations
+
+<!--This list also exists in multi-stage-query/known-issues-->
+
+When using the MSQ task engine for auto-compaction, keep the following 
limitations in mind:
+
+- The `metricSpec` field is only supported for certain aggregators. For more 
information, see [Supported aggregators](#supported-aggregators).
+- Only dynamic and range-based partitioning are supported.
+- Set `rollup`  to `true` if and only if `metricSpec` is not empty or null.
+- You can only partition on string dimensions. However, multi-valued string 
dimensions are not supported.
+- The `maxTotalRows` config is not supported in `DynamicPartitionsSpec`. Use 
`maxRowsPerSegment` instead.
+- Segments can only be sorted on `__time` as the first column.
+
+##### Supported aggregators
+
+Auto-compaction using the MSQ task engine supports only aggregators that 
satisfy the following properties: 
+* __Mergeability__: can combine partial aggregates
+* __Idempotency__: produces the same results on repeated runs of the 
aggregator on previously aggregated values in a column
+
+This is exemplified by the following `longSum` aggregator:
+
+```
+{"name": "added", "type": "longSum", "fieldName": "added"}
+```
+
+where `longSum` being capable of combining partial results satisfies 
mergeability, while input and output column being the same (`added`) ensures 
idempotency.
+
+The following are some examples of aggregators that aren't supported since at 
least one of the required conditions aren't satisfied:
+
+*  `longSum` aggregator where the `added` column rolls up into `sum_added` 
column discarding the input `added` column, violating idempotency, as 
subsequent runs would no longer find the `added` column:
+    ```
+    {"name": "sum_added", "type": "longSum", "fieldName": "added"}
+    ```
+* Partial sketches which cannot themselves be used to combine partial 
aggregates and need merging aggregators -- such as `HLLSketchMerge` required 
for `HLLSketchBuild` aggregator below -- violating mergeability:
+    ```
+    {"name": added, "type":"HLLSketchBuild", fieldName: "added"}

Review Comment:
   ```suggestion
       {"name": "added", "type": "HLLSketchBuild", "fieldName": "added"}
   ```



##########
docs/data-management/automatic-compaction.md:
##########
@@ -221,6 +228,137 @@ The following auto-compaction configuration compacts 
updates the `wikipedia` seg
 }
 ```
 
+## Auto-compaction using compaction supervisors  
+
+:::info Experimental
+Compaction supervisors are experimental. For production use, we recommend 
[auto-compaction using Coordinator 
duties](#auto-compaction-using-coordinator-duties).
+:::
+
+You can run automatic compaction using compaction supervisors on the Overlord 
rather than Coordinator duties. Compaction supervisors provide the following 
benefits over Coordinator duties:
+
+* Can use the supervisor framework to get information about the 
auto-compaction, such as status or state
+* More easily suspend or resume compaction for a datasource
+* Can use either the native compaction engine or the [MSQ task 
engine](#use-msq-for-auto-compaction)
+* More reactive and submits tasks as soon as a compaction slot is available
+* Tracked compaction task status to avoid re-compacting an interval repeatedly
+
+
+To use compaction supervisors, set the following properties in your Overlord 
runtime properties:
+  *  `druid.supervisor.compaction.enabled` to `true` so that compaction tasks 
can be run as a supervisor tasks
+  *  `druid.supervisor.compaction.engine` to  `msq` to specify the MSQ task 
engine as the compaction engine or to `native` to use the native engine. This 
is the default engine if the `engine` field is omitted from your compaction 
config
+
+Compaction supervisors use the same syntax as auto-compaction using  
Coordinator duties with one key difference: you submit the auto-compaction as a 
a supervisor spec. In the spec, set the `type` to `autocompact` and include the 
auto-compaction config in the `spec` .
+
+To submit an automatic compaction task, you can submit a supervisor spec 
through the [web console](#manage-compaction-supervisors-with-the-web-console) 
or the [supervisor API](#manage-compaction-supervisors-with-supervisor-apis).
+
+
+### Manage compaction supervisors with the web console
+
+To submit a supervisor spec for MSQ task engine automatic compaction, perform 
the following steps:
+
+1. In the web console, go to the **Supervisors** tab.
+1. Click **...** > **Submit JSON supervisor**.
+1. In the dialog, include the following:
+     - The type of supervisor spec by setting `"type": "autocompact"`
+     - The compaction configuration by adding it to the `spec` field
+    ```json
+    {
+   "type": "autocompact",
+   "spec": {
+      "dataSource": YOUR_DATASOURCE,
+      "tuningConfig": {...},
+       "granularitySpec": {...},
+       "engine": <native|msq>,
+       ...
+   }
+    ```

Review Comment:
   ```suggestion
       ```json
       {
        "type": "autocompact",
        "spec": {
          "dataSource": YOUR_DATASOURCE,
          "tuningConfig": {...},
          "granularitySpec": {...},
          "engine": <native|msq>,
          ...
       }
       ```
   ```



##########
docs/data-management/automatic-compaction.md:
##########
@@ -221,6 +228,137 @@ The following auto-compaction configuration compacts 
updates the `wikipedia` seg
 }
 ```
 
+## Auto-compaction using compaction supervisors  
+
+:::info Experimental
+Compaction supervisors are experimental. For production use, we recommend 
[auto-compaction using Coordinator 
duties](#auto-compaction-using-coordinator-duties).
+:::
+
+You can run automatic compaction using compaction supervisors on the Overlord 
rather than Coordinator duties. Compaction supervisors provide the following 
benefits over Coordinator duties:
+
+* Can use the supervisor framework to get information about the 
auto-compaction, such as status or state
+* More easily suspend or resume compaction for a datasource
+* Can use either the native compaction engine or the [MSQ task 
engine](#use-msq-for-auto-compaction)
+* More reactive and submits tasks as soon as a compaction slot is available
+* Tracked compaction task status to avoid re-compacting an interval repeatedly
+
+
+To use compaction supervisors, set the following properties in your Overlord 
runtime properties:
+  *  `druid.supervisor.compaction.enabled` to `true` so that compaction tasks 
can be run as a supervisor tasks
+  *  `druid.supervisor.compaction.engine` to  `msq` to specify the MSQ task 
engine as the compaction engine or to `native` to use the native engine. This 
is the default engine if the `engine` field is omitted from your compaction 
config
+
+Compaction supervisors use the same syntax as auto-compaction using  
Coordinator duties with one key difference: you submit the auto-compaction as a 
a supervisor spec. In the spec, set the `type` to `autocompact` and include the 
auto-compaction config in the `spec` .
+
+To submit an automatic compaction task, you can submit a supervisor spec 
through the [web console](#manage-compaction-supervisors-with-the-web-console) 
or the [supervisor API](#manage-compaction-supervisors-with-supervisor-apis).
+
+
+### Manage compaction supervisors with the web console
+
+To submit a supervisor spec for MSQ task engine automatic compaction, perform 
the following steps:
+
+1. In the web console, go to the **Supervisors** tab.
+1. Click **...** > **Submit JSON supervisor**.
+1. In the dialog, include the following:
+     - The type of supervisor spec by setting `"type": "autocompact"`
+     - The compaction configuration by adding it to the `spec` field
+    ```json
+    {
+   "type": "autocompact",
+   "spec": {
+      "dataSource": YOUR_DATASOURCE,
+      "tuningConfig": {...},
+       "granularitySpec": {...},
+       "engine": <native|msq>,
+       ...
+   }
+    ```
+1. Submit the supervisor.
+
+To stop the automatic compaction task, suspend or terminate the supervisor 
through the UI or API.
+
+### Manage compaction supervisors with supervisor APIs
+
+Submitting an automatic compaction as a supervisor task uses the same endpoint 
as supervisor tasks for streaming ingestion.
+
+The following example configures auto-compaction for the `wikipedia` 
datasource:
+
+```sh
+curl --location --request POST 
'http://localhost:8081/druid/indexer/v1/supervisor' \
+--header 'Content-Type: application/json' \
+--data-raw '{
+   "type": "autocompact",                     // required
+   "suspended": false,                        // optional 
+   "spec": {                                  // required
+       "dataSource": "wikipedia",             // required
+       "tuningConfig": {...},                 // optional
+       "granularitySpec": {...},              // optional
+       "engine": <native|msq>,                // optional
+       ...
+   }
+}'
+```
+
+Note that if you omit `spec.engine`, Druid uses the default compaction engine. 
You can control the default compaction engine with the 
`druid.supervisor.compaction.engine` Overlord runtime property. If 
`spec.engine` and `druid.supervisor.compaction.engine` are omitted, Druid 
defaults to the native engine.
+
+To stop the automatic compaction task, suspend or terminate the supervisor 
through the UI or API.
+
+### Use MSQ for auto-compaction
+
+The MSQ task engine is available as a compaction engine if you configure 
auto-compaction to use compaction supervisors. To use the MSQ task engine for 
automatic compaction, make sure the following requirements are met:
+
+* [Load the MSQ task engine 
extension](../multi-stage-query/index.md#load-the-extension).
+* In your Overlord runtime properties, set the following properties:
+  *  `druid.supervisor.compaction.enabled` to `true` so that compaction tasks 
can be run as a supervisor task.
+  *  Optionally, set `druid.supervisor.compaction.engine` to `msq` to specify 
the MSQ task engine as the default compaction engine. If you don't do this, 
you'll need to set `spec.engine` to `msq` for each compaction supervisor spec 
where you want to use the MSQ task engine.
+* Have at least two compaction task slots available or set 
`compactionConfig.taskContext.maxNumTasks` to two or more. The MSQ task engine 
requires at least two tasks to run, one controller task and one worker task.
+
+You can use [MSQ task engine context 
parameters](../multi-stage-query/reference.md#context-parameters) in 
`spec.taskContext` when configuring your datasource for automatic compaction, 
such as setting the maximum number of tasks using the 
`spec.taskContext.maxNumTasks` parameter. Some of the MSQ task engine context 
parameters overlap with automatic compaction parameters. When these settings 
overlap, set one or the other.
+
+
+#### MSQ task engine limitations
+
+<!--This list also exists in multi-stage-query/known-issues-->
+
+When using the MSQ task engine for auto-compaction, keep the following 
limitations in mind:
+
+- The `metricSpec` field is only supported for certain aggregators. For more 
information, see [Supported aggregators](#supported-aggregators).
+- Only dynamic and range-based partitioning are supported.
+- Set `rollup`  to `true` if and only if `metricSpec` is not empty or null.
+- You can only partition on string dimensions. However, multi-valued string 
dimensions are not supported.
+- The `maxTotalRows` config is not supported in `DynamicPartitionsSpec`. Use 
`maxRowsPerSegment` instead.
+- Segments can only be sorted on `__time` as the first column.
+
+##### Supported aggregators

Review Comment:
   ```suggestion
   #### Supported aggregators
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to