[GitHub] [druid] techdocsmith commented on a diff in pull request #12569: Docs for automatic compaction

GitBox Fri, 03 Jun 2022 16:58:26 -0700


techdocsmith commented on code in PR #12569:
URL: https://github.com/apache/druid/pull/12569#discussion_r889414763



##########
docs/configuration/index.md:
##########
@@ -987,8 +987,11 @@ Automatic compaction config example:
 }
 ```
 
-Compaction tasks fail when higher priority tasks cause Druid to revoke their 
locks. By default, realtime tasks like ingestion have a higher priority than 
compaction tasks. Therefore frequent conflicts between compaction tasks and 
realtime tasks can cause the coordinator's automatic compaction to get stuck.
-You may see this issue with streaming ingestion from Kafka and Kinesis, which 
ingest late-arriving data. To mitigate this problem, set `skipOffsetFromLatest` 
to a value large enough so that arriving data tends to fall outside the offset 
value from the current time. This way you can avoid conflicts between 
compaction tasks and realtime ingestion tasks.
+Compaction tasks fail when higher priority tasks cause Druid to revoke their 
locks. By default, realtime tasks like ingestion have a higher priority than 
compaction tasks. Therefore frequent conflicts between compaction tasks and 
realtime tasks can cause the Coordinator's automatic compaction to get stuck.
+You may see this issue with streaming ingestion from Kafka and Kinesis, which 
ingest late-arriving data.
+
+To mitigate this problem, set `skipOffsetFromLatest` to a value large enough 
so that arriving data tends to fall outside the offset value from the current 
time. This way you can avoid conflicts between compaction tasks and realtime 
ingestion tasks.

Review Comment:
   Do we need an example here? There is one in the Avoid conflicts section, but 
might help illustrate here.



##########
docs/ingestion/automatic-compaction.md:
##########
@@ -0,0 +1,198 @@
+---
+id: automatic-compaction
+title: "Automatic compaction"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after the data is ingested in 
Druid to improve query performance. Automatic compaction, or auto-compaction, 
refers to the system for automatic execution of compaction tasks managed by the 
[Druid Coordinator](../design/coordinator.md).
+
+The frequency of compaction tasks relies on the Coordinator [indexing 
period](../configuration/index.md#coordinator-operation), configured by 
`druid.coordinator.period.indexingPeriod`.

Review Comment:
   nit: "configured by" reads awkwardly.  Maybe new sentence: You can 
configure/specify/control the frequency...



##########
docs/ingestion/automatic-compaction.md:
##########
@@ -0,0 +1,198 @@
+---
+id: automatic-compaction
+title: "Automatic compaction"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after the data is ingested in 
Druid to improve query performance. Automatic compaction, or auto-compaction, 
refers to the system for automatic execution of compaction tasks managed by the 
[Druid Coordinator](../design/coordinator.md).
+
+The frequency of compaction tasks relies on the Coordinator [indexing 
period](../configuration/index.md#coordinator-operation), configured by 
`druid.coordinator.period.indexingPeriod`.
+The default indexing period is 30 minutes, meaning that the Coordinator first 
checks for segments to compact at most 30 minutes from when auto-compaction is 
enabled.
+Note this time period affects other Coordinator duties including merge and 
conversion tasks.

Review Comment:
   ```suggestion
   This time period affects other Coordinator duties including merge and 
conversion tasks.
   ```



##########
docs/ingestion/automatic-compaction.md:
##########
@@ -0,0 +1,198 @@
+---
+id: automatic-compaction
+title: "Automatic compaction"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after the data is ingested in 
Druid to improve query performance. Automatic compaction, or auto-compaction, 
refers to the system for automatic execution of compaction tasks managed by the 
[Druid Coordinator](../design/coordinator.md).
+
+The frequency of compaction tasks relies on the Coordinator [indexing 
period](../configuration/index.md#coordinator-operation), configured by 
`druid.coordinator.period.indexingPeriod`.
+The default indexing period is 30 minutes, meaning that the Coordinator first 
checks for segments to compact at most 30 minutes from when auto-compaction is 
enabled.
+Note this time period affects other Coordinator duties including merge and 
conversion tasks.
+To configure the frequency of compaction tasks, [create a new duty group for 
the Coordinator](#set-frequency-of-compaction-runs).
+
+At every indexing period, the Coordinator initiates a [segment 
search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) 
to determine eligible segments to compact.
+When there are eligible segments to compact, the Coordinator issues compaction 
tasks based on available worker capacity.
+If a compaction task takes longer than the indexing period, the Coordinator 
waits for it to finish before resuming the period for segment search.
+
+As a best practice, you should set up auto-compaction for all Druid 
datasources. You can run compaction tasks manually for cases where you want to 
allocate more system resources. For example, you may choose to run multiple 
compaction tasks in parallel to compact an existing datasource for the first 
time. See [Compaction](compaction.md) for additional details and use cases.
+
+This topic guides you through setting up automatic compaction for your Druid 
cluster. See the [examples](#examples) for common use cases for automatic 
compaction.
+
+## Enable automatic compaction
+
+You can enable automatic compaction for a datasource using the Druid console 
or programmatically via an API.
+This process differs for manual compaction tasks, which can be submitted from 
the [Tasks view of the Druid console](../operations/druid-console.md) or the 
[Tasks API](../operations/api-reference.md#post-5).
+
+### Druid console
+
+Use the Druid console to enable automatic compaction for a datasource as 
follows.
+
+1. Click **Datasources** in the top-level navigation.
+2. In the **Compaction** column, click the edit icon for the datasource to 
compact.
+3. In the **Compaction config** dialog, configure the auto-compaction 
settings. The dialog offers a form view as well as a JSON view. Editing the 
form updates the JSON specification, and editing the JSON updates the form 
field, if present. Form fields not present in the JSON indicate default values. 
You may add additional properties to the JSON for auto-compaction settings not 
displayed in the form. See [Configure automatic 
compaction](#configure-automatic-compaction) for supported settings for 
auto-compaction.
+4. Click **Submit**.
+5. Refresh the **Datasources** view. The **Compaction** column for the 
datasource changes from “Not enabled” to “Awaiting first run.”
+
+The following screenshot shows the compaction config dialog for a datasource 
with auto-compaction enabled.
+![Compaction config in web console](../assets/compaction-dialog.png)
+
+To disable auto-compaction for a datasource, click **Delete** from the 
**Compaction config** dialog. Druid does not retain your auto-compaction 
configuration.
+
+### Compaction configuration API
+
+Use the [Coordinator 
API](../operations/api-reference.md#automatic-compaction-status) to configure 
automatic compaction.
+To enable auto-compaction for a datasource, create a JSON object with the 
desired auto-compaction settings.
+See [Configure automatic compaction](#configure-automatic-compaction) for the 
syntax of an auto-compaction spec.
+Send the JSON object as a payload in a [`POST` 
request](../operations/api-reference.md#post-4) to 
`/druid/coordinator/v1/config/compaction`.
+The following example configures auto-compaction for the `wikipedia` 
datasource:
+
+```sh
+curl --location --request POST 
'http://localhost:8081/druid/coordinator/v1/config/compaction' \
+--header 'Content-Type: application/json' \
+--data-raw '{
+    "dataSource": "wikipedia",
+    "granularitySpec": {
+        "segmentGranularity": "DAY"
+    }
+}'
+```
+
+To disable auto-compaction for a datasource, send a [`DELETE` 
request](../operations/api-reference.md#delete-1) to 
`/druid/coordinator/v1/config/compaction/{dataSource}`. Replace `{dataSource}` 
with the name of the datasource for which to disable auto-compaction. For 
example:
+
+```sh
+curl --location --request DELETE 
'http://localhost:8081/druid/coordinator/v1/config/compaction/wikipedia'
+```
+
+
+## Configure automatic compaction
+
+You can configure automatic compaction dynamically without restarting Druid.
+The automatic compaction system uses the following syntax:
+
+```json
+{
+    "dataSource": <task_datasource>,
+    "ioConfig": <IO config>,
+    "dimensionsSpec": <custom dimensionsSpec>,
+    "transformSpec": <custom transformSpec>,
+    "metricsSpec": <custom metricsSpec>,
+    "tuningConfig": <parallel indexing task tuningConfig>,
+    "granularitySpec": <compaction task granularitySpec>,
+    "skipOffsetFromLatest": <time period to avoid compaction>,
+    "taskPriority": <compaction task priority>,
+    "taskContext": <task context>
+}
+```
+
+Most fields in the auto-compaction configuration align with a typical [Druid 
ingestion spec](../ingestion/ingestion-spec.md).
+The following properties only apply to auto-compaction:
+* `skipOffsetFromLatest`
+* `taskPriority`
+* `taskContext`
+
+Since the automatic compaction system provides a management layer on top of 
manual compaction tasks,
+the auto-compaction configuration does not include task-specific properties 
found in a typical Druid ingestion spec.
+The following properties are automatically set by the Coordinator:
+* `type`: Set to `compact`.
+* `id`: Generated using the task type, datasource name, interval, and 
timestamp. The task ID is prefixed with `coordinator-issued`.
+* `context`: Set according to the user-provided `taskContext`.
+
+For more details on each of the specs in an auto-compaction configuration, see 
[Automatic compaction dynamic 
configuration](../configuration/index.md#automatic-compaction-dynamic-configuration).
+
+
+### Avoid conflicts with ingestion
+
+The Coordinator compacts segments from newest to oldest. In the 
auto-compaction configuration, you can set a time period, relative to the end 
time of the most recent segment, for segments that should not be compacted. 
Assign this value to `skipOffsetFromLatest`. Note that this offset is not 
relative to the current time but to the latest segment time. For example, if 
you want to skip over segments from thirty days prior to the end time of the 
most recent segment, assign `"skipOffsetFromLatest": "P30D"`.
+
+Compaction tasks may be interrupted when they interfere with ingestion. For 
example, this occurs when an ingestion task needs to write data to a segment 
for a time interval locked for compaction. If there are continuous failures 
that prevent compaction from making progress, consider one of the following 
strategies:
+* Set `skipOffsetFromLatest` to reduce the chance of conflicts between 
ingestion and compaction.
+* Increase the priority value of compaction tasks relative to ingestion tasks. 
Only recommended for advanced users. This approach can cause ingestion jobs to 
fail or lag. To change the priority of compaction tasks, set `taskPriority` to 
the desired priority value in the auto-compaction configuration. For details on 
the priority values of different task types, see [Lock 
priority](../ingestion/tasks.md#lock-priority).
+
+
+### Set frequency of compaction runs
+
+If you want the Coordinator to check for compaction more frequently than its 
indexing period, create a separate duty group and set its time period in the 
`coordinator/runtime.properties` file. For example, to change the 
auto-compaction period to 1 minute:
+```
+druid.coordinator.dutyGroups=["compaction"]
+druid.coordinator.compaction.duties=["compactSegments"]
+druid.coordinator.compaction.period=PT60S
+```
+
+## View automatic compaction statistics
+
+After the Coordinator has initiated auto-compaction, you can view compaction 
statistics for the datasource, including the number of bytes, segments, and 
intervals already compacted and those awaiting compaction. The Coordinator also 
reports the total bytes, segments, and intervals not eligible for compaction in 
accordance with its [segment search 
policy](../design/coordinator.md#segment-search-policy-in-automatic-compaction).
+
+In the Druid console, the Datasources view displays auto-compaction 
statistics. The Tasks view shows the task information for compaction tasks that 
were triggered by the automatic compaction system.
+
+To get statistics by API, send a [`GET` 
request](../operations/api-reference.md#get-10) to 
`/druid/coordinator/v1/compaction/status`. To filter the results to a 
particular datasource, pass the datasource name as a query parameter to the 
request—for example, 
`/druid/coordinator/v1/compaction/status?dataSource=wikipedia`.
+
+## Examples
+
+The following examples demonstrate potential use cases in which 
auto-compaction may improve your Druid performance. See more details in 
[Compaction strategies](../ingestion/compaction.md#compaction-strategies). The 
examples in this section do not change the underlying data.
+
+### Change segment granularity
+
+You have a stream set up to ingest data with `HOUR` segment granularity into 
the `wikistream` datasource. You notice that your Druid segments are smaller 
than the [recommended segment size](../operations/segment-optimization.md) of 5 
million rows per segment. You wish to automatically compact segments to `DAY` 
granularity while leaving the latest week of data _not_ compacted as your 
stream consistently has data over that time period coming in.

Review Comment:
   ```suggestion
   You have a stream set up to ingest data with `HOUR` segment granularity into 
the `wikistream` datasource. You notice that your Druid segments are smaller 
than the [recommended segment size](../operations/segment-optimization.md) of 5 
million rows per segment. You wish to automatically compact segments to `DAY` 
granularity while leaving the latest week of data _not_ compacted because your 
stream consistently receives data within that time period.
   ```
   nit: avoid "as" for "because"
   
   I think this explains the use case for skip offset from latest pretty well. 
Wonder if there's a way to either refer to this or move it.



##########
docs/ingestion/automatic-compaction.md:
##########
@@ -0,0 +1,198 @@
+---
+id: automatic-compaction
+title: "Automatic compaction"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after the data is ingested in 
Druid to improve query performance. Automatic compaction, or auto-compaction, 
refers to the system for automatic execution of compaction tasks managed by the 
[Druid Coordinator](../design/coordinator.md).
+
+The frequency of compaction tasks relies on the Coordinator [indexing 
period](../configuration/index.md#coordinator-operation), configured by 
`druid.coordinator.period.indexingPeriod`.
+The default indexing period is 30 minutes, meaning that the Coordinator first 
checks for segments to compact at most 30 minutes from when auto-compaction is 
enabled.
+Note this time period affects other Coordinator duties including merge and 
conversion tasks.
+To configure the frequency of compaction tasks, [create a new duty group for 
the Coordinator](#set-frequency-of-compaction-runs).
+
+At every indexing period, the Coordinator initiates a [segment 
search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) 
to determine eligible segments to compact.
+When there are eligible segments to compact, the Coordinator issues compaction 
tasks based on available worker capacity.
+If a compaction task takes longer than the indexing period, the Coordinator 
waits for it to finish before resuming the period for segment search.
+
+As a best practice, you should set up auto-compaction for all Druid 
datasources. You can run compaction tasks manually for cases where you want to 
allocate more system resources. For example, you may choose to run multiple 
compaction tasks in parallel to compact an existing datasource for the first 
time. See [Compaction](compaction.md) for additional details and use cases.
+
+This topic guides you through setting up automatic compaction for your Druid 
cluster. See the [examples](#examples) for common use cases for automatic 
compaction.
+
+## Enable automatic compaction
+
+You can enable automatic compaction for a datasource using the Druid console 
or programmatically via an API.
+This process differs for manual compaction tasks, which can be submitted from 
the [Tasks view of the Druid console](../operations/druid-console.md) or the 
[Tasks API](../operations/api-reference.md#post-5).
+
+### Druid console
+
+Use the Druid console to enable automatic compaction for a datasource as 
follows.
+
+1. Click **Datasources** in the top-level navigation.
+2. In the **Compaction** column, click the edit icon for the datasource to 
compact.
+3. In the **Compaction config** dialog, configure the auto-compaction 
settings. The dialog offers a form view as well as a JSON view. Editing the 
form updates the JSON specification, and editing the JSON updates the form 
field, if present. Form fields not present in the JSON indicate default values. 
You may add additional properties to the JSON for auto-compaction settings not 
displayed in the form. See [Configure automatic 
compaction](#configure-automatic-compaction) for supported settings for 
auto-compaction.
+4. Click **Submit**.
+5. Refresh the **Datasources** view. The **Compaction** column for the 
datasource changes from “Not enabled” to “Awaiting first run.”
+
+The following screenshot shows the compaction config dialog for a datasource 
with auto-compaction enabled.
+![Compaction config in web console](../assets/compaction-dialog.png)
+
+To disable auto-compaction for a datasource, click **Delete** from the 
**Compaction config** dialog. Druid does not retain your auto-compaction 
configuration.
+
+### Compaction configuration API
+
+Use the [Coordinator 
API](../operations/api-reference.md#automatic-compaction-status) to configure 
automatic compaction.
+To enable auto-compaction for a datasource, create a JSON object with the 
desired auto-compaction settings.
+See [Configure automatic compaction](#configure-automatic-compaction) for the 
syntax of an auto-compaction spec.
+Send the JSON object as a payload in a [`POST` 
request](../operations/api-reference.md#post-4) to 
`/druid/coordinator/v1/config/compaction`.
+The following example configures auto-compaction for the `wikipedia` 
datasource:
+
+```sh
+curl --location --request POST 
'http://localhost:8081/druid/coordinator/v1/config/compaction' \
+--header 'Content-Type: application/json' \
+--data-raw '{
+    "dataSource": "wikipedia",
+    "granularitySpec": {
+        "segmentGranularity": "DAY"
+    }
+}'
+```
+
+To disable auto-compaction for a datasource, send a [`DELETE` 
request](../operations/api-reference.md#delete-1) to 
`/druid/coordinator/v1/config/compaction/{dataSource}`. Replace `{dataSource}` 
with the name of the datasource for which to disable auto-compaction. For 
example:
+
+```sh
+curl --location --request DELETE 
'http://localhost:8081/druid/coordinator/v1/config/compaction/wikipedia'
+```
+
+
+## Configure automatic compaction
+
+You can configure automatic compaction dynamically without restarting Druid.
+The automatic compaction system uses the following syntax:
+
+```json
+{
+    "dataSource": <task_datasource>,
+    "ioConfig": <IO config>,
+    "dimensionsSpec": <custom dimensionsSpec>,
+    "transformSpec": <custom transformSpec>,
+    "metricsSpec": <custom metricsSpec>,
+    "tuningConfig": <parallel indexing task tuningConfig>,
+    "granularitySpec": <compaction task granularitySpec>,
+    "skipOffsetFromLatest": <time period to avoid compaction>,
+    "taskPriority": <compaction task priority>,
+    "taskContext": <task context>
+}
+```
+
+Most fields in the auto-compaction configuration align with a typical [Druid 
ingestion spec](../ingestion/ingestion-spec.md).

Review Comment:
   correlate to > align with? not sure



##########
docs/ingestion/automatic-compaction.md:
##########
@@ -0,0 +1,198 @@
+---
+id: automatic-compaction
+title: "Automatic compaction"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after the data is ingested in 
Druid to improve query performance. Automatic compaction, or auto-compaction, 
refers to the system for automatic execution of compaction tasks managed by the 
[Druid Coordinator](../design/coordinator.md).

Review Comment:
   ```suggestion
   In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after ingestion to improve 
query performance. Automatic compaction, or auto-compaction, refers to the 
system for automatic execution of compaction tasks managed by the [Druid 
Coordinator](../design/coordinator.md).
   ```



##########
docs/configuration/index.md:
##########
@@ -987,8 +987,11 @@ Automatic compaction config example:
 }
 ```
 
-Compaction tasks fail when higher priority tasks cause Druid to revoke their 
locks. By default, realtime tasks like ingestion have a higher priority than 
compaction tasks. Therefore frequent conflicts between compaction tasks and 
realtime tasks can cause the coordinator's automatic compaction to get stuck.
-You may see this issue with streaming ingestion from Kafka and Kinesis, which 
ingest late-arriving data. To mitigate this problem, set `skipOffsetFromLatest` 
to a value large enough so that arriving data tends to fall outside the offset 
value from the current time. This way you can avoid conflicts between 
compaction tasks and realtime ingestion tasks.
+Compaction tasks fail when higher priority tasks cause Druid to revoke their 
locks. By default, realtime tasks like ingestion have a higher priority than 
compaction tasks. Therefore frequent conflicts between compaction tasks and 
realtime tasks can cause the Coordinator's automatic compaction to get stuck.

Review Comment:
   ```suggestion
   Compaction tasks fail when higher priority tasks cause Druid to revoke their 
locks. By default, realtime tasks like ingestion have a higher priority than 
compaction tasks. Frequent conflicts between compaction tasks and realtime 
tasks can cause the Coordinator's automatic compaction to hang.
   ```
   If you keep "Therefore" I think it needs a comma, but I think it reads OK 
without it.



##########
docs/ingestion/automatic-compaction.md:
##########
@@ -0,0 +1,198 @@
+---
+id: automatic-compaction
+title: "Automatic compaction"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after the data is ingested in 
Druid to improve query performance. Automatic compaction, or auto-compaction, 
refers to the system for automatic execution of compaction tasks managed by the 
[Druid Coordinator](../design/coordinator.md).
+
+The frequency of compaction tasks relies on the Coordinator [indexing 
period](../configuration/index.md#coordinator-operation), configured by 
`druid.coordinator.period.indexingPeriod`.
+The default indexing period is 30 minutes, meaning that the Coordinator first 
checks for segments to compact at most 30 minutes from when auto-compaction is 
enabled.
+Note this time period affects other Coordinator duties including merge and 
conversion tasks.
+To configure the frequency of compaction tasks, [create a new duty group for 
the Coordinator](#set-frequency-of-compaction-runs).

Review Comment:
   Not sure we've defined duty group? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] techdocsmith commented on a diff in pull request #12569: Docs for automatic compaction

Reply via email to