capistrant opened a new issue, #12526:
URL: https://github.com/apache/druid/issues/12526
### Motivation
Druid currently provides 3 guard rails to the Coordinator when it comes to
locating and killing unused segments.
1. Enabling/Disabling automated segment killing
* A binary choice that automated killing by the coordinator can or cannot
happen
* Allows the cluster operator to decide if there should be any automated
cleanup of unused segments at all
2. The idea of "killable datasources"
* A cluster operator can identify a subset of the overall datasource set
of killable. To be killable means that unused segments can be permanently killed
* Allows the cluster operator to insulate datasources from automated
cleanup if needed
3. `druid.coordinator.kill.durationToRetain` and
`druid.coordinator.kill.ignoreDurationToRetain`
* Configuration that is used to create a protected date `now -
druid.coordinator.killDurationToRetain` where any segment whose end date is
after this date, can't be deleted automatically
* This interval can be ignored completely by setting
`druid.coordinator.kill.ignoreDurationToRetain` to `true`
* The main benefit of this configuration is that it allows the operator to
keep unused data around in deep store in case a load rule change matches these
unused segments, thus making them usable again. This would prevent end users
from having to re-ingest the unused data had it been automatically cleaned up.
* Note that this includes unused overshadowed segments which wouldn't be
useable again even if the load rules changed (at least not easily usable again
as far as. I am aware)
I am of the opinion that this aforementioned guard rails are not adequate in
two areas:
The most glaring thing missing is a buffer window between a segment being
marked unused and being permanently deleted from druid. As of now, if an unused
segment is not protected by 1, 2, or 3 above, it is liable to be killed
immediately after being marked unused in the event that the
`KillUnusedSegments` duty runs on the coordinator at this time. Regardless,
this window between marking unused and killing is an indeterminate window bound
by the instant of marking and the configured period for how often the killing
logic should run. From an operators perspective, killing must be thought of
being immediate due to its potential of being true. This same idea is raised in
https://github.com/apache/druid/issues/9889 ... This is similar to a trash
folder in HDFS. It is there to prevent user error from causing un-wanted data
loss.
Another motivating factor is the retention of unused overshadowed segments
within the durationToRetain interval. It is not clear to me that there is any
straight-forward mechanism for bringing an overshadowed segment back to life
once it is marked unused. If that is true, these segments should be excluded
from durationToRetain to prevent the buildup data that cannot become used
again, even if the load rule chain for the parent datasource is changed to
include the segments interval.
### Proposed changes
Adding a last_used column to the druid_segments table. When the boolean
column, used, is updated, this column is updated with the current timestamp.
The coordinator now uses this `last_used` column in conjunction with
`druid.coordinator.kill.bufferPeriod` to filter out segments when looking for
segments to kill. The decision flow would now be: is kill enabled --> is the
datasource killable --> is the segment last_used date beyond the bufferPeriod
--> does the segment end date pre-date the timestamp created by `now` -
`druid.coordinator.killDurationToRetain`
Exempting overshadowed segments from the last decision point mentioned above.
> does the segment end date pre-date the timestamp created by `now` -
`druid.coordinator.killDurationToRetain`
### Rationale
#### Alternative Implementation
https://github.com/apache/druid/pull/10877#discussion_r864088454 There is
some discussion here regarding the desired solution. An alternative to the
schema change is embedding this last_used date in the payload stored per
segment in the metadata store. The plus to this is that it removes the schema
change required to facilitate the upgrade of an existing cluster. The downside
to this is the need to extract the embedded date from the payload when
evaluating unused segments for potential killing. The work can no longer be
pushed down to the metadata query, but rather must live in Druid code.
### Operational impact
This is a breaking change for upgrades. The coordinator will now expect a
different druid_segments schema, meaning a cluster operator will need to update
the schema and populate the new column for all existing rows before upgrading
the coordinator. There should be no impediment to rolling downgrade
functionality wise. There will however, be wasted storage in the metastore due
to the now unused column.
To mitigate the disruption to existing clusters we should provide scripts to
alter the metastore for all supported metadata storage platforms (Alternatively
we could allow Druid code to alter the schema on startup in the new version,
but this would require DDL permissions for the metadata storage user)
### Test plan
The new and changed logic for searching for segments to kill should be able
to be automated by building on top of our existing integration test suite.
The migration path will also need a testing plan that can be nailed down
after implementation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]