jihoonson commented on a change in pull request #8173: Add a cluster-wide 
configuration to force timeChunk lock and add a doc for segment locking
URL: https://github.com/apache/incubator-druid/pull/8173#discussion_r310265722
 
 

 ##########
 File path: docs/content/ingestion/locking-and-priority.md
 ##########
 @@ -24,30 +24,73 @@ title: "Task Locking & Priority"
 
 # Task Locking & Priority
 
+This document explains the task locking system in Druid. Druid's locking system
+and versioning system are tighly coupled with each other to guarantee the 
correctness of ingested data.
+
+## Overshadow Relation between Segments
+
+You can run a task to overwrite existing data. The segments created by an 
overwriting task _overshadows_ existing segments.
+Note that the overshadow relation holds only for the same time chunk and the 
same data source.
+These overshadowed segments are not considered in query processing to filter 
out stale data.
+
+A segment `s1` can overshadow another `s2` if
+
+- `s1` has a higher major version than `s2`.
+- `s1` has the same major version and a higher minor version than `s2`.
+
+Here are some examples.
+
+- A segment of the major version of `2019-01-01T00:00:00.000Z` and the minor 
version of `0` overshadows
+ another of the major version of `2018-01-01T00:00:00.000Z` and the minor 
version of `1`.
+- A segment of the major version of `2019-01-01T00:00:00.000Z` and the minor 
version of `1` overshadows
+ another of the major version of `2019-01-01T00:00:00.000Z` and the minor 
version of `0`.
+
 ## Locking
 
-Once an Overlord process accepts a task, the task acquires locks for the data 
source and intervals specified in the task.
+If you are running two or more [druid tasks](./tasks.html) which generate 
segments for the same data source and the same time chunk,
+the generated segments could potentially overshadow each other which could 
lead to incorrect query results.
 
-There are two lock types, i.e., _shared lock_ and _exclusive lock_.
+To avoid this problem, tasks should get locks first before creating any 
segment in Druid.
+There are two types of locks, i.e., _time chunk lock_ and _segment lock_, and 
each task can use different types of locks.
 
-- A task needs to acquire a shared lock before it reads segments of an 
interval. Multiple shared locks can be acquired for the same dataSource and 
interval. Shared locks are always preemptable, but they don't preempt each 
other.
-- A task needs to acquire an exclusive lock before it writes segments for an 
interval. An exclusive lock is also preemptable except while the task is 
publishing segments.
+When the time chunk lock is used, a task locks the entire time chunk of a data 
source where generated segments will be written.
+For example, suppose we have a task ingesting data into the time chunk of 
`2019-01-01T00:00:00.000Z/2019-01-02T00:00:00.000Z` of the `wikipedia` data 
source.
+With the time chunk locking, this task should lock the entire time chunk of 
`2019-01-01T00:00:00.000Z/2019-01-02T00:00:00.000Z` of the `wikipedia` data 
source
 
 Review comment:
   👍 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to