Re: [PR] docs: query from deep storage (druid)

via GitHub Mon, 24 Jul 2023 15:55:50 -0700


vtlim commented on code in PR #14609:
URL: https://github.com/apache/druid/pull/14609#discussion_r1272808644



##########
docs/api-reference/sql-api.md:
##########
@@ -186,4 +186,79 @@ Druid returns an HTTP 404 response in the following cases:
   - `sqlQueryId` is incorrect.
   - The query completes before your cancellation request is processed.
   
-Druid returns an HTTP 403 response for authorization failure.
\ No newline at end of file
+Druid returns an HTTP 403 response for authorization failure.
+
+## Query from deep storage
+
+> The `/sql/statements` endpoint used to query from deep storage is currently 
experimental.
+
+You can use the `sql/statements` endpoint to query segments that exist only in 
deep storage and are not loaded onto your Historical processes as determined by 
your load rules. 

Review Comment:
   The line above uses a forward slash preceding the endpoint but this line and 
others don't include it



##########
docs/api-reference/sql-api.md:
##########
@@ -186,4 +186,79 @@ Druid returns an HTTP 404 response in the following cases:
   - `sqlQueryId` is incorrect.
   - The query completes before your cancellation request is processed.
   
-Druid returns an HTTP 403 response for authorization failure.
\ No newline at end of file
+Druid returns an HTTP 403 response for authorization failure.
+
+## Query from deep storage
+
+> The `/sql/statements` endpoint used to query from deep storage is currently 
experimental.
+
+You can use the `sql/statements` endpoint to query segments that exist only in 
deep storage and are not loaded onto your Historical processes as determined by 
your load rules. 
+
+Note that at least part of a datasource must be available on a Historical 
process so that Druid can plan your query.
+
+For more information, see [Query from deep 
storage](../querying/query-from-deep-storage.md).
+
+### Submit a query
+
+To run a query from deep storage, send your query to the Router using the POST 
method:
+
+```
+POST https://ROUTER:8888/druid/v2/sql/statements
+```
+
+Submitting a query from deep storage uses the same syntax as any other Druid 
SQL query where the "query" field in the JSON object within the request payload 
contains your query. For example:
+
+```json
+{"query" : "SELECT COUNT(*) FROM data_source WHERE foo = 'bar'"}
+```  
+
+Generally, the `sql` and `sql/statements` endpoints support the same response 
body fields with minor differences. For general information about the available 
fields, see [submit a query to the `sql` endpoint](#submit-a-query).

Review Comment:
   ```suggestion
   Generally, the `sql` and `sql/statements` endpoints support the same 
response body fields with minor differences. For general information about the 
available fields, see [Submit a query to the `sql` endpoint](#submit-a-query).
   ```



##########
docs/api-reference/sql-api.md:
##########
@@ -186,4 +186,79 @@ Druid returns an HTTP 404 response in the following cases:
   - `sqlQueryId` is incorrect.
   - The query completes before your cancellation request is processed.
   
-Druid returns an HTTP 403 response for authorization failure.
\ No newline at end of file
+Druid returns an HTTP 403 response for authorization failure.
+
+## Query from deep storage
+
+> The `/sql/statements` endpoint used to query from deep storage is currently 
experimental.
+
+You can use the `sql/statements` endpoint to query segments that exist only in 
deep storage and are not loaded onto your Historical processes as determined by 
your load rules. 
+
+Note that at least part of a datasource must be available on a Historical 
process so that Druid can plan your query.
+
+For more information, see [Query from deep 
storage](../querying/query-from-deep-storage.md).
+
+### Submit a query
+
+To run a query from deep storage, send your query to the Router using the POST 
method:
+
+```
+POST https://ROUTER:8888/druid/v2/sql/statements
+```
+
+Submitting a query from deep storage uses the same syntax as any other Druid 
SQL query where the "query" field in the JSON object within the request payload 
contains your query. For example:
+
+```json
+{"query" : "SELECT COUNT(*) FROM data_source WHERE foo = 'bar'"}
+```  
+
+Generally, the `sql` and `sql/statements` endpoints support the same response 
body fields with minor differences. For general information about the available 
fields, see [submit a query to the `sql` endpoint](#submit-a-query).
+
+Keep the following in mind when submitting queries to the `sql/statements` 
endpoint:
+
+- There are additional context parameters  for `sql/statements`: 
+
+   - `executionMode`  determines how query results are fetched. The currently 
supported mode is `ASYNC`. 
+   - `selectDestination` set to `DURABLE_STORAGE` instructs Druid to write the 
results from SELECT queries to durable storage. Note that this requires you to 
have [durable storage for MSQ enabled](../operations/durable-storage.md).

Review Comment:
   Include a general term of this like the context parameter above?



##########
docs/design/architecture.md:
##########
@@ -70,12 +70,20 @@ Druid uses deep storage to store any data that has been 
ingested into the system
 storage accessible by every Druid server. In a clustered deployment, this is 
typically a distributed object store like S3 or
 HDFS, or a network mounted filesystem. In a single-server deployment, this is 
typically local disk.
 
-Druid uses deep storage only as a backup of your data and as a way to transfer 
data in the background between
-Druid processes. Druid stores data in files called _segments_. Historical 
processes cache data segments on
-local disk and serve queries from that cache as well as from an in-memory 
cache.
-This means that Druid never needs to access deep storage
-during a query, helping it offer the best query latencies possible. It also 
means that you must have enough disk space
-both in deep storage and across your Historical servers for the data you plan 
to load.
+Druid uses deep storage for the following purposes:
+
+- As a backup of your data, including those that get loaded onto Historical 
processes.
+- As a way to transfer data in the background between
+Druid processes. Druid stores data in files called _segments_. 

Review Comment:
   ```suggestion
   - As a way to transfer data in the background between Druid processes. Druid 
stores data in files called _segments_.
   ```



##########
docs/operations/rule-configuration.md:
##########
@@ -167,7 +167,7 @@ Set the following properties:
   - the segment interval starts any time after the rule interval starts.
 
   You can use this property to load segments with future start and end dates, 
where "future" is relative to the time when the Coordinator evaluates data 
against the rule. Defaults to `true`.
-- `tieredReplicants`: a map of tier names to the number of segment replicas 
for that tier.
+- `tieredReplicants`: a map of tier names to the number of segment replicas 
for that tier. If you set the replicants for a period to 0 on all tiers, you 
can still [query the data from deep 
storage](../querying/query-from-deep-storage.md)

Review Comment:
   ```suggestion
   - `tieredReplicants`: a map of tier names to the number of segment replicas 
for that tier. If you set the replicants for a period to 0 on all tiers, you 
can still [query the data from deep 
storage](../querying/query-from-deep-storage.md).
   ```



##########
docs/operations/rule-configuration.md:
##########
@@ -167,7 +167,7 @@ Set the following properties:
   - the segment interval starts any time after the rule interval starts.
 
   You can use this property to load segments with future start and end dates, 
where "future" is relative to the time when the Coordinator evaluates data 
against the rule. Defaults to `true`.
-- `tieredReplicants`: a map of tier names to the number of segment replicas 
for that tier.
+- `tieredReplicants`: a map of tier names to the number of segment replicas 
for that tier. If you set the replicants for a period to 0 on all tiers, you 
can still [query the data from deep 
storage](../querying/query-from-deep-storage.md)

Review Comment:
   What does this mean?
   >If you set the replicants for a period to 0 on all tiers,



##########
docs/querying/query-from-deep-storage.md:
##########
@@ -0,0 +1,187 @@
+---
+id: query-deep-storage
+title: "Query from deep storage"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+> Query from deep storage is an experimental feature.
+
+## Segments in deep storage
+
+Any data you ingest into Druid is already stored in deep storage, so you don't 
need to perform any additional configuration from that perspective. To take 
advantage of the space savings that querying from deep storage provides though, 
you need to make sure not all your segments get loaded onto Historical 
processes.
+
+To do this, configure [load 
rules](../operations/rule-configuration.md#load-rules) to load only the 
segments you do want on Historical processes. 

Review Comment:
   > only the segments you do want
   
   What criteria determines this? The segments corresponding to data to query 
with low latency?



##########
docs/operations/durable-storage.md:
##########
@@ -0,0 +1,66 @@
+---
+id: durable-storage
+title: "Durable storage for the multi-stage query engine"
+sidebar_label: "Durable storage"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+You can use durable storage to improve querying from deep storage and 
SQL-based ingestion.
+
+> Note that only S3 is supported as a durable storage location.
+
+Durable storage for queries from deep storage provides a location where you 
can write the results of deep storage queries to. Durable storage for SQL-based 
ingestion is used to temporarily house intermediate files, which can improve 
reliability.
+
+## Enable durable storage
+
+To enable durable storage, you need to set the following common service 
properties:
+
+```
+druid.msq.intermediate.storage.enable=true
+druid.msq.intermediate.storage.type=s3
+druid.msq.intermediate.storage.bucket=YOUR_BUCKET
+druid.msq.intermediate.storage.prefix=YOUR_PREFIX
+druid.msq.intermediate.storage.tempDir=/path/to/your/temp/dir
+```
+
+For detailed information about the settings related to durable storage, see 
[Durable storage 
configurations](../multi-stage-query/reference.md#durable-storage-configurations).
+
+
+## Use durable storage for SQL-based ingestion queries
+
+When you run a query, include the context parameter `durableShuffleStorage` 
and set it to `true`.
+
+For queries where you want to use fault tolerance for workers,  set 
`faultTolerance` to `true`, which automatically sets `durableShuffleStorage` to 
`true`.
+
+## Use durable storage for queries from deep storage
+
+When you run a query, include the context parameter `selectDestination` and 
set it to `DURABLE_STORAGE`. This context parameter configures queries from 
deep storage to write their results to durable storage.
+
+## Durable storage clean up
+
+To prevent durable storage from getting filled up with temporary files in case 
the tasks fail to clean them up, a periodic
+cleaner can be scheduled to clean the directories corresponding to which there 
isn't a controller task running. It utilizes
+the storage connector to work upon the durable storage. The durable storage 
location should only be utilized to store the output
+for cluster's MSQ tasks. If the location contains other files or directories, 
then they will get cleaned up as well.
+
+Enabling durable storage also enables the use of local disk to store temporary 
files, such as the intermediate files produced
+by the super sorter.  Tasks will use whatever has been configured for their 
temporary usage as described in [Configuring task storage 
sizes](../ingestion/tasks.md#configuring-task-storage-sizes)

Review Comment:
   ```suggestion
   by the super sorter.  Tasks will use whatever has been configured for their 
temporary usage as described in [Configuring task storage 
sizes](../ingestion/tasks.md#configuring-task-storage-sizes).
   ```



##########
docs/api-reference/sql-api.md:
##########
@@ -186,4 +186,79 @@ Druid returns an HTTP 404 response in the following cases:
   - `sqlQueryId` is incorrect.
   - The query completes before your cancellation request is processed.
   
-Druid returns an HTTP 403 response for authorization failure.
\ No newline at end of file
+Druid returns an HTTP 403 response for authorization failure.
+
+## Query from deep storage
+
+> The `/sql/statements` endpoint used to query from deep storage is currently 
experimental.
+
+You can use the `sql/statements` endpoint to query segments that exist only in 
deep storage and are not loaded onto your Historical processes as determined by 
your load rules. 

Review Comment:
   comments here also apply to query-from-deep-storage.md
   



##########
docs/api-reference/sql-api.md:
##########
@@ -186,4 +186,79 @@ Druid returns an HTTP 404 response in the following cases:
   - `sqlQueryId` is incorrect.
   - The query completes before your cancellation request is processed.
   
-Druid returns an HTTP 403 response for authorization failure.
\ No newline at end of file
+Druid returns an HTTP 403 response for authorization failure.
+
+## Query from deep storage
+
+> The `/sql/statements` endpoint used to query from deep storage is currently 
experimental.
+
+You can use the `sql/statements` endpoint to query segments that exist only in 
deep storage and are not loaded onto your Historical processes as determined by 
your load rules. 
+
+Note that at least part of a datasource must be available on a Historical 
process so that Druid can plan your query.
+
+For more information, see [Query from deep 
storage](../querying/query-from-deep-storage.md).
+
+### Submit a query
+
+To run a query from deep storage, send your query to the Router using the POST 
method:
+
+```
+POST https://ROUTER:8888/druid/v2/sql/statements
+```
+
+Submitting a query from deep storage uses the same syntax as any other Druid 
SQL query where the "query" field in the JSON object within the request payload 
contains your query. For example:
+
+```json
+{"query" : "SELECT COUNT(*) FROM data_source WHERE foo = 'bar'"}
+```  
+
+Generally, the `sql` and `sql/statements` endpoints support the same response 
body fields with minor differences. For general information about the available 
fields, see [submit a query to the `sql` endpoint](#submit-a-query).
+
+Keep the following in mind when submitting queries to the `sql/statements` 
endpoint:
+
+- There are additional context parameters  for `sql/statements`: 
+
+   - `executionMode`  determines how query results are fetched. The currently 
supported mode is `ASYNC`. 

Review Comment:
   ```suggestion
      - `executionMode`  determines how query results are fetched. Druid 
currently only supports `ASYNC`. 
   ```



##########
docs/api-reference/sql-api.md:
##########
@@ -186,4 +186,79 @@ Druid returns an HTTP 404 response in the following cases:
   - `sqlQueryId` is incorrect.
   - The query completes before your cancellation request is processed.
   
-Druid returns an HTTP 403 response for authorization failure.
\ No newline at end of file
+Druid returns an HTTP 403 response for authorization failure.
+
+## Query from deep storage
+
+> The `/sql/statements` endpoint used to query from deep storage is currently 
experimental.
+
+You can use the `sql/statements` endpoint to query segments that exist only in 
deep storage and are not loaded onto your Historical processes as determined by 
your load rules. 
+
+Note that at least part of a datasource must be available on a Historical 
process so that Druid can plan your query.
+
+For more information, see [Query from deep 
storage](../querying/query-from-deep-storage.md).
+
+### Submit a query
+
+To run a query from deep storage, send your query to the Router using the POST 
method:
+
+```
+POST https://ROUTER:8888/druid/v2/sql/statements
+```
+
+Submitting a query from deep storage uses the same syntax as any other Druid 
SQL query where the "query" field in the JSON object within the request payload 
contains your query. For example:
+
+```json
+{"query" : "SELECT COUNT(*) FROM data_source WHERE foo = 'bar'"}
+```  
+
+Generally, the `sql` and `sql/statements` endpoints support the same response 
body fields with minor differences. For general information about the available 
fields, see [submit a query to the `sql` endpoint](#submit-a-query).
+
+Keep the following in mind when submitting queries to the `sql/statements` 
endpoint:
+
+- There are additional context parameters  for `sql/statements`: 
+
+   - `executionMode`  determines how query results are fetched. The currently 
supported mode is `ASYNC`. 
+   - `selectDestination` set to `DURABLE_STORAGE` instructs Druid to write the 
results from SELECT queries to durable storage. Note that this requires you to 
have [durable storage for MSQ enabled](../operations/durable-storage.md).
+
+- The only supported results format is JSON.

Review Comment:
   ```suggestion
   - The only supported value for `resultFormat` is JSON.
   ```



##########
docs/api-reference/sql-api.md:
##########
@@ -186,4 +186,79 @@ Druid returns an HTTP 404 response in the following cases:
   - `sqlQueryId` is incorrect.
   - The query completes before your cancellation request is processed.
   
-Druid returns an HTTP 403 response for authorization failure.
\ No newline at end of file
+Druid returns an HTTP 403 response for authorization failure.
+
+## Query from deep storage
+
+> The `/sql/statements` endpoint used to query from deep storage is currently 
experimental.
+
+You can use the `sql/statements` endpoint to query segments that exist only in 
deep storage and are not loaded onto your Historical processes as determined by 
your load rules. 
+
+Note that at least part of a datasource must be available on a Historical 
process so that Druid can plan your query.
+
+For more information, see [Query from deep 
storage](../querying/query-from-deep-storage.md).
+
+### Submit a query
+
+To run a query from deep storage, send your query to the Router using the POST 
method:
+
+```
+POST https://ROUTER:8888/druid/v2/sql/statements
+```
+
+Submitting a query from deep storage uses the same syntax as any other Druid 
SQL query where the "query" field in the JSON object within the request payload 
contains your query. For example:
+
+```json
+{"query" : "SELECT COUNT(*) FROM data_source WHERE foo = 'bar'"}
+```  
+
+Generally, the `sql` and `sql/statements` endpoints support the same response 
body fields with minor differences. For general information about the available 
fields, see [submit a query to the `sql` endpoint](#submit-a-query).
+
+Keep the following in mind when submitting queries to the `sql/statements` 
endpoint:
+
+- There are additional context parameters  for `sql/statements`: 
+
+   - `executionMode`  determines how query results are fetched. The currently 
supported mode is `ASYNC`. 
+   - `selectDestination` set to `DURABLE_STORAGE` instructs Druid to write the 
results from SELECT queries to durable storage. Note that this requires you to 
have [durable storage for MSQ enabled](../operations/durable-storage.md).
+
+- The only supported results format is JSON.
+- Only the user who submits a query can see the results.
+
+
+### Get query status
+
+```
+GET https://ROUTER:8888/druid/v2/sql/statements/{queryID}
+```
+
+Returns the same response as the post API if the query is accepted or running. 
The response for a completed query includes the same information as an 
in-progress query with several additions:
+
+- A `result` object that summarizes information about your results, such as 
the total number of rows and a sample record
+- A `pages` object that includes the following information for each page of 
results:
+  -  `numRows`: the number of rows in that page of results
+  - `sizeInBytes`: the size of the page
+  - `id`: the page number that you can use to reference a specific page when 
you get query results
+
+
+### Get query results
+
+```
+GET 
https://ROUTER:8888/druid/v2/sql/statements/{queryID}/results?page=PAGENUMBER
+```
+
+Results are separated into pages, so you can use the optional `page` parameter 
to refine the results you get. When you retrieve the status of a completed 
query, Druid returns information about the composition of each page and its 
page number (`id`). 
+
+When getting query results, keep the following in mind:
+
+- JSON is the only supported result format

Review Comment:
   ```suggestion
   - JSON is the only supported result format.
   ```



##########
docs/design/architecture.md:
##########
@@ -70,12 +70,20 @@ Druid uses deep storage to store any data that has been 
ingested into the system
 storage accessible by every Druid server. In a clustered deployment, this is 
typically a distributed object store like S3 or
 HDFS, or a network mounted filesystem. In a single-server deployment, this is 
typically local disk.
 
-Druid uses deep storage only as a backup of your data and as a way to transfer 
data in the background between
-Druid processes. Druid stores data in files called _segments_. Historical 
processes cache data segments on
-local disk and serve queries from that cache as well as from an in-memory 
cache.
-This means that Druid never needs to access deep storage
-during a query, helping it offer the best query latencies possible. It also 
means that you must have enough disk space
-both in deep storage and across your Historical servers for the data you plan 
to load.
+Druid uses deep storage for the following purposes:
+
+- As a backup of your data, including those that get loaded onto Historical 
processes.
+- As a way to transfer data in the background between
+Druid processes. Druid stores data in files called _segments_. 
+- As the source data for queries that run against segments stored only in deep 
storage and not in Historical processes as determined by your load rules.
+
+Historical processes cache data segments on
+local disk and serve queries from that cache as well as from an in-memory 
cache. Segments on disk for Historical processes provide the low latency 
querying performance Druid is known for. You can query directly from deep 
storage though, which allows you to query segments that exist only in deep 
storage. This trades some performance to provide you with the ability to query 
more of your data without necessarily having to scale your Historical processes.
+
+When determining sizing for your storage, keep the following in mind:
+
+- Deep storage needs to be able to hold all the data that you ingest into Druid

Review Comment:
   ```suggestion
   - Deep storage needs to be able to hold all the data that you ingest into 
Druid.
   ```



##########
docs/design/deep-storage.md:
##########
@@ -25,7 +25,13 @@ title: "Deep storage"
 
 Deep storage is where segments are stored.  It is a storage mechanism that 
Apache Druid does not provide.  This deep storage infrastructure defines the 
level of durability of your data, as long as Druid processes can see this 
storage infrastructure and get at the segments stored on it, you will not lose 
data no matter how many Druid nodes you lose.  If segments disappear from this 
storage layer, then you will lose whatever data those segments represented.

Review Comment:
   ```suggestion
   Deep storage is where segments are stored.  It is a storage mechanism that 
Apache Druid does not provide.  This deep storage infrastructure defines the 
level of durability of your data. As long as Druid processes can see this 
storage infrastructure and get at the segments stored on it, you will not lose 
data no matter how many Druid nodes you lose.  If segments disappear from this 
storage layer, then you will lose whatever data those segments represented.
   ```



##########
docs/api-reference/sql-api.md:
##########
@@ -186,4 +186,79 @@ Druid returns an HTTP 404 response in the following cases:
   - `sqlQueryId` is incorrect.
   - The query completes before your cancellation request is processed.
   
-Druid returns an HTTP 403 response for authorization failure.
\ No newline at end of file
+Druid returns an HTTP 403 response for authorization failure.
+
+## Query from deep storage
+
+> The `/sql/statements` endpoint used to query from deep storage is currently 
experimental.
+
+You can use the `sql/statements` endpoint to query segments that exist only in 
deep storage and are not loaded onto your Historical processes as determined by 
your load rules. 
+
+Note that at least part of a datasource must be available on a Historical 
process so that Druid can plan your query.
+
+For more information, see [Query from deep 
storage](../querying/query-from-deep-storage.md).
+
+### Submit a query
+
+To run a query from deep storage, send your query to the Router using the POST 
method:
+
+```
+POST https://ROUTER:8888/druid/v2/sql/statements
+```
+
+Submitting a query from deep storage uses the same syntax as any other Druid 
SQL query where the "query" field in the JSON object within the request payload 
contains your query. For example:
+
+```json
+{"query" : "SELECT COUNT(*) FROM data_source WHERE foo = 'bar'"}
+```  
+
+Generally, the `sql` and `sql/statements` endpoints support the same response 
body fields with minor differences. For general information about the available 
fields, see [submit a query to the `sql` endpoint](#submit-a-query).
+
+Keep the following in mind when submitting queries to the `sql/statements` 
endpoint:
+
+- There are additional context parameters  for `sql/statements`: 
+
+   - `executionMode`  determines how query results are fetched. The currently 
supported mode is `ASYNC`. 
+   - `selectDestination` set to `DURABLE_STORAGE` instructs Druid to write the 
results from SELECT queries to durable storage. Note that this requires you to 
have [durable storage for MSQ enabled](../operations/durable-storage.md).
+
+- The only supported results format is JSON.
+- Only the user who submits a query can see the results.
+
+
+### Get query status
+
+```
+GET https://ROUTER:8888/druid/v2/sql/statements/{queryID}
+```
+
+Returns the same response as the post API if the query is accepted or running. 
The response for a completed query includes the same information as an 
in-progress query with several additions:

Review Comment:
   ```suggestion
   Returns information about the query associated with the given query ID. The 
response matches the response from the POST API if the query is accepted or 
running. The response for a completed query includes the same information as an 
in-progress query with several additions:
   ```



##########
docs/design/deep-storage.md:
##########
@@ -55,22 +61,28 @@ druid.storage.storageDirectory=/tmp/druid/localStorage
 The `druid.storage.storageDirectory` must be set to a different path than 
`druid.segmentCache.locations` or
 `druid.segmentCache.infoDir`.
 
-## Amazon S3 or S3-compatible
+### Amazon S3 or S3-compatible
 
 See [`druid-s3-extensions`](../development/extensions-core/s3.md).
 
-## Google Cloud Storage
+### Google Cloud Storage
 
 See [`druid-google-extensions`](../development/extensions-core/google.md).
 
-## Azure Blob Storage
+### Azure Blob Storage
 
 See [`druid-azure-extensions`](../development/extensions-core/azure.md).
 
-## HDFS
+### HDFS
 
 See [druid-hdfs-storage extension 
documentation](../development/extensions-core/hdfs.md).
 
-## Additional options
+### Additional options
 
 For additional deep storage options, please see our [extensions 
list](../configuration/extensions.md).
+
+## Querying from deep storage
+
+Although not as performant as querying segments stored on disk for Historicals 
processes, you can query from deep storage to access segments that you may not 
need frequently or with the extreme low latency Druid queries traditionally 
provide. You trade some performance for a total lower storage cost because you 
can access more of your data without the need to increase the number or 
capacity of your Historical processes.

Review Comment:
   ```suggestion
   Although not as performant as querying segments stored on disk for 
Historical processes, you can query from deep storage to access segments that 
you may not need frequently or with the extreme low latency Druid queries 
traditionally provide. You trade some performance for a total lower storage 
cost because you can access more of your data without the need to increase the 
number or capacity of your Historical processes.
   ```



##########
docs/api-reference/sql-api.md:
##########
@@ -186,4 +186,79 @@ Druid returns an HTTP 404 response in the following cases:
   - `sqlQueryId` is incorrect.
   - The query completes before your cancellation request is processed.
   
-Druid returns an HTTP 403 response for authorization failure.
\ No newline at end of file
+Druid returns an HTTP 403 response for authorization failure.
+
+## Query from deep storage
+
+> The `/sql/statements` endpoint used to query from deep storage is currently 
experimental.
+
+You can use the `sql/statements` endpoint to query segments that exist only in 
deep storage and are not loaded onto your Historical processes as determined by 
your load rules. 
+
+Note that at least part of a datasource must be available on a Historical 
process so that Druid can plan your query.
+
+For more information, see [Query from deep 
storage](../querying/query-from-deep-storage.md).
+
+### Submit a query
+
+To run a query from deep storage, send your query to the Router using the POST 
method:
+
+```
+POST https://ROUTER:8888/druid/v2/sql/statements
+```
+
+Submitting a query from deep storage uses the same syntax as any other Druid 
SQL query where the "query" field in the JSON object within the request payload 
contains your query. For example:
+
+```json
+{"query" : "SELECT COUNT(*) FROM data_source WHERE foo = 'bar'"}
+```  
+
+Generally, the `sql` and `sql/statements` endpoints support the same response 
body fields with minor differences. For general information about the available 
fields, see [submit a query to the `sql` endpoint](#submit-a-query).
+
+Keep the following in mind when submitting queries to the `sql/statements` 
endpoint:
+
+- There are additional context parameters  for `sql/statements`: 
+
+   - `executionMode`  determines how query results are fetched. The currently 
supported mode is `ASYNC`. 
+   - `selectDestination` set to `DURABLE_STORAGE` instructs Druid to write the 
results from SELECT queries to durable storage. Note that this requires you to 
have [durable storage for MSQ enabled](../operations/durable-storage.md).
+
+- The only supported results format is JSON.
+- Only the user who submits a query can see the results.
+
+
+### Get query status
+
+```
+GET https://ROUTER:8888/druid/v2/sql/statements/{queryID}
+```
+
+Returns the same response as the post API if the query is accepted or running. 
The response for a completed query includes the same information as an 
in-progress query with several additions:
+
+- A `result` object that summarizes information about your results, such as 
the total number of rows and a sample record
+- A `pages` object that includes the following information for each page of 
results:
+  -  `numRows`: the number of rows in that page of results
+  - `sizeInBytes`: the size of the page
+  - `id`: the page number that you can use to reference a specific page when 
you get query results
+
+
+### Get query results
+
+```
+GET 
https://ROUTER:8888/druid/v2/sql/statements/{queryID}/results?page=PAGENUMBER
+```
+
+Results are separated into pages, so you can use the optional `page` parameter 
to refine the results you get. When you retrieve the status of a completed 
query, Druid returns information about the composition of each page and its 
page number (`id`). 
+
+When getting query results, keep the following in mind:
+
+- JSON is the only supported result format
+- If you attempt to get the results for an in-progress query, Druid returns an 
error. 
+
+### Cancel a query
+
+```
+DELETE https://ROUTER:8888/druid/v2/sql/statements/{queryID}

Review Comment:
   Do the DELETE and GET requests only work for queries that were POSTed using 
`/sql/statements`?



##########
docs/design/architecture.md:
##########
@@ -70,12 +70,20 @@ Druid uses deep storage to store any data that has been 
ingested into the system
 storage accessible by every Druid server. In a clustered deployment, this is 
typically a distributed object store like S3 or
 HDFS, or a network mounted filesystem. In a single-server deployment, this is 
typically local disk.
 
-Druid uses deep storage only as a backup of your data and as a way to transfer 
data in the background between
-Druid processes. Druid stores data in files called _segments_. Historical 
processes cache data segments on
-local disk and serve queries from that cache as well as from an in-memory 
cache.
-This means that Druid never needs to access deep storage
-during a query, helping it offer the best query latencies possible. It also 
means that you must have enough disk space
-both in deep storage and across your Historical servers for the data you plan 
to load.
+Druid uses deep storage for the following purposes:
+
+- As a backup of your data, including those that get loaded onto Historical 
processes.
+- As a way to transfer data in the background between
+Druid processes. Druid stores data in files called _segments_. 
+- As the source data for queries that run against segments stored only in deep 
storage and not in Historical processes as determined by your load rules.
+
+Historical processes cache data segments on
+local disk and serve queries from that cache as well as from an in-memory 
cache. Segments on disk for Historical processes provide the low latency 
querying performance Druid is known for. You can query directly from deep 
storage though, which allows you to query segments that exist only in deep 
storage. This trades some performance to provide you with the ability to query 
more of your data without necessarily having to scale your Historical processes.

Review Comment:
   ```suggestion
   Historical processes cache data segments on local disk and serve queries 
from that cache as well as from an in-memory cache.
   Segments on disk for Historical processes provide the low latency querying 
performance Druid is known for.
   You can also query directly from deep storage. When you query segments that 
exist only in deep storage, you trade some performance in exchange for the 
ability to query more of your data without necessarily having to scale your 
Historical processes.
   ```



##########
docs/design/architecture.md:
##########
@@ -70,12 +70,20 @@ Druid uses deep storage to store any data that has been 
ingested into the system
 storage accessible by every Druid server. In a clustered deployment, this is 
typically a distributed object store like S3 or
 HDFS, or a network mounted filesystem. In a single-server deployment, this is 
typically local disk.
 
-Druid uses deep storage only as a backup of your data and as a way to transfer 
data in the background between
-Druid processes. Druid stores data in files called _segments_. Historical 
processes cache data segments on
-local disk and serve queries from that cache as well as from an in-memory 
cache.
-This means that Druid never needs to access deep storage
-during a query, helping it offer the best query latencies possible. It also 
means that you must have enough disk space
-both in deep storage and across your Historical servers for the data you plan 
to load.
+Druid uses deep storage for the following purposes:
+
+- As a backup of your data, including those that get loaded onto Historical 
processes.
+- As a way to transfer data in the background between
+Druid processes. Druid stores data in files called _segments_. 
+- As the source data for queries that run against segments stored only in deep 
storage and not in Historical processes as determined by your load rules.
+
+Historical processes cache data segments on
+local disk and serve queries from that cache as well as from an in-memory 
cache. Segments on disk for Historical processes provide the low latency 
querying performance Druid is known for. You can query directly from deep 
storage though, which allows you to query segments that exist only in deep 
storage. This trades some performance to provide you with the ability to query 
more of your data without necessarily having to scale your Historical processes.
+
+When determining sizing for your storage, keep the following in mind:
+
+- Deep storage needs to be able to hold all the data that you ingest into Druid
+- On disk storage for Historical processes need to be able to accommodate the 
data you want to load onto them to run queries on data you access frequently 
and need low latency for

Review Comment:
   Missing rest of sentence?



##########
docs/design/deep-storage.md:
##########
@@ -25,7 +25,13 @@ title: "Deep storage"
 
 Deep storage is where segments are stored.  It is a storage mechanism that 
Apache Druid does not provide.  This deep storage infrastructure defines the 
level of durability of your data, as long as Druid processes can see this 
storage infrastructure and get at the segments stored on it, you will not lose 
data no matter how many Druid nodes you lose.  If segments disappear from this 
storage layer, then you will lose whatever data those segments represented.
 
-## Local
+In addition to being the backing store for segments, you can use  [query from 
deep storage](#querying-from-deep-storage) and run queries against segments 
stored primarily in deep storage. Whether segments exist primarily in deep 
storage or in deep storage and on Historical processes, is determined by the 
[load rules](../operations/rule-configuration.md#load-rules) you configure.

Review Comment:
   ```suggestion
   In addition to being the backing store for segments, you can use [query from 
deep storage](#querying-from-deep-storage) and run queries against segments 
stored primarily in deep storage. The [load 
rules](../operations/rule-configuration.md#load-rules) you configure determine 
whether segments exist primarily in deep storage or in a combination of deep 
storage and Historical processes.
   ```



##########
docs/operations/durable-storage.md:
##########
@@ -0,0 +1,66 @@
+---
+id: durable-storage
+title: "Durable storage for the multi-stage query engine"
+sidebar_label: "Durable storage"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+You can use durable storage to improve querying from deep storage and 
SQL-based ingestion.
+
+> Note that only S3 is supported as a durable storage location.
+
+Durable storage for queries from deep storage provides a location where you 
can write the results of deep storage queries to. Durable storage for SQL-based 
ingestion is used to temporarily house intermediate files, which can improve 
reliability.
+
+## Enable durable storage
+
+To enable durable storage, you need to set the following common service 
properties:
+
+```
+druid.msq.intermediate.storage.enable=true
+druid.msq.intermediate.storage.type=s3
+druid.msq.intermediate.storage.bucket=YOUR_BUCKET
+druid.msq.intermediate.storage.prefix=YOUR_PREFIX
+druid.msq.intermediate.storage.tempDir=/path/to/your/temp/dir
+```
+
+For detailed information about the settings related to durable storage, see 
[Durable storage 
configurations](../multi-stage-query/reference.md#durable-storage-configurations).
+
+
+## Use durable storage for SQL-based ingestion queries
+
+When you run a query, include the context parameter `durableShuffleStorage` 
and set it to `true`.
+
+For queries where you want to use fault tolerance for workers,  set 
`faultTolerance` to `true`, which automatically sets `durableShuffleStorage` to 
`true`.
+
+## Use durable storage for queries from deep storage
+
+When you run a query, include the context parameter `selectDestination` and 
set it to `DURABLE_STORAGE`. This context parameter configures queries from 
deep storage to write their results to durable storage.
+
+## Durable storage clean up
+
+To prevent durable storage from getting filled up with temporary files in case 
the tasks fail to clean them up, a periodic
+cleaner can be scheduled to clean the directories corresponding to which there 
isn't a controller task running. It utilizes

Review Comment:
   nit: passive voice. 
   
   Also, how does one go about scheduling the periodic cleaner?



##########
docs/querying/query-from-deep-storage.md:
##########
@@ -0,0 +1,187 @@
+---
+id: query-deep-storage
+title: "Query from deep storage"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+> Query from deep storage is an experimental feature.
+
+## Segments in deep storage
+
+Any data you ingest into Druid is already stored in deep storage, so you don't 
need to perform any additional configuration from that perspective. To take 
advantage of the space savings that querying from deep storage provides though, 
you need to make sure not all your segments get loaded onto Historical 
processes.

Review Comment:
   ```suggestion
   Any data you ingest into Druid is already stored in deep storage, so you 
don't need to perform any additional configuration from that perspective. 
However, to take advantage of the space savings that querying from deep storage 
provides, make sure not all your segments get loaded onto Historical processes.
   ```



##########
docs/design/deep-storage.md:
##########
@@ -55,22 +61,28 @@ druid.storage.storageDirectory=/tmp/druid/localStorage
 The `druid.storage.storageDirectory` must be set to a different path than 
`druid.segmentCache.locations` or
 `druid.segmentCache.infoDir`.
 
-## Amazon S3 or S3-compatible
+### Amazon S3 or S3-compatible
 
 See [`druid-s3-extensions`](../development/extensions-core/s3.md).
 
-## Google Cloud Storage
+### Google Cloud Storage
 
 See [`druid-google-extensions`](../development/extensions-core/google.md).
 
-## Azure Blob Storage
+### Azure Blob Storage
 
 See [`druid-azure-extensions`](../development/extensions-core/azure.md).
 
-## HDFS
+### HDFS
 
 See [druid-hdfs-storage extension 
documentation](../development/extensions-core/hdfs.md).
 
-## Additional options
+### Additional options
 
 For additional deep storage options, please see our [extensions 
list](../configuration/extensions.md).
+
+## Querying from deep storage
+
+Although not as performant as querying segments stored on disk for Historicals 
processes, you can query from deep storage to access segments that you may not 
need frequently or with the extreme low latency Druid queries traditionally 
provide. You trade some performance for a total lower storage cost because you 
can access more of your data without the need to increase the number or 
capacity of your Historical processes.
+
+For information about how to run queries, see [Query from deep 
storage](../querying/query-from-deep-storage.md)

Review Comment:
   ```suggestion
   For information about how to run queries, see [Query from deep 
storage](../querying/query-from-deep-storage.md).
   ```



##########
docs/operations/durable-storage.md:
##########
@@ -0,0 +1,66 @@
+---
+id: durable-storage
+title: "Durable storage for the multi-stage query engine"
+sidebar_label: "Durable storage"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+You can use durable storage to improve querying from deep storage and 
SQL-based ingestion.
+
+> Note that only S3 is supported as a durable storage location.
+
+Durable storage for queries from deep storage provides a location where you 
can write the results of deep storage queries to. Durable storage for SQL-based 
ingestion is used to temporarily house intermediate files, which can improve 
reliability.
+
+## Enable durable storage
+
+To enable durable storage, you need to set the following common service 
properties:
+
+```
+druid.msq.intermediate.storage.enable=true
+druid.msq.intermediate.storage.type=s3
+druid.msq.intermediate.storage.bucket=YOUR_BUCKET
+druid.msq.intermediate.storage.prefix=YOUR_PREFIX
+druid.msq.intermediate.storage.tempDir=/path/to/your/temp/dir
+```
+
+For detailed information about the settings related to durable storage, see 
[Durable storage 
configurations](../multi-stage-query/reference.md#durable-storage-configurations).
+
+
+## Use durable storage for SQL-based ingestion queries
+
+When you run a query, include the context parameter `durableShuffleStorage` 
and set it to `true`.
+
+For queries where you want to use fault tolerance for workers,  set 
`faultTolerance` to `true`, which automatically sets `durableShuffleStorage` to 
`true`.
+
+## Use durable storage for queries from deep storage
+
+When you run a query, include the context parameter `selectDestination` and 
set it to `DURABLE_STORAGE`. This context parameter configures queries from 
deep storage to write their results to durable storage.
+
+## Durable storage clean up
+
+To prevent durable storage from getting filled up with temporary files in case 
the tasks fail to clean them up, a periodic
+cleaner can be scheduled to clean the directories corresponding to which there 
isn't a controller task running. It utilizes
+the storage connector to work upon the durable storage. The durable storage 
location should only be utilized to store the output
+for cluster's MSQ tasks. If the location contains other files or directories, 
then they will get cleaned up as well.
+
+Enabling durable storage also enables the use of local disk to store temporary 
files, such as the intermediate files produced
+by the super sorter.  Tasks will use whatever has been configured for their 
temporary usage as described in [Configuring task storage 
sizes](../ingestion/tasks.md#configuring-task-storage-sizes)
+If the configured limit is too low, `NotEnoughTemporaryStorageFault` may be 
thrown.

Review Comment:
   ```suggestion
   If the configured limit is too low, Druid may throw the error, 
`NotEnoughTemporaryStorageFault`.
   ```



##########
docs/querying/query-from-deep-storage.md:
##########
@@ -0,0 +1,187 @@
+---
+id: query-deep-storage
+title: "Query from deep storage"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+> Query from deep storage is an experimental feature.
+
+## Segments in deep storage
+
+Any data you ingest into Druid is already stored in deep storage, so you don't 
need to perform any additional configuration from that perspective. To take 
advantage of the space savings that querying from deep storage provides though, 
you need to make sure not all your segments get loaded onto Historical 
processes.
+
+To do this, configure [load 
rules](../operations/rule-configuration.md#load-rules) to load only the 
segments you do want on Historical processes. 
+
+For example, use the `loadByInterval` load rule and set  
`tieredReplicants.YOUR_TIER` (such as `tieredReplicants._default_tier`) to 0 
for a specific interval. If the default tier is the only tier in your cluster, 
this results in that interval only being available from deep storage.
+
+For example, the following interval load rule assigns 0 replicants for the 
specified interval to the tier `_default_tier`:
+
+```
+  {
+    "interval": "2017-01-19T00:00:00.000Z/2017-09-20T00:00:00.000Z",
+    "tieredReplicants": {
+      "_default_tier": 0
+    },
+    "useDefaultTierForNull": true,
+    "type": "loadByInterval"
+  }
+```
+
+This means that any segments within that interval don't get loaded onto 
`_default_tier` . Then, create a corresponding drop rule so that Druid drops 
the segments from Historical tiers if they were previously loaded.

Review Comment:
   Include an example of the corresponding drop rule?



##########
docs/querying/query-from-deep-storage.md:
##########
@@ -0,0 +1,187 @@
+---
+id: query-deep-storage
+title: "Query from deep storage"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+> Query from deep storage is an experimental feature.
+
+## Segments in deep storage
+
+Any data you ingest into Druid is already stored in deep storage, so you don't 
need to perform any additional configuration from that perspective. To take 
advantage of the space savings that querying from deep storage provides though, 
you need to make sure not all your segments get loaded onto Historical 
processes.
+
+To do this, configure [load 
rules](../operations/rule-configuration.md#load-rules) to load only the 
segments you do want on Historical processes. 
+
+For example, use the `loadByInterval` load rule and set  
`tieredReplicants.YOUR_TIER` (such as `tieredReplicants._default_tier`) to 0 
for a specific interval. If the default tier is the only tier in your cluster, 
this results in that interval only being available from deep storage.
+
+For example, the following interval load rule assigns 0 replicants for the 
specified interval to the tier `_default_tier`:
+
+```
+  {
+    "interval": "2017-01-19T00:00:00.000Z/2017-09-20T00:00:00.000Z",
+    "tieredReplicants": {
+      "_default_tier": 0
+    },
+    "useDefaultTierForNull": true,
+    "type": "loadByInterval"
+  }
+```
+
+This means that any segments within that interval don't get loaded onto 
`_default_tier` . Then, create a corresponding drop rule so that Druid drops 
the segments from Historical tiers if they were previously loaded.
+
+You can verify that a segment is not loaded on any Historical tiers by 
querying the Druid metadata table:
+
+```sql
+SELECT "segment_id", "replication_factor" FROM sys."segments" WHERE 
"replication_factor" = 0 AND "datasource" = YOUR_DATASOURCE
+```
+
+Segments with a `replication_factor` of `0` are not assigned to any Historical 
tiers. Queries you run against these segments are run directly against the 
segment in deep storage. 

Review Comment:
   ```suggestion
   Segments with a `replication_factor` of `0` are not assigned to any 
Historical tiers. Queries against these segments are run directly against the 
segment in deep storage. 
   ```



##########
docs/querying/query-from-deep-storage.md:
##########
@@ -0,0 +1,187 @@
+---
+id: query-deep-storage
+title: "Query from deep storage"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+> Query from deep storage is an experimental feature.
+
+## Segments in deep storage
+
+Any data you ingest into Druid is already stored in deep storage, so you don't 
need to perform any additional configuration from that perspective. To take 
advantage of the space savings that querying from deep storage provides though, 
you need to make sure not all your segments get loaded onto Historical 
processes.
+
+To do this, configure [load 
rules](../operations/rule-configuration.md#load-rules) to load only the 
segments you do want on Historical processes. 
+
+For example, use the `loadByInterval` load rule and set  
`tieredReplicants.YOUR_TIER` (such as `tieredReplicants._default_tier`) to 0 
for a specific interval. If the default tier is the only tier in your cluster, 
this results in that interval only being available from deep storage.
+
+For example, the following interval load rule assigns 0 replicants for the 
specified interval to the tier `_default_tier`:
+
+```
+  {
+    "interval": "2017-01-19T00:00:00.000Z/2017-09-20T00:00:00.000Z",
+    "tieredReplicants": {
+      "_default_tier": 0
+    },
+    "useDefaultTierForNull": true,
+    "type": "loadByInterval"
+  }
+```
+
+This means that any segments within that interval don't get loaded onto 
`_default_tier` . Then, create a corresponding drop rule so that Druid drops 
the segments from Historical tiers if they were previously loaded.

Review Comment:
   ```suggestion
   This means that any segments within that interval don't get loaded onto 
`_default_tier`. Then, create a corresponding drop rule so that Druid drops the 
segments from Historical tiers if they were previously loaded.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] docs: query from deep storage (druid)

Reply via email to