Re: [PR] docs: query from deep storage (druid)

via GitHub Wed, 26 Jul 2023 21:07:34 -0700


cryptoe commented on code in PR #14609:
URL: https://github.com/apache/druid/pull/14609#discussion_r1275697116



##########
docs/api-reference/sql-api.md:
##########
@@ -186,4 +186,79 @@ Druid returns an HTTP 404 response in the following cases:
   - `sqlQueryId` is incorrect.
   - The query completes before your cancellation request is processed.
   
-Druid returns an HTTP 403 response for authorization failure.
\ No newline at end of file
+Druid returns an HTTP 403 response for authorization failure.
+
+## Query from deep storage
+
+> The `/sql/statements` endpoint used to query from deep storage is currently 
experimental.
+
+You can use the `sql/statements` endpoint to query segments that exist only in 
deep storage and are not loaded onto your Historical processes as determined by 
your load rules. 
+
+Note that at least part of a datasource must be available on a Historical 
process so that Druid can plan your query.
+
+For more information, see [Query from deep 
storage](../querying/query-from-deep-storage.md).
+
+### Submit a query
+
+To run a query from deep storage, send your query to the Router using the POST 
method:
+
+```
+POST https://ROUTER:8888/druid/v2/sql/statements
+```
+
+Submitting a query from deep storage uses the same syntax as any other Druid 
SQL query where the "query" field in the JSON object within the request payload 
contains your query. For example:
+
+```json
+{"query" : "SELECT COUNT(*) FROM data_source WHERE foo = 'bar'"}
+```  
+
+Generally, the `sql` and `sql/statements` endpoints support the same response 
body fields with minor differences. For general information about the available 
fields, see [Submit a query to the `sql` endpoint](#submit-a-query).
+
+Keep the following in mind when submitting queries to the `sql/statements` 
endpoint:
+
+- There are additional context parameters  for `sql/statements`: 
+
+   - `executionMode`  determines how query results are fetched. Druid 
currently only supports `ASYNC`. 
+   - `selectDestination` set to `DURABLE_STORAGE` instructs Druid to write the 
results from SELECT queries to durable storage. Note that this requires you to 
have [durable storage for MSQ enabled](../operations/durable-storage.md).
+
+- The only supported value for `resultFormat` is JSON.
+- Only the user who submits a query can see the results.

Review Comment:
   The response why execution mode is async is this pojo: 
https://github.com/apache/druid/blob/master/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/sql/entity/SqlStatementResult.java
   
   We might want to mention that in the response payload.



##########
docs/api-reference/sql-api.md:
##########
@@ -186,4 +186,79 @@ Druid returns an HTTP 404 response in the following cases:
   - `sqlQueryId` is incorrect.
   - The query completes before your cancellation request is processed.
   
-Druid returns an HTTP 403 response for authorization failure.
\ No newline at end of file
+Druid returns an HTTP 403 response for authorization failure.
+
+## Query from deep storage
+
+> The `/sql/statements` endpoint used to query from deep storage is currently 
experimental.
+
+You can use the `sql/statements` endpoint to query segments that exist only in 
deep storage and are not loaded onto your Historical processes as determined by 
your load rules. 
+
+Note that at least part of a datasource must be available on a Historical 
process so that Druid can plan your query.
+
+For more information, see [Query from deep 
storage](../querying/query-from-deep-storage.md).
+
+### Submit a query
+
+To run a query from deep storage, send your query to the Router using the POST 
method:
+
+```
+POST https://ROUTER:8888/druid/v2/sql/statements
+```
+
+Submitting a query from deep storage uses the same syntax as any other Druid 
SQL query where the "query" field in the JSON object within the request payload 
contains your query. For example:
+
+```json
+{"query" : "SELECT COUNT(*) FROM data_source WHERE foo = 'bar'"}
+```  
+
+Generally, the `sql` and `sql/statements` endpoints support the same response 
body fields with minor differences. For general information about the available 
fields, see [Submit a query to the `sql` endpoint](#submit-a-query).
+
+Keep the following in mind when submitting queries to the `sql/statements` 
endpoint:
+
+- There are additional context parameters  for `sql/statements`: 
+
+   - `executionMode`  determines how query results are fetched. Druid 
currently only supports `ASYNC`. 
+   - `selectDestination` set to `DURABLE_STORAGE` instructs Druid to write the 
results from SELECT queries to durable storage. Note that this requires you to 
have [durable storage for MSQ enabled](../operations/durable-storage.md).
+
+- The only supported value for `resultFormat` is JSON.
+- Only the user who submits a query can see the results.
+
+
+### Get query status
+
+```
+GET https://ROUTER:8888/druid/v2/sql/statements/{queryID}
+```
+
+Returns information about the query associated with the given query ID. The 
response matches the response from the POST API if the query is accepted or 
running. The response for a completed query includes the same information as an 
in-progress query with several additions:

Review Comment:
   Get query status response and the `postReq` response is the same when 
executionMode=async



##########
docs/api-reference/sql-api.md:
##########
@@ -186,4 +186,79 @@ Druid returns an HTTP 404 response in the following cases:
   - `sqlQueryId` is incorrect.
   - The query completes before your cancellation request is processed.
   
-Druid returns an HTTP 403 response for authorization failure.
\ No newline at end of file
+Druid returns an HTTP 403 response for authorization failure.
+
+## Query from deep storage
+
+> The `/sql/statements` endpoint used to query from deep storage is currently 
experimental.
+
+You can use the `sql/statements` endpoint to query segments that exist only in 
deep storage and are not loaded onto your Historical processes as determined by 
your load rules. 
+
+Note that at least part of a datasource must be available on a Historical 
process so that Druid can plan your query.

Review Comment:
   ```suggestion
   Note that at least one segment of a datasource must be available on a 
Historical process so that the broker can plan your query. A quick way to check 
this is that data source should be visible on the druid console.
   ```



##########
docs/api-reference/sql-api.md:
##########
@@ -186,4 +186,79 @@ Druid returns an HTTP 404 response in the following cases:
   - `sqlQueryId` is incorrect.
   - The query completes before your cancellation request is processed.
   
-Druid returns an HTTP 403 response for authorization failure.
\ No newline at end of file
+Druid returns an HTTP 403 response for authorization failure.
+
+## Query from deep storage
+
+> The `/sql/statements` endpoint used to query from deep storage is currently 
experimental.
+
+You can use the `sql/statements` endpoint to query segments that exist only in 
deep storage and are not loaded onto your Historical processes as determined by 
your load rules. 
+
+Note that at least part of a datasource must be available on a Historical 
process so that Druid can plan your query.
+
+For more information, see [Query from deep 
storage](../querying/query-from-deep-storage.md).
+
+### Submit a query
+
+To run a query from deep storage, send your query to the Router using the POST 
method:
+
+```
+POST https://ROUTER:8888/druid/v2/sql/statements
+```
+
+Submitting a query from deep storage uses the same syntax as any other Druid 
SQL query where the "query" field in the JSON object within the request payload 
contains your query. For example:
+
+```json
+{"query" : "SELECT COUNT(*) FROM data_source WHERE foo = 'bar'"}
+```  
+
+Generally, the `sql` and `sql/statements` endpoints support the same response 
body fields with minor differences. For general information about the available 
fields, see [Submit a query to the `sql` endpoint](#submit-a-query).
+
+Keep the following in mind when submitting queries to the `sql/statements` 
endpoint:
+
+- There are additional context parameters  for `sql/statements`: 
+
+   - `executionMode`  determines how query results are fetched. Druid 
currently only supports `ASYNC`. 
+   - `selectDestination` set to `DURABLE_STORAGE` instructs Druid to write the 
results from SELECT queries to durable storage. Note that this requires you to 
have [durable storage for MSQ enabled](../operations/durable-storage.md).
+
+- The only supported value for `resultFormat` is JSON.
+- Only the user who submits a query can see the results.
+
+
+### Get query status
+
+```
+GET https://ROUTER:8888/druid/v2/sql/statements/{queryID}
+```
+
+Returns information about the query associated with the given query ID. The 
response matches the response from the POST API if the query is accepted or 
running. The response for a completed query includes the same information as an 
in-progress query with several additions:
+
+- A `result` object that summarizes information about your results, such as 
the total number of rows and a sample record
+- A `pages` object that includes the following information for each page of 
results:
+  -  `numRows`: the number of rows in that page of results
+  - `sizeInBytes`: the size of the page
+  - `id`: the page number that you can use to reference a specific page when 
you get query results
+
+
+### Get query results
+
+```
+GET 
https://ROUTER:8888/druid/v2/sql/statements/{queryID}/results?page=PAGENUMBER
+```
+
+Results are separated into pages, so you can use the optional `page` parameter 
to refine the results you get. When you retrieve the status of a completed 
query, Druid returns information about the composition of each page and its 
page number (`id`). 

Review Comment:
   Should we link the get query status api here ?



##########
docs/api-reference/sql-api.md:
##########
@@ -186,4 +186,79 @@ Druid returns an HTTP 404 response in the following cases:
   - `sqlQueryId` is incorrect.
   - The query completes before your cancellation request is processed.
   
-Druid returns an HTTP 403 response for authorization failure.
\ No newline at end of file
+Druid returns an HTTP 403 response for authorization failure.
+
+## Query from deep storage
+
+> The `/sql/statements` endpoint used to query from deep storage is currently 
experimental.
+
+You can use the `sql/statements` endpoint to query segments that exist only in 
deep storage and are not loaded onto your Historical processes as determined by 
your load rules. 
+
+Note that at least part of a datasource must be available on a Historical 
process so that Druid can plan your query.
+
+For more information, see [Query from deep 
storage](../querying/query-from-deep-storage.md).
+
+### Submit a query
+
+To run a query from deep storage, send your query to the Router using the POST 
method:
+
+```
+POST https://ROUTER:8888/druid/v2/sql/statements
+```
+
+Submitting a query from deep storage uses the same syntax as any other Druid 
SQL query where the "query" field in the JSON object within the request payload 
contains your query. For example:
+
+```json
+{"query" : "SELECT COUNT(*) FROM data_source WHERE foo = 'bar'"}
+```  
+
+Generally, the `sql` and `sql/statements` endpoints support the same response 
body fields with minor differences. For general information about the available 
fields, see [Submit a query to the `sql` endpoint](#submit-a-query).
+
+Keep the following in mind when submitting queries to the `sql/statements` 
endpoint:
+
+- There are additional context parameters  for `sql/statements`: 
+
+   - `executionMode`  determines how query results are fetched. Druid 
currently only supports `ASYNC`. 
+   - `selectDestination` set to `DURABLE_STORAGE` instructs Druid to write the 
results from SELECT queries to durable storage. Note that this requires you to 
have [durable storage for MSQ enabled](../operations/durable-storage.md).
+
+- The only supported value for `resultFormat` is JSON.
+- Only the user who submits a query can see the results.
+
+
+### Get query status
+
+```
+GET https://ROUTER:8888/druid/v2/sql/statements/{queryID}
+```
+
+Returns information about the query associated with the given query ID. The 
response matches the response from the POST API if the query is accepted or 
running. The response for a completed query includes the same information as an 
in-progress query with several additions:
+
+- A `result` object that summarizes information about your results, such as 
the total number of rows and a sample record
+- A `pages` object that includes the following information for each page of 
results:
+  -  `numRows`: the number of rows in that page of results
+  - `sizeInBytes`: the size of the page
+  - `id`: the page number that you can use to reference a specific page when 
you get query results
+
+
+### Get query results
+
+```
+GET 
https://ROUTER:8888/druid/v2/sql/statements/{queryID}/results?page=PAGENUMBER
+```
+
+Results are separated into pages, so you can use the optional `page` parameter 
to refine the results you get. When you retrieve the status of a completed 
query, Druid returns information about the composition of each page and its 
page number (`id`). 
+
+When getting query results, keep the following in mind:
+
+- JSON is the only supported result format.
+- If you attempt to get the results for an in-progress query, Druid returns an 
error. 
+
+### Cancel a query
+
+```
+DELETE https://ROUTER:8888/druid/v2/sql/statements/{queryID}
+```
+
+Cancels a running or accepted query. 
+
+Druid returns an HTTP 202 response for successful cancellation requests. If 
the query is already complete or can't be found, Druid returns an HTTP 500 
error with an error message describing the issue.

Review Comment:
   if the query is already completed then we return a 200. 
   If the query cannot be found we return a 404. 



##########
docs/api-reference/sql-api.md:
##########
@@ -186,4 +186,79 @@ Druid returns an HTTP 404 response in the following cases:
   - `sqlQueryId` is incorrect.
   - The query completes before your cancellation request is processed.
   
-Druid returns an HTTP 403 response for authorization failure.
\ No newline at end of file
+Druid returns an HTTP 403 response for authorization failure.
+
+## Query from deep storage
+
+> The `/sql/statements` endpoint used to query from deep storage is currently 
experimental.
+
+You can use the `sql/statements` endpoint to query segments that exist only in 
deep storage and are not loaded onto your Historical processes as determined by 
your load rules. 
+
+Note that at least part of a datasource must be available on a Historical 
process so that Druid can plan your query.
+
+For more information, see [Query from deep 
storage](../querying/query-from-deep-storage.md).
+
+### Submit a query
+
+To run a query from deep storage, send your query to the Router using the POST 
method:
+
+```
+POST https://ROUTER:8888/druid/v2/sql/statements
+```
+
+Submitting a query from deep storage uses the same syntax as any other Druid 
SQL query where the "query" field in the JSON object within the request payload 
contains your query. For example:
+
+```json
+{"query" : "SELECT COUNT(*) FROM data_source WHERE foo = 'bar'"}
+```  
+
+Generally, the `sql` and `sql/statements` endpoints support the same response 
body fields with minor differences. For general information about the available 
fields, see [Submit a query to the `sql` endpoint](#submit-a-query).
+
+Keep the following in mind when submitting queries to the `sql/statements` 
endpoint:
+
+- There are additional context parameters  for `sql/statements`: 
+
+   - `executionMode`  determines how query results are fetched. Druid 
currently only supports `ASYNC`. 
+   - `selectDestination` set to `DURABLE_STORAGE` instructs Druid to write the 
results from SELECT queries to durable storage. Note that this requires you to 
have [durable storage for MSQ enabled](../operations/durable-storage.md).
+
+- The only supported value for `resultFormat` is JSON.
+- Only the user who submits a query can see the results.
+
+
+### Get query status
+
+```
+GET https://ROUTER:8888/druid/v2/sql/statements/{queryID}
+```
+
+Returns information about the query associated with the given query ID. The 
response matches the response from the POST API if the query is accepted or 
running. The response for a completed query includes the same information as an 
in-progress query with several additions:
+
+- A `result` object that summarizes information about your results, such as 
the total number of rows and a sample record
+- A `pages` object that includes the following information for each page of 
results:
+  -  `numRows`: the number of rows in that page of results
+  - `sizeInBytes`: the size of the page
+  - `id`: the page number that you can use to reference a specific page when 
you get query results
+
+
+### Get query results
+
+```
+GET 
https://ROUTER:8888/druid/v2/sql/statements/{queryID}/results?page=PAGENUMBER
+```
+
+Results are separated into pages, so you can use the optional `page` parameter 
to refine the results you get. When you retrieve the status of a completed 
query, Druid returns information about the composition of each page and its 
page number (`id`). 
+
+When getting query results, keep the following in mind:
+
+- JSON is the only supported result format.
+- If you attempt to get the results for an in-progress query, Druid returns an 
error. 
+

Review Comment:
   If you attempt to get the results of a failed query, druid return's a 404. 
   If you attempt to get the results of a ingestion/replace query, druid 
returns an empty response. 



##########
docs/api-reference/sql-api.md:
##########
@@ -186,4 +186,79 @@ Druid returns an HTTP 404 response in the following cases:
   - `sqlQueryId` is incorrect.
   - The query completes before your cancellation request is processed.
   
-Druid returns an HTTP 403 response for authorization failure.
\ No newline at end of file
+Druid returns an HTTP 403 response for authorization failure.
+
+## Query from deep storage
+
+> The `/sql/statements` endpoint used to query from deep storage is currently 
experimental.
+
+You can use the `sql/statements` endpoint to query segments that exist only in 
deep storage and are not loaded onto your Historical processes as determined by 
your load rules. 
+
+Note that at least part of a datasource must be available on a Historical 
process so that Druid can plan your query.
+
+For more information, see [Query from deep 
storage](../querying/query-from-deep-storage.md).
+
+### Submit a query
+
+To run a query from deep storage, send your query to the Router using the POST 
method:
+
+```
+POST https://ROUTER:8888/druid/v2/sql/statements
+```
+
+Submitting a query from deep storage uses the same syntax as any other Druid 
SQL query where the "query" field in the JSON object within the request payload 
contains your query. For example:
+
+```json
+{"query" : "SELECT COUNT(*) FROM data_source WHERE foo = 'bar'"}
+```  
+
+Generally, the `sql` and `sql/statements` endpoints support the same response 
body fields with minor differences. For general information about the available 
fields, see [Submit a query to the `sql` endpoint](#submit-a-query).
+
+Keep the following in mind when submitting queries to the `sql/statements` 
endpoint:
+
+- There are additional context parameters  for `sql/statements`: 
+
+   - `executionMode`  determines how query results are fetched. Druid 
currently only supports `ASYNC`. 
+   - `selectDestination` set to `DURABLE_STORAGE` instructs Druid to write the 
results from SELECT queries to durable storage. Note that this requires you to 
have [durable storage for MSQ enabled](../operations/durable-storage.md).
+
+- The only supported value for `resultFormat` is JSON.
+- Only the user who submits a query can see the results.
+
+
+### Get query status
+
+```
+GET https://ROUTER:8888/druid/v2/sql/statements/{queryID}
+```
+
+Returns information about the query associated with the given query ID. The 
response matches the response from the POST API if the query is accepted or 
running. The response for a completed query includes the same information as an 
in-progress query with several additions:
+
+- A `result` object that summarizes information about your results, such as 
the total number of rows and a sample record
+- A `pages` object that includes the following information for each page of 
results:
+  -  `numRows`: the number of rows in that page of results
+  - `sizeInBytes`: the size of the page
+  - `id`: the page number that you can use to reference a specific page when 
you get query results
+
+
+### Get query results
+
+```
+GET 
https://ROUTER:8888/druid/v2/sql/statements/{queryID}/results?page=PAGENUMBER
+```
+
+Results are separated into pages, so you can use the optional `page` parameter 
to refine the results you get. When you retrieve the status of a completed 
query, Druid returns information about the composition of each page and its 
page number (`id`). 
+

Review Comment:
   If page number is not passed, all data for that query is returned on the 
order of the pages sequentially in the same response. Note that if you have 
large result sets, your request can timeout due to 
`druid.router.http.readTimeout`  



##########
docs/design/architecture.md:
##########
@@ -70,12 +70,20 @@ Druid uses deep storage to store any data that has been 
ingested into the system
 storage accessible by every Druid server. In a clustered deployment, this is 
typically a distributed object store like S3 or
 HDFS, or a network mounted filesystem. In a single-server deployment, this is 
typically local disk.
 
-Druid uses deep storage only as a backup of your data and as a way to transfer 
data in the background between
-Druid processes. Druid stores data in files called _segments_. Historical 
processes cache data segments on
-local disk and serve queries from that cache as well as from an in-memory 
cache.
-This means that Druid never needs to access deep storage
-during a query, helping it offer the best query latencies possible. It also 
means that you must have enough disk space
-both in deep storage and across your Historical servers for the data you plan 
to load.
+Druid uses deep storage for the following purposes:
+
+- As a backup of your data, including those that get loaded onto Historical 
processes.
+- As a way to transfer data in the background between Druid processes. Druid 
stores data in files called _segments_.
+- As the source data for queries that run against segments stored only in deep 
storage and not in Historical processes as determined by your load rules.

Review Comment:
   this line is a little confusing. 



##########
docs/operations/durable-storage.md:
##########
@@ -0,0 +1,66 @@
+---
+id: durable-storage
+title: "Durable storage for the multi-stage query engine"
+sidebar_label: "Durable storage"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+You can use durable storage to improve querying from deep storage and 
SQL-based ingestion.
+
+> Note that only S3 is supported as a durable storage location.
+
+Durable storage for queries from deep storage provides a location where you 
can write the results of deep storage queries to. Durable storage for SQL-based 
ingestion is used to temporarily house intermediate files, which can improve 
reliability.
+
+## Enable durable storage
+
+To enable durable storage, you need to set the following common service 
properties:
+
+```
+druid.msq.intermediate.storage.enable=true
+druid.msq.intermediate.storage.type=s3
+druid.msq.intermediate.storage.bucket=YOUR_BUCKET
+druid.msq.intermediate.storage.prefix=YOUR_PREFIX
+druid.msq.intermediate.storage.tempDir=/path/to/your/temp/dir
+```
+
+For detailed information about the settings related to durable storage, see 
[Durable storage 
configurations](../multi-stage-query/reference.md#durable-storage-configurations).
+
+
+## Use durable storage for SQL-based ingestion queries
+
+When you run a query, include the context parameter `durableShuffleStorage` 
and set it to `true`.
+
+For queries where you want to use fault tolerance for workers,  set 
`faultTolerance` to `true`, which automatically sets `durableShuffleStorage` to 
`true`.
+
+## Use durable storage for queries from deep storage
+
+When you run a query, include the context parameter `selectDestination` and 
set it to `DURABLE_STORAGE`. This context parameter configures queries from 
deep storage to write their results to durable storage.
+
+## Durable storage clean up
+
+To prevent durable storage from getting filled up with temporary files in case 
the tasks fail to clean them up, a periodic
+cleaner can be scheduled to clean the directories corresponding to which there 
isn't a controller task running. It utilizes
+the storage connector to work upon the durable storage. The durable storage 
location should only be utilized to store the output
+for cluster's MSQ tasks. If the location contains other files or directories, 
then they will get cleaned up as well.
+

Review Comment:
   If we select the destination as `durableStorage` for query results, the 
results are cleaned up when the task is removed from the metadata store. 



##########
docs/operations/rule-configuration.md:
##########
@@ -167,7 +167,7 @@ Set the following properties:
   - the segment interval starts any time after the rule interval starts.
 
   You can use this property to load segments with future start and end dates, 
where "future" is relative to the time when the Coordinator evaluates data 
against the rule. Defaults to `true`.
-- `tieredReplicants`: a map of tier names to the number of segment replicas 
for that tier.
+- `tieredReplicants`: a map of tier names to the number of segment replicas 
for that tier. If you set the replicants for a period to 0 on all tiers, you 
can still [query the data from deep 
storage](../querying/query-from-deep-storage.md).

Review Comment:
   Another way to do query the data from deep storage is to 
   set `tieredReplicants` empty and set `useDefaultTierForNull` to false. I 
think we should push users to this way in the docs. 
   cc @adarshsanjeev 



##########
docs/operations/durable-storage.md:
##########
@@ -0,0 +1,66 @@
+---
+id: durable-storage
+title: "Durable storage for the multi-stage query engine"
+sidebar_label: "Durable storage"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+You can use durable storage to improve querying from deep storage and 
SQL-based ingestion.
+
+> Note that only S3 is supported as a durable storage location.
+
+Durable storage for queries from deep storage provides a location where you 
can write the results of deep storage queries to. Durable storage for SQL-based 
ingestion is used to temporarily house intermediate files, which can improve 
reliability.
+
+## Enable durable storage
+
+To enable durable storage, you need to set the following common service 
properties:
+
+```
+druid.msq.intermediate.storage.enable=true
+druid.msq.intermediate.storage.type=s3
+druid.msq.intermediate.storage.bucket=YOUR_BUCKET
+druid.msq.intermediate.storage.prefix=YOUR_PREFIX
+druid.msq.intermediate.storage.tempDir=/path/to/your/temp/dir
+```
+
+For detailed information about the settings related to durable storage, see 
[Durable storage 
configurations](../multi-stage-query/reference.md#durable-storage-configurations).
+
+
+## Use durable storage for SQL-based ingestion queries
+
+When you run a query, include the context parameter `durableShuffleStorage` 
and set it to `true`.
+
+For queries where you want to use fault tolerance for workers,  set 
`faultTolerance` to `true`, which automatically sets `durableShuffleStorage` to 
`true`.
+
+## Use durable storage for queries from deep storage

Review Comment:
   I think we also need to mention this : 
https://github.com/apache/druid/pull/14629/files#diff-bb668e1497f66d4430a7e3650bbdc18accaddc4bbcbf111c38298870fd9e7c06R380



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] docs: query from deep storage (druid)

Reply via email to