gianm commented on code in PR #18252: URL: https://github.com/apache/druid/pull/18252#discussion_r2240528323
########## docs/querying/dart.md: ########## @@ -0,0 +1,116 @@ +--- +id: dart +title: "SQL queries using the Dart query engine" +sidebar_label: "Dart query engine" +description: Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ License); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +:::info[Experimental] + +Dart is experimental. For production use, we recommend using the other available query engines. + +::: + + +Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries that have intermediate results consisting of hundreds of millions of rows. In this case, the Dart engine's multi-threaded workers perform in-memory shuffles using locally cached data without pulling from deep storage. Review Comment: This intro should come from a different direction- Dart isn't meant as an alternative to MSQ tasks, it's meant as an alternative to the native engine. It's meant for situations where the native engine performs poorly because of insufficient parallelism, such as: - large joins (which Dart can do with a parallel sort-merge) - high-cardinality exact group-bys - high-cardinality exact count distinct In these situations, Dart can parallelize throughout the entire query, which leads to better performance. The introduction should also explain how Dart works. Briefly, it's a profile of MSQ that runs `SELECT` queries on Brokers and Historicals, rather than on tasks. Brokers act as controllers and Historicals act as workers. ########## docs/querying/dart.md: ########## @@ -0,0 +1,116 @@ +--- +id: dart +title: "SQL queries using the Dart query engine" +sidebar_label: "Dart query engine" +description: Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ License); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +:::info[Experimental] + +Dart is experimental. For production use, we recommend using the other available query engines. + +::: + + +Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries that have intermediate results consisting of hundreds of millions of rows. In this case, the Dart engine's multi-threaded workers perform in-memory shuffles using locally cached data without pulling from deep storage. + +You can query batch or realtime datasources with Dart. + +## Enable Dart + +To enable Dart, add the following line to your `broker/runtime.properties` and `historical/runtime.properties` files: + +``` +druid.msq.dart.enabled = true +``` + +### Configure resource consumption + +You can configure the Broker and the Historical to tune Dart's resource consumption. + +For Brokers, you can set the following configs: + +- `druid.msq.dart.controller.concurrentQueries`: The maximum number of query controllers that can run concurrently on that Broker. Additional controllers are queued. Defaults to 1. +- `druid.msq.dart.query.context.targetPartitionsPerWorker`: The number of partitions per worker to create during a shuffle. Set this to the number of available threads on workers to fully take advantage of multi-threaded processing of shuffled data. + +For Historicals, you can set the following configs: + +- `druid.msq.dart.worker.concurrentQueries`: The maximum number of query workers that can run concurrently on that Historical. Default is equal to the number of merge buffers because each query needs one merge buffer. Ideally, this should be equal to or larger than the sum of the `concurrentQueries` setting on your Brokers. Review Comment: This should be stronger than "Ideally"; see the above comment on `druid.msq.dart.controller.concurrentQueries`. ########## docs/querying/dart.md: ########## @@ -0,0 +1,116 @@ +--- +id: dart +title: "SQL queries using the Dart query engine" +sidebar_label: "Dart query engine" +description: Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ License); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +:::info[Experimental] + +Dart is experimental. For production use, we recommend using the other available query engines. + +::: + + +Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries that have intermediate results consisting of hundreds of millions of rows. In this case, the Dart engine's multi-threaded workers perform in-memory shuffles using locally cached data without pulling from deep storage. + +You can query batch or realtime datasources with Dart. + +## Enable Dart + +To enable Dart, add the following line to your `broker/runtime.properties` and `historical/runtime.properties` files: + +``` +druid.msq.dart.enabled = true +``` + +### Configure resource consumption + +You can configure the Broker and the Historical to tune Dart's resource consumption. + +For Brokers, you can set the following configs: + +- `druid.msq.dart.controller.concurrentQueries`: The maximum number of query controllers that can run concurrently on that Broker. Additional controllers are queued. Defaults to 1. +- `druid.msq.dart.query.context.targetPartitionsPerWorker`: The number of partitions per worker to create during a shuffle. Set this to the number of available threads on workers to fully take advantage of multi-threaded processing of shuffled data. + +For Historicals, you can set the following configs: + +- `druid.msq.dart.worker.concurrentQueries`: The maximum number of query workers that can run concurrently on that Historical. Default is equal to the number of merge buffers because each query needs one merge buffer. Ideally, this should be equal to or larger than the sum of the `concurrentQueries` setting on your Brokers. +- `druid.msq.dart.worker.heapFraction`: The maximum amount of heap available for use across all Dart queries as a decimal. The default is 0.35, 35% of heap. + + +## Run a Dart query + +Once enabled, you can use Dart in the Druid console or the SQL query API to issue queries. + +### Druid console + +In the **Query** view, select **Engine: SQL (Dart)** from the engine selector menu. + +### API + +Dart uses the SQL endpoint `/druid/v2/sql`. To use Dart, include the query context parameter `engine` and set it to `msq-dart`: + +<Tabs> + <TabItem value="SET" label="SET" default> + + ```sql + curl --location 'http://HOST:PORT/druid/v2/sql' \ +--header 'Content-Type: application/json' \ +--data '{ + "query": "SET engine = 'msq-dart';\nSELECT\n user,\n commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia \nGROUP BY 1, 2 \nORDER BY 2 DESC", + ... + ... +}' + ``` + + </TabItem> + <TabItem value="context_block" label="Context block"> + + ```sql + curl --location 'http://HOST:PORT/druid/v2/sql' \ + --header 'Content-Type: application/json' \ + --data '{ + "query": "SELECT\n user,\n commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia \nGROUP BY 1, 2 \nORDER BY 2 DESC", + + ... + ... + "context": { + "engine":"msq-dart" + ... + } + }' + ``` + + </TabItem> + </Tabs> + +Dart supports many of the same [query context parameters as the MSQ task engine](../multi-stage-query/reference.md#context-parameters). + + ## Known issues and limitations + + - If you encounter an issue where Dart can't find a segment, try rerunning your query. + - If your data includes HLL Sketches for realtime data, Dart returns a `NullPointerException`. Review Comment: Does this really happen? If so we should raise a github issue with more details. ########## docs/querying/dart.md: ########## @@ -0,0 +1,116 @@ +--- +id: dart +title: "SQL queries using the Dart query engine" +sidebar_label: "Dart query engine" +description: Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ License); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +:::info[Experimental] + +Dart is experimental. For production use, we recommend using the other available query engines. + +::: + + +Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries that have intermediate results consisting of hundreds of millions of rows. In this case, the Dart engine's multi-threaded workers perform in-memory shuffles using locally cached data without pulling from deep storage. + +You can query batch or realtime datasources with Dart. + +## Enable Dart + +To enable Dart, add the following line to your `broker/runtime.properties` and `historical/runtime.properties` files: + +``` +druid.msq.dart.enabled = true +``` + +### Configure resource consumption + +You can configure the Broker and the Historical to tune Dart's resource consumption. + +For Brokers, you can set the following configs: + +- `druid.msq.dart.controller.concurrentQueries`: The maximum number of query controllers that can run concurrently on that Broker. Additional controllers are queued. Defaults to 1. Review Comment: Important: the total `druid.msq.dart.controller.concurrentQueries` across all Brokers must be less than `druid.msq.dart.worker.concurrentQueries` on any one Historical, or else queries can potentially get stuck waiting for each other. The experimental version of Dart does not verify this for you, so it's important for admins to double-check it. ########## docs/querying/dart.md: ########## @@ -0,0 +1,116 @@ +--- +id: dart +title: "SQL queries using the Dart query engine" +sidebar_label: "Dart query engine" +description: Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ License); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +:::info[Experimental] + +Dart is experimental. For production use, we recommend using the other available query engines. + +::: + + +Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries that have intermediate results consisting of hundreds of millions of rows. In this case, the Dart engine's multi-threaded workers perform in-memory shuffles using locally cached data without pulling from deep storage. + +You can query batch or realtime datasources with Dart. + +## Enable Dart + +To enable Dart, add the following line to your `broker/runtime.properties` and `historical/runtime.properties` files: Review Comment: Let's recommend adding all of these configs to `_common/common.runtime.properties`. Only the Broker and Historical look at them, but it's easier to have them in one place. ########## docs/querying/dart.md: ########## @@ -0,0 +1,97 @@ +--- +id: dart +title: "SQL queries using the Dart query engine" +sidebar_label: "Dart query engine" +description: Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +:::info[Experimental] + +Dart is experimental. For production use, we recommend using the other available query engines. Review Comment: IMO this is a bit too strong, I'd reword as: > Dart is experimental. For production use cases that require a battle-tested query engine, we recommend the default `native` query engine. I say this because it's OK to use Dart in production if it's better than native for your use case. You should just be aware that it hasn't received as much testing, and use some caution. ########## docs/querying/dart.md: ########## @@ -0,0 +1,116 @@ +--- +id: dart +title: "SQL queries using the Dart query engine" +sidebar_label: "Dart query engine" +description: Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ License); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +:::info[Experimental] + +Dart is experimental. For production use, we recommend using the other available query engines. + +::: + + +Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries that have intermediate results consisting of hundreds of millions of rows. In this case, the Dart engine's multi-threaded workers perform in-memory shuffles using locally cached data without pulling from deep storage. + +You can query batch or realtime datasources with Dart. + +## Enable Dart + +To enable Dart, add the following line to your `broker/runtime.properties` and `historical/runtime.properties` files: + +``` +druid.msq.dart.enabled = true +``` + +### Configure resource consumption + +You can configure the Broker and the Historical to tune Dart's resource consumption. + +For Brokers, you can set the following configs: + +- `druid.msq.dart.controller.concurrentQueries`: The maximum number of query controllers that can run concurrently on that Broker. Additional controllers are queued. Defaults to 1. +- `druid.msq.dart.query.context.targetPartitionsPerWorker`: The number of partitions per worker to create during a shuffle. Set this to the number of available threads on workers to fully take advantage of multi-threaded processing of shuffled data. Review Comment: suggestion: "number of available threads on workers (`druid.processing.numThreads`)" Mention also that the default is 1, i.e. no multithreading on Historicals. ########## docs/querying/dart.md: ########## @@ -0,0 +1,116 @@ +--- +id: dart +title: "SQL queries using the Dart query engine" +sidebar_label: "Dart query engine" +description: Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ License); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +:::info[Experimental] + +Dart is experimental. For production use, we recommend using the other available query engines. + +::: + + +Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries that have intermediate results consisting of hundreds of millions of rows. In this case, the Dart engine's multi-threaded workers perform in-memory shuffles using locally cached data without pulling from deep storage. + +You can query batch or realtime datasources with Dart. + +## Enable Dart + +To enable Dart, add the following line to your `broker/runtime.properties` and `historical/runtime.properties` files: + +``` +druid.msq.dart.enabled = true +``` + +### Configure resource consumption + +You can configure the Broker and the Historical to tune Dart's resource consumption. + +For Brokers, you can set the following configs: + +- `druid.msq.dart.controller.concurrentQueries`: The maximum number of query controllers that can run concurrently on that Broker. Additional controllers are queued. Defaults to 1. +- `druid.msq.dart.query.context.targetPartitionsPerWorker`: The number of partitions per worker to create during a shuffle. Set this to the number of available threads on workers to fully take advantage of multi-threaded processing of shuffled data. + +For Historicals, you can set the following configs: + +- `druid.msq.dart.worker.concurrentQueries`: The maximum number of query workers that can run concurrently on that Historical. Default is equal to the number of merge buffers because each query needs one merge buffer. Ideally, this should be equal to or larger than the sum of the `concurrentQueries` setting on your Brokers. +- `druid.msq.dart.worker.heapFraction`: The maximum amount of heap available for use across all Dart queries as a decimal. The default is 0.35, 35% of heap. + + +## Run a Dart query + +Once enabled, you can use Dart in the Druid console or the SQL query API to issue queries. + +### Druid console + +In the **Query** view, select **Engine: SQL (Dart)** from the engine selector menu. + +### API + +Dart uses the SQL endpoint `/druid/v2/sql`. To use Dart, include the query context parameter `engine` and set it to `msq-dart`: + +<Tabs> + <TabItem value="SET" label="SET" default> + + ```sql + curl --location 'http://HOST:PORT/druid/v2/sql' \ +--header 'Content-Type: application/json' \ +--data '{ + "query": "SET engine = 'msq-dart';\nSELECT\n user,\n commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia \nGROUP BY 1, 2 \nORDER BY 2 DESC", + ... + ... +}' + ``` + + </TabItem> + <TabItem value="context_block" label="Context block"> + + ```sql + curl --location 'http://HOST:PORT/druid/v2/sql' \ + --header 'Content-Type: application/json' \ + --data '{ + "query": "SELECT\n user,\n commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia \nGROUP BY 1, 2 \nORDER BY 2 DESC", + + ... + ... + "context": { + "engine":"msq-dart" + ... + } + }' + ``` + + </TabItem> + </Tabs> + +Dart supports many of the same [query context parameters as the MSQ task engine](../multi-stage-query/reference.md#context-parameters). + + ## Known issues and limitations Review Comment: Some current known issues and limitations that come to mind for me: - Dart doesn't verify that `druid.msq.dart.controller.concurrentQueries` is set properly, that's up to the admin. If set too high then queries can get stuck on each other. - Dart does not use the query cache. - Dart does not implement query prioritization or lanes. - Dart (like MSQ in general) does not implement `useApproximateTopN`. - Dart cannot be used with JDBC. The `engine` parameter is ignored. - https://github.com/apache/druid/pull/18336 can in some cases lead to `NoClassDefFoundError` for `NilStageOutputReader` ########## docs/querying/dart.md: ########## @@ -0,0 +1,116 @@ +--- +id: dart +title: "SQL queries using the Dart query engine" +sidebar_label: "Dart query engine" +description: Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ License); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +:::info[Experimental] + +Dart is experimental. For production use, we recommend using the other available query engines. + +::: + + +Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries that have intermediate results consisting of hundreds of millions of rows. In this case, the Dart engine's multi-threaded workers perform in-memory shuffles using locally cached data without pulling from deep storage. + +You can query batch or realtime datasources with Dart. + +## Enable Dart + +To enable Dart, add the following line to your `broker/runtime.properties` and `historical/runtime.properties` files: + +``` +druid.msq.dart.enabled = true +``` + +### Configure resource consumption + +You can configure the Broker and the Historical to tune Dart's resource consumption. + +For Brokers, you can set the following configs: + +- `druid.msq.dart.controller.concurrentQueries`: The maximum number of query controllers that can run concurrently on that Broker. Additional controllers are queued. Defaults to 1. +- `druid.msq.dart.query.context.targetPartitionsPerWorker`: The number of partitions per worker to create during a shuffle. Set this to the number of available threads on workers to fully take advantage of multi-threaded processing of shuffled data. + +For Historicals, you can set the following configs: + +- `druid.msq.dart.worker.concurrentQueries`: The maximum number of query workers that can run concurrently on that Historical. Default is equal to the number of merge buffers because each query needs one merge buffer. Ideally, this should be equal to or larger than the sum of the `concurrentQueries` setting on your Brokers. +- `druid.msq.dart.worker.heapFraction`: The maximum amount of heap available for use across all Dart queries as a decimal. The default is 0.35, 35% of heap. + + +## Run a Dart query + +Once enabled, you can use Dart in the Druid console or the SQL query API to issue queries. + +### Druid console + +In the **Query** view, select **Engine: SQL (Dart)** from the engine selector menu. + +### API + +Dart uses the SQL endpoint `/druid/v2/sql`. To use Dart, include the query context parameter `engine` and set it to `msq-dart`: + +<Tabs> + <TabItem value="SET" label="SET" default> + + ```sql + curl --location 'http://HOST:PORT/druid/v2/sql' \ +--header 'Content-Type: application/json' \ +--data '{ + "query": "SET engine = 'msq-dart';\nSELECT\n user,\n commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia \nGROUP BY 1, 2 \nORDER BY 2 DESC", + ... + ... +}' + ``` + + </TabItem> + <TabItem value="context_block" label="Context block"> + + ```sql + curl --location 'http://HOST:PORT/druid/v2/sql' \ + --header 'Content-Type: application/json' \ + --data '{ + "query": "SELECT\n user,\n commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia \nGROUP BY 1, 2 \nORDER BY 2 DESC", + + ... + ... + "context": { + "engine":"msq-dart" + ... + } + }' + ``` + + </TabItem> + </Tabs> + +Dart supports many of the same [query context parameters as the MSQ task engine](../multi-stage-query/reference.md#context-parameters). Review Comment: See above comment; we should list them all comprehensively so people don't have to guess. ########## docs/querying/dart.md: ########## @@ -0,0 +1,116 @@ +--- +id: dart +title: "SQL queries using the Dart query engine" +sidebar_label: "Dart query engine" +description: Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ License); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +:::info[Experimental] + +Dart is experimental. For production use, we recommend using the other available query engines. + +::: + + +Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries that have intermediate results consisting of hundreds of millions of rows. In this case, the Dart engine's multi-threaded workers perform in-memory shuffles using locally cached data without pulling from deep storage. + +You can query batch or realtime datasources with Dart. + +## Enable Dart + +To enable Dart, add the following line to your `broker/runtime.properties` and `historical/runtime.properties` files: + +``` +druid.msq.dart.enabled = true +``` + +### Configure resource consumption + +You can configure the Broker and the Historical to tune Dart's resource consumption. + +For Brokers, you can set the following configs: + +- `druid.msq.dart.controller.concurrentQueries`: The maximum number of query controllers that can run concurrently on that Broker. Additional controllers are queued. Defaults to 1. +- `druid.msq.dart.query.context.targetPartitionsPerWorker`: The number of partitions per worker to create during a shuffle. Set this to the number of available threads on workers to fully take advantage of multi-threaded processing of shuffled data. + +For Historicals, you can set the following configs: + +- `druid.msq.dart.worker.concurrentQueries`: The maximum number of query workers that can run concurrently on that Historical. Default is equal to the number of merge buffers because each query needs one merge buffer. Ideally, this should be equal to or larger than the sum of the `concurrentQueries` setting on your Brokers. +- `druid.msq.dart.worker.heapFraction`: The maximum amount of heap available for use across all Dart queries as a decimal. The default is 0.35, 35% of heap. + + +## Run a Dart query + +Once enabled, you can use Dart in the Druid console or the SQL query API to issue queries. + +### Druid console + +In the **Query** view, select **Engine: SQL (Dart)** from the engine selector menu. + +### API + +Dart uses the SQL endpoint `/druid/v2/sql`. To use Dart, include the query context parameter `engine` and set it to `msq-dart`: + +<Tabs> + <TabItem value="SET" label="SET" default> + + ```sql + curl --location 'http://HOST:PORT/druid/v2/sql' \ +--header 'Content-Type: application/json' \ +--data '{ + "query": "SET engine = 'msq-dart';\nSELECT\n user,\n commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia \nGROUP BY 1, 2 \nORDER BY 2 DESC", + ... + ... +}' + ``` + + </TabItem> + <TabItem value="context_block" label="Context block"> + + ```sql + curl --location 'http://HOST:PORT/druid/v2/sql' \ + --header 'Content-Type: application/json' \ Review Comment: Same with this example, better if it's valid. ########## docs/querying/dart.md: ########## @@ -0,0 +1,116 @@ +--- +id: dart +title: "SQL queries using the Dart query engine" +sidebar_label: "Dart query engine" +description: Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ License); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +:::info[Experimental] + +Dart is experimental. For production use, we recommend using the other available query engines. + +::: + + +Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries that have intermediate results consisting of hundreds of millions of rows. In this case, the Dart engine's multi-threaded workers perform in-memory shuffles using locally cached data without pulling from deep storage. + +You can query batch or realtime datasources with Dart. + +## Enable Dart + +To enable Dart, add the following line to your `broker/runtime.properties` and `historical/runtime.properties` files: + +``` +druid.msq.dart.enabled = true +``` + +### Configure resource consumption + +You can configure the Broker and the Historical to tune Dart's resource consumption. + +For Brokers, you can set the following configs: + +- `druid.msq.dart.controller.concurrentQueries`: The maximum number of query controllers that can run concurrently on that Broker. Additional controllers are queued. Defaults to 1. +- `druid.msq.dart.query.context.targetPartitionsPerWorker`: The number of partitions per worker to create during a shuffle. Set this to the number of available threads on workers to fully take advantage of multi-threaded processing of shuffled data. + +For Historicals, you can set the following configs: + +- `druid.msq.dart.worker.concurrentQueries`: The maximum number of query workers that can run concurrently on that Historical. Default is equal to the number of merge buffers because each query needs one merge buffer. Ideally, this should be equal to or larger than the sum of the `concurrentQueries` setting on your Brokers. +- `druid.msq.dart.worker.heapFraction`: The maximum amount of heap available for use across all Dart queries as a decimal. The default is 0.35, 35% of heap. + + +## Run a Dart query + +Once enabled, you can use Dart in the Druid console or the SQL query API to issue queries. + +### Druid console + +In the **Query** view, select **Engine: SQL (Dart)** from the engine selector menu. + +### API + +Dart uses the SQL endpoint `/druid/v2/sql`. To use Dart, include the query context parameter `engine` and set it to `msq-dart`: + +<Tabs> + <TabItem value="SET" label="SET" default> + + ```sql + curl --location 'http://HOST:PORT/druid/v2/sql' \ +--header 'Content-Type: application/json' \ +--data '{ + "query": "SET engine = 'msq-dart';\nSELECT\n user,\n commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia \nGROUP BY 1, 2 \nORDER BY 2 DESC", Review Comment: It would be better if this example used valid JSON. You can just include `"query"` by itself. ########## docs/querying/dart.md: ########## @@ -0,0 +1,97 @@ +--- +id: dart +title: "SQL queries using the Dart query engine" +sidebar_label: "Dart query engine" +description: Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +:::info[Experimental] + +Dart is experimental. For production use, we recommend using the other available query engines. + +::: + + +Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries that have intermediate results consisting of hundreds of millions of rows. In this case, the Dart engine's multi-threaded workers perform in-memory shuffles using locally cached data without pulling from deep storage. + +You can query batch or realtime datasources with Dart. Review Comment: It's `includeSegmentSource`. It's mentioned in the "Context parameters" section of `docs/multi-stage-query/reference.md`, which should be replicated here with edits that make sense for Dart. In particular: - remove parameters that Dart doesn't use: `maxNumTasks`, `taskAssignment`, `maxParseExceptions`, `durableShuffleStorage`, `faultTolerance`, `selectDestination`, `rowsPerPage`, and anything not labeled `SELECT` (Dart doesn't do `INSERT` or `REPLACE`) - add Dart-specific parameters `maxConcurrentStages`, `targetPartitionsPerWorker`, `maxNonLeafWorkers` - update the default for `includeSegmentSource` to `REALTIME` Here's what the Dart-specific parameters do: - `maxConcurrentStages` is the number of stages that can run concurrently for a query. Default is 2. Higher numbers can potentially improve pipelining, but also mean less memory is available for each stage. - `targetPartitionsPerWorker` is the number of partitions we generate for each worker. It controls how much parallelism can be maintained throughout the query. Default is 1. - `maxNonLeafWorkers` is the number of workers to use for stages beyond the leaf stage. Default is 1, which is scatter-gather style. ########## docs/querying/dart.md: ########## @@ -0,0 +1,116 @@ +--- +id: dart +title: "SQL queries using the Dart query engine" +sidebar_label: "Dart query engine" +description: Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ License); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +:::info[Experimental] + +Dart is experimental. For production use, we recommend using the other available query engines. + +::: + + +Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries that have intermediate results consisting of hundreds of millions of rows. In this case, the Dart engine's multi-threaded workers perform in-memory shuffles using locally cached data without pulling from deep storage. + +You can query batch or realtime datasources with Dart. + +## Enable Dart + +To enable Dart, add the following line to your `broker/runtime.properties` and `historical/runtime.properties` files: + +``` +druid.msq.dart.enabled = true +``` + +### Configure resource consumption + +You can configure the Broker and the Historical to tune Dart's resource consumption. + +For Brokers, you can set the following configs: + +- `druid.msq.dart.controller.concurrentQueries`: The maximum number of query controllers that can run concurrently on that Broker. Additional controllers are queued. Defaults to 1. +- `druid.msq.dart.query.context.targetPartitionsPerWorker`: The number of partitions per worker to create during a shuffle. Set this to the number of available threads on workers to fully take advantage of multi-threaded processing of shuffled data. + +For Historicals, you can set the following configs: + +- `druid.msq.dart.worker.concurrentQueries`: The maximum number of query workers that can run concurrently on that Historical. Default is equal to the number of merge buffers because each query needs one merge buffer. Ideally, this should be equal to or larger than the sum of the `concurrentQueries` setting on your Brokers. +- `druid.msq.dart.worker.heapFraction`: The maximum amount of heap available for use across all Dart queries as a decimal. The default is 0.35, 35% of heap. + + +## Run a Dart query + +Once enabled, you can use Dart in the Druid console or the SQL query API to issue queries. + +### Druid console + +In the **Query** view, select **Engine: SQL (Dart)** from the engine selector menu. + +### API + +Dart uses the SQL endpoint `/druid/v2/sql`. To use Dart, include the query context parameter `engine` and set it to `msq-dart`: + +<Tabs> + <TabItem value="SET" label="SET" default> + + ```sql + curl --location 'http://HOST:PORT/druid/v2/sql' \ +--header 'Content-Type: application/json' \ +--data '{ + "query": "SET engine = 'msq-dart';\nSELECT\n user,\n commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia \nGROUP BY 1, 2 \nORDER BY 2 DESC", + ... + ... +}' + ``` + + </TabItem> + <TabItem value="context_block" label="Context block"> + + ```sql + curl --location 'http://HOST:PORT/druid/v2/sql' \ + --header 'Content-Type: application/json' \ + --data '{ + "query": "SELECT\n user,\n commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia \nGROUP BY 1, 2 \nORDER BY 2 DESC", + + ... + ... + "context": { + "engine":"msq-dart" + ... + } + }' + ``` + + </TabItem> + </Tabs> + +Dart supports many of the same [query context parameters as the MSQ task engine](../multi-stage-query/reference.md#context-parameters). + + ## Known issues and limitations + + - If you encounter an issue where Dart can't find a segment, try rerunning your query. + - If your data includes HLL Sketches for realtime data, Dart returns a `NullPointerException`. + - When a Dart query fails on a Historical with an error about no workers running for a query, it gets stuck retrying the query. If the query doesn't get canceled, it can cause other queries to fail. Review Comment: I think this was fixed by #17277, so let's remove it. ########## docs/querying/dart.md: ########## @@ -0,0 +1,116 @@ +--- +id: dart +title: "SQL queries using the Dart query engine" +sidebar_label: "Dart query engine" +description: Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ License); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +:::info[Experimental] + +Dart is experimental. For production use, we recommend using the other available query engines. + +::: + + +Use the Dart query engine for light-weight queries that don't need all the capabilities of the MSQ task engine. For example, use Dart for GROUP BY queries that have intermediate results consisting of hundreds of millions of rows. In this case, the Dart engine's multi-threaded workers perform in-memory shuffles using locally cached data without pulling from deep storage. + +You can query batch or realtime datasources with Dart. + +## Enable Dart + +To enable Dart, add the following line to your `broker/runtime.properties` and `historical/runtime.properties` files: + +``` +druid.msq.dart.enabled = true +``` + +### Configure resource consumption + +You can configure the Broker and the Historical to tune Dart's resource consumption. + +For Brokers, you can set the following configs: + +- `druid.msq.dart.controller.concurrentQueries`: The maximum number of query controllers that can run concurrently on that Broker. Additional controllers are queued. Defaults to 1. +- `druid.msq.dart.query.context.targetPartitionsPerWorker`: The number of partitions per worker to create during a shuffle. Set this to the number of available threads on workers to fully take advantage of multi-threaded processing of shuffled data. + +For Historicals, you can set the following configs: + +- `druid.msq.dart.worker.concurrentQueries`: The maximum number of query workers that can run concurrently on that Historical. Default is equal to the number of merge buffers because each query needs one merge buffer. Ideally, this should be equal to or larger than the sum of the `concurrentQueries` setting on your Brokers. +- `druid.msq.dart.worker.heapFraction`: The maximum amount of heap available for use across all Dart queries as a decimal. The default is 0.35, 35% of heap. + + +## Run a Dart query + +Once enabled, you can use Dart in the Druid console or the SQL query API to issue queries. + +### Druid console + +In the **Query** view, select **Engine: SQL (Dart)** from the engine selector menu. + +### API + +Dart uses the SQL endpoint `/druid/v2/sql`. To use Dart, include the query context parameter `engine` and set it to `msq-dart`: + +<Tabs> + <TabItem value="SET" label="SET" default> + + ```sql + curl --location 'http://HOST:PORT/druid/v2/sql' \ +--header 'Content-Type: application/json' \ +--data '{ + "query": "SET engine = 'msq-dart';\nSELECT\n user,\n commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia \nGROUP BY 1, 2 \nORDER BY 2 DESC", + ... + ... +}' + ``` + + </TabItem> + <TabItem value="context_block" label="Context block"> + + ```sql + curl --location 'http://HOST:PORT/druid/v2/sql' \ + --header 'Content-Type: application/json' \ + --data '{ + "query": "SELECT\n user,\n commentLength,COUNT(*) AS \"COUNT\" FROM wikipedia \nGROUP BY 1, 2 \nORDER BY 2 DESC", + + ... + ... + "context": { + "engine":"msq-dart" + ... + } + }' + ``` + + </TabItem> + </Tabs> + +Dart supports many of the same [query context parameters as the MSQ task engine](../multi-stage-query/reference.md#context-parameters). + + ## Known issues and limitations + + - If you encounter an issue where Dart can't find a segment, try rerunning your query. Review Comment: I think we fixed this one in #18291, so let's remove it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
