techdocsmith commented on code in PR #18056: URL: https://github.com/apache/druid/pull/18056#discussion_r2183779964
########## docs/querying/projections.md: ########## @@ -0,0 +1,179 @@ +--- +id: projections +title: Query projections +sidebar_label: Projections +description: . +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +Projections are a type of aggregation that is computed and stored as part of a segment. The pre-aggregated data can speed up queries by reducing the number of rows that need to be processed for any query shape that matches a projection. + +## Create a projection + +A projection has three components: + +- Virtual columns (`spec.projections.virtualColumns`) that are used to compute a projection. The source data for the virtual columns must exist in your datasource. +- Grouping columns (`spec.projections.groupingColumns`) that are used to group a projection. They must either already exist in your datasource or be defined in `virtualColumns`. The order in which you define your grouping columns equates to the order in which data is sorted in the projection, always ascending. Review Comment: ```suggestion - Grouping columns (`spec.projections.groupingColumns`) for the projection. The columns must either already exist in your datasource or be defined in the `virtualColumns` of your ingestion spec. The order in which you define your grouping columns dictates the sort order for data in the projection. Sort order is always ascending. ``` ########## docs/querying/projections.md: ########## @@ -0,0 +1,179 @@ +--- +id: projections +title: Query projections +sidebar_label: Projections +description: . +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +Projections are a type of aggregation that is computed and stored as part of a segment. The pre-aggregated data can speed up queries by reducing the number of rows that need to be processed for any query shape that matches a projection. + +## Create a projection + +A projection has three components: + +- Virtual columns (`spec.projections.virtualColumns`) that are used to compute a projection. The source data for the virtual columns must exist in your datasource. Review Comment: ```suggestion - Virtual columns (`spec.projections.virtualColumns`) to compute the projection. The source data for the virtual columns must exist in your datasource. ``` How does the source data relate to the virtual columns? ########## docs/querying/projections.md: ########## @@ -0,0 +1,179 @@ +--- +id: projections +title: Query projections +sidebar_label: Projections +description: . +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +Projections are a type of aggregation that is computed and stored as part of a segment. The pre-aggregated data can speed up queries by reducing the number of rows that need to be processed for any query shape that matches a projection. + +## Create a projection + +A projection has three components: + +- Virtual columns (`spec.projections.virtualColumns`) that are used to compute a projection. The source data for the virtual columns must exist in your datasource. +- Grouping columns (`spec.projections.groupingColumns`) that are used to group a projection. They must either already exist in your datasource or be defined in `virtualColumns`. The order in which you define your grouping columns equates to the order in which data is sorted in the projection, always ascending. +- Aggregators (`spec.projections.aggregators`) that define the columns you want to create projections for and which aggregator to use for that column. They must either already exist in your datasource or be defined in `virtualColumns`. + +The aggregators are what Druid attempts to match when you run a query. If an aggregator in a query matches an aggregator you defined in your projection, Druid uses it. + +You can either create a projection at ingestion time or after the datasource is created. + +Note that any projection dimension you create becomes part of your datasource. To remove a projection from your datasource, you need to reingest the data. Alternatively, you can use a query context parameter to not use projections for a specific query. Review Comment: ```suggestion Note that any projection dimension you create becomes part of your datasource. You must you reingest your data to remove a projection from your datasource. Alternatively, you can use a query context parameter to avoid using projections for a specific query. ``` ########## docs/querying/projections.md: ########## @@ -0,0 +1,179 @@ +--- +id: projections +title: Query projections +sidebar_label: Projections +description: . +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +Projections are a type of aggregation that is computed and stored as part of a segment. The pre-aggregated data can speed up queries by reducing the number of rows that need to be processed for any query shape that matches a projection. Review Comment: As a noob, I wonder how are they different from rollup? Why would you want to use this instead of rollup? ```suggestion Projections are a type of aggregation that Druid computes and stores as part of a segment. The pre-aggregated data reduces the number of rows for the query engine to process. This can speed up queries for query shapes that match a projection. ``` What type of "part" of the segment? Like a column? Or its own thing? What does it mean to "match the projection?" An example might be in order. ########## docs/querying/projections.md: ########## @@ -0,0 +1,179 @@ +--- +id: projections +title: Query projections +sidebar_label: Projections +description: . +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +Projections are a type of aggregation that is computed and stored as part of a segment. The pre-aggregated data can speed up queries by reducing the number of rows that need to be processed for any query shape that matches a projection. + +## Create a projection + +A projection has three components: + +- Virtual columns (`spec.projections.virtualColumns`) that are used to compute a projection. The source data for the virtual columns must exist in your datasource. +- Grouping columns (`spec.projections.groupingColumns`) that are used to group a projection. They must either already exist in your datasource or be defined in `virtualColumns`. The order in which you define your grouping columns equates to the order in which data is sorted in the projection, always ascending. +- Aggregators (`spec.projections.aggregators`) that define the columns you want to create projections for and which aggregator to use for that column. They must either already exist in your datasource or be defined in `virtualColumns`. + +The aggregators are what Druid attempts to match when you run a query. If an aggregator in a query matches an aggregator you defined in your projection, Druid uses it. + +You can either create a projection at ingestion time or after the datasource is created. + +Note that any projection dimension you create becomes part of your datasource. To remove a projection from your datasource, you need to reingest the data. Alternatively, you can use a query context parameter to not use projections for a specific query. + + + +### At ingestion time + +To create a projection at ingestion time, use the [`projectionsSpec` block in your ingestion spec](../ingestion/ingestion-spec.md#projections). + +To create projections for SQL-based ingestion, you need to also have the [`druid-catalog`](../development/extensions-core/catalog.md) extension loaded. + +### After ingestion + +You can define a projection after you ingest data. Although you can define the projection in a compaction spec, we recommend using the [`druid-catalog`](../development/extensions-core/catalog.md) extension. + +The following API call includes a payload with the `properties.projections` block that defines your projections: Review Comment: I like this example. ########## docs/querying/projections.md: ########## @@ -0,0 +1,179 @@ +--- +id: projections +title: Query projections +sidebar_label: Projections +description: . +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +Projections are a type of aggregation that is computed and stored as part of a segment. The pre-aggregated data can speed up queries by reducing the number of rows that need to be processed for any query shape that matches a projection. + +## Create a projection + +A projection has three components: + +- Virtual columns (`spec.projections.virtualColumns`) that are used to compute a projection. The source data for the virtual columns must exist in your datasource. +- Grouping columns (`spec.projections.groupingColumns`) that are used to group a projection. They must either already exist in your datasource or be defined in `virtualColumns`. The order in which you define your grouping columns equates to the order in which data is sorted in the projection, always ascending. +- Aggregators (`spec.projections.aggregators`) that define the columns you want to create projections for and which aggregator to use for that column. They must either already exist in your datasource or be defined in `virtualColumns`. Review Comment: ```suggestion - Aggregators (`spec.projections.aggregators`) that define the columns to create projections for and the aggregator to use for that column. The source columns? must either already exist in your datasource or be defined in `virtualColumns`. ``` " columns you want to create projections for ": is this the "source data" ? we were talking about in line 36? " ########## docs/querying/projections.md: ########## @@ -0,0 +1,179 @@ +--- +id: projections +title: Query projections +sidebar_label: Projections +description: . +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +Projections are a type of aggregation that is computed and stored as part of a segment. The pre-aggregated data can speed up queries by reducing the number of rows that need to be processed for any query shape that matches a projection. + +## Create a projection + +A projection has three components: + +- Virtual columns (`spec.projections.virtualColumns`) that are used to compute a projection. The source data for the virtual columns must exist in your datasource. +- Grouping columns (`spec.projections.groupingColumns`) that are used to group a projection. They must either already exist in your datasource or be defined in `virtualColumns`. The order in which you define your grouping columns equates to the order in which data is sorted in the projection, always ascending. +- Aggregators (`spec.projections.aggregators`) that define the columns you want to create projections for and which aggregator to use for that column. They must either already exist in your datasource or be defined in `virtualColumns`. + +The aggregators are what Druid attempts to match when you run a query. If an aggregator in a query matches an aggregator you defined in your projection, Druid uses it. + +You can either create a projection at ingestion time or after the datasource is created. + +Note that any projection dimension you create becomes part of your datasource. To remove a projection from your datasource, you need to reingest the data. Alternatively, you can use a query context parameter to not use projections for a specific query. + + + +### At ingestion time + +To create a projection at ingestion time, use the [`projectionsSpec` block in your ingestion spec](../ingestion/ingestion-spec.md#projections). + +To create projections for SQL-based ingestion, you need to also have the [`druid-catalog`](../development/extensions-core/catalog.md) extension loaded. + +### After ingestion + +You can define a projection after you ingest data. Although you can define the projection in a compaction spec, we recommend using the [`druid-catalog`](../development/extensions-core/catalog.md) extension. + +The following API call includes a payload with the `properties.projections` block that defines your projections: + +<details> +<summary>View the payload</summary> + +```json {11,19,39} showLineNumbers +{ + "type": "datasource", + "columns": [], + "properties": { + "segmentGranularity": "PT1H", + "projections": [ + { + "spec": { + "name": "channel_page_hourly_distinct_user_added_deleted", + "type": "aggregate", + "virtualColumns": [ + { + "type": "expression", + "name": "__gran", + "expression": "timestamp_floor(__time, 'PT1H')", + "outputType": "LONG" + } + ], + "groupingColumns": [ + { + "type": "long", + "name": "__gran", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "string", + "name": "channel", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "page", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "aggregators": [ + { + "type": "HLLSketchBuild", + "name": "distinct_users", + "fieldName": "user", + "lgK": 12, + "tgtHllType": "HLL_4" + }, + { + "type": "longSum", + "name": "sum_added", + "fieldName": "added" + }, + { + "type": "longSum", + "name": "sum_deleted", + "fieldName": "deleted" + } + ] + } + } + ] + } +} +``` + +</details> + +In this example, Druid aggregates data into `distinct_user`, `sum_added`, and `sum_deleted` dimensions based on the aggregator that's specified and a source dimension. These aggregations are grouped by the columns you define in `groupingColumns`. + +## Use a projection + +Druid automatically uses a projection if your query matches a projection you've defined. There are some query context parameters that give you some control on how projections are used and Druid's behavior: + +- `useProjection`: The name of a projection you defined. The query engine must use that projection and will fail the query if the projection does not match the query. +- `forceProjections` `true` or `false`. The query engine must use a projection and will fail the query if there isn't a matching projection. +- `noProjections`: `true` or `false`. The query engine won't use any projections. Review Comment: ```suggestion - `noProjections`: Set to `true` to prevent the query engine from using projections altogether. Defaults to `false`. ``` seriously this should be useProjections which defaults to true and you could set to false to disable. @clintropolis :/ ########## docs/querying/projections.md: ########## @@ -0,0 +1,179 @@ +--- +id: projections +title: Query projections +sidebar_label: Projections +description: . +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +Projections are a type of aggregation that is computed and stored as part of a segment. The pre-aggregated data can speed up queries by reducing the number of rows that need to be processed for any query shape that matches a projection. + +## Create a projection + +A projection has three components: + +- Virtual columns (`spec.projections.virtualColumns`) that are used to compute a projection. The source data for the virtual columns must exist in your datasource. +- Grouping columns (`spec.projections.groupingColumns`) that are used to group a projection. They must either already exist in your datasource or be defined in `virtualColumns`. The order in which you define your grouping columns equates to the order in which data is sorted in the projection, always ascending. +- Aggregators (`spec.projections.aggregators`) that define the columns you want to create projections for and which aggregator to use for that column. They must either already exist in your datasource or be defined in `virtualColumns`. + +The aggregators are what Druid attempts to match when you run a query. If an aggregator in a query matches an aggregator you defined in your projection, Druid uses it. + +You can either create a projection at ingestion time or after the datasource is created. + +Note that any projection dimension you create becomes part of your datasource. To remove a projection from your datasource, you need to reingest the data. Alternatively, you can use a query context parameter to not use projections for a specific query. + + + +### At ingestion time + +To create a projection at ingestion time, use the [`projectionsSpec` block in your ingestion spec](../ingestion/ingestion-spec.md#projections). + +To create projections for SQL-based ingestion, you need to also have the [`druid-catalog`](../development/extensions-core/catalog.md) extension loaded. + +### After ingestion + +You can define a projection after you ingest data. Although you can define the projection in a compaction spec, we recommend using the [`druid-catalog`](../development/extensions-core/catalog.md) extension. + +The following API call includes a payload with the `properties.projections` block that defines your projections: + +<details> +<summary>View the payload</summary> + +```json {11,19,39} showLineNumbers +{ + "type": "datasource", + "columns": [], + "properties": { + "segmentGranularity": "PT1H", + "projections": [ + { + "spec": { + "name": "channel_page_hourly_distinct_user_added_deleted", + "type": "aggregate", + "virtualColumns": [ + { + "type": "expression", + "name": "__gran", + "expression": "timestamp_floor(__time, 'PT1H')", + "outputType": "LONG" + } + ], + "groupingColumns": [ + { + "type": "long", + "name": "__gran", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "string", + "name": "channel", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "page", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "aggregators": [ + { + "type": "HLLSketchBuild", + "name": "distinct_users", + "fieldName": "user", + "lgK": 12, + "tgtHllType": "HLL_4" + }, + { + "type": "longSum", + "name": "sum_added", + "fieldName": "added" + }, + { + "type": "longSum", + "name": "sum_deleted", + "fieldName": "deleted" + } + ] + } + } + ] + } +} +``` + +</details> + +In this example, Druid aggregates data into `distinct_user`, `sum_added`, and `sum_deleted` dimensions based on the aggregator that's specified and a source dimension. These aggregations are grouped by the columns you define in `groupingColumns`. + +## Use a projection + +Druid automatically uses a projection if your query matches a projection you've defined. There are some query context parameters that give you some control on how projections are used and Druid's behavior: Review Comment: ```suggestion Druid automatically uses a projection if your query matches a projection you've defined. You can use the following query context parameters do override the default behavior for using projections: ``` ########## docs/querying/projections.md: ########## @@ -0,0 +1,179 @@ +--- +id: projections +title: Query projections +sidebar_label: Projections +description: . +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +Projections are a type of aggregation that is computed and stored as part of a segment. The pre-aggregated data can speed up queries by reducing the number of rows that need to be processed for any query shape that matches a projection. + +## Create a projection + +A projection has three components: + +- Virtual columns (`spec.projections.virtualColumns`) that are used to compute a projection. The source data for the virtual columns must exist in your datasource. +- Grouping columns (`spec.projections.groupingColumns`) that are used to group a projection. They must either already exist in your datasource or be defined in `virtualColumns`. The order in which you define your grouping columns equates to the order in which data is sorted in the projection, always ascending. +- Aggregators (`spec.projections.aggregators`) that define the columns you want to create projections for and which aggregator to use for that column. They must either already exist in your datasource or be defined in `virtualColumns`. + +The aggregators are what Druid attempts to match when you run a query. If an aggregator in a query matches an aggregator you defined in your projection, Druid uses it. + +You can either create a projection at ingestion time or after the datasource is created. + +Note that any projection dimension you create becomes part of your datasource. To remove a projection from your datasource, you need to reingest the data. Alternatively, you can use a query context parameter to not use projections for a specific query. + + + +### At ingestion time + +To create a projection at ingestion time, use the [`projectionsSpec` block in your ingestion spec](../ingestion/ingestion-spec.md#projections). + +To create projections for SQL-based ingestion, you need to also have the [`druid-catalog`](../development/extensions-core/catalog.md) extension loaded. + +### After ingestion Review Comment: should this be an existing table vs new table ########## docs/querying/projections.md: ########## @@ -0,0 +1,179 @@ +--- +id: projections +title: Query projections +sidebar_label: Projections +description: . +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +Projections are a type of aggregation that is computed and stored as part of a segment. The pre-aggregated data can speed up queries by reducing the number of rows that need to be processed for any query shape that matches a projection. + +## Create a projection + +A projection has three components: + +- Virtual columns (`spec.projections.virtualColumns`) that are used to compute a projection. The source data for the virtual columns must exist in your datasource. +- Grouping columns (`spec.projections.groupingColumns`) that are used to group a projection. They must either already exist in your datasource or be defined in `virtualColumns`. The order in which you define your grouping columns equates to the order in which data is sorted in the projection, always ascending. +- Aggregators (`spec.projections.aggregators`) that define the columns you want to create projections for and which aggregator to use for that column. They must either already exist in your datasource or be defined in `virtualColumns`. + +The aggregators are what Druid attempts to match when you run a query. If an aggregator in a query matches an aggregator you defined in your projection, Druid uses it. + +You can either create a projection at ingestion time or after the datasource is created. + +Note that any projection dimension you create becomes part of your datasource. To remove a projection from your datasource, you need to reingest the data. Alternatively, you can use a query context parameter to not use projections for a specific query. + + + +### At ingestion time + +To create a projection at ingestion time, use the [`projectionsSpec` block in your ingestion spec](../ingestion/ingestion-spec.md#projections). + +To create projections for SQL-based ingestion, you need to also have the [`druid-catalog`](../development/extensions-core/catalog.md) extension loaded. + +### After ingestion + +You can define a projection after you ingest data. Although you can define the projection in a compaction spec, we recommend using the [`druid-catalog`](../development/extensions-core/catalog.md) extension. Review Comment: Switch this around to mention the recommended way first. Then say if that way doesn't work ,you can optionally use the less-preferred message. ########## docs/querying/projections.md: ########## @@ -0,0 +1,179 @@ +--- +id: projections +title: Query projections +sidebar_label: Projections +description: . +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +Projections are a type of aggregation that is computed and stored as part of a segment. The pre-aggregated data can speed up queries by reducing the number of rows that need to be processed for any query shape that matches a projection. + +## Create a projection + +A projection has three components: + +- Virtual columns (`spec.projections.virtualColumns`) that are used to compute a projection. The source data for the virtual columns must exist in your datasource. +- Grouping columns (`spec.projections.groupingColumns`) that are used to group a projection. They must either already exist in your datasource or be defined in `virtualColumns`. The order in which you define your grouping columns equates to the order in which data is sorted in the projection, always ascending. +- Aggregators (`spec.projections.aggregators`) that define the columns you want to create projections for and which aggregator to use for that column. They must either already exist in your datasource or be defined in `virtualColumns`. + +The aggregators are what Druid attempts to match when you run a query. If an aggregator in a query matches an aggregator you defined in your projection, Druid uses it. + +You can either create a projection at ingestion time or after the datasource is created. Review Comment: This should be at line 34. It follows directly from the heading. This also doesn't make sense b/c ingestion time and after the datasource is created are not exclusive. Would it make sense to say a new datasource vs/ existing datasource? You can create a projection: * in the ingestion spec/query in an new datasource * in the catalalog for an existing data source * in the compaction spec in an existing datas source. Note that the catalog is preferred over compaction spec for existing data sources. ########## docs/querying/projections.md: ########## @@ -0,0 +1,179 @@ +--- +id: projections +title: Query projections +sidebar_label: Projections +description: . +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +Projections are a type of aggregation that is computed and stored as part of a segment. The pre-aggregated data can speed up queries by reducing the number of rows that need to be processed for any query shape that matches a projection. + +## Create a projection + +A projection has three components: + +- Virtual columns (`spec.projections.virtualColumns`) that are used to compute a projection. The source data for the virtual columns must exist in your datasource. +- Grouping columns (`spec.projections.groupingColumns`) that are used to group a projection. They must either already exist in your datasource or be defined in `virtualColumns`. The order in which you define your grouping columns equates to the order in which data is sorted in the projection, always ascending. +- Aggregators (`spec.projections.aggregators`) that define the columns you want to create projections for and which aggregator to use for that column. They must either already exist in your datasource or be defined in `virtualColumns`. + +The aggregators are what Druid attempts to match when you run a query. If an aggregator in a query matches an aggregator you defined in your projection, Druid uses it. + +You can either create a projection at ingestion time or after the datasource is created. + +Note that any projection dimension you create becomes part of your datasource. To remove a projection from your datasource, you need to reingest the data. Alternatively, you can use a query context parameter to not use projections for a specific query. + + + +### At ingestion time + +To create a projection at ingestion time, use the [`projectionsSpec` block in your ingestion spec](../ingestion/ingestion-spec.md#projections). Review Comment: is this just native json ingestion? I don't think of MSQ ingestion having an "ingestion spec" I feel like both of these need examples. ########## docs/querying/projections.md: ########## @@ -0,0 +1,179 @@ +--- +id: projections +title: Query projections +sidebar_label: Projections +description: . +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +Projections are a type of aggregation that is computed and stored as part of a segment. The pre-aggregated data can speed up queries by reducing the number of rows that need to be processed for any query shape that matches a projection. + +## Create a projection + +A projection has three components: + +- Virtual columns (`spec.projections.virtualColumns`) that are used to compute a projection. The source data for the virtual columns must exist in your datasource. +- Grouping columns (`spec.projections.groupingColumns`) that are used to group a projection. They must either already exist in your datasource or be defined in `virtualColumns`. The order in which you define your grouping columns equates to the order in which data is sorted in the projection, always ascending. +- Aggregators (`spec.projections.aggregators`) that define the columns you want to create projections for and which aggregator to use for that column. They must either already exist in your datasource or be defined in `virtualColumns`. + +The aggregators are what Druid attempts to match when you run a query. If an aggregator in a query matches an aggregator you defined in your projection, Druid uses it. + +You can either create a projection at ingestion time or after the datasource is created. + +Note that any projection dimension you create becomes part of your datasource. To remove a projection from your datasource, you need to reingest the data. Alternatively, you can use a query context parameter to not use projections for a specific query. + + + +### At ingestion time + +To create a projection at ingestion time, use the [`projectionsSpec` block in your ingestion spec](../ingestion/ingestion-spec.md#projections). + +To create projections for SQL-based ingestion, you need to also have the [`druid-catalog`](../development/extensions-core/catalog.md) extension loaded. + +### After ingestion + +You can define a projection after you ingest data. Although you can define the projection in a compaction spec, we recommend using the [`druid-catalog`](../development/extensions-core/catalog.md) extension. + +The following API call includes a payload with the `properties.projections` block that defines your projections: + +<details> +<summary>View the payload</summary> + +```json {11,19,39} showLineNumbers +{ + "type": "datasource", + "columns": [], + "properties": { + "segmentGranularity": "PT1H", + "projections": [ + { + "spec": { + "name": "channel_page_hourly_distinct_user_added_deleted", + "type": "aggregate", + "virtualColumns": [ + { + "type": "expression", + "name": "__gran", + "expression": "timestamp_floor(__time, 'PT1H')", + "outputType": "LONG" + } + ], + "groupingColumns": [ + { + "type": "long", + "name": "__gran", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "string", + "name": "channel", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "page", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "aggregators": [ + { + "type": "HLLSketchBuild", + "name": "distinct_users", + "fieldName": "user", + "lgK": 12, + "tgtHllType": "HLL_4" + }, + { + "type": "longSum", + "name": "sum_added", + "fieldName": "added" + }, + { + "type": "longSum", + "name": "sum_deleted", + "fieldName": "deleted" + } + ] + } + } + ] + } +} +``` + +</details> + +In this example, Druid aggregates data into `distinct_user`, `sum_added`, and `sum_deleted` dimensions based on the aggregator that's specified and a source dimension. These aggregations are grouped by the columns you define in `groupingColumns`. + +## Use a projection + +Druid automatically uses a projection if your query matches a projection you've defined. There are some query context parameters that give you some control on how projections are used and Druid's behavior: + +- `useProjection`: The name of a projection you defined. The query engine must use that projection and will fail the query if the projection does not match the query. +- `forceProjections` `true` or `false`. The query engine must use a projection and will fail the query if there isn't a matching projection. +- `noProjections`: `true` or `false`. The query engine won't use any projections. + +## Compaction + +To use compaction on a datasource that includes projections, you need to set the type to catalog: `spec.type: catalog`: Review Comment: ```suggestion To use compaction on a datasource that includes projections, you need to set the spec type to catalog: `spec.type: catalog`: ``` ########## docs/querying/projections.md: ########## @@ -0,0 +1,179 @@ +--- +id: projections +title: Query projections +sidebar_label: Projections +description: . +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +Projections are a type of aggregation that is computed and stored as part of a segment. The pre-aggregated data can speed up queries by reducing the number of rows that need to be processed for any query shape that matches a projection. + +## Create a projection + +A projection has three components: + +- Virtual columns (`spec.projections.virtualColumns`) that are used to compute a projection. The source data for the virtual columns must exist in your datasource. +- Grouping columns (`spec.projections.groupingColumns`) that are used to group a projection. They must either already exist in your datasource or be defined in `virtualColumns`. The order in which you define your grouping columns equates to the order in which data is sorted in the projection, always ascending. +- Aggregators (`spec.projections.aggregators`) that define the columns you want to create projections for and which aggregator to use for that column. They must either already exist in your datasource or be defined in `virtualColumns`. + +The aggregators are what Druid attempts to match when you run a query. If an aggregator in a query matches an aggregator you defined in your projection, Druid uses it. + +You can either create a projection at ingestion time or after the datasource is created. + +Note that any projection dimension you create becomes part of your datasource. To remove a projection from your datasource, you need to reingest the data. Alternatively, you can use a query context parameter to not use projections for a specific query. + + + +### At ingestion time + +To create a projection at ingestion time, use the [`projectionsSpec` block in your ingestion spec](../ingestion/ingestion-spec.md#projections). + +To create projections for SQL-based ingestion, you need to also have the [`druid-catalog`](../development/extensions-core/catalog.md) extension loaded. + +### After ingestion + +You can define a projection after you ingest data. Although you can define the projection in a compaction spec, we recommend using the [`druid-catalog`](../development/extensions-core/catalog.md) extension. + +The following API call includes a payload with the `properties.projections` block that defines your projections: + +<details> +<summary>View the payload</summary> + +```json {11,19,39} showLineNumbers +{ + "type": "datasource", + "columns": [], + "properties": { + "segmentGranularity": "PT1H", + "projections": [ + { + "spec": { + "name": "channel_page_hourly_distinct_user_added_deleted", + "type": "aggregate", + "virtualColumns": [ + { + "type": "expression", + "name": "__gran", + "expression": "timestamp_floor(__time, 'PT1H')", + "outputType": "LONG" + } + ], + "groupingColumns": [ + { + "type": "long", + "name": "__gran", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "string", + "name": "channel", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "page", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "aggregators": [ + { + "type": "HLLSketchBuild", + "name": "distinct_users", + "fieldName": "user", + "lgK": 12, + "tgtHllType": "HLL_4" + }, + { + "type": "longSum", + "name": "sum_added", + "fieldName": "added" + }, + { + "type": "longSum", + "name": "sum_deleted", + "fieldName": "deleted" + } + ] + } + } + ] + } +} +``` + +</details> + +In this example, Druid aggregates data into `distinct_user`, `sum_added`, and `sum_deleted` dimensions based on the aggregator that's specified and a source dimension. These aggregations are grouped by the columns you define in `groupingColumns`. + +## Use a projection + +Druid automatically uses a projection if your query matches a projection you've defined. There are some query context parameters that give you some control on how projections are used and Druid's behavior: + +- `useProjection`: The name of a projection you defined. The query engine must use that projection and will fail the query if the projection does not match the query. +- `forceProjections` `true` or `false`. The query engine must use a projection and will fail the query if there isn't a matching projection. Review Comment: ```suggestion - `forceProjections`: Set to `true` to require the query engine to use a projection. Otherwise the query fails when no matching projection exists. Defaults to `false`. ``` ########## docs/ingestion/ingestion-spec.md: ########## @@ -396,6 +396,46 @@ The `filter` conditionally filters input rows during ingestion. Only rows that p ingested. Any of Druid's standard [query filters](../querying/filters.md) can be used. Note that within a `transformSpec`, the `transforms` are applied before the `filter`, so the filter can refer to a transform. +### Projections + +Projections are pre-aggregated segments that can speed up queries by reducing the number of rows that need to be processed. Use the `projectionsSpec` block to define projections for your data during ingestion or [create them afterwards](../querying/projections.md#after-ingestion). Review Comment: ```suggestion Projections are a type of aggregation that Druid computes and stores as part of a segment. The pre-aggregated data reduces the number of rows for the query engine to process. This can speed up queries for query shapes that match a projection. Define projections for a new data source in the `projectionsSpec` block during ingestion. To add projections for an existing data source, see [create them afterwards](../querying/projections.md#after-ingestion). ``` ########## docs/querying/projections.md: ########## @@ -0,0 +1,179 @@ +--- +id: projections +title: Query projections +sidebar_label: Projections +description: . +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +Projections are a type of aggregation that is computed and stored as part of a segment. The pre-aggregated data can speed up queries by reducing the number of rows that need to be processed for any query shape that matches a projection. + +## Create a projection + +A projection has three components: + +- Virtual columns (`spec.projections.virtualColumns`) that are used to compute a projection. The source data for the virtual columns must exist in your datasource. +- Grouping columns (`spec.projections.groupingColumns`) that are used to group a projection. They must either already exist in your datasource or be defined in `virtualColumns`. The order in which you define your grouping columns equates to the order in which data is sorted in the projection, always ascending. +- Aggregators (`spec.projections.aggregators`) that define the columns you want to create projections for and which aggregator to use for that column. They must either already exist in your datasource or be defined in `virtualColumns`. + +The aggregators are what Druid attempts to match when you run a query. If an aggregator in a query matches an aggregator you defined in your projection, Druid uses it. + +You can either create a projection at ingestion time or after the datasource is created. + +Note that any projection dimension you create becomes part of your datasource. To remove a projection from your datasource, you need to reingest the data. Alternatively, you can use a query context parameter to not use projections for a specific query. + + + +### At ingestion time + +To create a projection at ingestion time, use the [`projectionsSpec` block in your ingestion spec](../ingestion/ingestion-spec.md#projections). + +To create projections for SQL-based ingestion, you need to also have the [`druid-catalog`](../development/extensions-core/catalog.md) extension loaded. + +### After ingestion + +You can define a projection after you ingest data. Although you can define the projection in a compaction spec, we recommend using the [`druid-catalog`](../development/extensions-core/catalog.md) extension. + +The following API call includes a payload with the `properties.projections` block that defines your projections: + +<details> +<summary>View the payload</summary> + +```json {11,19,39} showLineNumbers +{ + "type": "datasource", + "columns": [], + "properties": { + "segmentGranularity": "PT1H", + "projections": [ + { + "spec": { + "name": "channel_page_hourly_distinct_user_added_deleted", + "type": "aggregate", + "virtualColumns": [ + { + "type": "expression", + "name": "__gran", + "expression": "timestamp_floor(__time, 'PT1H')", + "outputType": "LONG" + } + ], + "groupingColumns": [ + { + "type": "long", + "name": "__gran", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "string", + "name": "channel", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "page", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "aggregators": [ + { + "type": "HLLSketchBuild", + "name": "distinct_users", + "fieldName": "user", + "lgK": 12, + "tgtHllType": "HLL_4" + }, + { + "type": "longSum", + "name": "sum_added", + "fieldName": "added" + }, + { + "type": "longSum", + "name": "sum_deleted", + "fieldName": "deleted" + } + ] + } + } + ] + } +} +``` + +</details> + +In this example, Druid aggregates data into `distinct_user`, `sum_added`, and `sum_deleted` dimensions based on the aggregator that's specified and a source dimension. These aggregations are grouped by the columns you define in `groupingColumns`. + +## Use a projection + +Druid automatically uses a projection if your query matches a projection you've defined. There are some query context parameters that give you some control on how projections are used and Druid's behavior: + +- `useProjection`: The name of a projection you defined. The query engine must use that projection and will fail the query if the projection does not match the query. Review Comment: ```suggestion - `useProjection`: The name of a projection. Set to `true` to require the query engine to use a specific projection. Otherwise the query fails when no matching projection exists. No default. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
