techdocsmith commented on code in PR #13486: URL: https://github.com/apache/druid/pull/13486#discussion_r1038651506
########## docs/development/extensions-core/datasketches-tuple.md: ########## @@ -50,8 +50,41 @@ druid.extensions.loadList=["druid-datasketches"] |name|A String for the output (result) name of the calculation.|yes| |fieldName|A String for the name of the input field.|yes| |nominalEntries|Parameter that determines the accuracy and size of the sketch. Higher k means higher accuracy but more space to store sketches. Must be a power of 2. See the [Theta sketch accuracy](https://datasketches.apache.org/docs/Theta/ThetaErrorTable) for details. |no, defaults to 16384| -|numberOfValues|Number of values associated with each distinct key. |no, defaults to 1| -|metricColumns|If building sketches from raw data, an array of names of the input columns containing numeric values to be associated with each distinct key.|no, defaults to empty array| +|metricColumns|If building sketches from raw data, an array of names of the input columns containing numeric values to be associated with each distinct key. If not provided `filedName` is assumed to be an arrayOfDoublesSketch|no, if not provided `filedName` is assumed to be an arrayOfDoublesSketch| +|numberOfValues|Number of values associated with each distinct key. |no, defaults to the length of `metricColumns` if provided and 1 otherwise| + +The `arrayOfDoublesSketch` aggregator has two modes of useage: + +- built from raw data - `metricColumns` is set to an array +- directly on top of an ArrayOfDoubles sketch - `metricColumns` is unset and `fieldName` represents an ArrayOfDoubles sketch (base64 encoded if at ingestion time) with `numberOfValues` doubles. + +#### Example on top of raw data + +Compute a theta of unique users, for each user store the `added` and `deleted` scoers Review Comment: ```suggestion Compute a theta of unique users. For each user store the `added` and `deleted` scores in a column called `users_theta`. ``` typo? ########## docs/development/extensions-core/datasketches-tuple.md: ########## @@ -50,8 +50,41 @@ druid.extensions.loadList=["druid-datasketches"] |name|A String for the output (result) name of the calculation.|yes| |fieldName|A String for the name of the input field.|yes| |nominalEntries|Parameter that determines the accuracy and size of the sketch. Higher k means higher accuracy but more space to store sketches. Must be a power of 2. See the [Theta sketch accuracy](https://datasketches.apache.org/docs/Theta/ThetaErrorTable) for details. |no, defaults to 16384| -|numberOfValues|Number of values associated with each distinct key. |no, defaults to 1| -|metricColumns|If building sketches from raw data, an array of names of the input columns containing numeric values to be associated with each distinct key.|no, defaults to empty array| +|metricColumns|If building sketches from raw data, an array of names of the input columns containing numeric values to be associated with each distinct key. If not provided `filedName` is assumed to be an arrayOfDoublesSketch|no, if not provided `filedName` is assumed to be an arrayOfDoublesSketch| +|numberOfValues|Number of values associated with each distinct key. |no, defaults to the length of `metricColumns` if provided and 1 otherwise| + +The `arrayOfDoublesSketch` aggregator has two modes of useage: + +- built from raw data - `metricColumns` is set to an array +- directly on top of an ArrayOfDoubles sketch - `metricColumns` is unset and `fieldName` represents an ArrayOfDoubles sketch (base64 encoded if at ingestion time) with `numberOfValues` doubles. + +#### Example on top of raw data + +Compute a theta of unique users, for each user store the `added` and `deleted` scoers + +```json +{ + "type": "arrayOfDoublesSketch", + "name": "users_theta", + "fieldName": "user", + "nominalEntries": 16384, + "metricColumns": ["added", "deleted"], +} +``` + +### Example on top of precomputed sketchs + +Ingest a sketch column called `user_sketches` that has two doubles in its array. Review Comment: ```suggestion Ingest a sketch column called `user_sketches` that has a base-64 encoded value of two doubles in its array and store it in a column called `users_theta`. ``` ########## docs/development/extensions-core/datasketches-tuple.md: ########## @@ -50,8 +50,41 @@ druid.extensions.loadList=["druid-datasketches"] |name|A String for the output (result) name of the calculation.|yes| |fieldName|A String for the name of the input field.|yes| |nominalEntries|Parameter that determines the accuracy and size of the sketch. Higher k means higher accuracy but more space to store sketches. Must be a power of 2. See the [Theta sketch accuracy](https://datasketches.apache.org/docs/Theta/ThetaErrorTable) for details. |no, defaults to 16384| -|numberOfValues|Number of values associated with each distinct key. |no, defaults to 1| -|metricColumns|If building sketches from raw data, an array of names of the input columns containing numeric values to be associated with each distinct key.|no, defaults to empty array| +|metricColumns|If building sketches from raw data, an array of names of the input columns containing numeric values to be associated with each distinct key. If not provided `filedName` is assumed to be an arrayOfDoublesSketch|no, if not provided `filedName` is assumed to be an arrayOfDoublesSketch| Review Comment: ```suggestion |metricColumns|When building sketches from raw data, an array input column that contain numeric values to associate with each distinct key. If not provided, assumes `fieldName` is an `arrayOfDoublesSketch`|no, if not provided `fieldName` is assumed to be an arrayOfDoublesSketch| ``` ########## docs/development/extensions-core/datasketches-tuple.md: ########## @@ -50,8 +50,41 @@ druid.extensions.loadList=["druid-datasketches"] |name|A String for the output (result) name of the calculation.|yes| |fieldName|A String for the name of the input field.|yes| |nominalEntries|Parameter that determines the accuracy and size of the sketch. Higher k means higher accuracy but more space to store sketches. Must be a power of 2. See the [Theta sketch accuracy](https://datasketches.apache.org/docs/Theta/ThetaErrorTable) for details. |no, defaults to 16384| -|numberOfValues|Number of values associated with each distinct key. |no, defaults to 1| -|metricColumns|If building sketches from raw data, an array of names of the input columns containing numeric values to be associated with each distinct key.|no, defaults to empty array| +|metricColumns|If building sketches from raw data, an array of names of the input columns containing numeric values to be associated with each distinct key. If not provided `filedName` is assumed to be an arrayOfDoublesSketch|no, if not provided `filedName` is assumed to be an arrayOfDoublesSketch| +|numberOfValues|Number of values associated with each distinct key. |no, defaults to the length of `metricColumns` if provided and 1 otherwise| + +The `arrayOfDoublesSketch` aggregator has two modes of useage: + +- built from raw data - `metricColumns` is set to an array +- directly on top of an ArrayOfDoubles sketch - `metricColumns` is unset and `fieldName` represents an ArrayOfDoubles sketch (base64 encoded if at ingestion time) with `numberOfValues` doubles. Review Comment: ```suggestion You can use the `arrayOfDoublesSketch` aggregator to: - Build sketches from raw data. In this case, set `metricColumns` to an array. - Build a sketch from an existing ArrayOfDoubles sketch . In this case, leave metricColumns` is unset and set the `fieldName` to an `ArrayOfDoubles` sketch with `numberOfValues` doubles. At ingestion time, you must base64 encode `ArrayOfDoubles` sketches at ingestion time. ``` ########## docs/development/extensions-core/datasketches-tuple.md: ########## @@ -50,8 +50,41 @@ druid.extensions.loadList=["druid-datasketches"] |name|A String for the output (result) name of the calculation.|yes| Review Comment: ```suggestion |name|String representing the output column to store sketch values.|yes| ``` This is the destination column for the sketch, no? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
