leerho commented on code in PR #35: URL: https://github.com/apache/datasketches-bigquery/pull/35#discussion_r1757695751
########## kll/sqlx/kll_sketch_float_get_cdf.sqlx: ########## @@ -24,20 +24,36 @@ RETURNS ARRAY<FLOAT64> LANGUAGE js OPTIONS ( library=["gs://$GCS_BUCKET/kll_sketch.js"], - description = '''Returns an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF, -of the input stream given a set of split points. -Param sketch: the given sketch in serialized form. + description = '''Returns an approximation to the Cumulative Distribution Function (CDF) +of the input stream as an array of cumulative probabilities defined by the given split_points. + +Param sketch: the given sketch as BYTES. + Param split_points: an array of M unique, monotonically increasing values -that divide the input domain into M+1 consecutive disjoint intervals. -Param inclusive: if true the rank of a value includes its own weight, -and therefore if the sketch contains values equal to a slit point, -then in CDF such values are included into the interval to the left of split point. -Otherwise they are included into the interval to the right of split point. -Returns: an array of M+1 values which are a consecutive approximation to the CDF -of the input stream given the split_points. The value at array position j of the returned -CDF array is the sum of the returned values in positions 0 through j of the PMF array. -This can be viewed as array of ranks of the given split points plus one more value that is always 1. -For more details: https://datasketches.apache.org/docs/KLL/KLLSketch.html''' + (of the same type as the input values to the sketch) + that divide the input value domain into M+1 overlapping intervals. + + The start of each interval is below the lowest input value retained by the sketch + (corresponding to a zero rank or zero probability). + + The end of each interval is the associated split-point except for the top interval + where the end is the maximum input value of the stream. + +Param inclusive: if true and the upper boundary of an interval equals a value retained by the sketch, the interval will include that value. + If the lower boundary of an interval equals a value retained by the sketch, the interval will exclude that value. + + If false and the upper boundary of an interval equals a value retained by the sketch, the interval will exclude that value. + If the lower boundary of an interval equals a value retained by the sketch, the interval will include that value. + +Returns: the CDF as a monotonically increasing FLOAT64 array of M+1 cumulative probablities on the interval (0.0, 1.0]. Review Comment: Yes, will fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
