gianm commented on a change in pull request #9436:
URL: https://github.com/apache/druid/pull/9436#discussion_r417740120
##########
File path:
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/quantiles/DoublesSketchToHistogramPostAggregator.java
##########
@@ -29,45 +29,68 @@
import org.apache.druid.query.aggregation.PostAggregator;
import org.apache.druid.query.cache.CacheKeyBuilder;
+import javax.annotation.Nullable;
import java.util.Arrays;
import java.util.Comparator;
import java.util.Map;
import java.util.Set;
public class DoublesSketchToHistogramPostAggregator implements PostAggregator
{
+ static final int DEFAULT_NUM_BINS = 10;
private final String name;
private final PostAggregator field;
private final double[] splitPoints;
+ private final Integer numBins;
@JsonCreator
public DoublesSketchToHistogramPostAggregator(
@JsonProperty("name") final String name,
@JsonProperty("field") final PostAggregator field,
- @JsonProperty("splitPoints") final double[] splitPoints)
+ @JsonProperty("splitPoints") @Nullable final double[] splitPoints,
+ @JsonProperty("numBins") @Nullable final Integer numBins)
{
this.name = Preconditions.checkNotNull(name, "name is null");
this.field = Preconditions.checkNotNull(field, "field is null");
- this.splitPoints = Preconditions.checkNotNull(splitPoints, "array of split
points is null");
+ this.splitPoints = splitPoints;
+ this.numBins = numBins;
}
@Override
public Object compute(final Map<String, Object> combinedAggregators)
{
final DoublesSketch sketch = (DoublesSketch)
field.compute(combinedAggregators);
+ final int numBins = splitPoints != null ? splitPoints.length + 1 :
Review comment:
Since `this.numBins` will be ignored if `splitPoints` is set, it'd be
best to throw an error in the constructor if both are provided. Could you
please add one (an error message like `"Cannot accept both 'numBins' and
'splitPoints'"`).
##########
File path:
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/quantiles/DoublesSketchToHistogramPostAggregator.java
##########
@@ -125,14 +155,26 @@ public boolean equals(final Object o)
if (!Arrays.equals(splitPoints, that.splitPoints)) {
return false;
}
- return field.equals(that.field);
+ if (!field.equals(that.field)) {
+ return false;
+ }
+ if (numBins == null && that.numBins == null) {
Review comment:
Please add an EqualsVerifier test too. We don't have them everywhere yet
but we're trying to add them in patches that adjust equals or hashCode
implementations. Check out InFilterTest for an example. It looks like this.
```
@Test
public void testEquals()
{
EqualsVerifier.forClass(InFilter.class)
.usingGetClass()
.withNonnullFields("dimension", "values")
.withIgnoredFields("longPredicateSupplier",
"floatPredicateSupplier", "doublePredicateSupplier")
.verify();
}
```
##########
File path:
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/quantiles/DoublesSketchToHistogramPostAggregator.java
##########
@@ -87,6 +110,12 @@ public PostAggregator getField()
return splitPoints;
}
+ @JsonProperty
+ public Integer getNumBins()
Review comment:
Suggest doing `@JsonInclude(JsonInclude.Include.NON_NULL)` so the
serialized form is cleaner when numBins is null.
(It would make sense to add this to getSplitPoints too.)
##########
File path: docs/development/extensions-core/datasketches-quantiles.md
##########
@@ -87,14 +87,15 @@ This returns an array of quantiles corresponding to a given
array of fractions
#### Histogram
-This returns an approximation to the histogram given an array of split points
that define the histogram bins. An array of <i>m</i> unique, monotonically
increasing split points divide the real number line into <i>m+1</i> consecutive
disjoint intervals. The definition of an interval is inclusive of the left
split point and exclusive of the right split point.
+This returns an approximation to the histogram given an array of split points
that define the histogram bins or a number of bins. An array of <i>m</i>
unique, monotonically increasing split points divide the real number line into
<i>m+1</i> consecutive disjoint intervals. The definition of an interval is
inclusive of the left split point and exclusive of the right split point. If
the number of bins is specified instead of split points, the interval between
the minimum and maximum values is divided into the given number of
equally-spaced bins.
Review comment:
It's suggested, but not explicitly called out, that you aren't allowed
to specify both `splitPoints` and `numBins`. It'd be better to call it out.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]