gianm commented on a change in pull request #9436:
URL: https://github.com/apache/druid/pull/9436#discussion_r417740120



##########
File path: 
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/quantiles/DoublesSketchToHistogramPostAggregator.java
##########
@@ -29,45 +29,68 @@
 import org.apache.druid.query.aggregation.PostAggregator;
 import org.apache.druid.query.cache.CacheKeyBuilder;
 
+import javax.annotation.Nullable;
 import java.util.Arrays;
 import java.util.Comparator;
 import java.util.Map;
 import java.util.Set;
 
 public class DoublesSketchToHistogramPostAggregator implements PostAggregator
 {
+  static final int DEFAULT_NUM_BINS = 10;
 
   private final String name;
   private final PostAggregator field;
   private final double[] splitPoints;
+  private final Integer numBins;
 
   @JsonCreator
   public DoublesSketchToHistogramPostAggregator(
       @JsonProperty("name") final String name,
       @JsonProperty("field") final PostAggregator field,
-      @JsonProperty("splitPoints") final double[] splitPoints)
+      @JsonProperty("splitPoints") @Nullable final double[] splitPoints,
+      @JsonProperty("numBins") @Nullable final Integer numBins)
   {
     this.name = Preconditions.checkNotNull(name, "name is null");
     this.field = Preconditions.checkNotNull(field, "field is null");
-    this.splitPoints = Preconditions.checkNotNull(splitPoints, "array of split 
points is null");
+    this.splitPoints = splitPoints;
+    this.numBins = numBins;
   }
 
   @Override
   public Object compute(final Map<String, Object> combinedAggregators)
   {
     final DoublesSketch sketch = (DoublesSketch) 
field.compute(combinedAggregators);
+    final int numBins = splitPoints != null ? splitPoints.length + 1 :

Review comment:
       Since `this.numBins` will be ignored if `splitPoints` is set, it'd be 
best to throw an error in the constructor if both are provided.  Could you 
please add one (an error message like `"Cannot accept both 'numBins' and 
'splitPoints'"`).

##########
File path: 
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/quantiles/DoublesSketchToHistogramPostAggregator.java
##########
@@ -125,14 +155,26 @@ public boolean equals(final Object o)
     if (!Arrays.equals(splitPoints, that.splitPoints)) {
       return false;
     }
-    return field.equals(that.field);
+    if (!field.equals(that.field)) {
+      return false;
+    }
+    if (numBins == null && that.numBins == null) {

Review comment:
       Please add an EqualsVerifier test too. We don't have them everywhere yet 
but we're trying to add them in patches that adjust equals or hashCode 
implementations. Check out InFilterTest for an example. It looks like this.
   
   ```
     @Test
     public void testEquals()
     {
       EqualsVerifier.forClass(InFilter.class)
                     .usingGetClass()
                     .withNonnullFields("dimension", "values")
                     .withIgnoredFields("longPredicateSupplier", 
"floatPredicateSupplier", "doublePredicateSupplier")
                     .verify();
     }
   ```

##########
File path: 
extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/quantiles/DoublesSketchToHistogramPostAggregator.java
##########
@@ -87,6 +110,12 @@ public PostAggregator getField()
     return splitPoints;
   }
 
+  @JsonProperty
+  public Integer getNumBins()

Review comment:
       Suggest doing `@JsonInclude(JsonInclude.Include.NON_NULL)` so the 
serialized form is cleaner when numBins is null.
   
   (It would make sense to add this to getSplitPoints too.)

##########
File path: docs/development/extensions-core/datasketches-quantiles.md
##########
@@ -87,14 +87,15 @@ This returns an array of quantiles corresponding to a given 
array of fractions
 
 #### Histogram
 
-This returns an approximation to the histogram given an array of split points 
that define the histogram bins. An array of <i>m</i> unique, monotonically 
increasing split points divide the real number line into <i>m+1</i> consecutive 
disjoint intervals. The definition of an interval is inclusive of the left 
split point and exclusive of the right split point.
+This returns an approximation to the histogram given an array of split points 
that define the histogram bins or a number of bins. An array of <i>m</i> 
unique, monotonically increasing split points divide the real number line into 
<i>m+1</i> consecutive disjoint intervals. The definition of an interval is 
inclusive of the left split point and exclusive of the right split point. If 
the number of bins is specified instead of split points, the interval between 
the minimum and maximum values is divided into the given number of 
equally-spaced bins.

Review comment:
       It's suggested, but not explicitly called out, that you aren't allowed 
to specify both `splitPoints` and `numBins`. It'd be better to call it out.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to