thomasrebele commented on code in PR #603:
URL: https://github.com/apache/hive/pull/603#discussion_r2630431180


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java:
##########
@@ -1524,6 +1531,77 @@ public Object process(Node nd, Stack<Node> stack, 
NodeProcessorCtx procCtx,
       return null;
     }
 
+    /**
+     * If possible, sets the min / max value for the column based on the 
aggregate function
+     * being calculated and its input.
+     */
+    private static void computeAggregateColumnMinMax(ColStatistics cs, 
HiveConf conf, AggregationDesc agg, String aggType,
+        Statistics parentStats) throws SemanticException {
+      if (agg.getParameters() != null && agg.getParameters().size() == 1) {
+        ColStatistics parentCS = StatsUtils.getColStatisticsFromExpression(
+            conf, parentStats, agg.getParameters().get(0));
+        if (parentCS != null && parentCS.getRange() != null &&
+            parentCS.getRange().minValue != null && 
parentCS.getRange().maxValue != null) {
+          long valuesCount = agg.getDistinct() ?
+              parentCS.getCountDistint() :
+              parentStats.getNumRows() - parentCS.getNumNulls();
+          Range range = parentCS.getRange();
+          // Get the aggregate function matching the name in the query.
+          GenericUDAFResolver udaf =
+              
FunctionRegistry.getGenericUDAFResolver(agg.getGenericUDAFName());
+          if (udaf instanceof GenericUDAFCount) {
+            cs.setRange(new Range(0, valuesCount));
+          } else if (udaf instanceof GenericUDAFMax || udaf instanceof 
GenericUDAFMin) {
+            cs.setRange(new Range(range.minValue, range.maxValue));
+          } else if (udaf instanceof GenericUDAFSum) {
+            switch (aggType) {
+            case serdeConstants.TINYINT_TYPE_NAME:
+            case serdeConstants.SMALLINT_TYPE_NAME:
+            case serdeConstants.DATE_TYPE_NAME:
+            case serdeConstants.INT_TYPE_NAME:
+            case serdeConstants.BIGINT_TYPE_NAME:
+              long maxValueLong = range.maxValue.longValue();
+              long minValueLong = range.minValue.longValue();
+              // If min value is less or equal to max value (legal)
+              if (minValueLong <= maxValueLong && minValueLong >= 0) {
+                // min = minValue, max = (minValue + maxValue) * 0.5 * 
parentNumRows
+                cs.setRange(new Range(
+                    minValueLong,
+                    StatsUtils.safeMult(
+                        StatsUtils.safeMult(StatsUtils.safeAdd(minValueLong, 
maxValueLong), 0.5),
+                        valuesCount)));
+              }
+              break;
+            case serdeConstants.FLOAT_TYPE_NAME:
+            case serdeConstants.DOUBLE_TYPE_NAME:
+              double maxValueDouble = range.maxValue.doubleValue();
+              double minValueDouble = range.minValue.doubleValue();
+              // If min value is less or equal to max value (legal)
+              if (minValueDouble <= maxValueDouble && minValueDouble >= 0) {
+                // min = minValue, max = (minValue + maxValue) * 0.5 * 
parentNumRows

Review Comment:
   What's the logic behind this? Wouldn't it be `max * parentNumRows`? Take for 
example the values [1, 1000, 1001, 1002, 1003]. If we sum them we get 4007. 
However, the formula evaluates to (1+1003)*0.5*5 = 2510, which is smaller than 
4007.
   
   I've checked the [most recent version of the 
file](https://github.com/apache/hive/blob/ee7138bf7a1a1ee1de07fa2243fe947a627268cb/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L1809),
 and it is still the same.
   
   Wdyt, @jcamachor?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to