asolimando commented on code in PR #3997:
URL: https://github.com/apache/hive/pull/3997#discussion_r1092311907


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/columnstats/merge/DateColumnStatsMerger.java:
##########
@@ -43,64 +46,57 @@ public void merge(ColumnStatisticsObj aggregateColStats, 
ColumnStatisticsObj new
     DateColumnStatsDataInspector aggregateData = 
dateInspectorFromStats(aggregateColStats);
     DateColumnStatsDataInspector newData = dateInspectorFromStats(newColStats);
 
-    setLowValue(aggregateData, newData);
-    setHighValue(aggregateData, newData);
-
-    aggregateData.setNumNulls(aggregateData.getNumNulls() + 
newData.getNumNulls());
-    if (aggregateData.getNdvEstimator() == null || newData.getNdvEstimator() 
== null) {
-      aggregateData.setNumDVs(Math.max(aggregateData.getNumDVs(), 
newData.getNumDVs()));
-    } else {
-      NumDistinctValueEstimator oldEst = aggregateData.getNdvEstimator();
-      NumDistinctValueEstimator newEst = newData.getNdvEstimator();
-      final long ndv;
-      if (oldEst.canMerge(newEst)) {
-        oldEst.mergeEstimators(newEst);
-        ndv = oldEst.estimateNumDistinctValues();
-        aggregateData.setNdvEstimator(oldEst);
-      } else {
-        ndv = Math.max(aggregateData.getNumDVs(), newData.getNumDVs());
-      }
-      LOG.debug("Use bitvector to merge column {}'s ndvs of {} and {} to be 
{}", aggregateColStats.getColName(),
-          aggregateData.getNumDVs(), newData.getNumDVs(), ndv);
-      aggregateData.setNumDVs(ndv);
+    Date lowValue = mergeLowValue(getLowValue(aggregateData), 
getLowValue(newData));
+    if (lowValue != null) {
+      aggregateData.setLowValue(lowValue);
+    }
+    Date highValue = mergeHighValue(getHighValue(aggregateData), 
getHighValue(newData));
+    if (highValue != null) {

Review Comment:
   Thanks @akshat0395 for your comment, you are right, but unfortunately the 
problem lies in the `*ColumnStatsDataInspector` objects, which inherits from 
the corresponding `*ColumnStatsData`, which don't have a common ancestor.
   
   For this reason it's hard to write a generic method, we would need to rely 
on generic for the   data type (say `Double`) and anyway a big switch for 
handling the *ColumnStatsDataInspector objects and casting, it would rapidly 
defeat the purpose.
   
   This lack of class hierarchy for column statistics comes directly from the 
thrift interface objects for the column statistics in the metastore (see 
[here](https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift)).
   
   HIVE-26313 could be a chance to overcome that, and define a better hierarchy 
for those classes, since we won't anyway be having a 1-1 mapping from a column 
in the metastore for column statistics and a Java class, but we will have to 
parse a (probably Json) blob and do the "mapping" ourselves.
   
   In summary, I'd rather leave this for when we will have a better class 
hierarchy for those objects (hopefully with HIVE-26313). WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to