[ 
https://issues.apache.org/jira/browse/HIVE-27000?focusedWorklogId=842453&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-842453
 ]

ASF GitHub Bot logged work on HIVE-27000:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 31/Jan/23 08:14
            Start Date: 31/Jan/23 08:14
    Worklog Time Spent: 10m 
      Work Description: akshat0395 commented on code in PR #3997:
URL: https://github.com/apache/hive/pull/3997#discussion_r1091578039


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/columnstats/merge/DateColumnStatsMerger.java:
##########
@@ -43,64 +46,57 @@ public void merge(ColumnStatisticsObj aggregateColStats, 
ColumnStatisticsObj new
     DateColumnStatsDataInspector aggregateData = 
dateInspectorFromStats(aggregateColStats);
     DateColumnStatsDataInspector newData = dateInspectorFromStats(newColStats);
 
-    setLowValue(aggregateData, newData);
-    setHighValue(aggregateData, newData);
-
-    aggregateData.setNumNulls(aggregateData.getNumNulls() + 
newData.getNumNulls());
-    if (aggregateData.getNdvEstimator() == null || newData.getNdvEstimator() 
== null) {
-      aggregateData.setNumDVs(Math.max(aggregateData.getNumDVs(), 
newData.getNumDVs()));
-    } else {
-      NumDistinctValueEstimator oldEst = aggregateData.getNdvEstimator();
-      NumDistinctValueEstimator newEst = newData.getNdvEstimator();
-      final long ndv;
-      if (oldEst.canMerge(newEst)) {
-        oldEst.mergeEstimators(newEst);
-        ndv = oldEst.estimateNumDistinctValues();
-        aggregateData.setNdvEstimator(oldEst);
-      } else {
-        ndv = Math.max(aggregateData.getNumDVs(), newData.getNumDVs());
-      }
-      LOG.debug("Use bitvector to merge column {}'s ndvs of {} and {} to be 
{}", aggregateColStats.getColName(),
-          aggregateData.getNumDVs(), newData.getNumDVs(), ndv);
-      aggregateData.setNumDVs(ndv);
+    Date lowValue = mergeLowValue(getLowValue(aggregateData), 
getLowValue(newData));
+    if (lowValue != null) {
+      aggregateData.setLowValue(lowValue);
+    }
+    Date highValue = mergeHighValue(getHighValue(aggregateData), 
getHighValue(newData));
+    if (highValue != null) {

Review Comment:
   This code block seems to be duplicate for multiple StatsMerger classes, can 
we move this to a shared class to avoid duplication.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 842453)
    Time Spent: 40m  (was: 0.5h)

> Improve the modularity of the *ColumnStatsMerger classes
> --------------------------------------------------------
>
>                 Key: HIVE-27000
>                 URL: https://issues.apache.org/jira/browse/HIVE-27000
>             Project: Hive
>          Issue Type: Improvement
>          Components: Statistics
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Alessandro Solimando
>            Assignee: Alessandro Solimando
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> *ColumnStatsMerger classes contain a lot of duplicate code which is not 
> specific to the data type, and that could therefore be lifted to a common 
> parent class.
> This phenomenon is bound to become even worse if we keep enriching further 
> our supported set of statistics as we did in the context of HIVE-26221.
> The current ticket aims at improving the modularity and code reuse of the 
> *ColumnStatsMerger classes, while improving unit-test coverage to cover all 
> classes and support more use-cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to