asolimando commented on code in PR #3997:
URL: https://github.com/apache/hive/pull/3997#discussion_r1092311907
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/columnstats/merge/DateColumnStatsMerger.java:
##########
@@ -43,64 +46,57 @@ public void merge(ColumnStatisticsObj aggregateColStats,
ColumnStatisticsObj new
DateColumnStatsDataInspector aggregateData =
dateInspectorFromStats(aggregateColStats);
DateColumnStatsDataInspector newData = dateInspectorFromStats(newColStats);
- setLowValue(aggregateData, newData);
- setHighValue(aggregateData, newData);
-
- aggregateData.setNumNulls(aggregateData.getNumNulls() +
newData.getNumNulls());
- if (aggregateData.getNdvEstimator() == null || newData.getNdvEstimator()
== null) {
- aggregateData.setNumDVs(Math.max(aggregateData.getNumDVs(),
newData.getNumDVs()));
- } else {
- NumDistinctValueEstimator oldEst = aggregateData.getNdvEstimator();
- NumDistinctValueEstimator newEst = newData.getNdvEstimator();
- final long ndv;
- if (oldEst.canMerge(newEst)) {
- oldEst.mergeEstimators(newEst);
- ndv = oldEst.estimateNumDistinctValues();
- aggregateData.setNdvEstimator(oldEst);
- } else {
- ndv = Math.max(aggregateData.getNumDVs(), newData.getNumDVs());
- }
- LOG.debug("Use bitvector to merge column {}'s ndvs of {} and {} to be
{}", aggregateColStats.getColName(),
- aggregateData.getNumDVs(), newData.getNumDVs(), ndv);
- aggregateData.setNumDVs(ndv);
+ Date lowValue = mergeLowValue(getLowValue(aggregateData),
getLowValue(newData));
+ if (lowValue != null) {
+ aggregateData.setLowValue(lowValue);
+ }
+ Date highValue = mergeHighValue(getHighValue(aggregateData),
getHighValue(newData));
+ if (highValue != null) {
Review Comment:
Thanks @akshat0395 for your comment, you are right, but unfortunately the
problem lies in the `*ColumnStatsDataInspector` objects, which inherits from
the corresponding `*ColumnStatsData`, which don't have a common ancestor.
For this reason it's hard to write a generic method, we would need to rely
on generic for the data type (say `Double`) and anyway a big switch for
handling the *ColumnStatsDataInspector objects and casting, it would rapidly
defeat the purpose.
This lack of class hierarchy for column statistics comes directly from the
thrift interface objects for the column statistics in the metastore (see
[here](https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift)).
HIVE-26313 could be a chance to overcome that, and define a better hierarchy
for those classes, since we won't anyway be having a 1-1 mapping from a column
in the metastore for column statistics and a Java class, but we will have to
parse a (probably Json) blob and do the "mapping" ourselves.
In summary, I'd rather leave this for when we will have a better class
hierarchy for those objects (hopefully with HIVE-26313). WDYT?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]