ramitg254 commented on code in PR #6089:
URL: https://github.com/apache/hive/pull/6089#discussion_r2583687667
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##########
@@ -2254,6 +2292,69 @@ private List<ColumnStatisticsObj> aggrStatsUseDB(String
catName, String dbName,
}
}
+ private ColumnStatisticsObj
columnStatisticsObjWithAdjustedNDV(List<Object[]> list, int i,
Review Comment:
done, renamed list to `columnBatchesOutput` and removed it as it was not
needed
and you also need to use `-Dhive.stats.fetch.bitvector=false` also to step
into this method as it is associated with `aggrStatsUseDB`.
and if you want to compare results then first run the command without
`-Dhive.metastore.direct.sql.batch.size=1000` and store the results and then
compare those results with including
`-Dhive.metastore.direct.sql.batch.size=1000` to compare the results produced
after this.
this need to be done because `aggrStatsUseDB` sometimes produces different
results from expected as with disabled bit vector and disabled kll sketch the
exact ndv estimation is not possible in some cases so approximate results are
produced by `aggrStatsUseDB` and these changes are keeping the `aggrStatsUseDB`
batched results consistent with `aggrStatsUseDB` unbatched results.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]