kasakrisz commented on code in PR #6496:
URL: https://github.com/apache/hive/pull/6496#discussion_r3332966797
##########
ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java:
##########
@@ -239,22 +239,33 @@ public static long getNumRows(HiveConf conf,
List<ColumnInfo> schema, Table tabl
return aggregateStat.getNumRows();
}
- private static void estimateStatsForMissingCols(List<String> neededColumns,
List<ColStatistics> columnStats,
- HiveConf conf, long nr, List<ColumnInfo> schema) {
+ /**
+ * Estimates column statistics for columns specified in {@code
neededColumnNames}
+ * that do not already have statistics in the {@code existingColStats} list.
+ *
+ * @return A {@link List} of {@link ColStatistics} objects containing
+ * both the provided existing statistics and the newly estimated ones.
+ */
+ static List<ColStatistics> estimateStatsForMissingCols(
+ List<String> neededColumnNames, List<ColStatistics> existingColStats,
HiveConf conf, long nr,
+ List<ColumnInfo> schema) {
- Set<String> neededCols = new HashSet<>(neededColumns);
- Set<String> colsWithStats = new HashSet<>();
+ Set<String> neededCols = new HashSet<>(neededColumnNames);
+ Set<String> columnNamesWithStats = new HashSet<>(existingColStats.size());
Review Comment:
I looked into this: `HashSet.newHashSet(int size)` calculates the actual
capacity of the set based on the load factor and the provided `size`, ensuring
that rehashing is not triggered until the set reaches the desired number of
elements + 1.
`new HashSet(size)`, on the other hand, uses the provided `size` as the
set's initial capacity. Once the set reaches 75% of that capacity (the default
load factor), a rehash is triggered. This makes specifying the initial size
ineffective in our case.
Thanks for pointing this out.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]