ramitg254 commented on code in PR #6089:
URL: https://github.com/apache/hive/pull/6089#discussion_r2583687667


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##########
@@ -2254,6 +2292,69 @@ private List<ColumnStatisticsObj> aggrStatsUseDB(String 
catName, String dbName,
     }
   }
 
+  private ColumnStatisticsObj 
columnStatisticsObjWithAdjustedNDV(List<Object[]> list, int i,

Review Comment:
   done, renamed list to `columnBatchesOutput` and removed it as it was not 
needed
   and you also need to use `-Dhive.stats.fetch.bitvector=false` also to step 
into this method as it is associated with `aggrStatsUseDB`. 
   and if you want to compare results then first run the command without 
`-Dhive.metastore.direct.sql.batch.size=1000` and store the results and then 
compare those results with including 
`-Dhive.metastore.direct.sql.batch.size=1000` to compare the results produced 
after this.
   
   this need to be done because `aggrStatsUseDB` sometimes produces different 
results from expected as with disabled bit vector and disabled kll sketch the 
exact ndv  estimation is not possible in some cases so approximate results are 
produced by `aggrStatsUseDB` and these changes are keeping the `aggrStatsUseDB` 
batched results consistent with `aggrStatsUseDB` unbatched results.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to