jcamachor commented on a change in pull request #787: HIVE-22239 URL: https://github.com/apache/hive/pull/787#discussion_r334016828
########## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java ########## @@ -944,7 +948,7 @@ else if(colTypeLowerCase.equals(serdeConstants.SMALLINT_TYPE_NAME)){ } else if (colTypeLowerCase.equals(serdeConstants.DATE_TYPE_NAME)) { cs.setAvgColLen(JavaDataModel.get().lengthOfDate()); // epoch, days since epoch - cs.setRange(0, 25201); + cs.setRange(DATE_RANGE_LOWER_LIMIT, DATE_RANGE_UPPER_LIMIT); Review comment: Yeah, this is a heuristic... No matter what you do, you will always get it wrong in some cases. I guess the idea is to target the most common case. The solution to overestimation/underestimation is to compute column stats as you mentioned, we do not want to let user tune this too. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org