Kimahriman commented on code in PR #47393: URL: https://github.com/apache/spark/pull/47393#discussion_r1684338484
########## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ########## @@ -2129,6 +2129,13 @@ object SQLConf { .intConf .createWithDefault(100) + val MIN_VERSIONS_TO_DELETE = buildConf("spark.sql.streaming.minVersionsToDelete") + .internal() + .doc("The minimum number of stale versions to delete when maintenance is invoked.") + .version("2.1.1") + .intConf + .createWithDefault(30) Review Comment: I agree the default of 30 is inline with the default 100 batches to retain. I guess the state reader will provide some use case to maintaining so many batches, don't understand what previous use case would have made sense to store that many batches, since you would have to do some manual checkpoint surgery to try to rollback. I don't have a strong preference either way, just not sure how many other people would be in my boat or how unique my use case is (large aggregations and deduping with batches every few hours). Either way would just be good to document in case others are surprised at an average increase in state store size -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org