Re: [PR] [SPARK-48931][SS] Reduce Cloud Store List API cost for state store maintenance task [spark]

via GitHub Fri, 19 Jul 2024 05:54:57 -0700


Kimahriman commented on code in PR #47393:
URL: https://github.com/apache/spark/pull/47393#discussion_r1684338484



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -2129,6 +2129,13 @@ object SQLConf {
     .intConf
     .createWithDefault(100)
 
+  val MIN_VERSIONS_TO_DELETE = 
buildConf("spark.sql.streaming.minVersionsToDelete")
+    .internal()
+    .doc("The minimum number of stale versions to delete when maintenance is 
invoked.")
+    .version("2.1.1")
+    .intConf
+    .createWithDefault(30)

Review Comment:
   I agree the default of 30 is inline with the default 100 batches to retain. 
I guess the state reader will provide some use case to maintaining so many 
batches, don't understand what previous use case would have made sense to store 
that many batches, since you would have to do some manual checkpoint surgery to 
try to rollback.
   
   I don't have a strong preference either way, just not sure how many other 
people would be in my boat or how unique my use case is (large aggregations and 
deduping with batches every few hours). Either way would just be good to 
document in case others are surprised at an average increase in state store size



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48931][SS] Reduce Cloud Store List API cost for state store maintenance task [spark]

Reply via email to