[GitHub] [spark] wankunde commented on a diff in pull request #41782: [SPARK-44239][SQL] Free memory allocated by large vectors when vectors are reset

via GitHub Sun, 20 Aug 2023 18:43:45 -0700


wankunde commented on code in PR #41782:
URL: https://github.com/apache/spark/pull/41782#discussion_r1299497876



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -487,6 +487,25 @@ object SQLConf {
     .intConf
     .createWithDefault(10000)
 
+  val VECTORIZED_HUGE_VECTOR_RESERVE_RATIO =
+    buildConf("spark.sql.inMemoryColumnarStorage.hugeVectorReserveRatio")
+      .doc("spark will reserve requiredCapacity * this ratio memory next time. 
This is only " +
+        "effective when spark.sql.inMemoryColumnarStorage.hugeVectorThreshold 
> 0 and required " +
+        "memory larger than that threshold.")
+      .version("3.5.0")
+      .doubleConf
+      .createWithDefault(1.2)
+
+  val VECTORIZED_HUGE_VECTOR_THRESHOLD =
+    buildConf("spark.sql.inMemoryColumnarStorage.hugeVectorThreshold")
+      .doc("When the in memory column vector is larger than this, spark will 
reserve " +
+        s"requiredCapacity * ${VECTORIZED_HUGE_VECTOR_RESERVE_RATIO.key} 
memory next time and " +
+        "free this column vector before reading next batch data. -1 means 
disabling the " +
+        "optimization.")
+      .version("3.5.0")
+      .bytesConf(ByteUnit.BYTE)
+      .createWithDefault(-1)

Review Comment:
   
![image](https://github.com/apache/spark/assets/3626747/7da2b853-e585-4244-9ddd-e445733d30e7)
   When VECTORIZED_HUGE_VECTOR_THRESHOLD = 1, there are two UT failures, as 
expected.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wankunde commented on a diff in pull request #41782: [SPARK-44239][SQL] Free memory allocated by large vectors when vectors are reset

Reply via email to