Re: [PR] [SPARK-46256][CORE] Parallel Compression Support for ZSTD [spark]

via GitHub Mon, 04 Dec 2023 21:28:48 -0800


dongjoon-hyun commented on code in PR #44172:
URL: https://github.com/apache/spark/pull/44172#discussion_r1414898758



##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -1910,6 +1910,16 @@ package object config {
       .booleanConf
       .createWithDefault(true)
 
+  private[spark] val IO_COMPRESSION_ZSTD_WORKERS =
+    ConfigBuilder("spark.io.compression.zstd.workers")
+      .doc("Thread size spawned to compress in parallel when using Zstd. When 
value <= 0, " +
+        "no worker is spawned, it works in single-threaded mode. When value > 
0, it triggers " +
+        "asynchronous mode, corresponding number of threads are spawned. More 
workers improve " +
+        "performance, but also increase memory cost.")
+      .version("4.0.0")
+      .intConf
+      .createWithDefault(8)

Review Comment:
   Ya, we should use `0` by default because this has a side-effect on both CPU 
cycle and native Memory usage. The production job can have perf regressions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-46256][CORE] Parallel Compression Support for ZSTD [spark]

Reply via email to