[incubator-celeborn] branch main updated: [CELEBORN-664][SPARK][PERF] Improve the perf of columnar shuffle write

zhouky Mon, 12 Jun 2023 03:46:39 -0700

This is an automated email from the ASF dual-hosted git repository.

zhouky pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn.git



The following commit(s) were added to refs/heads/main by this push:
     new 79806b27c [CELEBORN-664][SPARK][PERF] Improve the perf of columnar 
shuffle write
79806b27c is described below

commit 79806b27ca488293715c92469064dabd700bb022
Author: Fu Chen <[email protected]>
AuthorDate: Mon Jun 12 18:46:00 2023 +0800

    [CELEBORN-664][SPARK][PERF] Improve the perf of columnar shuffle write
    
    ### What changes were proposed in this pull request?
    
    per 
https://github.com/databricks/scala-style-guide#traversal-and-zipwithindex, use 
`while` loop for performance-sensitive code
    
    framegraph and shuffle write time before:
    
    ![截屏2023-06-12 下午4 18 
24](https://github.com/apache/incubator-celeborn/assets/8537877/59d94e05-71b5-4474-bebe-66df554ccc48)
    
    ![截屏2023-06-12 下午4 19 
56](https://github.com/apache/incubator-celeborn/assets/8537877/e24bb8b2-5b16-431b-92ae-cb8216e69d16)
    
    framegraph and shuffle write time after:
    
    ![截屏2023-06-12 下午4 18 
38](https://github.com/apache/incubator-celeborn/assets/8537877/18a84774-2197-487d-aa51-b33445619210)
    
    ![截屏2023-06-12 下午4 21 
39](https://github.com/apache/incubator-celeborn/assets/8537877/26d95e5a-6e68-46b7-8c8c-49eb2d2e252f)
    
    ### Why are the changes needed?
    
    ### Does this PR introduce _any_ user-facing change?
    
    ### How was this patch tested?
    
    Closes #1577 from cfmcgrady/columnar-perf.
    
    Authored-by: Fu Chen <[email protected]>
    Signed-off-by: zky.zhoukeyong <[email protected]>
---
 .../spark/sql/execution/columnar/RssColumnarBatchBuilder.scala      | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git 
a/client-spark/spark-3/src/main/scala/org/apache/spark/sql/execution/columnar/RssColumnarBatchBuilder.scala
 
b/client-spark/spark-3/src/main/scala/org/apache/spark/sql/execution/columnar/RssColumnarBatchBuilder.scala
index 2db46b60a..9b9637298 100644
--- 
a/client-spark/spark-3/src/main/scala/org/apache/spark/sql/execution/columnar/RssColumnarBatchBuilder.scala
+++ 
b/client-spark/spark-3/src/main/scala/org/apache/spark/sql/execution/columnar/RssColumnarBatchBuilder.scala
@@ -85,12 +85,16 @@ class RssColumnarBatchBuilder(
     val giantBuffer = new ByteArrayOutputStream
     val rowCntBytes = int2ByteArray(rowCnt)
     giantBuffer.write(rowCntBytes)
-    columnBuilders.foreach { builder =>
+    val builderLen = columnBuilders.length
+    var i = 0
+    while (i < builderLen) {
+      val builder = columnBuilders(i)
       val buffers = builder.build()
       val bytes = JavaUtils.bufferToArray(buffers)
       val columnBuilderBytes = int2ByteArray(bytes.length)
       giantBuffer.write(columnBuilderBytes)
       giantBuffer.write(bytes)
+      i += 1
     }
     giantBuffer.toByteArray
   }

[incubator-celeborn] branch main updated: [CELEBORN-664][SPARK][PERF] Improve the perf of columnar shuffle write

Reply via email to