This is an automated email from the ASF dual-hosted git repository.
zhouky pushed a commit to branch branch-0.3
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn.git
The following commit(s) were added to refs/heads/branch-0.3 by this push:
new 0b7060df0 [CELEBORN-664][SPARK][PERF] Improve the perf of columnar
shuffle write
0b7060df0 is described below
commit 0b7060df0e1584851a08c1008a9521451902cc23
Author: Fu Chen <[email protected]>
AuthorDate: Mon Jun 12 18:46:00 2023 +0800
[CELEBORN-664][SPARK][PERF] Improve the perf of columnar shuffle write
### What changes were proposed in this pull request?
per
https://github.com/databricks/scala-style-guide#traversal-and-zipwithindex, use
`while` loop for performance-sensitive code
framegraph and shuffle write time before:


framegraph and shuffle write time after:


### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes #1577 from cfmcgrady/columnar-perf.
Authored-by: Fu Chen <[email protected]>
Signed-off-by: zky.zhoukeyong <[email protected]>
(cherry picked from commit 79806b27ca488293715c92469064dabd700bb022)
Signed-off-by: zky.zhoukeyong <[email protected]>
---
.../spark/sql/execution/columnar/RssColumnarBatchBuilder.scala | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git
a/client-spark/spark-3/src/main/scala/org/apache/spark/sql/execution/columnar/RssColumnarBatchBuilder.scala
b/client-spark/spark-3/src/main/scala/org/apache/spark/sql/execution/columnar/RssColumnarBatchBuilder.scala
index 2db46b60a..9b9637298 100644
---
a/client-spark/spark-3/src/main/scala/org/apache/spark/sql/execution/columnar/RssColumnarBatchBuilder.scala
+++
b/client-spark/spark-3/src/main/scala/org/apache/spark/sql/execution/columnar/RssColumnarBatchBuilder.scala
@@ -85,12 +85,16 @@ class RssColumnarBatchBuilder(
val giantBuffer = new ByteArrayOutputStream
val rowCntBytes = int2ByteArray(rowCnt)
giantBuffer.write(rowCntBytes)
- columnBuilders.foreach { builder =>
+ val builderLen = columnBuilders.length
+ var i = 0
+ while (i < builderLen) {
+ val builder = columnBuilders(i)
val buffers = builder.build()
val bytes = JavaUtils.bufferToArray(buffers)
val columnBuilderBytes = int2ByteArray(bytes.length)
giantBuffer.write(columnBuilderBytes)
giantBuffer.write(bytes)
+ i += 1
}
giantBuffer.toByteArray
}