(spark) branch master updated: [SPARK-46328][SQL] Allocate capacity of array list of TColumns by columns size in TRowSet generation

dongjoon Fri, 08 Dec 2023 11:24:51 -0800

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 20c9b3dc4fac [SPARK-46328][SQL] Allocate capacity of array list of 
TColumns by columns size in TRowSet generation
20c9b3dc4fac is described below

commit 20c9b3dc4fac283f895c8d860b4c6e0144697302
Author: liangbowen <liangbo...@gf.com.cn>
AuthorDate: Fri Dec 8 11:24:35 2023 -0800

    [SPARK-46328][SQL] Allocate capacity of array list of TColumns by columns 
size in TRowSet generation
    
    ### What changes were proposed in this pull request?
    
    Allocate enough capacity by columns size for assembling array list of 
TColumns in TRowSet generation.
    
    ### Why are the changes needed?
    
    ArrayLists is created for TColumn value collections in RowSetUtils for 
TRowSet generation. Currently, they are created with Java's default capacity of 
16, rather than by the number of columns, which could cause array copying in 
assembling each TColumn collection when the column number exceeds the default 
capacity.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    GA tests.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #44258 from bowenliang123/rowset-cap.
    
    Authored-by: liangbowen <liangbo...@gf.com.cn>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 .../org/apache/spark/sql/hive/thriftserver/RowSetUtils.scala     | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/RowSetUtils.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/RowSetUtils.scala
index 94046adca0d8..502e29619027 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/RowSetUtils.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/RowSetUtils.scala
@@ -57,15 +57,16 @@ object RowSetUtils {
     val tRows = new java.util.ArrayList[TRow](rowSize)
     while (i < rowSize) {
       val row = rows(i)
-      val tRow = new TRow()
       var j = 0
       val columnSize = row.length
+      val tColumnValues = new java.util.ArrayList[TColumnValue](columnSize)
       while (j < columnSize) {
         val columnValue = toTColumnValue(j, row, schema(j), timeFormatters)
-        tRow.addToColVals(columnValue)
+        tColumnValues.add(columnValue)
         j += 1
       }
       i += 1
+      val tRow = new TRow(tColumnValues)
       tRows.add(tRow)
     }
     new TRowSet(startRowOffSet, tRows)
@@ -80,11 +81,13 @@ object RowSetUtils {
     val tRowSet = new TRowSet(startRowOffSet, new 
java.util.ArrayList[TRow](rowSize))
     var i = 0
     val columnSize = schema.length
+    val tColumns = new java.util.ArrayList[TColumn](columnSize)
     while (i < columnSize) {
       val tColumn = toTColumn(rows, i, schema(i), timeFormatters)
-      tRowSet.addToColumns(tColumn)
+      tColumns.add(tColumn)
       i += 1
     }
+    tRowSet.setColumns(tColumns)
     tRowSet
   }
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-46328][SQL] Allocate capacity of array list of TColumns by columns size in TRowSet generation

Reply via email to