Github user mallman commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22880#discussion_r231249401
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
 ---
    @@ -202,11 +204,15 @@ private[parquet] class ParquetRowConverter(
     
       override def start(): Unit = {
         var i = 0
    -    while (i < currentRow.numFields) {
    +    while (i < fieldConverters.length) {
           fieldConverters(i).updater.start()
           currentRow.setNullAt(i)
    --- End diff --
    
    > I'm going to push a new commit keeping the current code but with a brief 
explanatory comment.
    
    On further careful consideration, I believe that separating the calls to 
`currentRow.setNullAt(i)` into their own loop actually won't incur any 
significant performance degradation—if any at all.
    
    The performance of the `start()` method is dominated by the calls to 
`fieldConverters(i).updater.start()` and `currentRow.setNullAt(i)`. Putting the 
latter calls into their own loop won't change the count of those method calls, 
just the order. @viirya LMK if you disagree with my analysis.
    
    I will push a new commit with separate while loops. I won't use the more 
elegant `(0 until currentRow.numFields).foreach(currentRow.setNullAt)` because 
that's not a loop, and I doubt either the Spark or Hotspot optimizer can turn 
that into a loop.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to