[GitHub] [spark] sunchao commented on pull request #31998: [SPARK-34859][SQL] parquet vectorized reader - support column index with rowIndexes

GitBox Fri, 28 May 2021 10:39:07 -0700


sunchao commented on pull request #31998:
URL: https://github.com/apache/spark/pull/31998#issuecomment-850570200



   @lxian I'm thinking that the extra cost is just incrementing two indexes at 
the same time, so it should be fairly cheap. You can also refer to how 
[SynchronizingColumnReader](https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/impl/SynchronizingColumnReader.java#L89)
 is doing that. 
   
   Porting that logic to Spark is a bit tricky though, especially when it comes 
to handle the RLE-encoded definition levels. Let me try experimenting this idea 
too on my side.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sunchao commented on pull request #31998: [SPARK-34859][SQL] parquet vectorized reader - support column index with rowIndexes

Reply via email to