Re: [I] Implement vectored IO in parquet file format [parquet-java]

via GitHub Thu, 16 Apr 2026 05:42:48 -0700


peter-toth commented on issue #2703:
URL: https://github.com/apache/parquet-java/issues/2703#issuecomment-4260121705


   @steveloughran, @mukund-thakur we noticed that when vectoried IO is enabled 
the `BytesRead` metrics of Spark tasks are not correct.
   Spark fetches that metric via `FileSystem.getAllStatistics` see
   - 
https://github.com/apache/spark/blob/5d491f62748b4b9c34bc3b5bd7390f7b5ca75053/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala#L98-L109
 and
   - 
https://github.com/apache/spark/blob/5d491f62748b4b9c34bc3b5bd7390f7b5ca75053/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L164-L170
   
   I wonder if this is a known issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Implement vectored IO in parquet file format [parquet-java]

Reply via email to