pvary commented on PR #15328: URL: https://github.com/apache/iceberg/pull/15328#issuecomment-3910880300
> can we run the benchmarks for spark to see how the benchmarks turns out to be post these : https://github.com/apache/iceberg/tree/main/spark/v4.1/spark/src/jmh/java/org/apache/iceberg/spark ? Added some new tests for Parquet (`readUsingRegistryReader`, `readWithProjectionUsingRegistryReader`, `readUsingRegistryReader`, `readWithProjectionUsingRegistryReader`, `writeUsingRegistryWriter`, `writeUsingRegistryWriter`): ``` Benchmark Mode Cnt Score Error Units SparkParquetReadersFlatDataBenchmark.readUsingIcebergReader ss 5 0.311 ± 0.005 s/op SparkParquetReadersFlatDataBenchmark.readUsingIcebergReaderUnsafe ss 5 0.396 ± 0.018 s/op SparkParquetReadersFlatDataBenchmark.readUsingRegistryReader ss 5 0.326 ± 0.049 s/op SparkParquetReadersFlatDataBenchmark.readUsingSparkReader ss 5 0.408 ± 0.008 s/op SparkParquetReadersFlatDataBenchmark.readWithProjectionUsingIcebergReader ss 5 0.185 ± 0.018 s/op SparkParquetReadersFlatDataBenchmark.readWithProjectionUsingIcebergReaderUnsafe ss 5 0.363 ± 0.018 s/op SparkParquetReadersFlatDataBenchmark.readWithProjectionUsingRegistryReader ss 5 0.213 ± 0.026 s/op SparkParquetReadersFlatDataBenchmark.readWithProjectionUsingSparkReader ss 5 0.273 ± 0.019 s/op SparkParquetReadersNestedDataBenchmark.readUsingIcebergReader ss 5 0.184 ± 0.018 s/op SparkParquetReadersNestedDataBenchmark.readUsingIcebergReaderUnsafe ss 5 0.219 ± 0.026 s/op SparkParquetReadersNestedDataBenchmark.readUsingRegistryReader ss 5 0.179 ± 0.035 s/op SparkParquetReadersNestedDataBenchmark.readUsingSparkReader ss 5 0.223 ± 0.015 s/op SparkParquetReadersNestedDataBenchmark.readWithProjectionUsingIcebergReader ss 5 0.077 ± 0.010 s/op SparkParquetReadersNestedDataBenchmark.readWithProjectionUsingIcebergReaderUnsafe ss 5 0.137 ± 0.007 s/op SparkParquetReadersNestedDataBenchmark.readWithProjectionUsingRegistryReader ss 5 0.080 ± 0.006 s/op SparkParquetReadersNestedDataBenchmark.readWithProjectionUsingSparkReader ss 5 0.103 ± 0.003 s/op SparkParquetWritersFlatDataBenchmark.writeUsingIcebergWriter ss 5 2.602 ± 0.064 s/op SparkParquetWritersFlatDataBenchmark.writeUsingRegistryWriter ss 5 2.593 ± 0.074 s/op SparkParquetWritersFlatDataBenchmark.writeUsingSparkWriter ss 5 2.594 ± 0.054 s/op SparkParquetWritersNestedDataBenchmark.writeUsingIcebergWriter ss 5 1.559 ± 0.022 s/op SparkParquetWritersNestedDataBenchmark.writeUsingRegistryWriter ss 5 1.569 ± 0.043 s/op SparkParquetWritersNestedDataBenchmark.writeUsingSparkWriter ss 5 1.595 ± 0.046 s/op ``` The differences are barely noticeable in any direction. There should not be any real difference as the resulting readers and writers are using the same code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
