aokolnychyi commented on pull request #2945:
URL: https://github.com/apache/iceberg/pull/2945#issuecomment-894524869
Here are some benchmark numbers for writing 2.5 million records (flat
schema, 7 columns). I am using bucketing with 32 buckets on an int column for
partitioned writes.
```
Benchmark
Mode Cnt Score Error Units
TaskWriterParquetBenchmark.writePartitionedDataNewFanoutWriter
ss 5 10.432 ± 0.382 s/op
TaskWriterParquetBenchmark.writePartitionedDataOldFanoutWriter
ss 5 11.315 ± 0.345 s/op
TaskWriterParquetBenchmark.writePartitionedDataNewWriter
ss 5 11.416 ± 0.994 s/op
TaskWriterParquetBenchmark.writePartitionedDataOldWriter
ss 5 11.331 ± 0.238 s/op
TaskWriterParquetBenchmark.writePartitionedEqualityDeleteNewWriter
ss 5 11.795 ± 1.553 s/op
TaskWriterParquetBenchmark.writeUnpartitionedDataNewWriter
ss 5 10.736 ± 1.058 s/op
TaskWriterParquetBenchmark.writeUnpartitionedDataOldWriter
ss 5 10.501 ± 2.084 s/op
TaskWriterParquetBenchmark.writeUnpartitionedEqualityDeleteNewWriter
ss 5 9.935 ± 0.166 s/op
TaskWriterParquetBenchmark.writeUnpartitionedPositionDeleteWithoutRowNewWriter
ss 5 8.833 ± 0.791 s/op
```
Memory-wise it is very similar. Here is an example.
```
TaskWriterParquetBenchmark.writePartitionedDataNewWriter:·gc.alloc.rate
ss 5 177.302 ± 17.914
MB/sec
TaskWriterParquetBenchmark.writePartitionedDataNewWriter:·gc.churn.G1_Eden_Space
ss 5 136.865 ± 12.818
MB/sec
TaskWriterParquetBenchmark.writePartitionedDataNewWriter:·gc.churn.G1_Old_Gen
ss 5 5.411 ± 0.646
MB/sec
TaskWriterParquetBenchmark.writePartitionedDataOldWriter:·gc.alloc.rate
ss 5 177.730 ± 11.985
MB/sec
TaskWriterParquetBenchmark.writePartitionedDataOldWriter:·gc.churn.G1_Eden_Space
ss 5 137.768 ± 21.407
MB/sec
TaskWriterParquetBenchmark.writePartitionedDataOldWriter:·gc.churn.G1_Old_Gen
ss 5 5.420 ± 0.892
MB/sec
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]