jinyangli34 commented on PR #11258:
URL: https://github.com/apache/iceberg/pull/11258#issuecomment-2403412692
Run benchmark again, increased `NUM_RECORDS` from 1M to 5M
Tested 4 groups:
**main**: main branch without change in this PR
**PR**: this PR
**PR+2**: two more getBufferedSize calls per add value
```
@Override
public void add(T value) {
recordCount += 1;
+ long size1 = writeStore.getBufferedSize();
+ long size2 = writeStore.getBufferedSize();
+ if (size1 != size2) {
+ throw new RuntimeException("Buffered size changed after adding a
record");
+ }
long sizeBeforeWrite = writeStore.getBufferedSize();
model.write(0, value);
this.currentRawBufferedSize += writeStore.getBufferedSize() -
sizeBeforeWrite;
```
**PR+4**: four more getBufferedSize calls per add value
```
@Override
public void add(T value) {
recordCount += 1;
+ long size1 = writeStore.getBufferedSize();
+ long size2 = writeStore.getBufferedSize();
+ long size3 = writeStore.getBufferedSize();
+ long size4 = writeStore.getBufferedSize();
+ if (size1 != size2 || size3 != size4) {
+ throw new RuntimeException("Buffered size changed after adding a
record");
+ }
long sizeBeforeWrite = writeStore.getBufferedSize();
model.write(0, value);
this.currentRawBufferedSize += writeStore.getBufferedSize() -
sizeBeforeWrite;
```
Avg numbers:
```
Flat Benchmark Avg Main PR PR+2 PR+4
writeUsingIcebergWriter 15.773 15.976 16.672 17.133
writeUsingSparkWriter 16.056 15.826 15.830 15.891
Nested Benchmark Avg Main PR PR+2 PR+4
writeUsingIcebergWriter 9.683 9.775 9.978 10.199
writeUsingSparkWriter 10.156 9.676 9.698 9.683
```
Comparing this PR vs main branch, after this change:
Iceberg Writer is 1.3% slower for flat data and 0.95% slower for nested data
Spark Writer is 1.4% faster for flat data and 4.7% faster for nested data
[iceberg-pr-11258-perf-test.csv](https://github.com/user-attachments/files/17318460/iceberg-pr-11258-perf-test.csv)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]