ranxianglei commented on PR #4608:
URL: https://github.com/apache/paimon/pull/4608#issuecomment-2514015603
Thanks for the testing and PR. I downloaded and tested it immediately.
Compared with the previous version, this PR improvement increased the speed of
Parquet by nearly 10 times, which is a huge improvement!
I compared the orc implementation and considered the issue of testing jit.
My test results show that the current Parquet implementation is at least 1
times slower than orc.
感谢做的测试和pr。我第一时间下载了并且测试,相对之前的版本这个pr提升把Parquet的速度提升了接近10倍,非常大的提升!
我对比了orc实现,考虑了测试jit的问题。
我测试结果现在的Parquet实现比orc还慢至少1倍。
On my computer, the Parquet result is 8.4s and the ORC result is 4.1s.
在我电脑上Parquet结果是8.4s,ORC结果是4.1s。
@Aiden-Dong @JingsongLi @leaves12138
```java
Table table = TableUtil.getTable(); // PrimaryKeyFileStoreTable
PredicateBuilder builder = new PredicateBuilder(
RowType.of(DataTypes.INT(),
DataTypes.STRING(),
DataTypes.STRING()));
int[] projection = new int[] {0, 1, 2};
ReadBuilder readBuilder = table.newReadBuilder()
.withProjection(projection);
Random random = new Random();
for(int i = 0 ; i < 30 ; i ++){
InnerTableRead read = (InnerTableRead)readBuilder.newRead();
int key = random.nextInt(4000000);
Predicate keyFilter = builder.equal(0, key);
InnerTableScan tableScan = (InnerTableScan) readBuilder
.withFilter(keyFilter)
.newScan();
InnerTableScan innerTableScan = tableScan.withFilter(keyFilter);
TableScan.Plan plan = innerTableScan.plan();
List<Split> splits = plan.splits();
read.withFilter(keyFilter);//.executeFilter();
RecordReader<InternalRow> reader = read.createReader(splits);
reader.forEachRemaining(internalRow -> {
int f0 = internalRow.getInt(0);
String f1 = internalRow.getString(1).toString();
String f2 = internalRow.getString(2).toString();
System.out.println(String.format("%d - {%d, %s, %s}",key,
f0, f1, f2));
});
}
long startTime = System.currentTimeMillis();
for(int i = 0 ; i < 1000 ; i ++){
InnerTableRead read = (InnerTableRead)readBuilder.newRead();
int key = random.nextInt(4000000);
Predicate keyFilter = builder.equal(0, key);
InnerTableScan tableScan = (InnerTableScan) readBuilder
.withFilter(keyFilter)
.newScan();
InnerTableScan innerTableScan = tableScan.withFilter(keyFilter);
TableScan.Plan plan = innerTableScan.plan();
List<Split> splits = plan.splits();
read.withFilter(keyFilter);//.executeFilter();
RecordReader<InternalRow> reader = read.createReader(splits);
reader.forEachRemaining(internalRow -> {
int f0 = internalRow.getInt(0);
String f1 = internalRow.getString(1).toString();
String f2 = internalRow.getString(2).toString();
System.out.println(String.format("%d - {%d, %s, %s}",key,
f0, f1, f2));
});
}
long stopTime = System.currentTimeMillis();
System.out.println("time : " + (stopTime - startTime));
```
writer see https://github.com/apache/paimon/issues/4586
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]