mapleFU opened a new issue, #40948:
URL: https://github.com/apache/arrow/issues/40948
### Describe the enhancement requested
Parquet-format has dictionary `is_sorted` flag. However, seems no impl
enables this. `is_sorted` is useful when input data is ordered and building
filter on dictionary. It could make "dictionary filter" fast(without building a
hashtable).
This requires set `is_sorted` flag during writing dictionary page. We can
fast checking it if the dict is sorted. Like:
```
WriteBatchToDict(RecordBatch) {
if (is_sorted) {
is_sorted = checkDict(RecordBatch)
}
// do write
}
```
[1]
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L613-L614
### Component(s)
C++, Parquet
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]