mapleFU opened a new issue, #40948:
URL: https://github.com/apache/arrow/issues/40948

   ### Describe the enhancement requested
   
   Parquet-format has dictionary `is_sorted` flag. However, seems no impl 
enables this. `is_sorted` is useful when input data is ordered and building 
filter on dictionary. It could make "dictionary filter" fast(without building a 
hashtable).
   
   This requires set `is_sorted` flag during writing dictionary page. We can 
fast checking it if the dict is sorted. Like:
   
   ```
   WriteBatchToDict(RecordBatch) {
     if (is_sorted) {
       is_sorted = checkDict(RecordBatch)
     }
     // do write
   }
   ```
   
   [1] 
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L613-L614
   
   ### Component(s)
   
   C++, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to