Hello,

Does Apache Arrow have any standard way to embed min/max values of the fields
per record-batch basis?
It looks FieldNode supports neither dedicated min/max attribute nor
custom-metadata.
https://github.com/apache/arrow/blob/master/format/Message.fbs#L28

If we embed an array of min/max values into the custom-metadata of the
Field-node,
we may be able to implement.
https://github.com/apache/arrow/blob/master/format/Schema.fbs#L344

What I like to implement is something like BRIN index at PostgreSQL.
http://heterodb.github.io/pg-strom/brin/

This index contains only min/max values for a particular block ranges, and query
executor can skip blocks that obviously don't contain the target data.
If we can skip 9990 of 10000 record batch by checking metadata on a query that
tries to fetch items in very narrow timestamps, it is a great
acceleration more than
full file scans.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kai...@heterodb.com>

Reply via email to