Hello, Does Apache Arrow have any standard way to embed min/max values of the fields per record-batch basis? It looks FieldNode supports neither dedicated min/max attribute nor custom-metadata. https://github.com/apache/arrow/blob/master/format/Message.fbs#L28
If we embed an array of min/max values into the custom-metadata of the Field-node, we may be able to implement. https://github.com/apache/arrow/blob/master/format/Schema.fbs#L344 What I like to implement is something like BRIN index at PostgreSQL. http://heterodb.github.io/pg-strom/brin/ This index contains only min/max values for a particular block ranges, and query executor can skip blocks that obviously don't contain the target data. If we can skip 9990 of 10000 record batch by checking metadata on a query that tries to fetch items in very narrow timestamps, it is a great acceleration more than full file scans. Best regards, -- HeteroDB, Inc / The PG-Strom Project KaiGai Kohei <[email protected]>
