There is key-value metadata available on Message which might be able to work in the short term (some sort of encoded message). I think standardizing how we store statistics per batch does make sense.
We unfortunately can't add anything to field-node without breaking compatibility. But another option would be to add a new structure as a parallel list on RecordBatch itself. If we do add a new structure or arbitrary key-value pair we should not use KeyValue but should have something where the values can be bytes. On Wed, Feb 17, 2021 at 7:17 PM Kohei KaiGai <kai...@heterodb.com> wrote: > Hello, > > Does Apache Arrow have any standard way to embed min/max values of the > fields > per record-batch basis? > It looks FieldNode supports neither dedicated min/max attribute nor > custom-metadata. > https://github.com/apache/arrow/blob/master/format/Message.fbs#L28 > > If we embed an array of min/max values into the custom-metadata of the > Field-node, > we may be able to implement. > https://github.com/apache/arrow/blob/master/format/Schema.fbs#L344 > > What I like to implement is something like BRIN index at PostgreSQL. > http://heterodb.github.io/pg-strom/brin/ > > This index contains only min/max values for a particular block ranges, and > query > executor can skip blocks that obviously don't contain the target data. > If we can skip 9990 of 10000 record batch by checking metadata on a query > that > tries to fetch items in very narrow timestamps, it is a great > acceleration more than > full file scans. > > Best regards, > -- > HeteroDB, Inc / The PG-Strom Project > KaiGai Kohei <kai...@heterodb.com> >