We have been discussing something like this as well, either an arbitrary partitioning scheme or just a more extensive and customizable transform.
An example I’m interested in is a geo hash index where we store offsets on a large grid to denote partitions. The total offset file for the whole planet still only ends up being in the low megabytes while accounting for high density in cities and low density over oceans Sent from my iPhone > On Jul 4, 2023, at 8:08 AM, Joseph Allemandou <jalleman...@wikimedia.org> > wrote: > > > Hi Iceberg team, > > I'm working at the WikimediaFoundation, and we started using Iceberg for some > of our big-data tables - we love it :) > > One of the needs we'll have in the future would be to partition data using a > specific bucketing function. > How complex would that be to add a new function to the ones already present > in the Iceberg partitioning mechanism? Is there any docs on doing that? > Bonus points: Are there any plans to make it possible for users to reference > their own bucketing functions at table definition? > > Many thanks for the awesome project<3 > > -- > Joseph Allemandou (joal) (he / him) > Staff Data Engineer > Wikimedia Foundation