Hi JB, I have two questions on this scope:
1. any hope it is extensible so an user can plug its own metadata? 2. will scanning be made streaming friendly (I assume phase 0 is a batch), idea would be to be able to use Kappa like architecture to have real time capabilities Thanks, Romain Manni-Bucau @rmannibucau <https://x.com/rmannibucau> | .NET Blog <https://dotnetbirdie.github.io/> | Blog <https://rmannibucau.github.io/> | Old Blog <http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> | LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book <https://www.packtpub.com/en-us/product/java-ee-8-high-performance-9781788473064> Javaccino founder (Java/.NET service - contact via linkedin) Le ven. 5 juin 2026 à 02:20, Yufei Gu <[email protected]> a écrit : > Great to see the progress here. Thanks a lot JB! I will take a look at the > PR. > > Yufei > > > On Thu, Jun 4, 2026 at 2:58 AM Jean-Baptiste Onofré <[email protected]> > wrote: > > > Hi everyone, > > > > After several months of discussion (involving Directories, Table Sources, > > etc), I would like to propose Polaris Directories. > > > > I drafted a PR: > > https://github.com/apache/polaris/pull/4613 > > > > The proposal is documented as part of the PR: > > > > > https://github.com/jbonofre/polaris/blob/12dfea48570d076d4012143e66f02e8b503c4f99/site/content/in-dev/unreleased/directories.md > > > > In a nutshell, Polaris Directories make objects (including unstructured > > data like images, videos, and documents) discoverable alongside > structured > > Iceberg tables within a Polaris catalog. A directory points to a base > > location/prefix on an object store and automatically tracks the objects > it > > contains by maintaining an Iceberg table with object-level metadata such > as > > URI, size, content type, checksum, ... > > > > This means query engines and tools that already know how to read Iceberg > > tables can discover and access unstructured data with little or no extra > > work (accessing the object itself). > > > > A directory has two main parts: > > - Directory configuration, stored by the Polaris server. It describes > where > > the data lives, how to authenticate, which objects to include, and how > > often to re-scan. The configuration "lives" in a namespace. > > - Directory table, an Iceberg table serving as the inventory of all > objects > > contained in the directory, with one row per object discovered during a > > scan. The directory table uses the configuration name. > > The Polaris server itself does not perform scans. Instead, external > > services (e.g. directory table scanning service) read the directory > > configuration through the REST API, walk the object store, and write the > > results into the directory table. > > > > I propose we discuss this both on the mailing list (this thread) and on > the > > PR. If needed, I'm happy to schedule a dedicated meeting. > > > > I'm looking forward to your thoughts! > > > > Thanks! > > > > Regards > > JB > > >
