Hi JB,

I have two questions on this scope:

1. any hope it is extensible so an user can plug its own metadata?
2. will scanning be made streaming friendly (I assume phase 0 is a batch),
idea would be to be able to use Kappa like architecture to have real time
capabilities

Thanks,
Romain Manni-Bucau
@rmannibucau <https://x.com/rmannibucau> | .NET Blog
<https://dotnetbirdie.github.io/> | Blog <https://rmannibucau.github.io/> | Old
Blog <http://rmannibucau.wordpress.com> | Github
<https://github.com/rmannibucau> | LinkedIn
<https://www.linkedin.com/in/rmannibucau> | Book
<https://www.packtpub.com/en-us/product/java-ee-8-high-performance-9781788473064>
Javaccino founder (Java/.NET service - contact via linkedin)


Le ven. 5 juin 2026 à 02:20, Yufei Gu <[email protected]> a écrit :

> Great to see the progress here. Thanks a lot JB! I will take a look at the
> PR.
>
> Yufei
>
>
> On Thu, Jun 4, 2026 at 2:58 AM Jean-Baptiste Onofré <[email protected]>
> wrote:
>
> > Hi everyone,
> >
> > After several months of discussion (involving Directories, Table Sources,
> > etc), I would like to propose Polaris Directories.
> >
> > I drafted a PR:
> > https://github.com/apache/polaris/pull/4613
> >
> > The proposal is documented as part of the PR:
> >
> >
> https://github.com/jbonofre/polaris/blob/12dfea48570d076d4012143e66f02e8b503c4f99/site/content/in-dev/unreleased/directories.md
> >
> > In a nutshell, Polaris Directories make objects (including unstructured
> > data like images, videos, and documents) discoverable alongside
> structured
> > Iceberg tables within a Polaris catalog. A directory points to a base
> > location/prefix on an object store and automatically tracks the objects
> it
> > contains by maintaining an Iceberg table with object-level metadata such
> as
> > URI, size, content type, checksum, ...
> >
> > This means query engines and tools that already know how to read Iceberg
> > tables can discover and access unstructured data with little or no extra
> > work (accessing the object itself).
> >
> > A directory has two main parts:
> > - Directory configuration, stored by the Polaris server. It describes
> where
> > the data lives, how to authenticate, which objects to include, and how
> > often to re-scan. The configuration "lives" in a namespace.
> > - Directory table, an Iceberg table serving as the inventory of all
> objects
> > contained in the directory, with one row per object discovered during a
> > scan. The directory table uses the configuration name.
> > The Polaris server itself does not perform scans. Instead, external
> > services (e.g. directory table scanning service) read the directory
> > configuration through the REST API, walk the object store, and write the
> > results into the directory table.
> >
> > I propose we discuss this both on the mailing list (this thread) and on
> the
> > PR. If needed, I'm happy to schedule a dedicated meeting.
> >
> > I'm looking forward to your thoughts!
> >
> > Thanks!
> >
> > Regards
> > JB
> >
>

Reply via email to